Why some messages are more interesting than others

“The sun rised today” = not news
“The sun didn’t rise today” – interesting!

Why?

Claude Shannon in his article about Communication Theory (1948) wrote that information is related to surprise.
Specifically, that a message informing us of an event that has probability p to happen conveys

‐log(base 2) p bits of information.

For example, if an event has 50% probability to happen, then it has ‐log 0.5 = 1 bit of information associated.

If the event has only 10% probability to happen, then it has -log 0.1 = 3.3 bits of information.

Shannon’s diagram of a communication system – from Wikipedia

When we pick up a newspaper, we are looking for maximum information, so more `surprising’ events make for better news.

See also what the physicist John Wheeler wrote in 1990:

“… every it — every particle, every field of force — derives its function, its meaning, its very existence entirely from yes-or-no questions, binary choices, _bits_.”

Advertisements

Google Drive Site Publishing

Last November Google announced the possibility to use the Google Drive to publish static we pages. It supports HTML and CSS codes.

What’s the difference with Google Sites, the service that gives you the possibility to create your own micro-site?

It seems that Sites is more user-friendly with a GUI-based way to edit the HTML pages, with ready-made widgets to help create common page elements, as maps.
While GDrive lets you write directly your HTML and for already existing pages it could be the better way (just upload them to GDrive).

How-to

1. Create a folder in GDrive where you will keep all your web pages. The folder must be public, so you have to right click on the new folder and select “Share” and change the permissions from Private to “Public on the web”.

2. Now you can upload all your pages (HTML, CSS files, images and so on). This is a bit tricky: if you create them directly on GDrive they will not appear because they need to be the correct type. The easiest way is to create them offline, as plain text, with correct extension (e.g. HTML), then uploading them. Note that if you attempt later to edit the files as Google documents, they will be changed from .html files to .html.gdoc files and will no longer appear. You could use the GDrive local app on your computer or use a GDrive editor like Neutron Drive.

3. Now, to access the page you need to find out the GDrive Folder ID. When you open the folder, look at the URL. It is something like:

https://drive.google.com/?tab=co&authuser=0#folders/<long list of chars and numbers>/

The last unfriendly part after the “#folders/” one is the Folder ID. You can copy that part and add it at the end of this string:

https://googledrive.com/host/<add here the Folder ID>/

For example, mine is visible at:

https://googledrive.com/host/0B3PCsVYMU3-Eek9EWUg5NWlPdTg/ 

4 (extra): I know, the folder ID is not that mnemonic. You could use a shortener service to beautify it, as http://gdriv.es/

My (better) URL for the test page would be then http://gdriv.es/offmu

Enjoy!

[Link] Top 10 things that JavaScript got wrong

Jeffrey Way, a JavaScript developer,  compiled a list of the Top 10 Things that JavaScript Got Wrong, just for fun.
The list is quite enjoyable and a couple of points are so true.

Here’s Way’s list:

  1. The Name. JavaScript is NOT Java
  2. Null is an Object?
  3. NaN !== NaN
  4. Global Variables
  5. User-Agent Strings Report Mozilla. Ever Wonder Why?
  6. Scope Inconsistencies
  7. The Use of Bitwise Operators
  8. Too Many Falsy/Bottom Values
  9. It Can’t Do Arithmetic (he notes that he’s 99% teasing with this one)
  10. Code Styling Isn’t your Choice!

The #4, Javascript (actually now called ECMAscript) almost requiring global variables, is the one that strikes with me most.
Followed by #10: JavaScript adding semicolon when it thinks it’s necessary!

[Link] Great overview of extracting text methods

A great overview from computer science student Tomaž Kovačič about extracting text articles from HTML documents (aka as web scraping or text mining).

He is going through several modern methods and tools.

In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc.
[…] In the following chapters I’ll try to review some article text extraction methods that are applicable to today’s websites. They mostly leverage on machine learning, statistics and a wide rage of heuristics.

Google releases Java WindowBuilder Pro to everyone

Google announced that they are releasing WindowBuilder (a tool to create GUI for Java applications) as open source project to the Eclipse Foundation.

The tool is very complete (maybe a bit over-engineered) and it includes powerful functionality for creating user interfaces (in a drag & drop editor) based on  Swing, SWT (Standard Widget Toolkit) or GWT (Google Web Toolkit).
It’s also bi-directional: can read hand-written GUI Java code and reverse-engineered it into the GUI components.

I tried it with my test-application (a BMI calculator) and was able to create a GUI Java in one or two hours.

More information and the download are available on the Eclipse site and on Google Code.

New graph tool from Google Labs: Books N-grams Viewer

A fascinating tool launched recently by Google Labs: N-grams.

It graphs the frequency of occurrence of the terms (words, sentences)  you input from the more than 5 million books (starting from 1500?) that Google scanned.
The tool is working best in English but other languages as Chinese, French, German, Russian, and Spanish (unfortunately not Italian) are available.

You could play with it for hours …

ReadWriteWeb has some nice examples.

N-grams for religions
Major religions of the world.