September 3, 2010 3 Comments
Google Priority Inbox launched this week and there has been a huge fuss about it on sites such as TechCrunch and Wired, even the BBC is getting in on the action. Most people are calling what Google is doing a miracle but that’s not necessarily the case.
A lot of the stuff google is doing has been around for a while in fact during the final year of my undergraduate computer science course I created a system myself to help deal with email overload. The key to the system was determining which emails were important to the user and which are not. The basics of how a system such as priority mail works are fairly simple, that’s not to say that what google has done is impressive but more on that later. I figured I would share some of the knowledge that I learnt while I created my project all those years ago.
Determining the importance of an Email
So how do we determine how important an email is? For a while now spam has been a real issue for those of us that receive email. For a long time people spent huge amounts of time determining the characteristics of an email that meant it was a spam message. Things like coloring the message red were amongst the first key indicators. The thing was the spammers soon caught on to this and a new solution needed to be found. In in 2002 Paul Graham (of HackerNews and YCombinator fame) released an article called a plan for spam and later in 2003 better baysian filtering.
The articles outlined a method that could be trained to determine which messages were spam and which were important. Those articles were hugely interesting for me, not only because they were my first introduction to baysian algorithms but also because of the potential to allow users to give feedback to better understand which messages were spam.
Without going into too much detail the basic premise of these articles is that you take each word within the email and look at it. If the email is a good email then you put a good mark next to that word. If the email is bad you put a bad mark next to this word. You can then see how likely it is that a given word is contained in a good email or a bad email. When a new email arrives you look at the words of that email. If the email contains words that are in more bad messages than good ones then the changes are the new email is a spam one.
This concept led to a huge leap in spam filters but its easy to see that you can take this concept further. Instead of treating words as good or bad you can treat the words as important or unimportant. That’s exactly how that piece of my final year project worked. But why stop there? What if instead of counting good and bad words you created a bunch of different categories? You could automatically filter emails into any category you choose.
Of course this all gets far cleverer when you look at how people place things as a whole and how they classify them individually. These algorithms have also gone much much further than I have described often using ontologies and other techniques to improve things.
Why is Google Priority Email so great then and What else can I do?
Well this is the killer. SPAM filters existed far before Google launched Gmail but somehow they just managed to get it right. Switching to Google apps I personally saw my 100 spam messages per day (that’s 100 getting through from 4,000) cut down to maybe one or two a week. Google has access to a lot of data, and more specifically a lot of email accounts. As more and more people use this sort of system the data can be refined. It’s for this reason that I think Google’s Priority email could be a real killer app and a real asset in our current overloaded times. However, its not the only option.
In case you aren’t aware though there are a couple of other things that you can do to help take control of your inbox. The first is to take a look at OtherInbox. These guys have been doing a lot of what Priority Inbox is offering for a long time now and by doing it I mean doing it well. The OtherInbox system can even do things like pull out delivery company notices to tell you when your parcel is going to arrive. It’s basically Google’s Priority Inbox on steroids.
If you’re a dev then you could also look at the SPAMBayes project, its open source and has a load of different options. The other key is training. Of course there’s a lot of email coming from external advertising sources but have you ever considered that you might be part of someone else’s problem? A lot of the research conducted at the university I was at showed that the sender felt that an email was far more important than the recipient felt it was. It also showed that people are often to cavalier in their attitudes to email, including too many other people in their emails or sending email when it really wasn’t necessary. There’s a bunch of research on the subject of information overload on Tom Jackson’s website that’s really worth taking a look at if your serious about reducing the problem for your business.
So yes Google Priority inbox is cool, but take a look at the other options. Above all think about how you are contributing to the solution rather than the problem of Information Overload.