Multipart Body – A gem for working with multipart data

Multipart queries are used quite a lot in the transfer of data around the Internet. There are a number of projects out there that will generate multipart content such as email libraries and even web frameworks for uploading and working with files. When we came create parts of CloudMailin we couldn’t find a gem that would easily allow us to encode multipart content the way we wanted to. We could have used a library that already this ability it baked in but most of them didn’t work with eventmachine and if they did then we couldn’t be sure that they would work with any testing tools that we created later that didn’t rely on eventmachine. Although loads of libraries were implementing this code we couldn’t find anything that was standalone that we could just use across any of the different libraries that could post content.

In order to solve this issue we created our own internal multipart creation code. This weekend we have released that code as a gem called multipart_body. This gem is far from perfect and we have a list of things that we don’t have time to add and we would love some help with but the code has been useful to us so we hope it will be useful to others too.

The gem itself consits of two parts. Multipart body and the parts. To get started just install the gem


$ gem install multipart_body

Once the gem is installed you can create a form-data multipart body using this quick hash shorthand.

require 'multipart_body'
multipart = Multipart.new(:field1 => 'content', :field2 => 'something else')

To get a little more control you can create the parts yourself and use them to create the body:

# using a hash
part = Part.new(:name => 'name', :body => 'body', :filename => 'f.txt', :content_type => 'text/plain')

# or just with the name, body and an optional filename
part = Part.new('name', 'content', 'file.txt')
multipart = Multipart.new([part])

You can also pass a file to the multipart hash to automatically assign the filename:

require 'multipart_body'
multipart = Multipart.new(:field1 => 'content', :field2 => File.new('test.txt'))

The resulting output can then be created as follows:

part.to_s #=> The part with headers and content
multipart.to_s #=> The full list of parts joined by boundaries

So the following code example will create the output that follows:

multipart = MultipartBody.new(:test => 'content', :myfile => File.new('test.txt'))

------multipart-boundary-808358
Content-Disposition: form-data; name="myfile"; filename="test.txt"

hello
------multipart-boundary-808358
Content-Disposition: form-data; name="test"

content
------multipart-boundary-808358--

Like I said before the gem is far from perfect. At the moment it doesn’t have any documentation and it is missing quite a few features. By default it assumes you are creating form-data content and encodings are completely missing at the moment.

Hopefully though with a little bit of help it can provide a great starting block for anyone wishing to implement multipart bodies so that each library doesn’t have to re-invent this. If anyone has any time I’d love to see patches to bring this up to something much more useful.

Receiving Incoming Email in Rails 3 – choosing the right approach

When it comes to sending mail in Rails there are plenty of solutions and the best practices are fairly widely known. There are also a number of third party systems that can help you out. Receiving email in Rails however is a slightly less documented prospect.

Here I will cover some of the basic options for receiving email with some of the advantages and disadvantages. In later articles I will cover how to set up some of the solutions including the server. In each case I will also give a small example showing how you can find a user from the message’s to field update another from the message body. I don’t want to get too into the setup specifics of each approach at this point, instead I want to point out the alternatives and how you can make use of each. From what I can tell there are four main alternatives:

  • The ‘Official Way’ – using a mail server and script/rails runner
  • Using a mail server and cURL
  • Polling using IMAP/POP3
  • Use a Service Provider

It should be noted that I am the creator of one of the service providers (CloudMailin) however I appreciate that not all people want to use external services or have different needs and I am trying to make this article as objective as possible. Having said that if you do have comments please feel free to contact me.

Receiving Email the ‘Official Way’

The rails documentation is pretty sparse on incoming emails however the guides do contain a small snippet. Firstly you need to create a receive method in your ActionMailer. Something like the following for our example:

class MyMailer < ActionMailer::Base
  def receive(message)
    for recipient in message.to
      User.find_by_email(recipient).update_attribute(:bio, message.body)
    end
  end
end

As you can see the ActionMailer class is quite simple, then all that is left is to wire up your server so that any incoming email is sent directly to ActionMailer. This can be done by making sure that your mail server executes the following command:

app_dir/script/rails runner 'MyMailer.receive(STDIN.read)'.

This approach has some serious disadvantages though, especially when it comes to scalability. Firstly every time you receive an email you are spawning an new instance of your environment with script/rails. This is a nightmare in itself. Along with this you also need a copy of your app on the same server as the mail server. So you either have to add the mail server to your app server or you need another server and copy of your app running for the mail. You also have the hassle of setting up a dedicated mail server just for the purpose of receiving these incoming emails.

The same approach using cURL

In order to improve this method it is possible to remove the call to script/rails runner and replace it with a call to the web app via HTTP using cURL. Using this method when a new email arrives the following is called:

ruby receiver.rb

Then we create our receiver something like the following:

# note the backticks here execute the command
`curl -d "message=#{STDIN.read}" http://localhost/incoming_messages`

Update: In the comments it turns out that some people have reported problems with this method. You may need to escape the content so that your app receives the message correctly. The following method should help:

require 'cgi'
# note the backticks here execute the command
`curl -d "message=#{CGI.escape(STDIN.read)}" http://localhost/incoming_messages`

You could of course remove Ruby from the mix here entirely but using a Ruby script allows you to perform any processing if you want to in a more complex example. cUrl -d will send the request as application/x-www-form-urlencoded but you could also send the data multipart/form-data if you wish.

You can then simply create a normal controller and use the create method to receive your email as an HTTP POST. Something like the following:

def create
  message = Mail.new(params[:message])
  for recipient in message.to
      User.find_by_email(recipient).update_attribute(:bio, message.body)
    end
  end
end

This method has the advantage of being a little more scalable as nothing really changes in terms of your app. You simply receive the message over HTTP like any other form post or file upload. You may want to opt to move the processing out to a background job though if you are doing anything complex with the message. You will still however need to install and setup your own mail server.

Using a Third Party

In the last example we received email via an HTTP Post as a webhook. There are a couple of options for taking the setup and monitoring stress out of receiving mail in this manor without having to install an configure a mail server. Two of the options here are CloudMailin and smtp2web.

CloudMailin is currently in free beta and allows you to register to receive email via HTTP Post. The system was designed to be scalable and provide some additional features like delivery logs to make sure your app is receiving the emails. That’s enough about that one as I don’t want to be biased.

smtp2web is a google app engine instance that can be used to achieve a similar goal. It make use of app engines ability to receive email and then forwards the messages on to your web app.

Both of these options are designed to operate in ‘the cloud’ and don’t require you to own or setup a mail server to do the work. You will again probably want to make sure that you move processing over to a background worker if you have anything complex to do so that the processing doesn’t take up resource that should be serving your app to your customers.

Polling Using IMAP or SMTP

Finally this solution makes sense when you need to collect messages from an existing mailbox. You don’t have to own your own mail server but you will need to be able to run cron or a daemon to collect mail at regular intervals.

Although you could roll your own collector there are a couple already out there. Take a look at mailman for example. This approach can either rely on direct acces to your blog or can again POST via HTTP.

I will also look to write a separate post on MailMan as I think the power offered by MailMan is a worth a blog post in itself. Although there will be a delay with any polling as you can only poll every few minutes, in some situations using an existing mailbox is the only option.

Although this was brief, it should have given a quick introduction into some of the approaches available (I’m sure there are more too). I also plan to write a number of follow up articles showing how to implement options described here. If you have any advice, an alternative option or even an approach you would prefer to see covered first then please jump in and comment. Again if you have any comments on CloudMailin please let me know on here, twitter or via email at blog-comments [you know what goes here] cloudmailin.com

Thoughts on Google Priority Inbox, The How and the Why

Google Priority Inbox launched this week and there has been a huge fuss about it on sites such as TechCrunch and Wired, even the BBC is getting in on the action. Most people are calling what Google is doing a miracle but that’s not necessarily the case.

A lot of the stuff google is doing has been around for a while in fact during the final year of my undergraduate computer science course I created a system myself to help deal with email overload. The key to the system was determining which emails were important to the user and which are not. The basics of how a system such as priority mail works are fairly simple, that’s not to say that what google has done is impressive but more on that later. I figured I would share some of the knowledge that I learnt while I created my project all those years ago.

Determining the importance of an Email

So how do we determine how important an email is? For a while now spam has been a real issue for those of us that receive email. For a long time people spent huge amounts of time determining the characteristics of an email that meant it was a spam message. Things like coloring the message red were amongst the first key indicators. The thing was the spammers soon caught on to this and a new solution needed to be found. In in 2002 Paul Graham (of HackerNews and YCombinator fame) released an article called a plan for spam and later in 2003 better baysian filtering.

The articles outlined a method that could be trained to determine which messages were spam and which were important. Those articles were hugely interesting for me, not only because they were my first introduction to baysian algorithms but also because of the potential to allow users to give feedback to better understand which messages were spam.

Without going into too much detail the basic premise of these articles is that you take each word within the email and look at it. If the email is a good email then you put a good mark next to that word. If the email is bad you put a bad mark next to this word. You can then see how likely it is that a given word is contained in a good email or a bad email. When a new email arrives you look at the words of that email. If the email contains words that are in more bad messages than good ones then the changes are the new email is a spam one.

This concept led to a huge leap in spam filters but its easy to see that you can take this concept further. Instead of treating words as good or bad you can treat the words as important or unimportant. That’s exactly how that piece of my final year project worked. But why stop there? What if instead of counting good and bad words you created a bunch of different categories? You could automatically filter emails into any category you choose.

Of course this all gets far cleverer when you look at how people place things as a whole and how they classify them individually. These algorithms have also gone much much further than I have described often using ontologies and other techniques to improve things.

Why is Google Priority Email so great then and What else can I do?

Well this is the killer. SPAM filters existed far before Google launched Gmail but somehow they just managed to get it right. Switching to Google apps I personally saw my 100 spam messages per day (that’s 100 getting through from 4,000) cut down to maybe one or two a week. Google has access to a lot of data, and more specifically a lot of email accounts. As more and more people use this sort of system the data can be refined. It’s for this reason that I think Google’s Priority email could be a real killer app and a real asset in our current overloaded times. However, its not the only option.

In case you aren’t aware though there are a couple of other things that you can do to help take control of your inbox. The first is to take a look at OtherInbox. These guys have been doing a lot of what Priority Inbox is offering for a long time now and by doing it I mean doing it well. The OtherInbox system can even do things like pull out delivery company notices to tell you when your parcel is going to arrive. It’s basically Google’s Priority Inbox on steroids.

If you’re a dev then you could also look at the SPAMBayes project, its open source and has a load of different options. The other key is training. Of course there’s a lot of email coming from external advertising sources but have you ever considered that you might be part of someone else’s problem? A lot of the research conducted at the university I was at showed that the sender felt that an email was far more important than the recipient felt it was. It also showed that people are often to cavalier in their attitudes to email, including too many other people in their emails or sending email when it really wasn’t necessary. There’s a bunch of research on the subject of information overload on Tom Jackson’s website that’s really worth taking a look at if your serious about reducing the problem for your business.

So yes Google Priority inbox is cool, but take a look at the other options. Above all think about how you are contributing to the solution rather than the problem of Information Overload.

Follow

Get every new post delivered to your Inbox.

Join 284 other followers