Multipart Body – A gem for working with multipart data

Multipart queries are used quite a lot in the transfer of data around the Internet. There are a number of projects out there that will generate multipart content such as email libraries and even web frameworks for uploading and working with files. When we came create parts of CloudMailin we couldn’t find a gem that would easily allow us to encode multipart content the way we wanted to. We could have used a library that already this ability it baked in but most of them didn’t work with eventmachine and if they did then we couldn’t be sure that they would work with any testing tools that we created later that didn’t rely on eventmachine. Although loads of libraries were implementing this code we couldn’t find anything that was standalone that we could just use across any of the different libraries that could post content.

In order to solve this issue we created our own internal multipart creation code. This weekend we have released that code as a gem called multipart_body. This gem is far from perfect and we have a list of things that we don’t have time to add and we would love some help with but the code has been useful to us so we hope it will be useful to others too.

The gem itself consits of two parts. Multipart body and the parts. To get started just install the gem


$ gem install multipart_body

Once the gem is installed you can create a form-data multipart body using this quick hash shorthand.

require 'multipart_body'
multipart = Multipart.new(:field1 => 'content', :field2 => 'something else')

To get a little more control you can create the parts yourself and use them to create the body:

# using a hash
part = Part.new(:name => 'name', :body => 'body', :filename => 'f.txt', :content_type => 'text/plain')

# or just with the name, body and an optional filename
part = Part.new('name', 'content', 'file.txt')
multipart = Multipart.new([part])

You can also pass a file to the multipart hash to automatically assign the filename:

require 'multipart_body'
multipart = Multipart.new(:field1 => 'content', :field2 => File.new('test.txt'))

The resulting output can then be created as follows:

part.to_s #=> The part with headers and content
multipart.to_s #=> The full list of parts joined by boundaries

So the following code example will create the output that follows:

multipart = MultipartBody.new(:test => 'content', :myfile => File.new('test.txt'))
------multipart-boundary-808358
Content-Disposition: form-data; name="myfile"; filename="test.txt"

hello
------multipart-boundary-808358
Content-Disposition: form-data; name="test"

content
------multipart-boundary-808358--

Like I said before the gem is far from perfect. At the moment it doesn’t have any documentation and it is missing quite a few features. By default it assumes you are creating form-data content and encodings are completely missing at the moment.

Hopefully though with a little bit of help it can provide a great starting block for anyone wishing to implement multipart bodies so that each library doesn’t have to re-invent this. If anyone has any time I’d love to see patches to bring this up to something much more useful.

Advertisements

Receiving Incoming Email in Rails 3 – choosing the right approach

When it comes to sending mail in Rails there are plenty of solutions and the best practices are fairly widely known. There are also a number of third party systems that can help you out. Receiving email in Rails however is a slightly less documented prospect.

Here I will cover some of the basic options for receiving email with some of the advantages and disadvantages. In later articles I will cover how to set up some of the solutions including the server. In each case I will also give a small example showing how you can find a user from the message’s to field update another from the message body. I don’t want to get too into the setup specifics of each approach at this point, instead I want to point out the alternatives and how you can make use of each. From what I can tell there are four main alternatives:

  • The ‘Official Way’ – using a mail server and script/rails runner
  • Using a mail server and cURL
  • Polling using IMAP/POP3
  • Use a Service Provider

It should be noted that I am the creator of one of the service providers (CloudMailin) however I appreciate that not all people want to use external services or have different needs and I am trying to make this article as objective as possible. Having said that if you do have comments please feel free to contact me.

Receiving Email the ‘Official Way’

The rails documentation is pretty sparse on incoming emails however the guides do contain a small snippet. Firstly you need to create a receive method in your ActionMailer. Something like the following for our example:

class MyMailer < ActionMailer::Base
  def receive(message)
    for recipient in message.to
      User.find_by_email(recipient).update_attribute(:bio, message.body)
    end
  end
end

As you can see the ActionMailer class is quite simple, then all that is left is to wire up your server so that any incoming email is sent directly to ActionMailer. This can be done by making sure that your mail server executes the following command:

app_dir/script/rails runner 'MyMailer.receive(STDIN.read)'.

This approach has some serious disadvantages though, especially when it comes to scalability. Firstly every time you receive an email you are spawning an new instance of your environment with script/rails. This is a nightmare in itself. Along with this you also need a copy of your app on the same server as the mail server. So you either have to add the mail server to your app server or you need another server and copy of your app running for the mail. You also have the hassle of setting up a dedicated mail server just for the purpose of receiving these incoming emails.

The same approach using cURL

In order to improve this method it is possible to remove the call to script/rails runner and replace it with a call to the web app via HTTP using cURL. Using this method when a new email arrives the following is called:

ruby receiver.rb

Then we create our receiver something like the following:

# note the backticks here execute the command
`curl -d "message=#{STDIN.read}" http://localhost/incoming_messages`

Update: In the comments it turns out that some people have reported problems with this method. You may need to escape the content so that your app receives the message correctly. The following method should help:

require 'cgi'
# note the backticks here execute the command
`curl -d "message=#{CGI.escape(STDIN.read)}" http://localhost/incoming_messages`

You could of course remove Ruby from the mix here entirely but using a Ruby script allows you to perform any processing if you want to in a more complex example. cUrl -d will send the request as application/x-www-form-urlencoded but you could also send the data multipart/form-data if you wish.

You can then simply create a normal controller and use the create method to receive your email as an HTTP POST. Something like the following:

def create
  message = Mail.new(params[:message])
  for recipient in message.to
      User.find_by_email(recipient).update_attribute(:bio, message.body)
    end
  end
end

This method has the advantage of being a little more scalable as nothing really changes in terms of your app. You simply receive the message over HTTP like any other form post or file upload. You may want to opt to move the processing out to a background job though if you are doing anything complex with the message. You will still however need to install and setup your own mail server.

Using a Third Party

In the last example we received email via an HTTP Post as a webhook. There are a couple of options for taking the setup and monitoring stress out of receiving mail in this manor without having to install an configure a mail server. Two of the options here are CloudMailin and smtp2web.

CloudMailin is currently in free beta and allows you to register to receive email via HTTP Post. The system was designed to be scalable and provide some additional features like delivery logs to make sure your app is receiving the emails. That’s enough about that one as I don’t want to be biased.

smtp2web is a google app engine instance that can be used to achieve a similar goal. It make use of app engines ability to receive email and then forwards the messages on to your web app.

Both of these options are designed to operate in ‘the cloud’ and don’t require you to own or setup a mail server to do the work. You will again probably want to make sure that you move processing over to a background worker if you have anything complex to do so that the processing doesn’t take up resource that should be serving your app to your customers.

Polling Using IMAP or SMTP

Finally this solution makes sense when you need to collect messages from an existing mailbox. You don’t have to own your own mail server but you will need to be able to run cron or a daemon to collect mail at regular intervals.

Although you could roll your own collector there are a couple already out there. Take a look at mailman for example. This approach can either rely on direct acces to your blog or can again POST via HTTP.

I will also look to write a separate post on MailMan as I think the power offered by MailMan is a worth a blog post in itself. Although there will be a delay with any polling as you can only poll every few minutes, in some situations using an existing mailbox is the only option.

Although this was brief, it should have given a quick introduction into some of the approaches available (I’m sure there are more too). I also plan to write a number of follow up articles showing how to implement options described here. If you have any advice, an alternative option or even an approach you would prefer to see covered first then please jump in and comment. Again if you have any comments on CloudMailin please let me know on here, twitter or via email at blog-comments [you know what goes here] cloudmailin.com

Thoughts on Google Priority Inbox, The How and the Why

Google Priority Inbox launched this week and there has been a huge fuss about it on sites such as TechCrunch and Wired, even the BBC is getting in on the action. Most people are calling what Google is doing a miracle but that’s not necessarily the case.

A lot of the stuff google is doing has been around for a while in fact during the final year of my undergraduate computer science course I created a system myself to help deal with email overload. The key to the system was determining which emails were important to the user and which are not. The basics of how a system such as priority mail works are fairly simple, that’s not to say that what google has done is impressive but more on that later. I figured I would share some of the knowledge that I learnt while I created my project all those years ago.

Determining the importance of an Email

So how do we determine how important an email is? For a while now spam has been a real issue for those of us that receive email. For a long time people spent huge amounts of time determining the characteristics of an email that meant it was a spam message. Things like coloring the message red were amongst the first key indicators. The thing was the spammers soon caught on to this and a new solution needed to be found. In in 2002 Paul Graham (of HackerNews and YCombinator fame) released an article called a plan for spam and later in 2003 better baysian filtering.

The articles outlined a method that could be trained to determine which messages were spam and which were important. Those articles were hugely interesting for me, not only because they were my first introduction to baysian algorithms but also because of the potential to allow users to give feedback to better understand which messages were spam.

Without going into too much detail the basic premise of these articles is that you take each word within the email and look at it. If the email is a good email then you put a good mark next to that word. If the email is bad you put a bad mark next to this word. You can then see how likely it is that a given word is contained in a good email or a bad email. When a new email arrives you look at the words of that email. If the email contains words that are in more bad messages than good ones then the changes are the new email is a spam one.

This concept led to a huge leap in spam filters but its easy to see that you can take this concept further. Instead of treating words as good or bad you can treat the words as important or unimportant. That’s exactly how that piece of my final year project worked. But why stop there? What if instead of counting good and bad words you created a bunch of different categories? You could automatically filter emails into any category you choose.

Of course this all gets far cleverer when you look at how people place things as a whole and how they classify them individually. These algorithms have also gone much much further than I have described often using ontologies and other techniques to improve things.

Why is Google Priority Email so great then and What else can I do?

Well this is the killer. SPAM filters existed far before Google launched Gmail but somehow they just managed to get it right. Switching to Google apps I personally saw my 100 spam messages per day (that’s 100 getting through from 4,000) cut down to maybe one or two a week. Google has access to a lot of data, and more specifically a lot of email accounts. As more and more people use this sort of system the data can be refined. It’s for this reason that I think Google’s Priority email could be a real killer app and a real asset in our current overloaded times. However, its not the only option.

In case you aren’t aware though there are a couple of other things that you can do to help take control of your inbox. The first is to take a look at OtherInbox. These guys have been doing a lot of what Priority Inbox is offering for a long time now and by doing it I mean doing it well. The OtherInbox system can even do things like pull out delivery company notices to tell you when your parcel is going to arrive. It’s basically Google’s Priority Inbox on steroids.

If you’re a dev then you could also look at the SPAMBayes project, its open source and has a load of different options. The other key is training. Of course there’s a lot of email coming from external advertising sources but have you ever considered that you might be part of someone else’s problem? A lot of the research conducted at the university I was at showed that the sender felt that an email was far more important than the recipient felt it was. It also showed that people are often to cavalier in their attitudes to email, including too many other people in their emails or sending email when it really wasn’t necessary. There’s a bunch of research on the subject of information overload on Tom Jackson’s website that’s really worth taking a look at if your serious about reducing the problem for your business.

So yes Google Priority inbox is cool, but take a look at the other options. Above all think about how you are contributing to the solution rather than the problem of Information Overload.

MySQL, Snow Leopard and Rails 2.2.x, where has my Gem gone?

A couple of days ago I was updating some legacy code on one of our old sites. The setup was using MySQL and Rails 2.2.x. When trying to run one of the rake tasks I received the error:

!!! The bundled mysql.rb driver has been removed from Rails 2.2. Please install the mysql gem and try again: gem install mysql.

I thought it was quite odd that I had never seen this before but it turns out I had installed a new version of MySQL for a different project and this seems to have caused some issues. So no problem just run:

gem install mysql # right?

Well no. Running that code will give you the following on SnowLeopard:

Building native extensions. This could take a while...
ERROR: Error installing mysql:
ERROR: Failed to build gem native extension.

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lm... yes
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lz... yes
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lsocket... no
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lnsl... no
checking for mysql_query() in -lmysqlclient... no
checking for main() in -lmygcc... no
checking for mysql_query() in -lmysqlclient... no

I remembered seeing something like this before with MySQL, its because the libraries needed to compile the native extensions against are placed in a non-standard location on Snow Leopard. The following command should help though:

sudo env ARCHFLAGS="-arch x86_64" gem install mysql -- --with-mysql-config=/usr/local/mysql/bin/mysql_config

I did receive a bunch of compilation info/errors but ultimately the gem installed fine and works as it should. Hopefully this should help anyone else that comes into these problems.

Oh and it seems it was bundler that was the root cause of my issues along with a reinstallation/upgrade of MySQL.

Automatically prepending url’s with http://

Recently we added functionality that allowed users to include links to images that they uploaded to one of our sites. In order to make the experience as easy as possible for users we allowed them to enter the url with or without the protocol (http:// or https://).

In order to make sure that any of our models that stored the information would always return a link with the protocol in it I wanted to create a simple mixin that would override the existing link method returned from the database and prepend http:// to it if it needed to.

Checking for the protocol and inserting it
This is actually quite a simple method. The following code was used to override the source_url method that was returning the link from the database.

def source_url
  link = super
  "#{link.match(/(http|https):\/\//i) ? '' : 'http://'}#{link}"
end

Since I was going to add this to a number of models it made sense to convert this to a mixin that could be used on any of the modules.

module Protocolize
  def self.included(klass)
    klass.class_eval do
      def self.protocolize(link_method)
        define_method link_method.to_sym do
          link = super()
          return nil if link.blank?
          "#{link.match(/(http|https):\/\//i) ? '' : 'http://'}#{link}"
        end
      end
    end
  end
end

This can then be called using the following in your model:

include Protocolize
protocolize :method_name

Notice that you have to explicitly call super() with params and not just super when you use it within define_method. If you don’t you will get the following error:

implicit argument passing of super from method defined by define_method() is not supported. Specify all arguments explicitly.

Just a tiny snippet that might be useful to people to ensure their links work correctly.

Rails 3, Rake and url_for

Before I start I just want to make it clear that I know the arguments against using url_for in models and even in rake tasks. Sometimes however it makes sense to use_url for in a rake task. In my case I am trying to query another site’s api which requires the URI of the page on my site that I want to gather information about.

The approach in Rails 2.x

task :collect_stats => :environment do
  include ActionController::UrlWriter

  default_url_options[:host] = 'www.example.com'
  url = url_for(:controller => 'foo', :action => 'bar')
end

Notice that because there is no current request you have to specify the

default_url_options[:host]

as the helper has no idea what the host will be otherwise.

Doing the same thing in Rails 3

The following code does the same thing in Rails 3.

task :collect_stats => :environment do
  include ActionDispatch::Routing::UrlFor
  #include ActionController::UrlFor  #requires a request object
  include ActionController::PolymorphicRoutes
  include Rails.application.routes.url_helpers

  default_url_options[:host] = 'www.example.com'
  url = url_for(post)
end

There are two key points to notice here.

  1. The first is that I have included ActionDispatch::Routing::UrlFor rather than ActionController::UrlFor. The latter requires a request object and will attempt to automatically fill in the host name. Since we are in a rake task there is no request and the method will fail.
  2. The second thing is that I have also included two additional includes. The will allow you to work with polymorphic routes and named routes, giving a bit more flexibility.

Just a short snippet that might be of use to people but if there are any improvements out there then please let me know and I will update this. You can of course hard code the routes but there are scenarios where it makes much more sense to make use of the helpers provided, especially when using polymorphic routes.

Update: 08/06/2010

In the comments Jakub has stated that in the latest version of Rails you don’t need to include the polymorphic routes.

include ActionController::PolymorphicRoutes

Render ‘Rails Style’ Partials in Sinatra

We love Sinatra. Not only does it make a great framework in its own right but in addition it can be used to mimic parts of rails in a real simple environment for front-end designers. Instead of having to get them set up and explain the whole of rails they just get a nice simple app to work on without having to worry about creating different controllers or even models.

Although there is not a 1 to 1 translation between a rails app and a sinatra one, it does allow these developers to work with things like haml in a really easy to work with environment.

One of the features that I was asked for recently though was “How do you render a partial in sinatra?”

Rendering Partials in Sinatra

Sinatra is a super-lightweight framework. Because of this it doesn’t have the notion of partials built into it. However, a partial, in its simplest form, is nothing more than a call out to render the template as a string and then embed that string into your page.

A quick look at the sinatra sites FAQs shows that partials can be rendered in the following way in erb.

<%= erb(:mypartial, :layout => false) %>

In haml you could use exactly the same thing but call haml like so.

= haml(:mypartial, :layout => false)

Notice that

:layout => false

is set to ensure that the layout is not also rendered.

Going a little further

The FAQs also recommend using the code in the following gist.

http://gist.github.com/119874

The code shows a helper method called partial. This helper method can be used to render a partial from your code. The helper also allows you to pass collections and is a really cool and useful piece of code.

Making things work the rails way

The above helpers are great and really useful for sinatra. However, what if you want to render a partial the ‘rails way’? In our situation we were using sinatra as a mock up of what would eventually be brought into a rails app. Rails allows partials to be included like so:

<%= render :partial => 'partial_name' %>

By overriding the built in render method in Sinatra it is actually possible to mimic the rails partials. I came up with the following helper to quickly mock things up. The helper checks to see if the first argument passed to is a hash and if that contains they key :partial. If so it renders the partial, if not it just uses the default render method.

  helpers do
    def render(*args)
      if args.first.is_a?(Hash) && args.first.keys.include?(:partial)
        return haml "_#{args.first[:partial]}".to_sym, :layout => false
      else
        super
      end
    end
  end

The helper could easily by extended to allow for collections etc but for now it does the job. Any better solutions?