Language List Gem – a list of languages (ISO-639-1 or ISO-639-3) for Ruby

For a project I had been working on we needed to allow users to select the Languages that they could speak as part of a profile page. After a bit of research I found it difficult to find a comprehensive list of languages available in a format that could easily be used by Ruby.

Rails’ build in language files means that we’re all used to these shorthand prefixes like en to mean English. The issue was finding a nice list of all possible languages that could resolve to these codes. If we don’t take localised dialects into account such as en-gb or en-us then we can make use of the ISO standard list of languages housed in ISO-639-1 and ISO-639-3 for example.

After creating a script to compile these languages into a yml file that could easily be retrieved I launched this as a Ruby Gem that can be used in the following way:

all_languages = LanguageList::ALL_LANGUAGES

# Finding a language based on its ISO-639-1 or ISO-639-3 code
english = LanguageList::LanguageInfo.find('en')
english.name.inspect #=> "English"
english.iso_639_1.inspect #=> "en"
english.iso_639_3.inspect #=> "eng"

The only thing left was to add some helpers to be able to retrieve a list of common languages. Rather than re-invent the wheel I found a list of common langauges on this website. The author has compiled the list of common languages based upon the list of langauges that Microsoft use and a Wikipedia page of common languages.

I then added a common? flag to these langauges allowing for the following:

all_languages = LanguageList::ALL_LANGUAGES
common_languages = LanguageList::COMMON_LANGUAGES

# Finding a language based on its ISO-639-1 or ISO-639-3 code
english = LanguageList::LanguageInfo.find('en')
english.name.inspect #> "English"
english.iso_639_1.inspect #=> "en"
english.iso_639_3.inspect #=> "eng"
english.common? #=> true

I’m not 100% certain of the license of the ISO country list so use it at your own risk but any of the code I have created is released under the MIT license.

I am also sure that the datafile itself could be tweaked to include better names and punctuation so send any pull requests to the project on GitHub.

Advertisements

About Steve Smith
Software developer (often ruby, rails but I enjoy loads of languages), semantic tech. fanboy, skydiver, all round geek. Owner of dynamic:edge (hire us) the makers of CloudMailin.com

3 Responses to Language List Gem – a list of languages (ISO-639-1 or ISO-639-3) for Ruby

  1. Sprachprofi says:

    I used the language table from Rails’ globalize plugin. It’s a good idea to consult the list of most spoken languages, because iso-639-1 is weird, containing some obscure languages (Bislama, 6200 speakers) and even constructed languages while ignoring more commonly-spoken languages (Wu, 77 million speakers).

    Still, commonly spoken does not necessarily mean commonly spoken on the web. I’m pretty sure that you can find more speakers of Icelandic (320 000) on the web than speakers of Fula (40-65 million). Esperanto may only have 1-2 million speakers worldwide, but they certainly all seem to be online, judging from its net presence. Google, Facebook and Ubuntu are all translated into Esperanto, while Marathi (70 million) is noticeably missing. In terms of Wikipedia size, Esperanto even outdoes Indonesian and Arabic.

    So, what does this tell us? Prepare for the unexpected. Provide a mechanism through which users can add other languages. I recently programmed a bot tracking scores in a language-learning contest (http://6wc.learnlangs.com) and really thought that I could get away with supporting just the iso-639-1 languages – who’d want to learn anything more obscure than that? – but had to find out the hard way that Cantonese, Ancient Greek, Klingon and American Sign Language are not on that list.

    • Steve Smith says:

      This is a really useful comment thanks! I’m especially surprised to hear that the ISO list doesn’t include cantonese or sign language.

      In our project we went under the assumption that the list of common languages was good enough as this wasn’t a particularly important features, however we certainly didn’t put much time into selecting the list, hence using the existing blog post. I guess it’s fairly easy to add new items to the list as it’s only an array but you’re right to expect the unexpected for sure.

  2. jrochkind says:

    Thank you for this, this is awesome.

    Do you know if anyone has localized _names_ for the ISO code? Like “Espanol” instead of “Spanish” for code ‘es’ ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: