By On September 18, 2011
September 18, 2011

Tales from upgrading to Ruby 1.9.2 – character encoding

Official Ruby logo

Image via Wikipedia

In an effort to contribue some of the engineering knowledge back to the community, this is the first post with a more engineering/technical focus from our VP of engineering, Nir Tzur.

In an ongoing effort to improve application performance we have recently made it a high priority to upgrade our rails (2.3.14) environment from Ruby 1.8.7 to release 1.9.2. We have briefly considered a move to Ruby Enterprise Edition (REE), but decided against it as the internal improvements with 1.9.2 are expected to be superior, and frankly put, we’re not sure about REE’s future.

Well, I have much to say in favor of Rails and Ruby, but someone had to stop me from “praising” 1.9.2 Encoding during this upgrade. There are tons of blogs and articles over the net discussing this issue, but apparently none of them covered all that we needed. So, I thought it would be a good idea to list what we’ve had to do, hoping that it will help someone, somewhere, sometime.

The first thing we’ve done was to find sources with non-ASCII characters and insert the utf-8 magic comment on their first line. These sources were mostly related to string analysis methods involving gsub functionality. For the sake of web searches, here is the comment we use:

# -*- encoding: utf-8 -*-

We are using mysql2, so we were lucky enough to skip the need to use mysql patches to support utf-8 encoding.

Then, we set Encoding.default_external and Encoding.default_internal to Encoding::UTF_8 in our environment.rb. We have done that early enough such that it gets invoked before our boot.rb gets loaded.

That was clearly not enough. We found out that utf-8 encoded strings that were posted to our controllers, were actually set as ASCII. To fix that we had to install a the following monkeypatch, also listed below:

# Patching ActionController:
#
module ActionController

  class Request
    private
    def normalize_parameters_with_force_encoding(value)
      (_value = normalize_parameters_without_force_encoding(value)).respond_to?(:force_encoding) ? 
         _value.force_encoding(Encoding::UTF_8) : _value
    end
    alias_method_chain :normalize_parameters, :force_encoding
  end
end

In addition, we had to set the Encoding.locale_charmap to utf-8 as well. Before discussing this deeply, I’d like to point you out to this wonderful post that directed us to make this change. To set the Encoding.locale_charmap, we had to set the ENV['LANG'] environment variable before starting rails. This should be easily done setting LANG=en_US.UTF-8 on /etc/profile, which can be viewed using Unix’s `locale` command.

Well, it didn’t work, so we tried setting it on /etc/environment, but it didn’t work either. It took us time to understand that our passenger (3.0.8) and nginx are blocking environment variables from being passed to Ruby. We ended up with wrapping the ruby invokation in passenger as follows:

#!/bin/sh
export LANG=en_US.utf8;
exec “/usr/bin/ruby” “$@”

That was almost it!

To finalize our actions, we had to:

  1. Set the same environment variables on all our servers, such that rake invokation will work well with utf-8 strings.
  2. Use force_encoding(‘utf-8′) in some specific areas, for example, we use net/imap to retrieve e-mails, and had to use .toutf8 method on messages’ subject. For some reason, the string representation of messages’ body worked well without our interfering.
  3. Replace ‘render :inline’ of mixed ASCII and utf-8 strings, with a ‘render :partial => …’ which did the same functionality.

So, I hope this info will serve you and save you the countless hours we spent on encoding while upgrading to 1.9.2.

Nir