i18n

Internationalization (i18n) refers to design that enables easy localization. This includes support for:

  • saving all data in a multibyte character set like UTF8
  • regional time/date formats
  • css for right-to-left languages
  • ability to choose language preferences

This page will be for taking notes on i18n.

unicode database

On another rails projects I had utf8 specified in database.yml but the mysql tables were not created using a utf8 charset when the migrations are run. I think that the charset designation in the database.yml doesn’t actually set the database to use a utf8 charset and collation but perhaps just refers to rails connection to the database. Everything appeared to be working until we tried to add russian and arabic translations.

This migration forced the mysql tables to actually be utf8:

class ConvertTablesToUnicode < ActiveRecord::Migration
  def self.up
    charset = 'utf8'
    collation = 'utf8_general_ci'
    execute "ALTER DATABASE #{connection.current_database} CHARACTER SET #{charset} COLLATE #{collation}"
    connection.tables.each do |table|
      execute "ALTER TABLE #{table} CONVERT TO CHARACTER SET #{charset} COLLATE #{collation}"
    end
  end
  def self.down
    raise ActiveRecord::IrreversibleMigration.new()
  end
  def self.connection
    ActiveRecord::Base.connection
  end
end

After running this migration, all the old data was still preserved with correct accents but then arabic and russian started working.

The same may be needed for crabgrass.

setting up charsets for sphinx

non-latin charsets have to be configured for sphinx to work.

here is a blog post on how to get sphinx to work with arabic.

basically:

codetitle. sphinx.yml

development: &my_settings
  charset_table: "0..9, a..z, _, A..Z->a..z, U+621..U+63a, U+640..U+64a, U+66e..U+66f, U+671..U+6d3, U+6d5, U+6e5..U+6e6, U+6ee..U+6ef, U+6fa..U+6fc, U+6ff"
test:
  <<: *my_settings
production:
  <<: *my_settings