How to Support Emojis With Rails and Amazon Aurora (MySQL)


I have recently started a new Ruby on Rails project which I hope will exist for a very long time. I decided, after getting burned in the past regarding this issue, to ensure the application supports emojis in all text fields. Here is the stack:

  • Ruby on Rails 5.2.0
  • Ruby 2.3.1
  • Elastic Beanstalk (Passenger with Ruby 2.3 running on 64bit Amazon Linux/2.7.2)
  • Aurora MySQL 5.6.10a

The Problem

  YourModel.create!(your_column_name: '😀')
  ActiveRecord::StatementInvalid: Mysql2::Error: 
  Incorrect string value: '\xF0\x9F\x98\x80' for column 'your_column_name' at row 1: 
  INSERT INTO `your_models` (`your_column_name`, `created_at`, `updated_at`) VALUES ('😀', '2018-05-07 00:43:47', '2018-05-07 00:43:47')

In my experience, if users are allowed to enter text, they WILL enter emojis. Rather than do insane things to strip these characters out prior to writing a record to the database, I decided to do this the right way and support the use of these characters (without switching away from this stack).

The Solution

First, try to find this article when you create your project. There are additional headaches to convert an EXISTING stack TO this state.

In config/database.yml, add two lines related to the character encoding of the DB,

your_environment:
  ...
  encoding: utf8mb4
  collation: utf8mb4_bin

Then, in config/initializers, add a file entitled mysql_utf8mb4_fix.rb, containing the following code. Supporting emojis limits the number of characters that can go into a VARCHAR field to 191, if you are going to index the field. I got this code here.

require 'active_record/connection_adapters/abstract_mysql_adapter'

module ActiveRecord
  module ConnectionAdapters
    class AbstractMysqlAdapter
      NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 }
    end
  end
end

The Final Step

I will not go into the details of creating an Aurora cluster, but once you have created your cluster, you must create a custom Parameter Group in the RDS console, modify a few values, and attach it to your cluster.

Create the parameter group:

Parameter Group Creation 1 Parameter Group Creation 2

Set the following Parameter Group values to utf8mb4:

character_set_client
character_set_connection
character_set_database
character_set_filesystem
character_set_results
character_set_server

Set the following Parameter Group values to utf8mb4_bin:

collation_connection
collation_server

Save your changes, and attach the parameter group to your cluster.

Confirm This Works

Connect to the Aurora database with something like Sequel Pro and confirm that your tables’ character set is utf8mb4, and your VARCHAR columns have a max length of 191.

Parameter Group Confirmation 3

Then, ssh into your instance and make sure code similar to this works:

Mannequin.create!(email: '😀', longer_text: 'this is a mix of emoji and 😀')
=> #<Mannequin id: 1, email: "😀", longer_text: "this is a mix of emoji and 😀">
Mannequin.find_by_email('😀')
=> #<Mannequin id: 1, email: "😀", longer_text: "this is a mix of emoji and 😀">

(note that I have a model called ‘Mannequin’ and have indexed the string field ‘email’)

Let me know what you think in the comments, and let me know if there’s a better way to do this with this particular stack!