Igor Dobryn
about IT

ActiveRecord::Base#find_by_sql to wrap results of composite INSERT query

find_by_sql is supposed to run an SQL query as a string. Fingers of one hand will be enough to count theoretical use-cases. On practice, I could remember a few usages in old Rails version 2/3. This was due to limited API or lack of time transitioning SQL query into low-level AR scopes and merging them. A few months ago I met a place where find_by_sql was particularly useful and it wasn’t a trivial use case for find.

So, a few more details. We often run data imports. Some records should be inserted, some deleted and some updated. The current use-case is about inserting records. We receive a batch of data in a temporary table, filter it and insert it into the main table. After all, we have to run some high-level ruby code on ActiveRecord models (i.e. send creation notification).

Here is a highly simplified example:

CREATE TABLE students (
  id integer NOT NULL,
  school_id integer NOT NULL,
  email character varying NOT NULL,
  first_name character varying NOT NULL,
  last_name character varying NOT NULL
);

 
The staged table is almost the same, but all columns are nullable and strings:

CREATE TABLE staged_students (
  school_id character varying,
  email character varying,
  first_name character varying,
  last_name character varying
);

 
So, we executed SQL to insert records into the main table and ran callback on the ActiveRecord object:

where_sql = "school_id IS NOT NULL AND email IS NOT NULL AND first_name IS NOT NULL AND last_name IS NOT NULL"

Student.connection.execute <<~SQL
  INSERT INTO ews_students (
    school_id,
    email,
    first_name,
    last_name
  )
    SELECT
      school_id,
      email,
      first_name,
      last_name
        FROM staged_students WHERE #{where_sql}
SQL

students = Student.where("email IN (SELECT email FROM staged_students WHERE #{where_sql})")
student.each(&:send_notification)

 
Note it makes separate query which is quite bad for several reasons:

  • 2 queries - not as optimal
  • it’s cumbersome - difficult to read/maintain
  • error-prone - students to send notification can be desynced with originally inserted students (i.e., due to human error)

PostgreSQL allows us to specify what to return from the INSERT query by using RETURNING keyword. What is extremely easy to combine with ActiveRecord::Base#find_by_sql:

students = Student.find_by_sql <<~SQL
  INSERT INTO ews_students (
    school_id,
    email,
    first_name,
    last_name
  )
    SELECT
      school_id,
      email,
      first_name,
      last_name
        FROM staged_students WHERE school_id IS NOT NULL AND email IS NOT NULL AND first_name IS NOT NULL AND last_name IS NOT NULL
  RETURNING *
SQL

student.each(&:send_notification)

 
That’s all. No need for extra query anymore.

 

Conclusions

Before the improvement, we had no issues with the code. Curiosity and willingness to improve do the job and we gained lots of pros:

  • the code became shorter
  • intention clearer
  • performance better
  • and most importantly students are always up to date. Desync is impossible as we return exactly what created.

With just a single con - in fact, we do insert_by_sql which is more than just find_by_sql.

Rails ActiveRecord find_by_sql PostgreSQL