.

Coffee Powered

code and content

Mass inserting data in Rails without killing your performance

Mass inserting is one of those operations that isn’t really well-supported by ActiveRecord, but which has to be done nonethless. You might say, “Well hey, I’ll just run a loop and create a bunch of AR objects, no sweat”.

That’ll work, but if speed is a factor, it might not be your best option.

ActiveRecord makes interface to the DB very easy, but it doesn’t necessarily make it fast. Instantiating an ActiveRecord object is costly, and if you do a lot of ‘em, that’s going to cause you to bump up against the garbage collector, which will significantly hinder performance. There are several options, though, depending on how much speed you need.

There are benchmarks at the bottom of the post, so if you’re just interested in those, scroll down.

Option 1: Use transactions

This is definitely the easiest method, and while you’ll realize gains from it, you aren’t going to be breaking any speed records using only this method. However, it’s well worth it if you are doing mass inserts via ActiveRecord.

Instead of

1000.times { Model.create(options) }

You want:

ActiveRecord::Base.transaction do
  1000.times { Model.create(options) }
end

The net effect is that the database performs all of your inserts in a single transaction, rather than starting and committing a new transaction for every request.

Options 2: Get down and dirty with the raw SQL

If you know that your data is valid and can afford to skip validations, you can save a lot of time by just jumping directly to raw SQL.

Imagine, for example, that you’re running the following:

1000.times {|i| Foo.create(:counter => i) }

That’s going to create 1000 ActiveRecord objects, run validations, generate the insert SQL, and dump it into the database. You can realize large performance gains by just jumping directly to the generated SQL.

1000.times do |i|
  Foo.connection.execute "INSERT INTO foos (counter) values (#{i})"
end

You should use sanitize_sql and such as necessary to sanitize input values if they are not already sanitized, but with this technique you can realize extremely large performance gains. Of course, wrapping all those inserts in a single transaction, as in Option 1 gets you even more performance.

Foo.transaction do
  1000.times do |i|
    Foo.connection.execute "INSERT INTO foos (counter) values (#{i})"
  end
end

Option 3: A single mass insert

Many databases support mass inserts of data in a single insert statement. They are able to significantly optimize this operation under the hood, and if you’re comfortable using it, will be your fastest option by far.

inserts = []
TIMES.times do
  inserts.push "(3.0, '2009-01-23 20:21:13', 2, 1)"
end
sql = "INSERT INTO user_node_scores (`score`, `updated_at`, `node_id`, `user_id`) VALUES #{inserts.join(", ")}"

No transaction block is necessary here, since it’s just a single statement, and the DB will wrap it in a transaction. We build an array of value sets to include, then just join them into the INSERT statement. We don’t use string concatenation, since it will result in significantly more string garbage generated, which could potentially get us into the GC, which we’re trying to avoid (and hey, memory savings are always good).

Option 4: ActiveRecord::Extensions

njero in #rubyonrails pointed me at this nifty little gem and I decided to include it. It seems to try to intelligently do mass inserts of data. I wasn’t able to get it to emulate the single mass insert for a MySQL database, but it does provide a significant speed increase without much additional work, and can preserve your validations and such.

There’s the obvious added benefit that you stay in pure Ruby, and don’t have to get into the raw SQL.

columns = [:score, :node_id, :user_id]
values = []
TIMES.times do
	values.push [3, 2, 1]
end

UserNodeScore.import columns, values

Benchmarks

I used a simple script to test each of the methods described here.

require "ar-extensions"

CONN = ActiveRecord::Base.connection
TIMES = 10000

def do_inserts
	TIMES.times { UserNodeScore.create(:user_id => 1, :node_id => 2, :score => 3) }
end

def raw_sql
	TIMES.times { CONN.execute "INSERT INTO `user_node_scores` (`score`, `updated_at`, `node_id`, `user_id`) VALUES(3.0, '2009-01-23 20:21:13', 2, 1)" }
end

def mass_insert
	inserts = []
	TIMES.times do
		inserts.push "(3.0, '2009-01-23 20:21:13', 2, 1)"
	end
	sql = "INSERT INTO user_node_scores (`score`, `updated_at`, `node_id`, `user_id`) VALUES #{inserts.join(", ")}"
	CONN.execute sql
end

def activerecord_extensions_mass_insert(validate = true)
	columns = [:score, :node_id, :user_id]
	values = []
	TIMES.times do
		values.push [3, 2, 1]
	end

	UserNodeScore.import columns, values, {:validate => validate}
end

puts "Testing various insert methods for #{TIMES} inserts\n"
puts "ActiveRecord without transaction:"
puts base = Benchmark.measure { do_inserts }

puts "ActiveRecord with transaction:"
puts bench = Benchmark.measure { ActiveRecord::Base.transaction { do_inserts } }
puts sprintf("  %2.2fx faster than base", base.real / bench.real)

puts "Raw SQL without transaction:"
puts bench = Benchmark.measure { raw_sql }
puts sprintf("  %2.2fx faster than base", base.real / bench.real)

puts "Raw SQL with transaction:"
puts bench = Benchmark.measure { ActiveRecord::Base.transaction { raw_sql } }
puts sprintf("  %2.2fx faster than base", base.real / bench.real)

puts "Single mass insert:"
puts bench = Benchmark.measure { mass_insert }
puts sprintf("  %2.2fx faster than base", base.real / bench.real)

puts "ActiveRecord::Extensions mass insert:"
puts bench = Benchmark.measure { activerecord_extensions_mass_insert }
puts sprintf("  %2.2fx faster than base", base.real / bench.real)

puts "ActiveRecord::Extensions mass insert without validations:"
puts bench = Benchmark.measure { activerecord_extensions_mass_insert(true)  }
puts sprintf("  %2.2fx faster than base", base.real / bench.real)

And the results:

Testing various insert methods for 10000 inserts
ActiveRecord without transaction:
 14.930000   0.640000  15.570000 ( 18.898352)
ActiveRecord with transaction:
 13.420000   0.310000  13.730000 ( 14.619136)
  1.29x faster than base
Raw SQL without transaction:
  0.920000   0.170000   1.090000 (  3.731032)
  5.07x faster than base
Raw SQL with transaction:
  0.870000   0.150000   1.020000 (  1.648834)
  11.46x faster than base
Single mass insert:
  0.000000   0.000000   0.000000 (  0.268634)
  70.35x faster than base
ActiveRecord::Extensions mass insert:
  6.580000   0.280000   6.860000 (  9.409169)
  2.01x faster than base
ActiveRecord::Extensions mass insert without validations:
  6.550000   0.240000   6.790000 (  9.446273)
  2.00x faster than base

The results are fairly self-explainatory, but of particular note is the specific single INSERT statement. At 70x faster than the non-transactional ActiveRecord insert, if you need speed, it’s hard to beat.

Conclusions

ActiveRecord is great, but sometimes it’ll hold you back. Finding the balance between ease of use (full ActiveRecord) and performance (bare metal mass inserts) can have a profound effect on the performance of your app.

  • http://marketing.postedpost.com/2008/02/28/the-mis Net Worth Statement

    In fact, stored procedures that insert, update, and delete data in Oracle work in the same manner as they do for SQL Server. Net Worth Statement

  • http://mikeperham.com Mike Perham

    ar-extensions is great. You didn't get the multi-insert capability because you didn't require the mysql connector extensions:

    require 'ar-extensions/adapters/mysql'

    require 'ar-extensions/import/mysql'

    Also, you can get a big performance increase from ar-extensions when saving a lot of models by disabling validations:

    Model.import data, :validate => false

    Yes, I've spent a lot of time on this too. :-)

  • BJ Clark

    AR-Extensions is great, except that it monkey patches lots of stuff in ActiveRecord which can create crazy weird bugs if you're doing any other modifications of AR. We use Masochism (with mods) for DB load balancing, and AR-Extensions borked a bunch of stuff with it. Which ended up costing us more dev time than just doing a transaction with a big loop.

  • Chris Heald

    Thanks for the tip, Mike. The docs on ar-extensions seem to be a bit light!

    I'll give that a shot. It's nice to stay in Ruby when at all possible!

  • http://danielharan.com/ Daniel Haran

    On MySQL, I used LOAD DATA LOCAL INFILE:
    http://danielharan.wordpress.com/2007/06/12/ruby-…

  • http://www.davidalison.com David Alison

    Excellent post Chris – exactly what I was looking for when I was trying to Google up solutions for bulk inserts in Rails. I appreciate you taking the time to write this down.

  • http://4loc.wordpress.com/ jfromm

    Good overview.

  • http://www.kalivo.com/convs/show/2044-rails-batch-inserts-updates-benchmark Kalivo.com – Rails Batch Inserts/updates benchmark

    [...] Here is a basic benchmark of ar-extensions.  I used a slightly modified version of this script. [...]

  • Michael

    Is it possible to tell Rails to *not* wrap Model.save in a transaction — we're uploading large volumes of data and the BEGIN/INSERT/COMMIT is surely much slower than a simple INSERT. Anyone know how to tell Rails to not wrap a 'save' in a transaction?

  • http://www.startbreakingfree.com Brian Armstrong

    Awesome, you da man! Thanks for sharing this.

  • http://www.thirdpartycode.com Vid Luther

    Great post, just an FYI for all who are interested, the reason why the single mass insert in MySQL is so much faster, is because you're doing what's known as an "extended insert". What this does is, tells MySQL to rebuild the index AFTER the data has been inserted, not in the middle, or after each insert.

  • Jeremy

    This is great !! My records are owned by users, how can I ensure that this INSERT statement would set ‘owner_id” to the value of rails current_user ?

  • zeeshan malik

    hey, very helpful tutorial. Ran into sql exception for queries with string values having single quote (e.g. Don’t) in them. Follow the link to see how I sanitized or escaped bulk insert SQL queries before executing ‘em  http://stackoverflow.com/questions/5976708/rails3-sql-execution-with-hash-substitution-like-where/8007422#8007422

    Best,
    Zeeshan

  • spencer roan

    In your benchmark AR::Extensions performs the same regardless of validations? it seems to me that there must be a typo and you ran it twice with validations.

  • http://www.facebook.com/dcac.lab Dcac Lab

    Excuse me, I couldn’t find that nifty little gem, it seems that the page is no more served, can you please tell us where to get it ?

  • Mason

    Thank you! 3 years later it’s still very helpful.

  • Hisamuddin Riza

    Thanks for sharing Chris, your post really help me

  • http://twitter.com/ReadySetRails ReadySetRails

    Great write up, thank you. I was working with a large dataset (5k followers returned by the twitter API), and saving the result was a pain. I ended up using .in_groups_of, and building my batch insert that way. 

    1 insert of 5000 records hit parser issues and memory overload issues on my system and on Heroku. But inserting 500 records 10 times gets the job done in about 5-7 seconds.

    Thanks again!

  • Nitin

    Nice article :)
    Awesome way taught here without any gem :)
    http://ramanavel.blogspot.in/2011/05/bulk-inserting-with-out-any-gem.html

  • Nitin

    Sorry my mistake. The article I mentioned above doesnt do a bulk insert.

  • http://www.facebook.com/id.email003new Allone M Hiveone

    Hello , I was searching for something similar as my system generates 1000 buttons, updates and deletes them and takes lot of time on every commit .. Using transaction i can do this in seconds now.

  • http://www.facebook.com/id.email003new Allone M Hiveone

    Thank you so much 

  • sachin prasad

    I always thought active Record was cool, but your post forced me to rethink.

  • http://karthikeyanblogs.wordpress.com/2012/12/05/mass-inserting-data-in-rails-without-killing-your-performance/ Mass inserting data in Rails without killing your performance « Karthikeyan's Blog

    [...] Mass inserting data in Rails without killing your performance [...]

  • http://acupofrails.wordpress.com/2012/12/05/mass-inserting-data-in-rails-without-killing-your-performance/ Mass inserting data in Rails without killing your performance « Ruby On Rails Blog

    [...] Mass inserting data in Rails without killing your performance [...]

  • Ismael

    ‘Option 3: A single mass insert’ made my day! Thanks!!

  • Gourav Tiwari

    like it!

  • Jack Royal-Gordon

    Thanks, Chris.  Now it’s been four years and the advice is still invaluable. My performance improvement was ~32x.  And I posted some code (at http://stackoverflow.com/questions/15317837/bulk-insert-records-into-active-record-table/15317838#15317838) that developers can use to extend ActiveRecord so that the feature’s easily available.

  • Lee Sail

    In Oracle DB, it must use the SQL like this:
    insert all
    into foo(counter) values (‘counter1′)
    into foo(counter) values (‘counter2′)
    into foo(counter) values (‘counter3′)
    select * from dual

  • Guest

    I just wrote a little monkey-patch for ActiveRecord 3.2.12 which allows mass SQL inserts using a simple, high-level, “Rubyish” interface. Check it out: https://github.com/alexdowad/showcase/blob/master/activerecord/bulk_db_operations.rb

  • Alex Dowad

    Sorry about the spam, I don’t know what I did to post that 3 times!