Article summary
It seems that on every Rails project I work on, I end up writing utility scripts that make changes to the production data in some way or another. Perhaps it’s pre-loading hundreds of user accounts for a customer that wants to provide a spreadsheet of users, or populating an account with fake data that can be used for a demo, or manually fixing a data integration issue with an external system. Often, this requires parsing and processing a source file (like a CSV file).
In the past, I’d write a script to parse/process the file, add the source file to the repository, deploy it to production, and then remotely execute the appropriate script. But recently, I’ve been trying something different, and it’s been working really well.
In this post, I’m going to talk about doing the processing locally, generating Ruby code, and then executing the generated code remotely.
Caveat
This technique is not appropriate for all cases. It’s best suited for situations where the deployed production codebase provides the building blocks that a generated script can use. In many cases, the only building blocks needed are the application’s ActiveRecord models.
Overview
As an example, I’m going to walk through writing a script that will import fake data into an account to be used for a demo. The product is a fitness tracker, and the web app displays charts and analyzes a user’s historical activity data. A CSV has been populated with the desired daily values, and the contents of the CSV need to be loaded into the production databse.
To get that data into production, I’m going to write a script that will be executed locally (on my development laptop). The output of that script will be Ruby code meant to be executed in a production console (on the production server). I’ll use a technique I described in a previous post, Run a Local Rails Script on Heroku, to pipe the generated script to a Heroku console.
The huge advantage I see with using this technique is that I don’t need to deploy anything to production to import the demo data. And if the structure of the CSV changes, or the data changes and it needs to be run again, that won’t require a production deploy, either.
Start with a Test
It can be tempting to treat something like this as a one-off situation that doesn’t need to be tested with the same rigor as the rest of the application code. But let’s not fall for that temptation–always start with a test. The test should invoke the generator, providing the path to a CSV file, then capture the generated Ruby code and evaluate it. The test can then assert that the database was updated in all of the expected ways based on the input CSV.
csv_content = <<–EOS
Date,Steps,Calories,Duration
02/15/17,6432,300,94
02/16/17,9076,900,143
02/17/17,3012,180,45
EOS
csv_file = Tempfile.new('demo-data')
csv_file << csv_content
csv_file.close
generated_import_code = DemoDataGenerator.process_csv_demo_data "[email protected]", csv_file.path
eval generated_import_code
assert_daily_values "2017-02-15", steps: 6432, calories: 300, duration: 94
assert_daily_values "2017-02-16", steps: 9076, calories: 900, duration: 143
assert_daily_values "2017-02-17", steps: 3012, calories: 180, duration: 45
Parse the CSV
I’m not going to go into parsing a CSV file in Ruby here (there are plenty of other resources out there that talk about it). For the purposes of this example, let’s just assume we’ve got some code that turns the CSV file into an Array of Hashes.
Generate the Code
Using an ERB template (here’s a good Introduction to ERB Templating) is an easy and straightforward way to go from the Hash that was parsed from the CSV file into Ruby code.
def generate(email_address, data)
template = <<–EOS
user = User.find_by_email "<%= email_address %>"
records = []
<% data.each do |row_data| %>
records << user.activity_values.build steps: <%= row_data[:steps] %>, calories: <%= row_data[:calories] %>, duration: <%= row_data[:duration] %>; nil
<% end %>
run = lambda do
ActivityValue.transaction do
ActivityValue.delete_all(user_id: user.id)
records.each { |r| r.save!; nil }
end
end
run.call
EOS
renderer = ERB.new(template, 3, '>')
renderer.result(binding)
end
There are a couple of things to point out here. First, you’ll notice that a couple of the lines end with ; nil
. Since the script is going to be executed in a console, this will prevent unecessary output by making the result of these lines be nil.
The second is the creation of the ERB
object. You can find out more about that in the Introduction to ERB Templating, but those arguments are telling ERB to run in a sandbox (safe level 3) and to not print newlines after tags.
Test It
Once the code generation code is fully fleshed out, you should be able to run the test you wrote earlier, iterating until you get all of the kinks worked out.
But once that passes, it’s a good idea to run a manual test. Make your code generator accessible via a script that can be executed with rails runner
and run it, redirecting the output to a file. You can then run the generated code against your local development database like this:
bin/rails console < generated_script.rb
Execute It
In my previous post, I wrote about how to run a local script in a remote Heroku console. Of particular interest will be putting an exit
at the end of your script and using the --no-tty
option. (Note: you’ll want to add the exit
in the rails runner
wrapper script, not in the code that you run through your automated test).
heroku run console --app=my-heroku-app-production --no-tty < generated_script.rb
Final Thoughts
This was a pretty simple example, but you should get the idea. You can take it farther by doing things like adding in sanity checks that must pass before anything is changed in production, etc. And the best part is, you don’t need to deploy anything to production if you decide to make a change to the import logic, or the CSV structure changes, or you need to run an entirely different utility script in the production environment.