In this post, I will review options for parsing Excel files using Ruby. I’ll discuss the different types of Excel files and introduce some of the Ruby libraries that exist for working with them. Note that I focus mostly on reading Excel files in this post, but there is some discussion around writing/updating them as well.
If you’d like to go straight to the code, I’ve put a project on GitHub with example snippets for reading an Excel file with each library discussed in this post.
Excel File Types
Before we get into the different Ruby libraries, let’s talk about Excel files. It is important to identify the type of Excel files that you are going to be using. There are two main types: legacy files and the newer OOXML file format introduced in Microsoft Office 2007.
There is a nice description of the differences on Wikipedia. The tldr; version is that the legacy file format includes files with the following extensions:
|.xls||legacy Excel file|
|.xlt||legacy Excel file template|
|.xlm||legacy Excel file with macros|
Microsoft Excel 2007 abandoned the legacy binary format and switched to the Open Office XML (OOXML) format that is used today. These files use the following extensions:
|.xlsx||OOXML Excel file|
|.xlst||OOXML Excel file template|
|.xlsm||OOXML Excel file with macros|
It’s important to determine if you are working with legacy or OOXML Excel files. You can usually convert back and forth between the different formats if you have a copy of Excel, but chances are you don’t want to rely on a manual file format conversion. In my scenario, I was receiving Excel files from an external source and couldn’t control the file format.
It is also worth noting that the modern
.xlsx files usually work in other spreadsheet programs that can read the OOXML format, such as Numbers and LibreOffice.
Ruby Excel Libraries
There are many Ruby libraries for working with Excel–perhaps too many. When researching the different libraries, it took some effort to understand their features and limitations. I found the following questions helpful when reviewing each library:
- What types of Excel files does it support?
- Does it support reading, writing, or both?
- Can it handle large files? Is it fast?
- Does it require a file as input, or can it use a stream?
Depending on your application, some or all of those questions may be important.
Based on your needs, one or more of the above libraries may be helpful. Consider the following usage scenarios.
Writing .xlsx Files
If you need to write
.xlsx files, axslx is a good place to start. It allows writing cell values, along with generating charts and graphs. If you need something lighter-weight, rubyXL is a nice option.
Reading .xlsx Files
If you only need to read
.xlsx files, you can choose between rubyXL, roo, creek, and simple_xlsx_reader. roo is very popular–likely because it also supports the legacy
.xls format. However, if you read on to the performance section, creek and simple_xlsx_reader are much better at handling larger files. If you want to read from an
IO stream instead of a file, rubyXL is the only library that supports that.
Reading and Writing .xlsx Files
If you need to read and write
.xlsx files, you have a couple of options. You can use rubyXL, which supports reading and writing. Alternatively, you could use two different libraries–one for reading and the other for writing.
Reading and Writing Legacy Files
Support for the legacy
.xls format is more limited. If you only need to support
.xls, I would recommend spreadsheet. It supports reading and writing. If you need to also support
.xlsx files, I would probably recommend a second gem…unless you only need to read the files, in which case, roo allows reading legacy and modern Excel files.
The good news is that regardless of which library you end up choosing, the code to open a file and read is it pretty simple, and it looks very similar across the different libraries. As an example, the below code reads all of the rows in an Excel file using creek.
require 'creek' workbook = Creek::Book.new 'path/to/file.xlsx' worksheets = workbook.sheets worksheets.each do |worksheet| worksheet.rows.each do |row| row_cells = row.values # do something with row_cells end end
I have samples for reading
.xlsx files using each of the above libraries in the linked GitHub project.
If you need to read large Excel files, you may want to compare the performance between the different libraries. I created a quick and dirty benchmark comparing the four libraries that can read
.xlsx files in the above table.
I created sample
.xlsx files with 500, 10000, 50000, 200000, and 500000 rows. Then, I ran a code snippet using each library that parsed each file (e.g., read every row in the file). The snippets used for each library and the sample Excel files are available here.
I ran each library on each file three times and recorded the median time (the times didn’t vary much at all).
Both rubyXL and roo performed about the same, requiring over two minutes to parse a 500000-row Excel file. creek and simple_xlsx_reader were both much faster, requiring less than a minute to parse the 500000-row file.
I hope this post gave you some direction for working with Excel files in Ruby. If you are using a library that I didn’t cover and you like it, please let me know.
This was super-helpful, and I really appreciate your taking the time to run benchmarks. This was exactly what I needed. Thanks so much!
Thanks Darin, glad you found it helpful.
Nice article, very useful. Thank you.
Exactly what I was looking for, thanks!
Thanks for this comparison Matt, much appreciated!
Nice and elegant comparison Matt, thanks much!!
This was very helpful! I would like to know something though. Is it possible to use one gem to read a file then use a different gem to add content to it?
That seems like it should be possible.
I don’t think so. The library (gem) needs to have read the content to change it. You cannot load data to spreedshet gem and then use axlsx classes to operate on it, as axlsx is empty.
You you could build your own xlsx on the content of source file. You read input with spreedshet and generate new output file with axlsx.
The simple is to use the gem with read/write support. So you will have update option in gem.
Very good summary. I’m using a mix of 3 to support xls/x. axlsx for writing, spreadsheet and simple_xlsx_reader for reading.
Roo Doesn’t support conditional formatting. Does any of the above library supports reading sheets with conditional formatting
Thank you, sir! This was very helpful!
Very well written post.
Does any of them support reading xls with formulae values( end result )?
Did you run perf tests for the ‘spreadsheet’ gem, by any chance?
Thank you very much for this nice overview and code snippet. Exactly what I needed.
I need to parse an excel sheet & insert rows in between. This new rows should have same template & formulas and data similar to previous rows. Something like adding a new row which is a clone of previous row.
Which gem will support this?
Thanks in advance
Comments are closed.