Having written my fair share of application data exports and imports, using CSV for tabular data has usually appeared to be an obvious choice. But lately, I’ve changed my mind on this. I’d argue you should write XLSX instead.
Let’s take a look at the tradeoffs.
- Not a proprietary format
- Simple text data, no presentational concerns to worry about
- Very easy to generate
- CSV has no way to specify the character encoding used by a file.
- People make up their own rules regarding delimiters and escaping special characters. The file itself does not specifically indicate what these rules are.
- A lot of complexity shakes out of handling the above.
- There are no data types built in. Everything is a string.
The attractiveness of CSV clearly lies in its apparent (and misleading) simplicity and the ease with which one can generate it. On the other hand, there’s a significant amount of complexity and uncertainty involved in reading it.
Take, for example, importing it into Excel. It’s a multi-step configuration process. Yes, you can open a CSV file in Excel without first configuring how to interpret it. If you do this, Excel will just make some assumptions about the data. One of these assumptions, however, is that your data is plain ASCII. You’d better hope that there are no unicode characters, because Excel will happily interpret them incorrectly by default.
- Solves the problem of escaping delimiters and special characters
- Specifies the character encoding (safe for UTF)
- Has explicit data types, e.g., integers and dates
- Much more difficult to generate and parse
- Added complexity of dealing with sparse data
- Added complexity from the concepts of workbooks and sheets
- Added complexity of dealing with presentational concerns (e.g., cell colors)
There’s essentially no chance that you’re going to be able to export data in an XLSX spreadsheet without using a library to do all the work. If a good library doesn’t exist, you are probably out of luck.
If a library does exist, it’s not much more work than properly exporting CSV. In return, you get more control over how the data is presented and formatted. There’s a standard way to format date values, for example.
CSVs simply have inherent problems, and it’s more likely than you think that you’ll run into them. If, for example, your data includes the names of real people, you’re already guaranteed to run into problems with unicode.
When you analyze the tradeoffs, there really isn’t any category where CSV actually wins. Moreover, XLSX files are also simply more familiar to most people and provide a better user experience.
From now on, I’m defaulting to exporting XLSX files instead.