ReBuildAll Blog
Thoughts (mostly) on .NET development

Splitting CSV data with Regex   (Tips & Tricks)   
The other day I needed to process CSV data from a .NET program. The basic processing is quite simple, you can just call string.Split() to have the job done. But I was faced with a CSV file that contained items in quotes. And of course inside the quotes, a comma would be allowed, thus rendering string.Split() quite useless.

I asked around and found that people were using a library that can be obtained from CodeProject. You can find the actual project here. It provides quite robust CSV processing capabilities. Unfortunately, I was in a hurry, and did not want to play around with a library (I know: bad me). I needed a regular expression that could split my string.

As it turned out, there were some articles to be found around the internet, but none would offer a perfect solution. I finally stitched together a regex from various sources and Google cache entries.

Regex rex = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");

This would work for my case. It might not be general enough, but if you have CSVs with quotes and need a fast solution, you can just take the above regex and then use it to split:

string[] result = rex.Split(csvLine);


 

Comments

Jouni Re: Splitting CSV data with Regex
Yes. My point exactly. ;-)
Lenard Gunda Re: Splitting CSV data with Regex
I actually asked around here at work at the time, and all I got was the recommendation for the CodeProject library referenced in the blog post :)
Jouni Re: Splitting CSV data with Regex
You most definitely should have used my implementation of a CSV parser, available at http://www.heikniemi.net/hardcoded/2004/10/csv-parser-for-c/ ;-)

(oh, and it's also included inside the basic tool library at work - the fact that I'm telling you this here reminds me that the internal library documentation definitely needs some polish)