One downside of using the normal Unix utilities for csv files is that handling quoted fields is complicated. If you have a comma inside a field, the field needs to be quoted.
I’ve used xsv table, and xsv includes other useful commands for csv files that you can pipe together: https://github.com/BurntSushi/xsv
Yeah, quoting is one reason I started using TSV over CSV.
There’s an RFC that actually defines TSV not to contain tabs! So you can’t express them at all. But at least it’s consistent, unlike CSV. I have seen some really horrible things people call “CSV”, e.g. emitted by hardware!
text editors don’t support them well. If tools don’t support them, then they might as well not exist. Imagine if text editors didn’t support the \n character to mean the end of line – would people still use it?
ditto for terminals. It’s nice to be able to view a TSV or CSV file in a termial.
grep doesn’t work anymore, since it’s inherently line-based.
You still have an escaping problem. Just like TSV fields can’t contain tabs, the RS delimited files can’t have fields with RS. So it just shifts the problem around.
(QTSV doesn’t have these problems. Being able to just grep is a feature, but you could also make your own qtsv-grep, etc.)
One downside of using the normal Unix utilities for csv files is that handling quoted fields is complicated. If you have a comma inside a field, the field needs to be quoted.
cat data.csv | relpipe-in-csv | relpipe-out-tabular | less -RSi
and see the well-parsed CSV content. Or do some filtering or transformation using relpipe-tr-sql, relpipe-tr-awk, relpipe-tr-cut, relpipe-tr-grep, relpipe-tr-sed, relpipe-tr-scheme etc. You can also convert data from different formats to valid CSV using relpipe-out-csv.
Although I don’t think the use of binary formats plays well with shell-like tools. Also it seems there is not a spec, e.g. how do you escape the record separator, etc.
My guess is that someone had the idea to separate numeric and string fields by enclosing strings in double quotes. This is supported by a SuperCalc manual from 1983 (linked from the Wikipedia page for CSV).
As alternatives to piecing together column, less, and perl, some full-featured tools for tabular data at the command line I’d recommend:
VisiData. Think “vim for tabular data”. Rewards time spent learning it. By default it’s a superior tabular data viewer that can save you from having to open a csv/spreadsheet in Excel/LibreOffice.
xsv. Also recommended elsewhere in this thread. It’s useful for quick tasks like grabbing the columns with xsv headers, giving you a table view — like the original author is trying to achieve — with xsv table, or counting the records with xsv count (keep in mind wc -l is inadequate for CSV files because fields may contain newlines).
Nu shell. Give it a <file.csv nu -c 'cat | from csv' and you’re “in”, and you can do anything nu can do with tables. This includes outputting a table view by default (so, this is a third way to view tabular data nicely). It can also output into another format such as tsv or json (<file.csv nu -c 'cat | from csv | to tsv'), which I suppose makes Nu similar to relational pipes (mentioned elsewhere in the thread).
Finally, while not directly related to command line usage, I’d recommend using a “rainbow csv” extension for your text editor that makes it easier to quickly look at csv files without requiring another tool.
One downside of using the normal Unix utilities for csv files is that handling quoted fields is complicated. If you have a comma inside a field, the field needs to be quoted.
I’ve used
xsv table
, andxsv
includes other useful commands for csv files that you can pipe together: https://github.com/BurntSushi/xsvYeah, quoting is one reason I started using TSV over CSV.
There’s an RFC that actually defines TSV not to contain tabs! So you can’t express them at all. But at least it’s consistent, unlike CSV. I have seen some really horrible things people call “CSV”, e.g. emitted by hardware!
So I have an extension planned which I’m now calling QTSV: https://github.com/oilshell/oil/wiki/TSV2-Proposal
It’s very simple in that you’re allowed to put QSN in cells: http://www.oilshell.org/release/latest/doc/qsn.html
So a QTSV file is syntactically a TSV file, and it will work fine with
column
, and be able to express tabs in fields.I wonder why tooling doesn’t emit the record separator and friends, which are specified by ASCII…
I can think of a couple reasons:
\n
character to mean the end of line – would people still use it?grep
doesn’t work anymore, since it’s inherently line-based.(QTSV doesn’t have these problems. Being able to just
grep
is a feature, but you could also make your ownqtsv-grep
, etc.)Yes, this is a classic problem. Fortunately CSV is one of formats supported by Relational pipes, so you can do e.g.
and see the well-parsed CSV content. Or do some filtering or transformation using
relpipe-tr-sql
,relpipe-tr-awk
,relpipe-tr-cut
,relpipe-tr-grep
,relpipe-tr-sed
,relpipe-tr-scheme
etc. You can also convert data from different formats to valid CSV usingrelpipe-out-csv
.FWIW I’ve had relational pipes on my wiki for awhile:
https://github.com/oilshell/oil/wiki/Structured-Data-in-Oil
Although I don’t think the use of binary formats plays well with shell-like tools. Also it seems there is not a spec, e.g. how do you escape the record separator, etc.
Any idea why we do not use escaping for commas that are not part of the field spec?
My guess is that someone had the idea to separate numeric and string fields by enclosing strings in double quotes. This is supported by a SuperCalc manual from 1983 (linked from the Wikipedia page for CSV).
Quotes are also used to embed newlines into a field, so you’d need some mechanism for escaping those as well.
Indeed. But I wonder if the same mechanism can be used
Yeah, I think it could. There might be some trickiness involved in handling CRLF though.
Nothing against combining some shell utilities and today I was reminded of
column
. But my goto for handling csv / tsv files is definitely csvkit.There’s also csvtk
Came here just to say that :) csvkit is pretty great. For me, it’s a must for handling CSV files.
This is so useful. Especially with those 250 column tables
As alternatives to piecing together
column
,less
, andperl
, some full-featured tools for tabular data at the command line I’d recommend:xsv headers
, giving you a table view — like the original author is trying to achieve — withxsv table
, or counting the records withxsv count
(keep in mindwc -l
is inadequate for CSV files because fields may contain newlines).<file.csv nu -c 'cat | from csv'
and you’re “in”, and you can do anything nu can do with tables. This includes outputting a table view by default (so, this is a third way to view tabular data nicely). It can also output into another format such as tsv or json (<file.csv nu -c 'cat | from csv | to tsv'
), which I suppose makes Nu similar to relational pipes (mentioned elsewhere in the thread).Finally, while not directly related to command line usage, I’d recommend using a “rainbow csv” extension for your text editor that makes it easier to quickly look at csv files without requiring another tool.
I work with TSV files heavily in my day job but was somehow entirely unaware of
column
. Thank you!