1. 21
  1.  

  2. 12

    One downside of using the normal Unix utilities for csv files is that handling quoted fields is complicated. If you have a comma inside a field, the field needs to be quoted.

    I’ve used xsv table, and xsv includes other useful commands for csv files that you can pipe together: https://github.com/BurntSushi/xsv

    1. 2

      Yeah, quoting is one reason I started using TSV over CSV.

      There’s an RFC that actually defines TSV not to contain tabs! So you can’t express them at all. But at least it’s consistent, unlike CSV. I have seen some really horrible things people call “CSV”, e.g. emitted by hardware!


      So I have an extension planned which I’m now calling QTSV: https://github.com/oilshell/oil/wiki/TSV2-Proposal

      It’s very simple in that you’re allowed to put QSN in cells: http://www.oilshell.org/release/latest/doc/qsn.html

      So a QTSV file is syntactically a TSV file, and it will work fine with column, and be able to express tabs in fields.

      1. 3

        I wonder why tooling doesn’t emit the record separator and friends, which are specified by ASCII…

        1. 2

          I can think of a couple reasons:

          • text editors don’t support them well. If tools don’t support them, then they might as well not exist. Imagine if text editors didn’t support the \n character to mean the end of line – would people still use it?
          • ditto for terminals. It’s nice to be able to view a TSV or CSV file in a termial.
          • grep doesn’t work anymore, since it’s inherently line-based.
          • You still have an escaping problem. Just like TSV fields can’t contain tabs, the RS delimited files can’t have fields with RS. So it just shifts the problem around.

          (QTSV doesn’t have these problems. Being able to just grep is a feature, but you could also make your own qtsv-grep, etc.)

      2. 2

        One downside of using the normal Unix utilities for csv files is that handling quoted fields is complicated. If you have a comma inside a field, the field needs to be quoted.

        Yes, this is a classic problem. Fortunately CSV is one of formats supported by Relational pipes, so you can do e.g.

        cat data.csv | relpipe-in-csv | relpipe-out-tabular | less -RSi
        

        and see the well-parsed CSV content. Or do some filtering or transformation using relpipe-tr-sql, relpipe-tr-awk, relpipe-tr-cut, relpipe-tr-grep, relpipe-tr-sed, relpipe-tr-scheme etc. You can also convert data from different formats to valid CSV using relpipe-out-csv.

        1. 1

          FWIW I’ve had relational pipes on my wiki for awhile:

          https://github.com/oilshell/oil/wiki/Structured-Data-in-Oil

          Although I don’t think the use of binary formats plays well with shell-like tools. Also it seems there is not a spec, e.g. how do you escape the record separator, etc.

        2. 1

          Any idea why we do not use escaping for commas that are not part of the field spec?

          sentence, here is an escaped\, comma, and more

          1. 1

            My guess is that someone had the idea to separate numeric and string fields by enclosing strings in double quotes. This is supported by a SuperCalc manual from 1983 (linked from the Wikipedia page for CSV).

            1. 1

              Quotes are also used to embed newlines into a field, so you’d need some mechanism for escaping those as well.

              1. 1

                Indeed. But I wonder if the same mechanism can be used

                 sentence, here is an escaped\
                 newline, and more
                
                1. 1

                  Yeah, I think it could. There might be some trickiness involved in handling CRLF though.

          2. 4

            Nothing against combining some shell utilities and today I was reminded of column. But my goto for handling csv / tsv files is definitely csvkit.

            1. 3

              There’s also csvtk

              1. 2

                Came here just to say that :) csvkit is pretty great. For me, it’s a must for handling CSV files.

              2. 3

                This is so useful. Especially with those 250 column tables

                1. 3

                  As alternatives to piecing together column, less, and perl, some full-featured tools for tabular data at the command line I’d recommend:

                  1. VisiData. Think “vim for tabular data”. Rewards time spent learning it. By default it’s a superior tabular data viewer that can save you from having to open a csv/spreadsheet in Excel/LibreOffice.
                  2. xsv. Also recommended elsewhere in this thread. It’s useful for quick tasks like grabbing the columns with xsv headers, giving you a table view — like the original author is trying to achieve — with xsv table, or counting the records with xsv count (keep in mind wc -l is inadequate for CSV files because fields may contain newlines).
                  3. Nu shell. Give it a <file.csv nu -c 'cat | from csv' and you’re “in”, and you can do anything nu can do with tables. This includes outputting a table view by default (so, this is a third way to view tabular data nicely). It can also output into another format such as tsv or json (<file.csv nu -c 'cat | from csv | to tsv'), which I suppose makes Nu similar to relational pipes (mentioned elsewhere in the thread).

                  Finally, while not directly related to command line usage, I’d recommend using a “rainbow csv” extension for your text editor that makes it easier to quickly look at csv files without requiring another tool.

                  1. 2

                    I work with TSV files heavily in my day job but was somehow entirely unaware of column. Thank you!