1. 10
  1.  

  2. 2

    All valid points, in the most general case. For any kind of ‘production’ system which could handle unknown data or format variants, there are plenty of good libraries out there, and it would be irresponsible not to use them.

    But… for disposable scripts to process known-boring CSV data, I think it’s usually fine to just read lines and split on commas. I do that a lot, and I’ve never regretted it. On the other hand, I’ve often regretted overcomplicating otherwise simple programs by adding unused generality.

    1. 2

      I have no idea why tab-separated files are not more popular (supported everywhere, including Excel and Google Sheets), it solves a lot of problems around custom importers (ie. you can now even just use awk) as most datasets do not include tab or newlines in their values.

      1. 2

        There are a couple issues with tabs:

        • It’s not always clear whether a character is a tab or a space.
        • Most programming editors support converting tabs to spaces and this option is often enabled.

        If you can choose the format then pipe (|) delimiters are often a good option.
        Pipes are much rarer than newlines or tabs.

        1. 1

          wot?! This is about data ingestion between tools, not manually editing a CSV file in a text editor.

          1. 2

            CSV files are opened and edited manually all the time in real life.

        2. 2

          By extension, I suppose ascii-separated values would be even better.

          But, hey, network effect strikes again. I’m not gonna tell data producers how to format their products. When in Rome, etc.