1. 12
  1.  

  2. 2

    It might sound like a silly question, but if I already have the file on my local filesystem, what eoningain by adding a network layer on top of it?

    1. 3

      In my company we have a lot of slowly-changing data sets. This looks like a way to distribute that data via API without all the machinery of a database underneath it.

      Combine it with immutable infrastructure and we could have a version controlled file distributed by deployment without the need for custom source code for each dataset.

      1. 3

        This is one of the main inspirations behind roapi! We have lots of datasets that gets updated once a day from ETL pipelines. Right now we are exporting those data back to MySQL at the end of every pipeline run. But MySQL is a huge overkill for this particular use-case. The export process can easily overload the MySQL instance and impact ongoing production traffic too. Just serving those data from a stateless read-only API backed by an embedded analytical query engine is a much simpler and more scalable setup. Serving newly produced data can be done through deploying a new version of the API without any impact to the existing traffic.

      2. 2
        • The data is centralised by default.
        • The data is read-only.
        • You can edit the underlying file at any time and your users will see the new data without needing to re-download anything.
        • Serve via HTTPS so the reader can verify the source of the data, know it was transmitted privately and wasn’t tampered with during transit.
        • You only need to expose a network connection, not the file system. Keeps the attack surface smaller.
        • Allowing whatever is consuming it to query it using SQL.
        • Get all the speedups that Parquet offers over CSV.
        • Single command on the CLI, very little to mess up.
        1. 2

          If you are ok with distributing the same dataset in all nodes that need access to the data, then you don’t need the extra network layer. if fact, roapi comes with a columnq library and cli to help you perform the same type of analytical queries over local dataset, see https://github.com/roapi/roapi/tree/main/columnq-cli and https://github.com/roapi/roapi/tree/main/columnq.

          But if you have a 10GB data you want to serve and you have 1000 clients, you probably don’t want to copy and duplicate that 10GB data to all the clients. This is where ROAPI will meet your needs. Or if your clients are all in different languages and you want to provide a consistent query interface and capabilities for all of them, wrapping the data behind an API is a good idea even if the dataset size is small.

        2. 2

          Cool project! It looks like a great little tool for improving quality of life for anything that’s still just some hosted file.

          It would be pretty fun to make a little static HTML file for executing the basic commands and stuff from web browser (sort of like a Open API Spec browser)

          1. 2

            yep, that’s a pretty cool idea, we were discussing about this couple weeks ago in https://github.com/roapi/roapi/issues/80#issuecomment-923160321.

          2. 1

            ROAPI is made up of 4K lines of Rust. This line count is low due to the intense use of 3rd party libraries

            I’m not saying it could or should have been smaller, but I don’t think 4K can be called “low” especially when you’re using so many libraries you had to call it “intense”.

            1. 1

              There are very few OLAP systems out there with less than 100K lines of code.

              4K lines means you can review the code in minutes/hours rather than days or never fully review.