1. 17

  2. 6

    Since it seems like horizontal scalability is important to you and you are mainly looking for support for key GETs similar to S3/GCS on your data, you may also want to look at HBase, Cassandra, ScyllaDB. These three are specifically made for horizontal scalability and for a way to store lots of files, especially small files, with fast concurrent reads, high availability, and low cost. I personally wrote about our experience with Cassandra for storing hundreds of billions of 4-10kb records here.

    1. 1

      It took me a while to get them working but I tried out both Cassandra and ScyllaDB. They are not the easiest to install on Debian 10 at the moment. Cassandra is stupidly fast. On my dedi it’s operating at the nearly the sequential MB/s of the HDD with writes. That’s an 8 core machine, single 7200rpm disk and with 8 python clients on the same machine using each using execute_async to repeatedly load { uuid : 6Kb } payloads (similar to my use case) for 60 seconds. With a single disk I could very nearly saturate a 1Gbps connection. Makes me giddy to be honest. I tried BatchStatements too but async was much faster.

      The weird thing is that I tried the with ScyllaDB and it kept dropping connections, having weird semaphore timeout issues, and just in general not loading all of the data during my simple localhost load test. I’m not sure what I’m doing wrong but I don’t really feel like trying to debug it anymore when I have such an acceptable choice with Cassandra or CockroachDB.

      Your write up is very helpful btw. I’m going to be stealing that xsv formatting trick.

      1. 2

        That’s great to hear. Yea, we found Cassandra to be stupidly fast as well, as long as you keep it simple and ensure the client library isn’t doing too much client-side work. Cassandra was pretty much designed for speed of ingest and speed (esp. end-to-end latency) of concurrent reads, and it is especially fast on the read side when the reads can be disk-aligned (which it tries hard to do).

        I only mentioned ScyllaDB because I have a mental model of, “it’s a clean-house rewrite of Cassandra from Java to C++, in order to remove the ‘JVM tax’ and improve performance further.” I had never actually tried ScyllaDB concretely before. But there’s no need to adopt it up front since the whole point is that it is 100% compatible with the Cassandra wire protocol.

        If you’re happy with Cass, you can probably skip HBase. It’s more complex to set up than Cassandra, and I found that its data model was also more difficult to understand.

        Also, it may not be totally up-to-date, but since you were using Python to saturate Cassandra, you might be interested in a blog post I wrote about the Python Cassandra async event loops and their performance impact on Cassandra ingest. Here it is.

        Happy hacking!

    2. 3

      Please use the smallest supporting set of tags for this, e.g., databases and devops. It’ll help more people see it.

      1. 1


      2. 2

        These notes are what I’ve compiled over the past week+ of research, load testing and checking what would work on VPS and dedicated servers with the best $/capability ratios that I could find. If anyone has any input I would take it wholeheartedly.

        1. 1

          It was quick and entertaining to read (I meant to skim, but ended up reading most of it). Very tight, light summaries… A lot of these writes tend to run too long, but yours is great. Also, great title. ETA: One question, out of curiosity but probably useful for the article: What were your own driving use cases? (e.g. storage for smaller projects, startup, day job, personal files, etc).

          1. 1

            The purpose is to be able to store a bunch of encrypted files and messages without vc funding or needing ads to do it. So it’s project or bootstrapped startup I guess?

        2. 1

          Writing suggestions:

          It’s a bit unclear what the categories “Big boi storage” and “Lil’ boi storage” mean. Maybe just substitute “file” for “boi”? Or find some other funny way to express what you mean a bit clearer. It’s probably innocuous, but I’d probably degender or invert the gender on it, for extra fun and inclusivity.

          I think you’ve buried the lead a bit, too by hiding your recommended systems down the lists and in the conclusion. I would feature them first in each list and in the introduction.

          Content suggestions:

          Did you check out any of the distributed sqlite things?

          1. 6

            Thank you for giving it a read! I sometimes feel like I do too much research trying to find ways to minimize the cost of running things so it’s nice that the post got some eyes. With the help of your comment I’ve cleaned up up the post.

            I did check out a couple of them but I did not load test either. I decided against them primarily due to scaling issues (rqlite) or ease of use issues (actordb). If there are more I don’t know about them and didn’t look into them. The post has been updated to note that.

            1. 4

              Your article is a great summary (and introduction to) a lot of storage systems I’ve only heard of. I just wanted to chime in and say, it’s really refreshing to read your avid enthusiasm towards criticism!

          2. 1

            Can I ask what you think about seaweedfs? S3 compatible and optimised for clustering and small files.

            1. 2

              I just gave it a few test runs using the included s3 api server command weed server -s3. The default filer was backed by Leveldb. With small files (6KB) it’s slow compared to Cassandra. It seems to be able to upload around half the number of what CockroachDB can do. This is might be because there’s no batching in the s3 api but I don’t think it was because of the lack of asynchronous clients; I used 15 clients simultaneously but didn’t max the cpu so it’s something else since request per second still dropped.

              When it comes to larger files (6MB) it’s great, all files are uploaded and it’s really, really fast. Adding volumes for buckets seems fairly straightforward in comparison to minio where the sizes of clusters are fixed and the scaling comes from federations of clusters under etcd namespaces.

              Originally I was cursing you because you gave me something else that seemed worth investigating but now I’m happy I’ve seen it. It should be relatively simple to shove this in in place of everything I would’ve been doing with S3 or wasabi since it even supports presigned urls. Thanks. :D. Gotta update the article with new info and hope no one else has reasonable suggestions.