1. 35

  2. 12

    Another nice project with an S3 compatible API is seaweedfs (née weedfs): https://github.com/chrislusf/seaweedfs, inspired by the haystack paper (from FB, when FB had around 1-2K photo uploads per second); we use it in production (albeit not in distributed mode). A lightning talk I did a few month ago: https://github.com/miku/haystack

    1. 1

      if you do not mind, a question – did you find any solutions that are based on JDK 11+ (Java/clojure/scala, etc) – I am looking for a file store open source lib, but I would like it to be compatible with a JVM ecosystem.

      1. 1

        Interesting, I’d assume a JVM ecosystem would permit non JVM code. Is it a JVM client library you want?

        1. 1

          Not the OP, but I’ve heard that some banks will refuse to deploy any code that doesn’t run on the JVM.

          1. 1

            Wow, do you have perhaps an example, or country of a possible example?

            I know crux db is on the JVM, and they can use and even encourage their object store to be on Kafka (famously JVM)

            1. 1

              Unfortunately, no. This was just word-of-mouth from people in adjacent businesses so feel free to take it with a grain of salt.

              The general contour of reasoning was that with security being a top concern, they prefer to deploy code in ways that are familiar.

          2. 1

            Thank you for the follow-ups. I would like the whole service to be packageable and accessible as a JAR that I can incorporate in our ‘uber’ JAR.

            The backend I am working on, has one of its ‘features’ – a simple deployment. In the translation, it means a single-jar + PostgresSQL.

            The single-jar within it, has about 20 ‘micro-services’, essentially. So a user can start one up just one jar and have everything in ‘one host’ or start the jar with config params telling the JAR which microservices to start on which host. That configuration file is like a ‘static’ service orchestrator. It is the same for all the hosts, but there are sections for each host in the deployment.

            One of the microservices (or easier to call them just services) I am going to be enhancing is a ‘content server’. Today the content service basically needs a ‘backend-accessible-directory’.

            That service does all the others things: administering the content, acting as IAM Policy Enforcement Point, caching content in a memory-mapped db (if a particular content is determined to be needed ‘often’), a non-trivial hierarchical directory management to ensure that too many files do not end up in ‘one folder’, and so on.

            I need to support features where files are in a ‘remote content server’ (rather then in a locally accessible directory). Where the content server is an S3 (or some other standard compatible system) So I would like the ‘content server’ to be included as a 21st service in that single JAR.

            Which is why, I am not just looking for a client, but for the actual server – to be compatible with JVMs. Hope the explanation gives more color to the question I asked.

            With regards to other comment where folks mention that some organizations like banks – prefer a JVM only code. That’s true to a degree, it is a preference, not an absolute requirement though.

            That’s because some of these organizations have built by themselves ‘pre-docker’ deployment infrastructures. Where it is easy to request a ‘production capacity’ as long as the deployed backend is a JAR (because those deployment infrastructures are basically JVM clusters that support migrations, software defined networks, load balancing, password vaults, monitoring, etc)

            So when a vendor (or even internal team) comes in and says: for our solution we run on docker, it is OK, but they have invested millions… and now want to continue to get benefits (internal payment basically) for their self-build infrastructure management tools … Which is why there is a preference for JVM-only solutions and, perhaps, will be for some time.

            And to be honest, JVM (and JVM based languages) and their tools ecosystem continues to evolve (security, code analysis, performance, etc) — it seems that the decisions back then about investing into managed infrastructure around JVM – were good decisions.

      2. 5

        If you want to be even closer to the metal, there’s Ceph’s Bluestore backend which eschews the file system and writes blocks directly. I don’t have experience with Ceph, but that’s how S3 actually writes to disk.

        1. 1

          Those results are interesting! Has anyone gone a level deeper though? I’d love to see someone writing a driver for an SSD drive that bypasses not only the File System but also the Flash Translation Layer and exposes a low level interface that deals in immutable NAND flash pages. The FTL is, after all, only there to serve the mutability needed by the File System sitting layer above it. I think many people realize at this point how mutability complicates everything, so thanks to ditching it, not only we could speed things up, but also simplify them. For instance, LMDB is based on copy-on-write (i.e. immutable) memory pages because it’s the most reliable way. If it could deal in NAND pages directly instead, that’s losing two layers of abstraction that slow down and complicate things, with no loss of functionality. The downside is of course being tied to a specific SSD firmware, but the interface can be generalized to other SSD manufacturers.

        2. 3

          I’ve looked at MinIO for my home storage server, to expose files to other applications I run in my internal network for myself and my partner.

          This a was a while ago, but skimming through their documentation, I could not find much related to ACLs. Basically what I found is “we don’t support the ACL API, we replaced it by a much weaker system of ‘policies’…” [1][2] And their server security page doesn’t even mention ACLs even once, only server side encryption. I was a little disappointed…

          It’s really sad, because I like a lot of things about this project (open source, written in go, widely used and stable). Openshift used to have an object storage API, but I can’t find it anymore :( .

          [1] See “List of amazon S3 API not supported by Minio” , sorry there is no anchor on this title/section.

          [2] https://docs.min.io/docs/minio-client-complete-guide#policy

          1. 1

            It’s really sad, because I like a lot of things about this project (open source, written in go, widely used and stable). Openshift used to have an object storage API, but I can’t find it anymore :( .

            This thing? https://blog.oddbit.com/post/2021-02-10-object-storage-with-openshift/

            1. 1

              I think that’s it. It has evolved so much since the last time I looked into it. It used to have its own “open api” which was not at all compatible with S3, and much more REST-y.

              It looks like now the objective is to be scalable. I was more looking for a replacement for NFS (= remote file system).

              But thanks for linking this!

              1. 1

                If you’re looking for NFS-kinda like stuff, maybe GlusterFS will be up your alley?

                1. 1

                  I am seriously considering it. I need to look whether one can do authorization on glusterfs. (Remote user A can only mount this path in readonly, etc…)

            2. 1

              what exactly do you want to control access to? I’m currently running minio in production and set up AWS style policies in it with very little trouble

              EDIT: in my case I did actually use it to replace an NFS server

              1. 2

                What I mean is tokenA can only read in BucketA, tokenB can read/write is BucketA.

                1. 2

                  yea, this isn’t a problem with minio, so you can create users with the mc admin tool, and set policies that are like…. IAM style json…

                    "Version": "2012-10-17",
                    "Statement": [
                        "Effect": "Allow",
                        "Action": ["s3:ListBucket"],
                        "Resource": ["arn:aws:s3:::*"]
                        "Effect": "Allow",
                        "Action": ["s3:*"],
                        "Resource": ["arn:aws:s3:::BucketA/*"]

                  then you apply this policy to UserA (who uses TokenA) and then he can get to BucketA

                  it’s an admittedly rough interface, but it totally works

            3. 2

              We host and use cloudian object store (and we resell it too) which is probably one of the best s3 clones. It is worth it for customers that get eaten by egress costs.

              Internally we use both.

              It still falls short of features from s3 , only recently they added the ability to put object events on a sqs-like queue. But has reasonable IAM and bucket policy support. I’ve meant to try it with Hadoop for a while but I’ve not found the time yet .

              However the deployment, monitoring and maintenance is absolutely non trivial. If you plan to host your own, I’d truly recommend a good and comprehensive cost analysis.

              1. 1

                I use Minio at home to act as a blob-store endpoint on a NAS for DVC. It’s great!

                However, Minio is compatible with the basic S3 API, but it is not a “drop-in” replacement for AWS S3’s features. This is part of the advantages and disadvantages of buying into a cloud provider’s solution; you get more features, but each cloud provider continually adds features to try and build a durable advantage over competitors.

                Also spinning through the “Is this only useful for a handful of banks?” section, definitely egress network costs are an issue with AWS. However some of the objections are handled by AWS and other cloud providers, e.g. “doesn’t allow data to be exposed to any internet-facing machine” and “object storage management as a core competency” (since S3 is much more than just object storage).

                This is all part of the swirling pros and cons of cloud providers.

                1. 1

                  I like this article, but I’m not sure why the title says “bare metal” when the tool is wrapping HDFS?

                  1. 12

                    It doesn’t need to use HDFS, and in fact that’s not the usual mode of operation. Normally it manages its own storage.

                    But in any case they’re using “bare metal” to mean that you can run it on actual computers, and not just as a component of some “cloud system” or other. Computing in the 21st century is weird.

                    1. 6

                      It will plunk it’s files wherever you want. We’re running it on top of ZFS at my work, but it was previously on EXT4 (until we created too many files and discovered issues with truncated MD4 hash collision bugs with EXT4)

                      1. 4

                        until we created too many files and discovered issues with truncated MD4 hash collision bugs with EXT4

                        This, alone, is fascinating. Is there a bug report or write up I could look over to learn more? It might make a good Lobsters submission on its own.

                        1. 6

                          I should preface this by saying we had create billions hundreds of millions of files before this was a problem. I believe this blog post goes over the symptoms and cause pretty well.