1. 33

Hey guys, I’ve been recommended to post our new project here to get some feedback, hope it’s fine.

We’ve developed a decentralised peer to peer file system that enables the creation of flexible and controllable storage infrastructures in a few minutes.

We released it a few weeks ago and we’re eager to get some feedback on it. You can read more about it here: http://infinit.sh or see examples of infrastructures you could build with here: https://infinit.sh/documentation/deployments

Don’t hesitate if you have any questions! Happy to talk about the state of peer to peer and storage world too :)

  1.  

  2. 4
    • Will this be open sourced, and if so, when?
    • The documentation doesn’t make it immediately clear what the relationship is between the different types of storage (S3, etc.) is and the DHT, could you elaborate?
    • Are the storage layers “hot-swappable”? I.e. can I expand the room in my volume by adding an S3 thing here, a Google thing there, a RAID array later on, and if so is there a Drobo-like algo for re-replicating data?
    • See my question re IPFS.
    1. 3
      • Yes, we want to open source all the client code, especially everything related to encryption. We started with our build system but we will open source more parts in the coming weeks.
      • A storage is where the blocks are stored, the nodes that contribute storage to the DHT fetch and write blocks from/to the storage. A DHT is just way of addressing the blocks so they can be found.
      • Yes, but we haven’t release the feature yet. It’s hanging out in our roadmap as “rebalancing”, should be ready at the end of the month or so.
    2. 4

      Just wanted to mention that the documentation seems very thorough and easy to understand. Quite a pleasure to read through. :)

      1. 1

        Thanks, really appreciate :)

      2. 4

        Love that it is C++. Don’t love that I can’t see the code yet. As a fellow C++ P2P programmer, I would love to see the code to learn, for example the DHT implementation.

        I’ll try this when the source is available and hopefully can provide some good feedback.

        Looking from the outside, the commands seem pretty simple. Does this also work across NAT? Also does it use relays?

        1. 2

          We punch most firewall in UDP. If that does not work, we use TCP which will work if you have a public IP or if UPNP is available to NAT a port. Otherwise you still can connect to other nodes with public IP. In such a setup, you’d probably have to resort to manual port NATing. This setup would probably happen only in a very strict, corporate environment.

        2. [Comment from banned user removed]

          1. 5

            Infinit has been designed to support heterogeneous environments, in particular to tolerate faults that are said to be Byzantine.

            In other words, our protocols for nodes to communicate have been designed so that no assumption can be made regarding the intention of the other nodes: they may be trying to find a flaw in the protocols to exploit the system and gain access to some files or a bug may have altered the way the program behaves i.e not following the protocols. In addition, whenever a file is stored, it is cut into chunks and every chunk is encrypted with a unique key. On top of that, a decentralized access control mechanism based on RSA key pairs allows a user to decide who else can access/modify his/her files/directories.

            We haven’t created dataflow diagrams for instance because we focus, for now, on developing the remaining key functionalities after which we will work with the open source community to strengthen the security model and stability.

          2. 3

            Shouldn’t the first user be called alice, not bob? ;-p

            1. 2

              Hmmm it should. Is it not the case?

              1. 2

                Ah, it is the case in the deployments doc but not in the getting-started ones.

                1. 2

                  Oh right. Will be changed as well! Thanks.

            2. 3

              I’d love to see a compare and contrast with IPFS https://ipfs.io/

              1. 3

                We wrote a small entry in our FAQ about that: https://infinit.sh/faq#how-does-infinit-differ-from-ipfs But we should maybe write a more complete comparison as people often ask. Do you have any specific questions maybe?

                1. 1

                  Why are you not integrating with IPFS? Seems like you should be.

                  1. 1

                    Yeah we could do it, it’s just a matter of priorities for the next weeks. Could you take 1 minute to upvote it here? Thanks! https://infinit-sh.uservoice.com/forums/318522-general/suggestions/11434446-support-ipfs-as-a-storage-medium-in-the-same-mann

                    1. 2

                      Upvoted! BTW, did you see my questions down below? ^_^

                  2. 1

                    Ah alright, didn’t see the FAQ there.

                    1. 1

                      Based on the FAQ response, I’m curious how you’d compare Infinit to git-annex.

                      1. 2

                        From what I see, it’s quite different since it’s just git dealing with large files. You cannot create hybrid infrastructures, don’t have user permissions, virtual disks to access your files etc.

                  3. 2

                    This is a fascinating framework. I really like the idea of providing a playground that can rapidly deploy different types of overlay networks. There are a number of computational resources that one might wish to make available via overlay networks, and limiting it only to filesystems feels a bit restrictive, but I do understand that this has a lot of important use-cases, and that it needs enough care that focusing only on it makes sense.

                    This doesn’t do anything Tor-like; as far as I can tell from the tutorial, any given network always has a single owner - in that it just wants one set of AWS credentials, so there’s no way to build a federated system. So I would say that it’s designed for scenarios where the network owner is okay with being identifiable, and the users are okay with trusting the owner to manage their ACLs. The users, other than the owner, can remain anonymous in this scenario, to the extent that the underlying transport allows; their IP addresses do leak.

                    Is that an accurate summary of the privacy properties? I could easily have misunderstood; the docs really aren’t written from that perspective, I suppose because it’s not trying to be anything dramatically more private than Dropbox or S3.

                    1. 2

                      Actually we are doing something dramatically more private than Dropbox or S3. While you can use those services to store part of the data, it’s entirely opaque to them thanks to encryption. The only special power the network owners have is letting people in: they cannot in any way access or alter other users data, unless specifically given permissions - actually the owner could have absolutely no access to any data at all. There can be as many S3 or other storage you wish, every node can potentially contribute storage. As far as reading/writing data goes, everyone is considered equal, so I’d call that a federated system.

                      As far as privacy goes, you need permissions to join a network, but the owner could sign passports from his secret batcave. You can also delegate invitation permission, so anyone in the network could invite you. For users, the only information that would leak is indeed the IP address, use your VPN if that’s an issue. All in all, the only thing you know for sure about other users and the owner are their public key, as long as you don’t publicly associate yourself with it, I’d say you’re good.

                      Finally, note that you can perfectly store data in the filesystem that only you can read (enforced by encryption). In that scenario, you don’t really care about staying anonymous: the only thing the rest of the world knows is roughly the size of the data you’re storing.

                      1. 1

                        Thanks for your response.

                        The owner is the one who ultimately pays Amazon and the other backend providers; that certainly leaks their legal identity, to Amazon and to anyone who can socially engineer Amazon or request the information for legal purposes. It also makes their billing account a single point of failure for the whole thing. In a truly federated system, there wouldn’t just be one person financially responsible for the backends, and no one party could choose to stop running them all at once.

                        You are certainly providing properties which let the owner stay pseudonymous to other users, to whatever extent it may be difficult to track EC2 instances back to their billing account, but this isn’t an email-like system where gmail and hotmail can interoperate without having to be financially entangled, nor is it a Tor-like system where there are dozens of unrelated parties paying to run chunks of the infrastructure, none of whom individually have the power to stop it. I don’t mind what you choose to call it, but this is an important distinction.

                        I’m very interested in hearing how your ACLs are implemented, especially with regard to key management and revocation. If this is an important feature, there really needs to be a lot more detail on it. For all I know, you’re doing something novel and fascinating, but if you’ve documented that, I couldn’t find where. What I do understand, and please feel free to correct it:

                        1. Each user has an asymmetric key pair; the public half is registered with the network and all their interactions are encrypted and signed by it.
                        2. Each file and directory has an ACL. There’s nothing described about how ACLs are enforced; for all that your docs say, it could be a simple “is this user in the list? return true” check, with no cryptographic backing. But you’re suggesting that users can have their data be private from the network owner, so presumably it’s actually encrypted. By which keys? Which entity in the system generates those keys, and where are they held?
                        3. Is the “only you can read your file” property enforced by this same ACL mechanism, or a different one?

                        I’d note that “you don’t really care about staying anonymous, as long as nobody can read your data” is a choice of a specific privacy property to provide. It is not the only possibility. There are certainly realistic scenarios where somebody finds it important that nobody knows how much they’re storing.

                        All of these choices are perfectly reasonable, but really ought to be spelled out somewhere. It’s hard to know what your system is useful for otherwise.

                    2. 2

                      Cool - definitely trying this out.

                      Minor typo in the shell example at the bottom of the front page:
                      infinit-volume --mount --name company --ountpoint /mnt/company/

                      Guessing --ountpoint should read --mountpoint :)

                      1. 2

                        Woops, will be fixed soon, thanks. Let us know how that went when you have some time!