1. 0

Hi, I’ve been working on a Self-hosted Git engine called “Sorcia” and I launched it 4 months ago https://news.ycombinator.com/item?id=22685914

The project name is renamed from “Sorcia” to “Cmdity” now. I’m here to ask few questions. Thus far Sorcia/Cmdity has only the web-frontend for Git repositories and there is no collaboration tools which I’m working on currently.

I’m building a CLI utility where anyone can send a patch to your repository hosted on your instance. In other words, if the developer who wants to send a patch and that person is not a member of your repository, one can use this CLI utility to send patches.

The process would be like.

  • There will be a “Patches” section/page.
  • Every repository on Cmdity will have a special API endpoint where a developer could be able to send patches.
  • You create a patch using “git-format-patch” and send it using this CLI.

But there will always be some bottlenecks. My consideration is that:

  • What if someone would send a fake patch which is not relevant, in this case I can create a moderation feature on the application where the admin/members has to accept the patch to be published on “Patches” section/page and then it can be reviewed and merged.

Now, coming to a bit harder problem for me:

  • What if someone could send patches continuously in order to get the Cmdity instance down? How can I avoid this? One solution I’m thinking of is that the developer has a git email address and they have to confirm everytime they send a patch to a repository through a confirmation link which will be sent from the application.

But I’m thinking whether this would be the right approach. Hence I’m here to ask for any suggestions which would be more efficient than what I’m thinking.

The pages that you need to check are.

Thanks a lot!

  1. 3

    I don’t have time to do a thorough overall security review. But here is a riff on your idea for rate limiting that you might want to consider:

    • In your moderation queue, where you accept a patch from a user, add an additional option to move patches from that user to the review queue in the future.
    • When sending their first patch from a git email address, the sender must confirm the email.
    • They may then send up to N patches.
    • After N patches, they may not send more until one of them is accepted in the moderation queue.
    • They may then continue to submit patches until they have N in the moderation queue again.
    • The owner can choose to always move a user’s patches from the moderation queue into the review queue. That would lift the restriction on how many patches a user can send.
    • Maybe if an owner chooses to merge a contributor’s patchset, prominently include an option in the display where they do that to always move future contributions directly into the review queue.

    My rationale: when I’m contributing to a project, it’s often more than one patch, but seldom more than, say, a dozen. I generally try to break my contributions into small patches, then submit them in one request if the project is using a model like github or gitlab. If I had to verify my email for every patch, I’d give up somewhere around #3 or #4. Something like what I’m describing might effectively prevent malicious “contributions” while also keeping friction low for contributors.

    1. 2

      I like this, but you will need to do some sort of identification/authorization. Otherwise if I see a patch from hoistbypetard@example.com, I can then change my git email to that address and fill up that queue for N patches, and continue doing that for all the verified emails.

      Just create accounts for them, maybe email link to login/verify a device until they setup a password, ssh keys, etc.

      Also I’d send an email notification for every patch submitted, so hoistbypetard@example.com knows the patch got received correctly, wasn’t “hacked”, etc.

      1. 1

        Yes, this is a good point! I’ll think about a workaround for this when I implement the CLI. Thanks a lot!

      2. 1

        I’m sorry for a bit late to reply. And thank you very much for your time to suggesting this!

        If I had to verify my email for every patch, I’d give up somewhere around #3 or #4.

        I understand this. You are correct here.

        What you have said seems to be a good approach to tackle what I’m trying to solve. I’ll wait for some more time to think about this and see if I have any more questions to ask to you.

        Thanks again!

        1. 1

          And the N patches can be an option that the owner could adjust for a contributor apart from adding them as a member to the repository if an owner wish to do so.

          Yes, I got what you are saying. Well, I don’t see any bottlenecks with your suggestion. That said, I’m curious to see more replies to my question as well :)

        2. 2

          I think https://web.archive.org/web/20130117043748/http://sheddingbikes.com/posts/1306816425.html will make for interesting reading. Note the trolling via any user-accessible text field (here, project membership), and the rather nasty denial-of-service attack in response.

          Unrelated to your question, but: the “license information” section on https://cmdity.org/r/cmdity isn’t encouraging. Even the AGPL doesn’t require open-sourcing any code that your code merely processes, and I can’t even find the LICENSE.AGPL file which apparently adds that additional restriction. (Or not? The AGPL may not let you add such restrictions: “If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term.”)

          That said, I wish you the best!

          1. 1

            Even the AGPL doesn’t require open-sourcing any code that your code merely processes,

            That’s fine. But I’m not sure, what you are saying here. Could you explain more about this? I haven’t written the Commercial License yet and also I’ve haven’t implemented that feature. Still there is plenty of work to do before doing that.

            It’s a Self-hosted software and consider this as a platform. If all your projects/repositories in your Cmdity instance is FOSS, then you are free to use and you don’t need to pay for a commercial license. Otherwise, if you are using Cmdity and keeping any proprietary projects, you need to purchase a commercial license.

            1. 1

              Not a lawyer, but as I understand it: the AGPL is, at least, not trying to do what you say you want; one should be able to run a hypothetical AGPL httpd, accessible to the public, without AGPL’ing the files served by the httpd. (One would need to share any changes to the AGPL httpd.)

              The problem isn’t really the license itself - what you describe is a custom commercial license with a free tier for open-source, which is sensible enough (and yes, do please define which license count as “open source”, as in your sibling comment); the problem is the lack of clarity.

            2. 1

              I think it is better if I mention what licenses are allowed for people to use Cmdity for free. I will do that.

            3. 1

              Here’s an idea. Ever heard of hashcash? Bitcoin draws some inspiration from this. The idea is that whenever someone wants to submit a patch, they have to solve a puzzle. Similar to a captcha, but it’s a computational puzzle, so the user doesn’t need to do anything, it just slows them down. An example would be:

              • User wants to submit something, asks server for puzzle.
              • Server gives the user some random string X and asks it to find a string Y such that hash(X | Y) starts with ten zeroes (this is the difficulty and can be adjusted).
              • User finds the string Y by brute force (trying out random ones) and submits it back to server, along with its patch.
              • The server verifies that hash(X | Y) does indeed start with 10 zeroes, and if so, accepts the patch.

              I’m not a cryptographer so this break down might not be entirely correct. But the idea is basically to slow down submissions by making them computationally expensive, which isn’t a huge problem if you’re submitting just one patch (you can adjust it to take like 3s on a modern machine) but it will make it expensive for spammers. You’d probably want to use scrypt as the hashing function here.

              1. 2

                The “economics” of hashcash-like systems typically do not work out today.

                In this case, you’re trying to prevent the Sorcia server from needing to process a git push; I’m pretty sure that I could construct a git push that takes, say, 1 second to process. (For convenience, assume the server is running on a single-core VM.)

                To prevent a somewhat motivated attacker from consuming 100% of your CPU - say an attacker who has a few beefy computers around the house or in the cloud - you may need to impose a 100:1 or 1000:1 cost difference between the attacker and the server (so you’d need to impose 100s resp. 1000s worth of effort on submitters, to ensure that an attacker with 100 resp. 1000 CPU cores can’t consume all resources on your VM.)

                But imposing 100s or 1000s worth of effort on legitimate users - who may really like their ancient laptop (Thinkpad?), and who may really need to wait e.g. 400s or 4000s for an answer - isn’t going to be well-received.

                … and this really isn’t the worst case yet; consider the difference in computational power between an attacker with a botnet or a GPU farm, and a legitimate user who’s running heavy crypto code through Javascript on an ancient smartphone with an aging battery…

                1. 1

                  My rebuttal to this (and I’m really no expert) is that it should be possible to just block large patches (typically, patches are only a few KiB anyways?). Plus, there could be additional checks, such as a per-IP rate limit, etc. Just using this as a layer to make it more expensive to attack. Wouldn’t that be a good idea? I think anything that takes more than 1 to 3s of processing time for a legitimate user would be a bad idea.

                2. 1

                  I appreciate your response on this! But, I don’t know much about cryptography in order to comment on this to be honest. I think going with a reasonably good enough approach like what “hoistbypetard” suggested is the way to go for now. I mean with N number of patches and giving the owner to modify the default N patches for a contributor based on their needs.

                  Also, one could always come back and figure out a solution if there is any problem with that approach by considering what the community would say when they started using Cmdity and whether they find this workaround annoying.

                  Thanks again!