1. 35
  1.  

  2. 59

    Update: been working on a better approach to these problems that leave affected users feeling less put-out. I’ll be starting with a better email template in the future:

    https://paste.sr.ht/~sircmpwn/3d32eb7bbc564170c3d30f041e5e8dc71aa5a1c6

    In the future I’ll be working on better automated soft limits, so that users aren’t surprised by this.

    @sjl: after thinking it over more, I was unprofessional and sarcastic with you. I apologise.

    1. 41

      I think it would be beneficial for you to take on the mindset that your users’ use cases are always valid, by definition, as a premise. Whether or not your service can handle their use cases, maybe not, but this idea that you know better what your users should be doing is not going to accomplish your goals.

      As another example, I happen to need 2-3 more GiB RAM than the sr.ht freebsd build services offers at the moment, and have offered to up my monthly donation to account for the resource usage, and you’ve turned me down, on the grounds that I’m fundamentally abusing computer hardware in some moral way. As a result, Zig freebsd builds have many of the tests disabled, the ones where the bootstrapping compiler is a bit memory hungry. Zig’s FreeBSD users suffer because of this. And as a result, when someone else offers me a different FreeBSD CI service with more RAM, I’m ready to accept it, because my use case is valid.

      1. 6

        Could linking with the boehm conservative gc work as a stop gap? I think it won’t require any code changes.

        1. 4

          Something Andrew doesn’t mention here is why he needs 2-3 GiB more RAM: because, by design, his compiler never frees memory. Nearly all of that RAM is dead memory. In order to accomodate this use-case, I’d have to provision dedicated hardware just for Zig. Sometimes, use-cases are wrong, and you need to correct the problem at the source. Just because someone is willing to throw unspecified sums of money at you to get their “use-case” dealt with doesn’t mean it’s worth dealing with. I have finite time and resources and maybe I feel like my time is better spent implementing features which are on-topic for everyone else, even at the expense of losing some user with more money than sense.

          1. 44

            even at the expense of losing some user with more money than sense.

            I really hope you change your tune here. Insulting users is pretty much the worst thing you could do.

            Another thread recently talked about the fact that compilers don’t free memory, because the goal of a compiler is to be as fast as possible, so they treat the heap as an arena that the OS frees. Compilers have done this for 50+ years—zig isn’t special here.

            1. 4

              I didn’t mean to imply that Andrew doesn’t have sense, but that the hypothetical customer-thats-always-right might not.

              As for compilers never freeing to be fast, utter bollocks. So the compiler should OOM if I have <8G of RAM to spare? Absolutely nuts. Freeing memory is not a performance bottleneck.

              1. 38

                Your reasoning is sound, your wording and phrasing choices are not. In what I’ve read you don’t come off as witty when you’re dealing with a paying customer and telling them they can’t do something which I also think is unreasonable, you come off as a dick. That’s how it appears. I don’t have any problems with you or your services and I think you working on this stuff is awesome… but I wouldn’t pay for insults in addition to whatever else you might provide.

                1. 5

                  As long as I’ve known him Drew has pretty consistently been like this. It’s not a bad thing. It’s quite refreshing actually.

                  1. 36

                    It’s refreshing to have a business make fun of you?

                    1. 9

                      It’s quite refreshing to see someone willing to say ‘no you’re wrong’ instead of the typical corporate ‘the customer is always right’ bullshit so many people here have obviously come to expect.

                      Sometimes the customer is wrong.

                      1. 34

                        It’s OK for both people to be right, and the customer to stop paying for the service and walk away. It’s then OK for the customer to go tell people about how they were treated. Hopefully that happens more.

                    2. 24

                      As a former moderator in prior communities, I politely disagree. Folks that are never not toxic are a serious liability and require special effort to handle well. I recall one memorable day when Drew dared me to ban him; I should have let the emotions flow through me and removed him from the community.

                      Also, as a former business owner, I politely disagree that this is good business practice.

                      1. 18

                        I agree it’s good, now I know to avoid this business!

                2. 15

                  CPU speed vs memory usage is a fundamental resource tradeoff that occurs all the time in computing. Just because you disagree with where on the spectrum someone has chosen to aim their design doesn’t mean they’re stupid. Especially when they too are a mostly-one-person project operating on limited resources.

                  It’s PERFECTLY VALID to say “I don’t have time to accommodate this one special case, sorry”. It is NOT perfectly valid to say “you are stupid for needing this special case, go away”. Money vs. person-time is another fundamental resource tradeoff where different people have different priorities.

                  1. 22

                    Regardless of the use case, I’d really rather not have my SCM platform making discretionary decisions about what I’m working on. The users aren’t paying for you to audit them, they’re paying for the services provided by the software. If you want your service to come with the exemption that you get to unilaterally decide whose content is allowed and whose content isn’t allowed, you’re free to do that. Just expect the community to nearly unanimously respond with “we’ll go elsewhere”

                    1. 7

                      He’s not making ‘discretionary decisions about what [you’re] working on’. I don’t see Drew saying ‘you can’t use this service because I don’t like the way your compiler is designed’. He’s saying ‘provisioning dedicated hardware for specific projects is a lot of time and effort that I don’t have, so I’d need to have a really really good reason to do it, no matter how much money you’re willing to throw at me, and you haven’t given me one’.

                      Every service out there gets to decide what is allowed and what isn’t. Look at the terms of service of any file or repository hosting service anywhere. GitHub, GitLab, Bitbucket, imgur, pastebin services… ALL of them make it clear in their terms of service that it’s entirely up to their whim whether they want to host your files or not.

                      1. 32

                        Drew is literally commenting on a particular users project, and how its design is a problem, so I have no idea what you’re talking about:

                        Something Andrew doesn’t mention here is why he needs 2-3 GiB more RAM: because, by design, his compiler never frees memory. Nearly all of that RAM is dead memory.

                        As for compilers never freeing to be fast, utter bollocks.

                        @andrewrk can hopefully clarify, but I thought his offer to up monthly donations was to improve sr.ht’s FreeBSD offering, in general, not necessarily to only improve Zig builds (Zig builds would improve as a byproduct of improving the FreeBSD infrastructure). If the donations were only to be used to improve Zig-specific experiences, then I understand the argument that Drew doesn’t want to commit to that.

                        1. [Comment removed by moderator pushcx: Removing slapfight.]

                          1. [Comment removed by moderator pushcx: Removing slapfight.]

                            1. [Comment removed by moderator pushcx: Removing slapfight.]

                    2. 13

                      It just seems weird to me that one of your criteria for whether or not to give a customer resources is based on a personal audit of their code. Are you going to do this for every customer?

                      1. 4

                        I completly understand the concern here, and take it very seriously. I usually don’t dig into the repo at all and just reach out to the user to clarify its purpose. In this case, though, the repo was someone’s personal website, and named as such, and connecting the dots did not require much.

                        1. 2

                          As explained downthread, it’s “Alert fires -> look for what’s caused the alert -> contact customer whose repo tripped the alert”.

                      2. [Comment from banned user removed]

                        1. 10

                          Nobody’s suggesting otherwise.

                      3. 19

                        You handled this very professionally and courteously, I plan to continue to use sh for many happy years to come.

                        1. 6

                          You are under no obligation to explain or justify what your business model is to anyone, or on a personal level what self sustainability, your own peace of mind, well being or definition of meaningful sustainable work is.

                          There is a particular mode of doing business these days which people inside that paradigm often do not understand that they are inside and therefore apply force to get others to conform.

                          You’re breaking old paradigms and inventing new ways of running organisations and that is brave, ground breaking and commendable and politically powerful.

                          I hope issues like this does not deter you one bit from blazing your own trail through the fucked up world that is tech organisations in late stage capitalism and I hope you share as much as you can about how you’re doing personally and in ‘business’.

                          1. 2

                            git-lfs implementations often don’t allow to reclaim unreachable blobs: once you push a binary blob, even on a branch that you deleted, it will take some space forever.

                            Maybe it is worth investigating git-annex while you’re on this topic.

                            1. 6

                              Yeah, git annex is also something I intend to study. I’m only just setting up large file servers for blob storage, figuring out how to apply them is the next step.

                          2. 46

                            Please, don’t just link to twitter. It takes at least several (~5 or more on a good day) tries for me to render. Twitter usually will not have a deep discussion, and even if it does, half the conversation is hidden and takes more tried to properly display. I don’t know if it’s just my mistake, or if it’s because I don’t have an account, but if you really want to talk about a conversation on twitter, write a summary, some context, some explanation, some thoughts and put in on some blog or whatever normal site (there are plenty of these, many easy to use).

                            1. 9

                              Check out nitter.net, it’s a static and hassle-free twitter frontend. You can get redirected automatically using a browser extension like Invidition. Conversations are broken on twitter, but that’s just the way the website works.

                              1. 7

                                but that’s just the way the website works.

                                optimized to generate maximum social discord ;)

                              2. 7

                                non logged in twitter is broken for me too, especially on mobile

                              3. 17

                                This is no fun to deal with for any sort of code hosting site. Around the time I worked at Bitbucket, we had issues with users writing scripts to split up large movies files into chunks and automatically create repos via the API to upload them. Because our official limit was 1GB this was technically allowed under the ToS. We had to come up with other ways to deal with abuse like this.

                                Another thing you may want to watch out for is public repositories being used as hosting. As far as I’m aware, there is still special cased aggressive caching on a number of repos because they serve kodi addons and you can just point kodi at the XML file at the “raw view file” link and it just works. That resulted in a ton of additional bandwidth and resource usage, which can be especially frustrating if those users are only using a free tier.

                                Best of luck.

                                1. 10

                                  @ddevault Would it be possible to get a clear “Terms of Service” clarifying these sorts of use cases? 1.1 Gb seems like an excessive file size, but having a crystal clear & mutually agreed upon set of rules for platform use is essential for trust (more so for a paid service), and right now users don’t know what does and does not constitute as a reasonable use of the service .

                                  1. 37

                                    No, they’re intentionally vague so that we can exercise discretion. There are some large repositories which we overlook, such as Linux trees, pkgsrc, nixpkgs, even mozbase is overlooked despite being huge and expensive to host.

                                    In this guy’s case, he had uploaded gigabytes of high-resolution personal photos (>1.1 Gb - it takes up more space and CPU time on our server than on your workstation because we generate clonebundles for large repos). It was the second largest repository on all of SourceHut. SourceHut is a code forge, not Instagram.

                                    1. 40

                                      No, they’re intentionally vague so that we can exercise discretion.

                                      I like to call this “mystery meat TOS”. You never know what you’ll get until you take a bite!

                                      1. 24

                                        I mean, honestly, a small fraction of our users hit problems. I’ve had to talk to <10 people, and this guy is the only one who felt slighted. It’s an alpha-quality service, maybe it’ll be easier to publish objective limits once things settle down and the limitations are well defined. On the whole, I think more users benefit from having a human being making judgement calls in the process than not, because usually we err on the side of letting things slide.

                                        Generally we also are less strict on paid accounts, but the conversation with this guy got hostile quick so there wasn’t really an opportunity to exercise discretion in his case.

                                        1. 30

                                          the conversation with this guy got hostile quick

                                          Here’s the conversation, for folks who want to know what “the conversation got hostile” means to Source Hut: https://paste.stevelosh.com/18ddf23cb15679ac1ddca458b4f26c48b6a53f11

                                          1. 32

                                            i’m not a native speaker, but have the feeling that you got defensive quickly:

                                            Okay. I guess I assumed a single 1.1 gigabyte repository wouldn’t be an unreasonable use of a $100/year service. I certainly didn’t see any mention of a ban on large binary files during the sign up or billing process, but I admit I may have missed it. I’ve deleted the repository. Feel free to delete any backups you’ve made of it to reclaim the space, I’ve backed it up myself.

                                            it’s a pay-what-you-like alpha service, not backed by venture capital. you got a rather friendly mail, noticing you that you please shouldn’t put large files into hg, not requesting that you delete it immediately.

                                            ddevaults reply was explaining the reasoning, not knowing that you are a mercurial contributor:

                                            Hg was not designed to store large blobs, and it puts an unreasonable strain on our servers that most users don’t burden us with. I’m sorry, but hg is not suitable for large blobs. Neither is git. It’s just not the right place to put these kinds of files.

                                            i’m not sure i’d label this as condescending. again I’m no native speaker, so maybe i’m missing nuances.

                                            after that you’ve cancelled your account.

                                            1. 13

                                              As a native speaker, your analysis aligns with how I interpreted it.

                                              1. 9

                                                Native speaker here, I actually felt the conversation was fairly polite right up until the very end (Steve’s last message).

                                              2. 14
                                              3. 28

                                                On the whole, I think more users benefit from having a human being making judgement calls in the process than not, because usually we err on the side of letting things slide.

                                                Judgement calls are great if you have a documented soft limit (X GB max repo size / Y MB max inner repo file size) and say “contact me about limit increases”. Your customers can decide ahead of time if they will meet the criteria, and you get the wiggle room you are interested in.

                                                Judgement calls suck if they allow users to successfully use your platform until you decide it isn’t proper/valid.

                                                1. 12

                                                  That’s a fair compromise, and I’ll eventually have something like this. But it’s important to remember that SourceHut is an alpha service. I don’t think these kinds of details are a reasonable expectation to place on the service at this point. Right now we just have to monitor things and try to preempt any issues that come up. This informal process also helps to identify good limits for formalizing later. But, even then, it’ll still be important that we have an escape hatch to deal with outliers - the following is already in our terms of use:

                                                  You must not deliberately use the services for the purpose of:

                                                  • impacting service availability for other users

                                                  It’s important that we make sure that any single user isn’t affecting service availability for everyone else.

                                                  Edit: did a brief survey of competitor’s terms of service. They’re all equally vague, presumably for the same reasons

                                                  GitHub:

                                                  [under no circumstances will you] use our servers for any form of excessive automated bulk activity (for example, spamming or cryptocurrency mining), to place undue burden on our servers through automated means, or to relay any form of unsolicited advertising or solicitation through our servers, such as get-rich-quick schemes;

                                                  The Service’s bandwidth limitations vary based on the features you use. If we determine your bandwidth usage to be significantly excessive in relation to other users of similar features, we reserve the right to suspend your Account, throttle your file hosting, or otherwise limit your activity until you can reduce your bandwidth consumption

                                                  GitLab:

                                                  [you agree not to use] your account in a way that is harmful to others [such as] taxing resources with activities such as cryptocurrency mining.

                                                  At best they give examples, but always leave it open-ended. It would be irresponsible not to.

                                                  1. 17

                                                    The terms of service pages don’t mention the limits, but the limits are documented elsewhere.

                                                    GitHub:

                                                    We recommend repositories be kept under 1GB each. Repositories have a hard limit of 100GB. If you reach 75GB you’ll receive a warning from Git in your terminal when you push. This limit is easy to stay within if large files are kept out of the repository. If your repository exceeds 1GB, you might receive a polite email from GitHub Support requesting that you reduce the size of the repository to bring it back down.

                                                    In addition, we place a strict limit of files exceeding 100 MB in size. For more information, see “Working with large files.”

                                                    GitLab (unfortunately all I can find is a blog post):

                                                    we’ve permanently raised our storage limit per repository on GitLab.com from 5GB to 10GB

                                                    Bitbucket:

                                                    The repository size limit is 2GB for all plans, Free, Standard, or Premium.

                                                    1. 9

                                                      I see. This would be a nice model for a future SourceHut to implement, but it requries engineering effort and prioritization like everything else. Right now the procedure is:

                                                      1. High disk use alarm goes off
                                                      2. Manually do an audit for large repos
                                                      3. Send emails to their owners if they seem to qualify as excessive use

                                                      Then discuss the matter with each affected user. If there are no repos which constitute excessive use, then more hardware is provisioned.

                                                      1. 11

                                                        Maybe this is something you should put on your TOS/FAQ somewhere.

                                                    2. 8

                                                      This informal process also helps to identify good limits for formalizing later.

                                                      Sounds like you have some already:

                                                      • Gigabyte-scale repos get special attention
                                                      • Giant collections of source code, such as personal forks of large projects (Linux source, nix pkgtree) are usually okay
                                                      • Giant collections of non-source-code are usually not okay, especially binary/media files
                                                      • These guidelines are subject to judgement calls
                                                      • These guidelines may be changed or refined in the future

                                                      All you have to do is say this, then next time someone tries to do this (because there WILL be a next time) you can just point at the docs instead of having to take the time to explain the policy. That’s what the terms of service is for.

                                                  2. 8

                                                    Regardless of what this specific user was trying to do, I would exercise caution. There are valid use cases for large files in a code repository. For example: Game development, where you might have large textures, audio files, or 3D models. Or a repository for a static website that contains high-res images, audio, and perhaps video. The use of things like git-lfs as a way to solve these problems is common but not universal.

                                                    To say something like, “SourceHut is a code forge, not Instagram” is to pretend these use cases are invalid, or don’t exist, or that they’re not “code”, or something.

                                                    I’ve personally used competing services like GitHub for both the examples above and this whole discussion has completely put me off ever using Sourcehut despite my preference for Mercurial over Git.

                                                    1. 4

                                                      I agree that some use-cases like that are valid, but they require special consideration and engineering work that hg.sr.ht hasn’t received yet (namely largefiles, and in git’s case annex or git-lfs). For an alpha-quality service, sometimes we just can’t support those use-cases yet.

                                                      The instragram comparison doesn’t generalize, in this case this specific repo was just full of a bunch of personal photos, not assets necessary for some software to work. Our systems aren’t well equipped to handle game assets either, but the analogy doesn’t carry over.

                                                2. 4

                                                  I don’t think the way you’re working is impossible to describe, I think it’s just hard and I think most people don’t understand the way you’re doing and building business. This means your clients may have an expectation that you will give a ToS or customer service level that you can not or will not provide

                                                  To strive towards a fair description that honours how you are actually defining things for yourself and tries to make that more transparent without having to have specific use cases, perhaps there is a direction with wording such as:

                                                  • To make a sustainable system we expect the distribution of computing resource usage and human work to follow a normal distribution. To preserve quality of service for all clients and to honour the sustainability of the business and wellbeing of our stuff and to attempt to provide a reasonably uniform and undestandable pricing model, we reserve the right to remove outliers who use an unusually large amount of any computing and/or human resource. If a client is identified as using a disproportionate amount of service, we will follow this process: (Describe fair process with notification, opportunity for communication/negotiation, fair time for resolution, clear actions if resolution is met or not).
                                                  • This system is provided for the purposes of XYZ and in order to be able to design/optimise/support this system well we expect all users to use it predominatly for this purpose. It may be the case that using our system for other things is possible, however in the case we detect this we reserve the right to (cancel service) to ensure that we do not arrive at a situation where an established client is using our service for another prupose which may perform poorly for them in the future because it is not supported, or may become disproportionately hard for us to provide computing resource or human time for because it is not part of XYZ. This will be decided at our discretion and the process we will follow if we identify a case like this is (1,2,3)
                                                  1. 2

                                                    Would it be possible to get a clear “Terms of Service” clarifying these sorts of use cases?

                                                    No, they’re intentionally vague so that we can exercise discretion. There

                                                    May I suggest, perhaps: “ToS: regular repositories have a maximum file size X and repository size Y. We provide extra space to some projects that we consider important.”

                                                    1. 1

                                                      No, they’re intentionally vague so that we can exercise discretion.

                                                      Funny way to say “so I can do whatever I want without having to explain myself”

                                                      1. 15

                                                        I think that’s unfair. He did in fact explain himself to the customer and it was the customer who decided to cancel the service. I’d agree if the data was deleted without sufficient warning, but that is not the case here.

                                                  2. 4

                                                    i’d always like to hear what provoked the “condescending replies”. sounds like a fun guy :)

                                                      1. 27

                                                        I don’t read anything condescending, he does not and know anything about you, he has to explain things clearly and can’t assume anything about what the end user does or does not understand.

                                                        1. 12

                                                          Yeah, me neither.

                                                          “I’m a contributor to Mercurial, but thanks for explaining how it’s designed to me.” is a pointlessly aggressive response unless you expect him to somehow know (and remember) that.

                                                          1. 12

                                                            On the other hand, stating that “hg is not suitable for your use case” strikes me as rather patronizing. It’s demonstrably false as evidenced by the fact that this repo exists, and has been working like this for a while on BitBucket. So clearly it works and Drew’s un-nuanced assertion is false.

                                                            Drew’s case would have been much better if he had just stated “sorry, we don’t support this particular use case” instead of saying that “you’re doing it wrong”.

                                                            I’m not trying to defend Steve here, but no one is exactly smelling like roses in this conversation. Both parties could have done better.

                                                            1. 2

                                                              That’s fair.

                                                          2. 7

                                                            Ditto, the only one being condescending as far as I can tell was @sjl. @ddevault acted in a professional manner.

                                                            1. 4

                                                              Professional in terms of tone. Not sure I would describe his decisions as professional, but that’s clearly more subject to debate, based on the number of comments here.

                                                              1. 2

                                                                My interpretation of the interaction was that @ddevault’s initial email was giving @sjl a heads up and wasn’t specifically ordering him to take down his files. @sjl took down his files voluntarily, but I think @ddevault’s initial email left open the possibility of a discussion / negotiation, which seems courteous and professional to me. I.e. there was no explicit decision made on @ddevault’s part, outside of the initial warning. Maybe I’m wrong in my interpretation

                                                      2. 5

                                                        This may be foreign to Americans where the letter of the law is important and you’re used to looking for loopholes. I think in Germany, most judges will look at intent and try to interpret the law for what it means. One of the reasons most GPL violation rulings have happened there.

                                                        Why not contact them and ask them for a ruling before deciding to do something like this? GitHub has for sure taken people offline who do stuff like this (a 9gag mirror is the most ready example).

                                                        1. 7

                                                          This may be foreign to Americans where the letter of the law is important and you’re used to looking for loopholes.

                                                          This is actually not true of US law, either. The only difference is that, as a common law country, our judges try to explain their intent in a repeatable way so it can be used by other courts.

                                                        2. 3

                                                          If you’re putting binary files into git you’re doing it wrong. One could argue about small files, but compiled code/executables, photos or “gifs for the readme” are definitely misplaced in a git repository.

                                                          1. 12

                                                            I do find that having image files in a resources/ directory for something like a website is often simpler than separating the two. Even then making sure that images are compressed and generally not bloating repo size / git history is essential.

                                                            1. 18

                                                              I do find that having image files in a resources/ directory for something like a website is often simpler than separating the two.

                                                              Yeah, the is exactly the use case here. Mercurial (and git) aren’t designed for handling large binary files, but if you’re checking in static assets/resources that rarely change it still tends to work fine. This repo was fine on Bitbucket for many years, and is working fine on an hgweb instance I’ve spun up in the mean time.

                                                              I specifically asked about limits because if it’s just the size of the repo being a technical problem for their infrastructure, I can understand. But they would not specify any limits, but just reiterated several times that Mercurial wasn’t designed for this. So I don’t know which of these was the actual problem:

                                                              1. The repo is so taxing on their infrastructure it’s causing issues for other users.
                                                              2. The repo is so large it’s costing more to store than some portion of the $100/year account price can cover.
                                                              3. They are morally opposed to me using Mercurial in a way that it wasn’t designed for (but which still works fine in practice).

                                                              Cases 1 and 2 are understandable. Setting some kind of limit would prevent those problems (you can still choose to “look the other way” for certain repos, or if it’s only code that’s being stored). Case 3 is something no limit would solve.

                                                              1. 3

                                                                If you want to store large files and you want to pay an amount proportional to the file sizes, perhaps AWS S3 or Backblaze B2 would be more appropriate than a code hosting website? I don’t mean to be obtuse, but the site is literally called source hut. Playing rules lawyer on it read like saying “Am I under arrest? So I’m free to go? Am I under arrest? So I’m free to go?” to a police officer.

                                                                1. 5

                                                                  B2 or S3 would make things more complicated than necessary for this simple repo. I’ve spun up a $5/month Linode to run hgweb and it’s been working great. I’m all set.

                                                            2. 6

                                                              This case was hg, but the same limitations are present. Hg has a special extension for supporting this:

                                                              https://www.mercurial-scm.org/wiki/LargefilesExtension

                                                              And it’s considered “a feature of last resort”. It’s not designed to deal with these use-cases.

                                                              LFS support requires dedicated engineering and operations efforts, which SourceHut has planned, but is not ready yet.

                                                              1. 5

                                                                I have a repository with mostly PNG files. Each PNG file is source code; a chunk of data inside each PNG file is machine-readable code for the graph visually encoded in that PNG’s pixels. What would you have me do?

                                                                I suspect that you would rather see my repository as a tree of text files. While this would be just as machine-readable, it would be less person-readable, and a motivating goal for this project is to have source files be visually readable in the way that they currently are, if not more so.

                                                                git would not support binary files if its authors did not think that binary-file support were not useful; that is the kind of people that they are and the kind of attitude that they have towards software design.

                                                                With all that said, I know how git works, and I deliberately attempt to avoid checking in PNGs which I think that I will have to change in a later revision. It would be quite nice if git were able to bridge this gap itself, and allow me to check in plaintext files which are automatically presented as PNGs, but this is not what git was designed to do, and we all can imagine the Makefile which I’d end up writing instead.

                                                                1. 1

                                                                  I like the project, but pardon my ignorance - aren’t the PNG files still binary assets produced by the “real” source code, which is the textual expression parsed to generate both the embedded bitstring and the dot graph? If they’re machine readable, that places them in the same category as compiled object files.

                                                                  1. 3

                                                                    The real source code is non-textual; it is the diagram (WP, nLab) which is being given as a poset (WP, nLab). To achieve optimal space usage, each poset is stored as a single integer which codes for the adjacency matrix. However, this compressed format is completely unreadable. There are several layers around it, but each layer is meant to do one thing and add a minimum of overhead; JSON (in the future, BSON or Capn) for versioning and framing, and PNG for display and transport. There isn’t really source code; there’s just a couple Python and Monte scripts that I use to do data entry, and I want them eventually automated away in favor of API-driven development.

                                                                    For example, the raw integer for this “big” poset is (at the time of writing) 11905710401280198804461645206862582864032733280538002552643783587742343463875542982826632679979531781130345962690055869140174557805164079451664493830119908249546448900393600362536375098236826502527472287219502587641866446344027189639396008435614121342172595257280100349850262710460607552082781379116891641029966906257269941782203148347435446319452110650150437819888183568953801710556668517927269819049826069754639635218001519121790080070299124681381391073905663214918834228377170513865681335718039072014942925734763447177695704726505508232677565207907808847361088533519190628768503935101450078436440078883570667613621377399190615990138641789867825632738232993306524474475686731263045976640892172841112492236837826524936991273493174493252277794719194724624788800854540425157965678492179958293592443502481921718293759598648627823849117026007852748145536301969541329010559576556167345793274146464743707377623052614506411610303673538441500857028082327094252838525283361694107747501060452083296779071329108952096981932329154808658134461352836962965680782547027111676034212381463001532108035024267617377788040931430694669554305150416269935699250945296649497910288856160812977577782420875349655110824367467382338222637344309284881261936350479660159974669827300003335652340304220699450056411068025062209368014080962770221004626200169073615123558458480350116668115018680372480286949148129488817476018620025866304409104277550106790930739825843129557280931640581742580657243659197320774352481739310337300453334832766294683618032459315377206656069384474626488794123815830298230349250261308484422476802951799392281959397902761456273759806713157666108792675886634397141328888098305747354465103699243937608547404520480305831393405718705181942963222123463560268031790155109126115213866048693391516959219000560878337219324622230146226960346469769371525338127604307953786112516810509019551617885907067412613823285538493443834790453576561810785102306389953804151473860800342221969666874213156376831068606096772785272984102609049257833898258081466729520326827598704376424140779421965233471588921765110820238036094910936640446304632443760482611408445010230964335747094869968021425396439555206085281953007985784739643408074475440039274314217788647485602069097474262381690379456154426900896918268563062231294937080146199930562645748389040251871291840481739518244706752426504146889097315360662429293711705265772337748378759001582638301784557163848933046038798381667545043026975297902178839764134784634179453671000024868722179355800776002690855305662785522771116635997791339179517016284742206819482196944663461005128697584753594559406283638837841370287286682993990297923202976404261911087739188860505577427942276773287168600954693735964671046522557013031834557159173262849132567983767216098382093390056878765856939614383049277441.

                                                                    1. 1

                                                                      Ah, okay, I see. Makes sense, thank you for explaining!

                                                                2. 4

                                                                  I’ve seen this argument quite a number of times, and almost always without a coherent explanation of why is that wrong. What’s the rationale behind this argument?

                                                                  1. 4

                                                                    Shameless plug, I contributed heavily to this help topic back when I was the PM for Microsoft’s Git server: https://docs.microsoft.com/en-us/azure/devops/repos/git/manage-large-files?view=azure-devops

                                                                    FWIW I disagree with the comment up-thread which says that GIFs for READMEs don’t belong. If you’re going to check in a logo or meme or whatever, that’s perfectly fine. Just don’t do 1000 of them and don’t churn it every week.

                                                                    1. 2

                                                                      I think a big part is also “are my tools there for me or am I slave to my tools?”

                                                                      If I have a website and most content is under version control, it’s annoying and complicated to have (big) assets outside. Most people simply want one repo with everything inside, and it’s mostly additive, often once per week - it simply doesn’t matter if it’s the wrong tool.

                                                                3. [Comment from banned user removed]

                                                                  1. 17

                                                                    Yes, a tool should provide extra restrictions on the user instead of allowing them to work freely, what a great idea. Maybe we should close the source so people can’t alter the filesize limitations, too…

                                                                    1. 4

                                                                      The ability to add large binary files to git repositories is a big gun pointing at the user’s foot, begging them to pull the trigger. Then users go and pull the trigger and are surprised that their foot gets blown off.

                                                                      Git can’t handle large binary files. Git should at least warn users when they add the large binary files and force them to do add --force to bypass the limit. Pretending to handle something then silently breaking later is much worse than being upfront from the outset that large binary files just don’t work in git.

                                                                      1. 7

                                                                        Depending on the use case though, there might be nothing wrong at all with large binary files. For example, Git will probably work completely fine if you put the assets for your game or the gif for your readme in the repo. Maybe your git hosting provider won’t like it - or maybe you don’t have a hosting provider, or maybe you self-host so it doesn’t matter if your occasional push to your remote requires a lot of CPU on the server and wastes some disk space.

                                                                        How is Git supposed to know whether adding this particular binary file to this particular repository is an issue?

                                                                        1. 2

                                                                          Git doesn’t work fine if you put the assets for your game in the repository, because it doesn’t use binary diffs. Every copy of that asset through the history of development will be stored in every copy of the repository.

                                                                          It doesn’t waste ‘some’ disk space, it wastes enormous amounts of disk space.

                                                                          1. 4

                                                                            The amount of disk space it uses really depends a lot on what you’re doing (if the textures only occasionally change, and they’re not too many gigabytes, storing every version of every texture might be what you want), and how much disk space is “an enormous amount” depends on your circumstances (you may have a self-hosted gitlab instance on a server with a bunch of terabytes of storage and use shallow clones locally).

                                                                            You can also start out with keeping the assets in source control for simplicity, and then move on to some other way of managing your assets if it turns into an issue.

                                                                            1. 1

                                                                              Yeah if things change only very infrequently that is a case where large binary files in the repository is okay. But in that case, why version control them at all? Just throw them onto an FTP server IMO.

                                                                              1. 6

                                                                                Because keeping track of versions can be useful even for files which don’t change that often? And because it’s simpler to keep the entire source for the game in one system instead of having some parts of the game in git and other parts on some ftp server?

                                                                                1. 0

                                                                                  See to me this is just… strange. I would never put the assets for a game in the same place as the code. They’re just fundamentally different types of things in my mind. They’re created separately, distributed differently, managed differently, changed differently. They use different storage formats and they “obviously” should be separated.

                                                                                  I say “obviously” because it’s obvious to me and I always assumed that it was obviously true to everyone else, but clearly I must be in the minority on this one, which is really interesting. I am surprised that people would want to put assets and code in the same place. They just feel so different and so separate to me that it’d be weird having them in the same place.

                                                                                  As an analogy, putting big binary files and source code in the same repo is to me like putting Javascript and CSS inline into your HTML. Yeah if you have a tiny bit of CSS and JavaScript you’re probably going to just chuck it inline in the <head>. But any big stylesheets and scripts you keep separate in their own files. Putting your website’s logo and icon in the repo is fine, but big media files? Not fine.

                                                                                  1. 3

                                                                                    Maybe this helps? To me, putting the implementation of my entities in git and their artwork in ftp would feel like putting the HTML and JS in git but then pull in the CSS from ftp. Sure, they’re different formats, and one part is how it works and the other part is how it looks, but they’re so tightly coupled. One doesn’t really work without the other, and a change to how the entity works might very well mean you need to add, remove or change textures. For similar reasons, it also doesn’t really make sense to go back to an earlier commit of the source code without also going back to the assets which existed at that time, which gets difficult when the assets are in FTP and they aren’t in some form of version control.