1. 7

    Using a notebook, and writing out what I’ve tried so far by hand.

    The number of circles I can drive myself in without self reflection far outweigh any amazing gdb tricks I can paste here.

    1. 1

      +1 to that. I treat it as some sort of scientific notebook and write down my hypotheses, experiments, results, etc.

    1. 1

      This was really interesting, and well put together. I’ve thought too for a while that we still have very rudimentary tools as software engineers.

      1. 1

        Thank you for sharing this. It works perfectly for me.

        1. 2

          Is there a concept of jails in Nix(OS)? Or is it somehow not as relevant?

          1. 3

            You can declare a list of docker containers to run: https://nixos.wiki/wiki/NixOS_Containers

            If you want the tight integrations, NixOS does have native support for running containers: https://nixos.org/nixos/manual/#ch-containers

          1. 1

            Are there some release notes or a changelog available?

            1. 23

              So accurate, and I’m glad Lea had the patience to put it into writing. I’m much more inclined to write off the entire ecosystem as some expletive.

              This is generally why I push juniors towards learning Elm instead of JavaScript these days. There are so many edge cases in today’s JavaScript. So many inconsistencies. So much magic one just has to know. So much conflicting information and constant churn. I feel this pain every time a junior asks me for help with something that won’t work as they expect, which these days is effectively every day.

              Juniors are surprisingly good at finding logic rules that are internally inconsistent in programming languages.

              There’s been a push lately for people to write “simple Haskell”, and to be honest I think we more desperately need a push towards simple JavaScript. That or just abstract it away entirely and allow it become the bytecode of the web.

              1. 13

                So many inconsistencies. So much magic one just has to know.

                This sounds like English.

                JS has two clear strong points: everybody knows it and it’s the lingua franca of the web. Just like how everybody knows English and it’s the lingua franca of international commerce.

                The way it is going, we will use JavaScript forever.

                Yes you could learn Elm. But when you quit your company 2 years from now, it will likely be better to have +2 years of JS than 2 years of Elm.

                1. 16

                  everybody knows it

                  I would argue against that.

                  I think it’s no coincidence that one of the most popular technical book series for this language is called You Don’t Know JS.

                  1. 9

                    Well they know it the same way most people know English. Incomplete and error-prone but most of the time still good enough.

                    1. 4

                      I think “incomplete and error-prone” is what causes user experiences the likes of which is described in the article. For an experienced programmer, that might mean giving up on some library or language. For a novice, that might mean reconsidering their interest in programming.

                  2. 5

                    It sounds a bit like C, in fact: A standardized cross-platform (ha!) language with odd corner cases everyone seems to augment with libraries to smooth over the fact it was never fully portable from the beginning. It keeps being used, so it accretes more libraries around it, making it harder to switch away from, as even if you’re writing in a different language, you need to accept that the rest of the world speaks an API designed around That Language the moment you talk to something from the outside world.

                    1. 5

                      “everybody knows it and it’s the lingua franca of the web “everybody knows English and it’s the lingua franca of international commerce”

                      Sure, if by that you mean “everyone claims to know it, but when they need to use it, many fall flat on their face, or resort to some half-assed translation system that just complicates matters”.

                    2. 24

                      You push juniors towards learning Elm, a little known language, with smaller community and fewer learning and documentation resources, and no proven longevity (just google “leaving Elm” for crying out loud)? As someone who had to pick up JS over the past year and uses it at their job, any newbie picking up Javascript ES6 and following good programming practices should have little problem. The ecosystem is a different story, but most “edge cases” and “gotchas” come from inexperience with the language. Recommending they learn some random language won’t help with a PEBCAK problem like that.

                      1. 4

                        Despite being a Haskell fan, I don’t find Elm to be an attractive target. Sure, it’s simple, but the JS interop is really messy, some aspects of setting up elements just don’t scale well, and I’ve seen far too many complaints about the core team’s behavior to assume it’s just a fluke.

                        Yes, JavaScript has a ton of weird stuff going on. However, it’s still the standard, and learning it is beneficial—even if you don’t like to use it personally—because at minimum you’ll be seeing a lot of it. The edge cases in tooling are a mess but there are improvements lying around if you scan a bit (e.g. Parcel 2 is looking far more promising over Webpack’s config soup), and most of the type system weirdness is “solved” by using TypeScript (which makes me sad since it’s unsound, but it also has some incredibly powerful type system features that many other languages don’t).

                        1. 4

                          I want to be clear that the point of my comment was not to fetishise Elm.

                          The point is that all of JavaScript’s inconsistencies makes learning how to write programs with confidence immensely more challenging.

                          JS interop is not part of learning how to write programs, and some people’s reaction to how the the language designers reacted to language feature requests (which usually added a greater surface area for potential runtime failure) is also not part of learning how to write programs.

                          Minor aside: I don’t see how Elm’s FFI is “messy”. The port system works the way I would expect it to. It might feel more cumbersome than running some IO unsafely, but effects are hard, and this is the price we pay for a smaller surface area of potential runtime failures.

                        2. 2

                          With 8+ years of JS (and TS) experience, and a smattering of Perl, Java, C, C++,C#, Elm, Ruby (Rails), Elm, Haskell, and more. I’d rather write all of those other languages combined every day than write more JS. The community always feels like they revert to whataboutisms and “it’s the language of the web!”. When it comes to writing a server, it’s perfectly acceptable to use any of a variety of languages. When it comes to writing for the browser it’s JS or nothing. Suggesting Elm is akin to suggesting cannibalism. I’d suggest Svelte, but most people think it’s JS. Why can’t the web front end be as diverse and accepting as the back end?

                        1. 2

                          I use newsboat. It’s simple and works well. It’s also interesting to watch it slowly be rewritten into Rust.

                          1. 4

                            I recently used Clojure(Script) for the first time ever along with re-frame and reagent for a simple frontend for a side-project. I liked it all overall! I have a lot of Elm experience so it wasn’t entirely foreign to me.

                            One thing that I haven’t found is docs on what are all the available functions—e.g. I was wondering how to handle on mouse enter and on mouse leave events. I guess I missed it, but it looks like you can basically just the event names in kebab-case.

                            1. 2

                              I’m using Clojure(Script) for the first time in a side-project for a simple website to interact with an API. I’m a Vim user but I’m sort of lost in between:

                              • SLIME and all the variations, incl. these 2 plugins (though one seems for Common Lisp)
                              • liquidz/vim-iced
                              • tpope/vim-fireplace
                              • CIDER (and nREPL?) and all the variations
                              • snoe/clojure-lsp
                              • Parens plugins like eraserhd/parinfer-rust and guns/vim-sexp

                              I’m used to just setting up a language-server for a language and more or less going on from there. So far I really likeeraserhd/parinfer-rust and snoe/clojure-lsp but it seems like I’m missing out on other things? I realise that some of the above overlap/call each other. I suppose that some of my uncertainty comes from being new to Lisp and how the development flow works a little bit different to other languages?

                              1. 7

                                It’s nice that more people are leaning into deterministic simulation while building correctness-critical distributed systems.

                                It’s not clear to me what they mean by strong consistency for a system that replicates multiple concurrent uncoordinated modifications on different devices, it would be nice if they went into that claim a bit more.

                                1. 7

                                  yeah, the deterministic simulation is my favorite tech in the whole project. it’s caught all types of bugs, from simple logic errors to complicated race conditions that we would have never thought to test. I think there’s some interesting work out there to bring more of this “test case generation” style of testing to a larger audience…

                                  It’s not clear to me what they mean by strong consistency for a system that replicates multiple concurrent uncoordinated modifications on different devices, it would be nice if they went into that claim a bit more.

                                  ah, sorry this wasn’t worded too clearly. we broke the sync protocol down into two subproblems: 1) syncing a view of the remote filesystem to the clients and 2) allowing clients to propose new changes to the remote filesystem. then, the idea is that we’d solve these two problems with strong consistency guarantees, and then we’d use these protocols for building a more operational transform flavored protocol on top.

                                  we took this approach since protocol-level inconsistencies were very common with sync engine classic’s protocol. we spent a ton of time debugging how a client’s view of the remote filesystem got into a bizarre state or why they sent up a nonsensical filesystem modification. so, it’d be possible to build a serializable system on our core protocol, even though we don’t, and that strength at the lowest layer is still really useful.

                                  1. 2

                                    deterministic simulation is my favorite tech in the whole project

                                    Any tips on where to get started on this?

                                    1. 2

                                      Any tips on where to get started on this?

                                      the threads on this post are a good place to start: https://lobste.rs/s/ob6a8z/rewriting_heart_our_sync_engine#c_8zixa2 and https://lobste.rs/s/ob6a8z/rewriting_heart_our_sync_engine#c_ab2ysi. we also have a next blog post on testing currently in review :)

                                      1. 1

                                        Thank you! I’m looking forward to the next blog post too :)

                                    2. 2

                                      yeah, the deterministic simulation is my favorite tech in the whole project. it’s caught all types of bugs, from simple logic errors to complicated race conditions that we would have never thought to test. I think there’s some interesting work out there to bring more of this “test case generation” style of testing to a larger audience…

                                      I’ve been digging into a whole bunch of approaches as to how people do deterministic simulation. I’m really curious—how does your approach work? Can you provide some sort of gist/code example as to how those components are structured?

                                      1. 7

                                        ah, I don’t have a good code sample handy (but we’ll prepare one for our testing blog post). but here’s the main idea –

                                        1. we write almost all of our logic on a single thread, using futures to multiplex concurrent operations on a single thread. then, we make sure all of the code on that thread is deterministic with fixed inputs. there’s lots of ways code can sneak in a dependency on a global random number generator or time.
                                        2. have traits for the interfaces between the control thread and other threads. we also mock out external time behind a trait too.
                                        3. then, wrap each real component in a mock component that pauses all requests and puts them into a wait queue.

                                        now, instead of just calling .wait on the control thread future, poll it until it blocks (i.e. returns Async::NotReady). this means that the control thread can’t make any progress until some future it’s depending on completes. then, we can look at the wait queues and psuedorandomly unpause some subset of them and then poll the control thread again. we repeat this process until the test completes.

                                        all of these scheduling decisions are made psuedorandomly from a fixed RNG seed that’s determined at the beginning of the test run. we can also use this seed for injecting errors, generating initial conditions, and “agitating” the system by simulating other concurrent events. the best part is that once we find a failure, we’re guaranteed that we can reproduce it given its original seed.

                                        in fact, we actually don’t even log in CI at all. we run millions of seeds every day and then if CI finds a failure, it just prints the seed and we then run it locally to debug.

                                        1. 4

                                          There are so many beautiful properties of a system that is amenable to discrete event simulation.

                                          • You can use the seed to generate a schedule of events that happen in the system. When invariants are violated, you can shrink this generated history to the minimal set that reproduces the violation, like how quickcheck does its shrinking (I usually just use quickcheck for generating histories though). This produces minimized histories that are usually a lot simpler to debug, as causality is less blurred by having hundreds of irrelevant concurrent events in-flight. Importantly, separating the RNG from the generated schedule allows you to improve your schedule generators while keeping the actual histories around that previously found bugs and reusing them for regression tests. Otherwise every time you improve your generator you destroy all of your regression tests because the seeds no longer generate the same things.
                                          • Instead of approaching it from a brute force exploration of the interleaving+fault space, it’s often much more bug:instruction efficient to start with what has to go right for a desired invariant-respecting workload, and then perturbing this history to a desired tree depth (fault tolerance degree). Lineage Driven Fault Injection can be trivially applied to systems that are simulator friendly, allowing bugs to be sussed out several orders of magnitude more cheaply than via brute force exploration.
                                          • This approach can be millions of times faster than black-box approaches like jepsen, allowing engineers to run the tests on their laptops in a second or two that would have taken jepsen days or weeks, usually with far less thorough coverage.

                                          Simulation is the only way to build distributed systems that work. I wrote another possible simulation recipe here but there are many possible ways of doing it, and different systems will benefit from more or less complexity in this layer.

                                          1. 3

                                            thanks for the pointer for the molly paper, looks really interesting.

                                            here’s another idea I was playing with a few months ago: instead of viewing the input to the test as random seed, think of it as an infinite tape of random bits. then, the path taken through the program is pretty closely related to different bits in the input. for example, sampling whether to inject an error for a request is directly controlled by a bit somewhere in the input tape.

                                            this is then amenable to symbolic execution based fuzzing, where the fuzzer watches the program execution and tries to synthesize random tapes that lead to interesting program states. we actually got this up and working, and it found some interesting bugs really quickly. for example, when populating our initial condition, we’d sample two random u64s and insert them into a hashmap, asserting that there wasn’t a collision. the symbolic executor was able to reverse the hash function and generate a tape with two duplicate integers in the right places within minutes!

                                            but, we didn’t actually find any real bugs with that approach during my limited experiments. I think the heuristics involved are just too expensive, and running more black box random search in the same time is just as effective. however, we do spend time tuning our distributions to get good program coverage, and perhaps with a more white box approach that wouldn’t be necessary.

                                            1. 3

                                              I’ve also had disappointing results when combining fuzzing techniques with discrete event simulation of complex systems. My approach has been to have libfuzzer (via cargo fuzz) generate a byte sequence, and have every 2-4 bytes serve as a seed for a RNG that generates a scheduled event in a history of external inputs. This approach actually works extremely well for relatively small codebases, as a burst of Rust projects experienced a lot of nice bug discovery when this crate was later released, but the approach really never took off for my use in sled where it dropped the throughput so much that the coverage wasn’t able to stumble on introduced bugs anywhere close to as fast as just running quickcheck uninstrumented.

                                              I’ve been meaning to dive into Andreas Zeller’s Fuzzing Book to gain some insights into how I might be able to better apply this technique, because I share your belief that it feels like it SHOULD be an amazing approach.

                                              1. 4

                                                here’s another pointer for interesting papers if you’re continuing down this path: https://people.eecs.berkeley.edu/~ksen/cs29415fall16.html?rnd=1584484104661#!ks-body=home.md

                                                I’ve kind of put it on the backburner for now, but it’s good to hear that you’ve reached similar conclusions :)

                                        2. 4

                                          I don’t know anything about this project, but I do work on a system that has this property, I guess. There are lots of approaches but for me it’s just an exercise in designing the components carefully.

                                          First, you want to draw strict borders between the core protocol or domain or business logic, and the way the world interacts with that core. This is a tenet of the Clean Architecture or the Hexagonal Architecture or whatever, the core stuff should be pure and only know about its own domain objects, it shouldn’t know anything about HTTP or databases or even the concept of physical time. Modeling time as a dependency in this way takes practice, it’s as much art as it is science, and it depends a lot on your language.

                                          Second, you want to make it so that if the core is just sitting there with no input, it doesn’t change state. That means no timers or autonomous action. Everything should be driven by external input. This can be synchronous function calls — IMO this is the best model — but it can also work with an actor-style message passing paradigm. There are tricks to this. For example, if your protocol needs X to happen every Y, then you can model that as a function X that you require your callers to call every Y.

                                          Once you have the first step, you can implement in-memory versions of all of your dependencies, and therefore build totally in-memory topologies however you like. Once you have the second step, you have determinism that you can verify, and, if you’re able to model time abstractly rather than concretely, you can run your domain components as fast as your computer can go. With the two combined you can simulate whatever condition you want.

                                          I hope this makes sense. I’m sure /u/spacejam has a slightly? majorly? different approach to the challenge.

                                          1. 3

                                            this is spot on! we still have periodic timers in our system, but we hook them into our simulation of external time. there’s some special casing to avoiding scheduling a timer wakeup when there’s other queued futures, but it more-or-less just works.

                                            1. 2

                                              this is spot on!

                                              Nice to hear I’m not totally off base :)

                                            2. 2

                                              I totally agree that if you can make your core business logic a pure function, it dramatically improves your ability to understand your system. What you said about time is also possible for most things that flip into nondeterminism in production:

                                              • random number generators can be deterministic when run in testing
                                              • threads/goroutines can be executed in deterministic interleavings under test with the use of linux realtime priorities / instrumented scheduling / ptrace / gdb scripts etc…

                                              Determinism can be imposed on things in test that need a bit of nondeterminism in production to better take advantage of hardware parallelism. You lose some generality - code that you compile with instrumented scheduler yields that runs in a deterministic schedule for finding bugs will trigger different cache coherency traffic and possibly mask some bugs if you’re relying on interesting atomic barriers for correctness, as the scheduler yield will basically shove sequentially consistent barriers at every yield and cause some reordering bugs to disappear, but that’s sort of out-of-scope.

                                              There are some fun techniques that allow you to test things with single-threaded discrete event simulation but get performance via parallelism without introducing behavior that differs from the single threaded version:

                                              • software transactional memory
                                              • embedded databases that support serializable transactions
                                              • commutative data structures
                                      1. 3

                                        To extend the language control structures (loops, match), to embed a different computing paradigm other than recursion theory (hello FSA), to make a DSL (maybe not that of a good idea), to generate code using the same programming language.

                                        1. 1

                                          to embed a different computing paradigm other than recursion theory (hello FSA)

                                          Do you know of any available examples of this?

                                          1. 2

                                            I read it here and it was very interesting, you can do at a type level but it requires a lot of thinking while this is very streamlined. Another interesting take I havent read yet is the virtual machine section of SICP.

                                            1. 1

                                              Thank you. For anyone wanting to read the paper, I found it available on the author’s site. [1]

                                              [1] https://cs.brown.edu/~sk/Publications/Papers/Published/sk-automata-macros/paper.pdf

                                        1. 4

                                          It’s amazing how often rewrites for fun happen in our industry. If this was any other industry than programming people would be shocked.

                                          1. 30

                                            Reading the article, it does look like they hit some performance issue, due to how the Go GC works, and they tried Rust because it was already seeing adoption at Discord. I would not call that “just for fun”.

                                            On the other hand, a lot of things change (improve?) in our industry because we hack stuff just for fun. :)

                                            1. 6

                                              This happens in other industries too. Youtube is full of metalworkers/MechE’s, EE’s, chemists, woodworkers, and so on hacking their tools, twiddling with silly ideas just for fun, breaking stuff, trying out old ideas in new ways or vice versa, and so on. The only thing special about software is you can do it sitting in front a computer instead of needing a whole workshop or lab.

                                              1. 4

                                                It’s even codified, the car industry has time frames in which a whole car is essentially completely replaced.

                                              2. 3

                                                Even though rather expensive, doing a rewrite is usually not so bad in our industry than it would be in many others. Imagine replacing a bridge with a new one just because you got slightly better steel now than last year.

                                                1. 11

                                                  Imagine building a new bridge because there was more traffic than would fit on the bridge you already have, and deciding you might as well use the new steel while you’re at it.

                                                2. 1

                                                  What’s the equivalent of rewrites in other industries?

                                                  1. 5

                                                    Remodeling the kitchen.

                                                    1. 1

                                                      My own perspective: if software engineering is like writing then it’s a rewrite or a reimagining. If it’s math then it’s something like string theory or where a trail leads nowhere so you go back and start over from a new fundamental. I think software is mostly writing with a bit of the math in my analogies. I don’t think it’s like physical construction but sometimes I use those analogies.

                                                  1. 1

                                                    ../log/2020/20-01-04.txt — a file for every day, with a shell script to make/edit the file. I also used Day One for a couple of years, and still like it.

                                                    1. 2

                                                      care to share the shell script?

                                                      1. 2

                                                        Sure thing:

                                                        #!/bin/sh
                                                        
                                                        mkdir -p ~/Documents/log/$(date +"%Y")/$(date +"%m")
                                                        echo "\n\n---\n\n" > ~/Documents/log/$(date +"%Y")/$(date +"%m")/$(date +"%y-%m-%d").txt
                                                        
                                                    1. 1

                                                      VPS (FreeBSD, with everything in a separate jail):

                                                      • ZNC
                                                      • git
                                                      • portfolio
                                                      • side-projects

                                                      Raspberry Pi:

                                                      • Pi-hole
                                                        1. 2

                                                          Please do!

                                                        1. 10

                                                          @ddevault Would it be possible to get a clear “Terms of Service” clarifying these sorts of use cases? 1.1 Gb seems like an excessive file size, but having a crystal clear & mutually agreed upon set of rules for platform use is essential for trust (more so for a paid service), and right now users don’t know what does and does not constitute as a reasonable use of the service .

                                                          1. 37

                                                            No, they’re intentionally vague so that we can exercise discretion. There are some large repositories which we overlook, such as Linux trees, pkgsrc, nixpkgs, even mozbase is overlooked despite being huge and expensive to host.

                                                            In this guy’s case, he had uploaded gigabytes of high-resolution personal photos (>1.1 Gb - it takes up more space and CPU time on our server than on your workstation because we generate clonebundles for large repos). It was the second largest repository on all of SourceHut. SourceHut is a code forge, not Instagram.

                                                            1. 40

                                                              No, they’re intentionally vague so that we can exercise discretion.

                                                              I like to call this “mystery meat TOS”. You never know what you’ll get until you take a bite!

                                                              1. 24

                                                                I mean, honestly, a small fraction of our users hit problems. I’ve had to talk to <10 people, and this guy is the only one who felt slighted. It’s an alpha-quality service, maybe it’ll be easier to publish objective limits once things settle down and the limitations are well defined. On the whole, I think more users benefit from having a human being making judgement calls in the process than not, because usually we err on the side of letting things slide.

                                                                Generally we also are less strict on paid accounts, but the conversation with this guy got hostile quick so there wasn’t really an opportunity to exercise discretion in his case.

                                                                1. 30

                                                                  the conversation with this guy got hostile quick

                                                                  Here’s the conversation, for folks who want to know what “the conversation got hostile” means to Source Hut: https://paste.stevelosh.com/18ddf23cb15679ac1ddca458b4f26c48b6a53f11

                                                                  1. 32

                                                                    i’m not a native speaker, but have the feeling that you got defensive quickly:

                                                                    Okay. I guess I assumed a single 1.1 gigabyte repository wouldn’t be an unreasonable use of a $100/year service. I certainly didn’t see any mention of a ban on large binary files during the sign up or billing process, but I admit I may have missed it. I’ve deleted the repository. Feel free to delete any backups you’ve made of it to reclaim the space, I’ve backed it up myself.

                                                                    it’s a pay-what-you-like alpha service, not backed by venture capital. you got a rather friendly mail, noticing you that you please shouldn’t put large files into hg, not requesting that you delete it immediately.

                                                                    ddevaults reply was explaining the reasoning, not knowing that you are a mercurial contributor:

                                                                    Hg was not designed to store large blobs, and it puts an unreasonable strain on our servers that most users don’t burden us with. I’m sorry, but hg is not suitable for large blobs. Neither is git. It’s just not the right place to put these kinds of files.

                                                                    i’m not sure i’d label this as condescending. again I’m no native speaker, so maybe i’m missing nuances.

                                                                    after that you’ve cancelled your account.

                                                                    1. 13

                                                                      As a native speaker, your analysis aligns with how I interpreted it.

                                                                      1. 9

                                                                        Native speaker here, I actually felt the conversation was fairly polite right up until the very end (Steve’s last message).

                                                                      2. 14
                                                                      3. 28

                                                                        On the whole, I think more users benefit from having a human being making judgement calls in the process than not, because usually we err on the side of letting things slide.

                                                                        Judgement calls are great if you have a documented soft limit (X GB max repo size / Y MB max inner repo file size) and say “contact me about limit increases”. Your customers can decide ahead of time if they will meet the criteria, and you get the wiggle room you are interested in.

                                                                        Judgement calls suck if they allow users to successfully use your platform until you decide it isn’t proper/valid.

                                                                        1. 12

                                                                          That’s a fair compromise, and I’ll eventually have something like this. But it’s important to remember that SourceHut is an alpha service. I don’t think these kinds of details are a reasonable expectation to place on the service at this point. Right now we just have to monitor things and try to preempt any issues that come up. This informal process also helps to identify good limits for formalizing later. But, even then, it’ll still be important that we have an escape hatch to deal with outliers - the following is already in our terms of use:

                                                                          You must not deliberately use the services for the purpose of:

                                                                          • impacting service availability for other users

                                                                          It’s important that we make sure that any single user isn’t affecting service availability for everyone else.

                                                                          Edit: did a brief survey of competitor’s terms of service. They’re all equally vague, presumably for the same reasons

                                                                          GitHub:

                                                                          [under no circumstances will you] use our servers for any form of excessive automated bulk activity (for example, spamming or cryptocurrency mining), to place undue burden on our servers through automated means, or to relay any form of unsolicited advertising or solicitation through our servers, such as get-rich-quick schemes;

                                                                          The Service’s bandwidth limitations vary based on the features you use. If we determine your bandwidth usage to be significantly excessive in relation to other users of similar features, we reserve the right to suspend your Account, throttle your file hosting, or otherwise limit your activity until you can reduce your bandwidth consumption

                                                                          GitLab:

                                                                          [you agree not to use] your account in a way that is harmful to others [such as] taxing resources with activities such as cryptocurrency mining.

                                                                          At best they give examples, but always leave it open-ended. It would be irresponsible not to.

                                                                          1. 17

                                                                            The terms of service pages don’t mention the limits, but the limits are documented elsewhere.

                                                                            GitHub:

                                                                            We recommend repositories be kept under 1GB each. Repositories have a hard limit of 100GB. If you reach 75GB you’ll receive a warning from Git in your terminal when you push. This limit is easy to stay within if large files are kept out of the repository. If your repository exceeds 1GB, you might receive a polite email from GitHub Support requesting that you reduce the size of the repository to bring it back down.

                                                                            In addition, we place a strict limit of files exceeding 100 MB in size. For more information, see “Working with large files.”

                                                                            GitLab (unfortunately all I can find is a blog post):

                                                                            we’ve permanently raised our storage limit per repository on GitLab.com from 5GB to 10GB

                                                                            Bitbucket:

                                                                            The repository size limit is 2GB for all plans, Free, Standard, or Premium.

                                                                            1. 9

                                                                              I see. This would be a nice model for a future SourceHut to implement, but it requries engineering effort and prioritization like everything else. Right now the procedure is:

                                                                              1. High disk use alarm goes off
                                                                              2. Manually do an audit for large repos
                                                                              3. Send emails to their owners if they seem to qualify as excessive use

                                                                              Then discuss the matter with each affected user. If there are no repos which constitute excessive use, then more hardware is provisioned.

                                                                              1. 11

                                                                                Maybe this is something you should put on your TOS/FAQ somewhere.

                                                                            2. 8

                                                                              This informal process also helps to identify good limits for formalizing later.

                                                                              Sounds like you have some already:

                                                                              • Gigabyte-scale repos get special attention
                                                                              • Giant collections of source code, such as personal forks of large projects (Linux source, nix pkgtree) are usually okay
                                                                              • Giant collections of non-source-code are usually not okay, especially binary/media files
                                                                              • These guidelines are subject to judgement calls
                                                                              • These guidelines may be changed or refined in the future

                                                                              All you have to do is say this, then next time someone tries to do this (because there WILL be a next time) you can just point at the docs instead of having to take the time to explain the policy. That’s what the terms of service is for.

                                                                          2. 8

                                                                            Regardless of what this specific user was trying to do, I would exercise caution. There are valid use cases for large files in a code repository. For example: Game development, where you might have large textures, audio files, or 3D models. Or a repository for a static website that contains high-res images, audio, and perhaps video. The use of things like git-lfs as a way to solve these problems is common but not universal.

                                                                            To say something like, “SourceHut is a code forge, not Instagram” is to pretend these use cases are invalid, or don’t exist, or that they’re not “code”, or something.

                                                                            I’ve personally used competing services like GitHub for both the examples above and this whole discussion has completely put me off ever using Sourcehut despite my preference for Mercurial over Git.

                                                                            1. 4

                                                                              I agree that some use-cases like that are valid, but they require special consideration and engineering work that hg.sr.ht hasn’t received yet (namely largefiles, and in git’s case annex or git-lfs). For an alpha-quality service, sometimes we just can’t support those use-cases yet.

                                                                              The instragram comparison doesn’t generalize, in this case this specific repo was just full of a bunch of personal photos, not assets necessary for some software to work. Our systems aren’t well equipped to handle game assets either, but the analogy doesn’t carry over.

                                                                        2. 4

                                                                          I don’t think the way you’re working is impossible to describe, I think it’s just hard and I think most people don’t understand the way you’re doing and building business. This means your clients may have an expectation that you will give a ToS or customer service level that you can not or will not provide

                                                                          To strive towards a fair description that honours how you are actually defining things for yourself and tries to make that more transparent without having to have specific use cases, perhaps there is a direction with wording such as:

                                                                          • To make a sustainable system we expect the distribution of computing resource usage and human work to follow a normal distribution. To preserve quality of service for all clients and to honour the sustainability of the business and wellbeing of our stuff and to attempt to provide a reasonably uniform and undestandable pricing model, we reserve the right to remove outliers who use an unusually large amount of any computing and/or human resource. If a client is identified as using a disproportionate amount of service, we will follow this process: (Describe fair process with notification, opportunity for communication/negotiation, fair time for resolution, clear actions if resolution is met or not).
                                                                          • This system is provided for the purposes of XYZ and in order to be able to design/optimise/support this system well we expect all users to use it predominatly for this purpose. It may be the case that using our system for other things is possible, however in the case we detect this we reserve the right to (cancel service) to ensure that we do not arrive at a situation where an established client is using our service for another prupose which may perform poorly for them in the future because it is not supported, or may become disproportionately hard for us to provide computing resource or human time for because it is not part of XYZ. This will be decided at our discretion and the process we will follow if we identify a case like this is (1,2,3)
                                                                          1. 2

                                                                            Would it be possible to get a clear “Terms of Service” clarifying these sorts of use cases?

                                                                            No, they’re intentionally vague so that we can exercise discretion. There

                                                                            May I suggest, perhaps: “ToS: regular repositories have a maximum file size X and repository size Y. We provide extra space to some projects that we consider important.”

                                                                            1. 1

                                                                              No, they’re intentionally vague so that we can exercise discretion.

                                                                              Funny way to say “so I can do whatever I want without having to explain myself”

                                                                              1. 15

                                                                                I think that’s unfair. He did in fact explain himself to the customer and it was the customer who decided to cancel the service. I’d agree if the data was deleted without sufficient warning, but that is not the case here.

                                                                          1. 7

                                                                            I really liked this article. Having gotten in functional programming with Elm for the last 2 years, the nonEmpty type is brilliant, and I’m going to re-implement it in Elm. The tie in of language theoretic security was a nice touch. I’ve been promoting that at work for a while.

                                                                            it’s extremely easy to forget.

                                                                            Anything that can be forgotten by a developer, will be forgotten.

                                                                            A better solution is to choose a data structure that disallows duplicate keys by construction

                                                                            “Making Impossible States Impossible” by Richard Feldman is a talk on effectively the same concept.

                                                                            1. 8

                                                                              However, sometimes it is quite annoying to conflate the type and structure of the list with the fact that it is nonempty. For example the functions from Data.List don’t work with the NonEmpty type. I think the paper on “departed proofs” linked near the bottom points to a different approach where various claims about a value are represented as separate proof objects.

                                                                              1. 3

                                                                                In this instance, the fact that both [] (lists) and NonEmpty are instances of Foldable will help: http://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Foldable.html

                                                                              2. 2

                                                                                I ran across a case of the nonempty approach in File.Select.files the other day:

                                                                                Notice that the function that turns the resulting files into a message takes two arguments: the first file selected and then a list of the other selected files. This guarantees that one file (or more) is available. This way you do not have to handle “no files loaded” in your code. That can never happen!

                                                                                1. 1

                                                                                  Nice find! I’ve actually used that before but hadn’t made the connection, haha.

                                                                              1. 3

                                                                                zsh on macOS with zplugin for managing plugins (it’s by far the fastest in my experience, and others have done benchmarks).

                                                                                Why zsh? It’s popular and isn’t bash. I’ve thought about trying fish but haven’t gotten around to it.

                                                                                1. 2

                                                                                  I don’t think I’d be placated until they address the comment made by the CFO and why the speak of collaboration when there’s so many existing issues on their public-facing projects.

                                                                                  1. 3

                                                                                    I’ve used Elm professionally for about a year and a half, maybe longer, and we’ve had more or less the exact same experience. I’ve also recently lived in Norway, and have used Vy (before it’s name change).

                                                                                    It has made me more curious about more ‘advanced’ functional languages like PureScript, and I wish there was a good comparison of Elm to it (and also other languages such as ReasonML).

                                                                                    1. 3

                                                                                      I don’t use Elm very much but I have used a good amount of Purescript (and Typescript), and having simple JS interop is such a game changer. Really wish that it could stick around.Elm works well for a lot of UI stuff but it’s just annoying to have to “do stuff” when I have some existing JS.

                                                                                      Though kinda ironically I think Purescript is a really good backend language. Effects systems are super valuable on server-side code but don’t actually tend to be that helpful in frontend code (beyond how they’re used for dictionaries).

                                                                                      1. 2

                                                                                        Are effect systems not useful for error management for in frontend code?

                                                                                        1. 1

                                                                                          Effects systems are super valuable on server-side code but don’t actually tend to be that helpful in frontend code (beyond how they’re used for dictionaries).

                                                                                          Mind elaborating on this?

                                                                                          1. 1

                                                                                            I wrote a thing about this a couple years ago, basically the granular effects system of purescript let you track DB reading and DB writing separately, to let you establish stronger guarantees about what kind of IO is happening in a function

                                                                                            http://rtpg.co/2016/07/20/supercharged-types.html

                                                                                            Some other examples over the years that I would find useful:

                                                                                            • an effect like “DB access of unknown result size”. For example a raw SQL query on a table without any sort of LIMIT could potentially return a lot of data at once, whereas in web requests you want to have consistent, fast replies (so you should opt for pagination instead)

                                                                                            • an effect like “accesses multi-tenant data”. This will let you determine what parts of your system are scoped down to a single tenant and which parts are scoped to multi-tenant data

                                                                                            • An effect like “makes IO requests to an external service”. You could use this to qualify your SLAs in certain parts of your system (your own system probably should be built for higher expected uptime than some random third party)

                                                                                            • An effect like “locks this resource”. You can use this to make sure you unlock the resource at a later data. Normally this is accomplished through a simple bracket pattern but with effects systems you can opt for different styles.

                                                                                            Because the row polymorphism doesn’t force you to stack monads you avoid the mess of writing up a bunch of transformers and unwrapping/rewrapping items to get at the innards.

                                                                                          2. 1

                                                                                            Since we’re a startup we had effectively 0 existing code to worry about, which I guess make it easier in that regard for us.

                                                                                            Do you use the more ‘complex’ functional concepts like type classes?

                                                                                            I wonder if anyone would pick Purescript over Haskell for the backend.

                                                                                          3. 2

                                                                                            For us, at least, the relative simplicity of Elm was a big part of being able to move existing developers over to it.

                                                                                            There were definitely times I missed metaprogramming or other fancy features, but I think hiding a lot of ‘magic’ is intimidating (even though they’re coming from a JavaScript background, where cheaty magic is just how things get done).

                                                                                            Our experience also aligned with this post. We didn’t yet fall into the trap of trying to use elm-git-install for sub-dependencies (maintaining first-level dependencies is time-consuming enough) but it’ll probably happen sooner or later.

                                                                                            1. 2

                                                                                              You’re right about the simplicity making it easy for new developers to get started and feel confident with the language.

                                                                                              I personally feel like I’m now wanting more complex concepts (metaprogramming, type classes, ..).

                                                                                              We too haven’t reached that point, but I could see that in a year.

                                                                                              1. 2

                                                                                                I really did miss type classes. Partially specified types is an ‘ok’ workaround in some cases, but it still felt incomplete. Especially not being able to extend Num.

                                                                                                1. 1

                                                                                                  What do you think about nikita-volkov/typeclasses? Useful or not really?

                                                                                                  1. 2

                                                                                                    Wasn’t aware of that package at the time (loved the book, BTW) - but it looks like it might clean up some of the ugliness we had around non-primitive comparable. I’ll have to see if that works out when I’m on that project next. Thanks!