1. 110

    I tell anyone asking me for career advice the same two things.

    The first: the deeper in the world’s dependency tree you are, the less frequently things will churn, and the longer your skills will last. TCP doesn’t change very often. Theoretical skills may be applicable for your entire career. Human skills are more durable than any technical skill. Kernels don’t change very often (but more than TCP). Databases don’t change very often (but more than kernels). There is a spectrum of skill durability, and you will burn out faster if you find that all of your skills become worthless after a very short time. Dependency forces things not to change their interface, which causes the work to shift toward performance and reliability among other things that some people find far more rewarding over time.

    The second: the more people who do what you do, the worse you will be treated, the more BS you will have to put up with, the worse your pay will be, the faster you will be fired, the harder it will be to find a job that values you, etc… etc… Supply and demand applies to our labor market, and if you want to be happier, you should exploit this dynamic as heavily as possible. Avoid competition like the plague. But don’t avoid funding. How do you avoid competition without going off into the wilderness where there is no money to be made? Hype drives funding, but it also drives a lot of competition. However, using rule #1 above, the hyped things depend on other things. Many of these dependencies are viewed as “too hard” for one reason or another. That’s the best place to be. Go where other people are afraid, but nevertheless have a lot of money depending on.

    All hyped things rely on things that for one reason or another are not commonly understood, and tend not to change quickly. That’s a good place to find work involving durable skills that tend to have lower competition. Go where the dependency is high but the competition is low, and you have a better chance of being happy than people who go where the competition is high or the dependency is low. Bonus points if it’s actually “hard” because then you won’t get bored as quickly.

    There are areas of front-end that are high-dependency, durable, slow-changing, and low-competition. That’s where engineers are likely to be happiest. But these two principles apply to every field or zooming out to any business generally. I’m pretty happy working on new distributed systems and database storage engines for the time being. But I’m always looking for the things that are viewed as hard while also receiving significant investment, as these are the things that will ultimately give me more opportunities to live life on my own terms.

    1. 10

      Go where other people are afraid, but nevertheless have a lot of money depending on.

      There is an old Yorkshire saying: “where’s there’s muck, there’s brass”.

      1. 7

        This is so true, I’ve gone to my car to fetch my laptop just to upvote and comment. It’s an exceptionally important piece of advice I wish I had understood as early as possible in life, but I didn’t.

        CS (pure) and Math degrees are so good because they teach you really basic theories that are mostly timeless. Whenever I’ve gravitated towards more trendy or applied skills, either in coursework or in jobs, there’s always been a really poor and transient ROI.

        […] using rule #1 above, the hyped things depend on other things. Many of these dependencies are viewed as “too hard” for one reason or another. That’s the best place to be.

        What are some examples of these dependencies right now, or in the near future?

        1. 6

          Thank you very much for this post. Great distillation of essential career advice, especially the part about the durability of human skills. So many developers would derive far more value from a single public speaking or improv class than from learning yet another new programming language.

          1. 3

            Oh man, the number of times I’ve echoed this exact same message to others almost verbatim in the first two paragraphs. Thanks for posting this. Thinking about my career in this way a few years ago was probably the most valuable change I made.

            1. 4

              Thank you for the kind words :)

              Large-scale ML, blockchain, IoT, serverless, k8s, etc… are all areas recently flooded by newly minted experts in the topical skills, but like the Australian poop diver who claims to have never worked a day in his life, there are great opportunities for high-respect jobs in the dirty internals of the systems :) Particularly with this set of hyped tech, there are very few people who seem to specialize in getting bugs to jump out in distributed systems. Everybody is still writing almost all of them in a way that assumes the happy path. But there are techniques for building all of these systems in ways that encourage the race conditions to jump out. The few people who take pride in building correct distributed systems will have their plates full for a while!

              Another reason why this kind of bug hunting and prevention is not all that popular may be that the sorta-similar yet way-cooler-seeming field of systems security tends to absorb a lot of the people who may have otherwise been predisposed to this type of work.

            2. 1

              love this comment. like the other commenter, do you have any examples of “hard, but trendy” in frontend or elsewhere?

              1. 7

                Spitballing, I’m going to google “hyped programming field” and see what comes up, then I’ll try to break it into areas for investigation. Ok, my results on the first page seemed to be about AI, python, golang. I mentioned high-scale ML depending on distributed systems skills and correctness skills above, so I’ll think about the others. Python feels harder for me to answer so let’s dig into that.

                Python is immensely popular and I think that becoming an expert in alleviating any friction point that people often hit will be lucrative. But what about things that SOUND really boring? When I think about things that suck about Python, the most dreadful thing that comes to my mind is migrating large applications from Python 2 to Python 3. Python 2 stuff is still everywhere, and it’s increasingly hazardous over time because its support has run out. But people don’t want to touch things that are still ticking along. Legacy systems, like geriatric medical care, becomes more important every day.

                But despite being extremely important and an excellent place to apply interesting mechanical (yet supervised) translation and correctness analysis during modernization, so many people have been burned by consent issues at work where their managers forced them to work on legacy despite them not wanting to work on them. So much of the overall programming population has been burned by non-consensual legacy engineering that almost everyone wants to stay as far away as possible. The only question is how to find the companies who realize that modernizing their legacy systems is actually something you want to pay a specialist to do instead of forcing the task on junior devs with no negotiating power. Banks are well-known for paying consultants for this kind of work, but to be honest I’m not sure.

                Maybe Python 2 modernization isn’t exactly a golden ticket due to the difficulty in finding companies who are willing to hire specialists to perform this task. Maybe it’s too easy. But may it’s not. I’m not sure. In any case, this maybe demonstrates the general search technique, and can be used to illuminate further areas for research. If general Python 2 modernization is “too easy” then you can slap on more filters. Python 2 modernization for automotive codebases. That sounds much more correctness-critical and likely to pay a specialist to accomplish.

                Anyway, the general pattern is: what feels repulsive and hard? Maybe that sense of repulsion creates a dynamic where the few people who do the thing are extremely well treated due to their scarcity. If that scarcity of talent aligns with an extremely popular overall field, there’s a good chance that there are folks who require a specialist to address this niche.

                1. 6

                  Anyway, the general pattern is: what feels repulsive and hard? Maybe that sense of repulsion creates a dynamic where the few people who do the thing are extremely well treated due to their scarcity. If that scarcity of talent aligns with an extremely popular overall field, there’s a good chance that there are folks who require a specialist to address this niche.

                  Here’s an example of scarcity working out well for an expert. Another company in my industry was converting to a different accounting system. These projects are incredibly complex and can last years. Part of the process required recreating hundreds of templates for the new system. One contractor handled the template conversion; she retired at a young age after the project was complete.

                  1. 3

                    non-consensual legacy engineering

                    Have you written about this anywhere? I’d be curious to hear more about it. It immediately struck a chord with me, and probably does with most software engineers.

              1. 3

                At my previous job I did everything (full-stack) and it progressively led to being primarily frontend (in Elm, at least). There wasn’t a path to what I wanted to work on (both tech and domain), which was one of the reasons I left.

                When interviewing for my next job, I told everywhere that I did not want to work in anything frontend related. I’ve been asked I’d be willing to a minority split, but I knew that that promise wouldn’t be held. I’ve also had to prevent myself from thinking at my new place that I could quickly knock out a PR or too.

                Anyway, just saying, you’re not alone.

                1. 7

                  I just wanted to say that this article was well-written. I had been meaning to learn about Pijul and this was a good starting point.

                  Pedantic: I think there is a small typo where there should be a line-break before the pijul record command to separate it from pijul ls.

                  1. 4

                    Thx very much! It was very fun to learn about Pijul and work on the article, so glad it was useful. The typo should be fixed as well ;)

                  1. 28

                    The font looks really good, and has a lot of great features! But why the ligatures…

                    When I see the current state of coding fonts, and ligatures, I don’t know whether I’m the last of the Mohicans, or whether I’m just becoming an old man…

                    1. 12

                      You are not alone, but the zip file they have ha a “No ligatures” directory. At least for ttf.

                      1. 10

                        Why not? I really like those, it is appealing visually and permit to increase the readability at least for me. I use JB Mono most of the the time in my system so maybe I am more used to it but ligatures what a important point when choosing a font for me.

                        1. 11

                          I’m a vim user. Seeing a different representation than what I’m actually editing trips me up. I’ll often want to go to the equals sign in the != operator. When the ligature version of that operator is visible, I have to remember which characters it contains, and when I type “f=” to go to the character, the cursor is visually over the entire ligatured operator and I don’t trust that it’s actually over the second (invisible) character.

                          It’s not that big of a deal and I’m sure I could work past it, but it was enough for me to switch back.

                          1. 5

                            I am also a (n)vim user and this seemed weird for me so I tried with the JetBrains Mono font and ligatures. In the mini-buffer for search, /!= produces the ligature and when I search for the ligature, I land on the first character of the ligature, eg when I pressed x it remove the ! and keep the = in the text. So the cursor is never on the second, or more, character of the ligature. The ligature contains the exact characters you have typed and it is just a visual interpretation but will not appear with another font/software. And when typing f= to go the the character, it works (at least in my experience). For me a ligature, it is mostly eye-candy (but better readability too) for the set of characters that compose it. Mentally, I have the character set is my representation when searching/editing etc.

                            I understand how it can break your mental model and flow by creating some disruptive visual response between the character set behind the ligature and the visual response of the ligature being a character on its own. I like ligatures for that.

                            1. 5

                              the cursor is visually over the entire ligatured operator and I don’t trust that it’s actually over the second (invisible) character.

                              This sounds weird. I haven’t tried this font yet but I normally use Hasklig, which has similar ligatures (Think it’s Source Code Pro with added ligatures? Primary focus is on Haskell, but works well with other languages too). When the cursor is over either of the characters in the ligature, it doesn’t draw the ligature. I’m also using (n)vim, although it’s the terminal (usually konsole in my case) that’s actually drawing the ligatures. I don’t know whether the ligature breaking when under the cursor is an intentional thing or just a side effect of the cursor being there. From looking at the JetBrains mono samples, I think Hasklig’s ligatures are a lot more conservative: for example, it will render ‘/=’ (haskell’s not-equals operator) as a double-width ≠, but it won’t change != to anything, so I think Hasklig’s ligatures always look like a tidier combination of the individual characters, not a completely different character.

                              1. 1

                                the cursor is visually over the entire ligatured operator and I don’t trust that it’s actually over the second (invisible) character

                                I use vim with Fira Code with ligatures enabled, and this does not match my experience. When the cursor is over the ligatured characters, on my systems, they are displayed as separate characters. Maybe JetBrains Mono is different though?

                                I do still have to remember what characters make up the ligature in order to navigate directly to one of them, but 30 years of muscle memory take care of that for me.

                                1. 1

                                  A few people mentioned that this doesn’t happen with their setups. I don’t know whether it’s vim or the terminal that renders the ligatures. I don’t have a ligature font installed anymore. I tried it on a previous computer. It sounds like with an updated vim/font/terminal/something then the problem is avoidable. I still think I’d run into similar versions of the issue: having to mentally remove the visual abstraction to reference the underlying text. In the end I don’t think I’ll go back to it since it doesn’t feel more readable to me regardless.

                                  I just wanted to offer a more concrete reason for why I stopped since the OP was asking.

                            2. 4

                              The difference between =, == and === is like trying to indentify the differences in a hyphen, en dash and em dash. Not a fan.

                              1. 3

                                === generally has three lines and looks like ≡≡≡. = and == are easier to tell apart because of their length.

                              2. 4

                                There’s a ligature free version thankfully. Can’t wait for windows terminal to give you a no ligatures switch. I don’t like the way they look and I don’t want my font to lie to me.

                                1. 3

                                  I’m with you. Personally, I do not see the point of not seeing the real characters that make up your code. But that must be the way my brain works.

                                  1. 4

                                    I am Chingachgook, and I support this message. As we say in my language, “ligatures are hella wack, yo.”

                                    1. 2

                                      One of the reasons I use this font is because of the ligatures. For me it increases readability and also just looks nice.

                                      1. 2

                                        I use it as my main font (terminal, including nvim; and as the mono font in firefox), but without ligatures. It’s just not a problem if you don’t like them :-) Otherwise it looks just great.

                                        1. 1

                                          I’m with you, here. But then I also tend to find “fi”-ligatures in English text jarring on occasion. Worst is when those show up in preformatted text.

                                        1. 8

                                          Using a notebook, and writing out what I’ve tried so far by hand.

                                          The number of circles I can drive myself in without self reflection far outweigh any amazing gdb tricks I can paste here.

                                          1. 1

                                            +1 to that. I treat it as some sort of scientific notebook and write down my hypotheses, experiments, results, etc.

                                          1. 1

                                            This was really interesting, and well put together. I’ve thought too for a while that we still have very rudimentary tools as software engineers.

                                            1. 1

                                              Thank you for sharing this. It works perfectly for me.

                                              1. 2

                                                Is there a concept of jails in Nix(OS)? Or is it somehow not as relevant?

                                                1. 3

                                                  You can declare a list of docker containers to run: https://nixos.wiki/wiki/NixOS_Containers

                                                  If you want the tight integrations, NixOS does have native support for running containers: https://nixos.org/nixos/manual/#ch-containers

                                                1. 1

                                                  Are there some release notes or a changelog available?

                                                  1. 23

                                                    So accurate, and I’m glad Lea had the patience to put it into writing. I’m much more inclined to write off the entire ecosystem as some expletive.

                                                    This is generally why I push juniors towards learning Elm instead of JavaScript these days. There are so many edge cases in today’s JavaScript. So many inconsistencies. So much magic one just has to know. So much conflicting information and constant churn. I feel this pain every time a junior asks me for help with something that won’t work as they expect, which these days is effectively every day.

                                                    Juniors are surprisingly good at finding logic rules that are internally inconsistent in programming languages.

                                                    There’s been a push lately for people to write “simple Haskell”, and to be honest I think we more desperately need a push towards simple JavaScript. That or just abstract it away entirely and allow it become the bytecode of the web.

                                                    1. 13

                                                      So many inconsistencies. So much magic one just has to know.

                                                      This sounds like English.

                                                      JS has two clear strong points: everybody knows it and it’s the lingua franca of the web. Just like how everybody knows English and it’s the lingua franca of international commerce.

                                                      The way it is going, we will use JavaScript forever.

                                                      Yes you could learn Elm. But when you quit your company 2 years from now, it will likely be better to have +2 years of JS than 2 years of Elm.

                                                      1. 16

                                                        everybody knows it

                                                        I would argue against that.

                                                        I think it’s no coincidence that one of the most popular technical book series for this language is called You Don’t Know JS.

                                                        1. 9

                                                          Well they know it the same way most people know English. Incomplete and error-prone but most of the time still good enough.

                                                          1. 4

                                                            I think “incomplete and error-prone” is what causes user experiences the likes of which is described in the article. For an experienced programmer, that might mean giving up on some library or language. For a novice, that might mean reconsidering their interest in programming.

                                                        2. 5

                                                          “everybody knows it and it’s the lingua franca of the web “everybody knows English and it’s the lingua franca of international commerce”

                                                          Sure, if by that you mean “everyone claims to know it, but when they need to use it, many fall flat on their face, or resort to some half-assed translation system that just complicates matters”.

                                                          1. 5

                                                            It sounds a bit like C, in fact: A standardized cross-platform (ha!) language with odd corner cases everyone seems to augment with libraries to smooth over the fact it was never fully portable from the beginning. It keeps being used, so it accretes more libraries around it, making it harder to switch away from, as even if you’re writing in a different language, you need to accept that the rest of the world speaks an API designed around That Language the moment you talk to something from the outside world.

                                                          2. 24

                                                            You push juniors towards learning Elm, a little known language, with smaller community and fewer learning and documentation resources, and no proven longevity (just google “leaving Elm” for crying out loud)? As someone who had to pick up JS over the past year and uses it at their job, any newbie picking up Javascript ES6 and following good programming practices should have little problem. The ecosystem is a different story, but most “edge cases” and “gotchas” come from inexperience with the language. Recommending they learn some random language won’t help with a PEBCAK problem like that.

                                                            1. 4

                                                              Despite being a Haskell fan, I don’t find Elm to be an attractive target. Sure, it’s simple, but the JS interop is really messy, some aspects of setting up elements just don’t scale well, and I’ve seen far too many complaints about the core team’s behavior to assume it’s just a fluke.

                                                              Yes, JavaScript has a ton of weird stuff going on. However, it’s still the standard, and learning it is beneficial—even if you don’t like to use it personally—because at minimum you’ll be seeing a lot of it. The edge cases in tooling are a mess but there are improvements lying around if you scan a bit (e.g. Parcel 2 is looking far more promising over Webpack’s config soup), and most of the type system weirdness is “solved” by using TypeScript (which makes me sad since it’s unsound, but it also has some incredibly powerful type system features that many other languages don’t).

                                                              1. 4

                                                                I want to be clear that the point of my comment was not to fetishise Elm.

                                                                The point is that all of JavaScript’s inconsistencies makes learning how to write programs with confidence immensely more challenging.

                                                                JS interop is not part of learning how to write programs, and some people’s reaction to how the the language designers reacted to language feature requests (which usually added a greater surface area for potential runtime failure) is also not part of learning how to write programs.

                                                                Minor aside: I don’t see how Elm’s FFI is “messy”. The port system works the way I would expect it to. It might feel more cumbersome than running some IO unsafely, but effects are hard, and this is the price we pay for a smaller surface area of potential runtime failures.

                                                              2. 2

                                                                With 8+ years of JS (and TS) experience, and a smattering of Perl, Java, C, C++,C#, Elm, Ruby (Rails), Elm, Haskell, and more. I’d rather write all of those other languages combined every day than write more JS. The community always feels like they revert to whataboutisms and “it’s the language of the web!”. When it comes to writing a server, it’s perfectly acceptable to use any of a variety of languages. When it comes to writing for the browser it’s JS or nothing. Suggesting Elm is akin to suggesting cannibalism. I’d suggest Svelte, but most people think it’s JS. Why can’t the web front end be as diverse and accepting as the back end?

                                                              1. 2

                                                                I use newsboat. It’s simple and works well. It’s also interesting to watch it slowly be rewritten into Rust.

                                                                1. 4

                                                                  I recently used Clojure(Script) for the first time ever along with re-frame and reagent for a simple frontend for a side-project. I liked it all overall! I have a lot of Elm experience so it wasn’t entirely foreign to me.

                                                                  One thing that I haven’t found is docs on what are all the available functions—e.g. I was wondering how to handle on mouse enter and on mouse leave events. I guess I missed it, but it looks like you can basically just the event names in kebab-case.

                                                                  1. 2

                                                                    I’m using Clojure(Script) for the first time in a side-project for a simple website to interact with an API. I’m a Vim user but I’m sort of lost in between:

                                                                    • SLIME and all the variations, incl. these 2 plugins (though one seems for Common Lisp)
                                                                    • liquidz/vim-iced
                                                                    • tpope/vim-fireplace
                                                                    • CIDER (and nREPL?) and all the variations
                                                                    • snoe/clojure-lsp
                                                                    • Parens plugins like eraserhd/parinfer-rust and guns/vim-sexp

                                                                    I’m used to just setting up a language-server for a language and more or less going on from there. So far I really likeeraserhd/parinfer-rust and snoe/clojure-lsp but it seems like I’m missing out on other things? I realise that some of the above overlap/call each other. I suppose that some of my uncertainty comes from being new to Lisp and how the development flow works a little bit different to other languages?

                                                                    1. 7

                                                                      It’s nice that more people are leaning into deterministic simulation while building correctness-critical distributed systems.

                                                                      It’s not clear to me what they mean by strong consistency for a system that replicates multiple concurrent uncoordinated modifications on different devices, it would be nice if they went into that claim a bit more.

                                                                      1. 7

                                                                        yeah, the deterministic simulation is my favorite tech in the whole project. it’s caught all types of bugs, from simple logic errors to complicated race conditions that we would have never thought to test. I think there’s some interesting work out there to bring more of this “test case generation” style of testing to a larger audience…

                                                                        It’s not clear to me what they mean by strong consistency for a system that replicates multiple concurrent uncoordinated modifications on different devices, it would be nice if they went into that claim a bit more.

                                                                        ah, sorry this wasn’t worded too clearly. we broke the sync protocol down into two subproblems: 1) syncing a view of the remote filesystem to the clients and 2) allowing clients to propose new changes to the remote filesystem. then, the idea is that we’d solve these two problems with strong consistency guarantees, and then we’d use these protocols for building a more operational transform flavored protocol on top.

                                                                        we took this approach since protocol-level inconsistencies were very common with sync engine classic’s protocol. we spent a ton of time debugging how a client’s view of the remote filesystem got into a bizarre state or why they sent up a nonsensical filesystem modification. so, it’d be possible to build a serializable system on our core protocol, even though we don’t, and that strength at the lowest layer is still really useful.

                                                                        1. 2

                                                                          yeah, the deterministic simulation is my favorite tech in the whole project. it’s caught all types of bugs, from simple logic errors to complicated race conditions that we would have never thought to test. I think there’s some interesting work out there to bring more of this “test case generation” style of testing to a larger audience…

                                                                          I’ve been digging into a whole bunch of approaches as to how people do deterministic simulation. I’m really curious—how does your approach work? Can you provide some sort of gist/code example as to how those components are structured?

                                                                          1. 7

                                                                            ah, I don’t have a good code sample handy (but we’ll prepare one for our testing blog post). but here’s the main idea –

                                                                            1. we write almost all of our logic on a single thread, using futures to multiplex concurrent operations on a single thread. then, we make sure all of the code on that thread is deterministic with fixed inputs. there’s lots of ways code can sneak in a dependency on a global random number generator or time.
                                                                            2. have traits for the interfaces between the control thread and other threads. we also mock out external time behind a trait too.
                                                                            3. then, wrap each real component in a mock component that pauses all requests and puts them into a wait queue.

                                                                            now, instead of just calling .wait on the control thread future, poll it until it blocks (i.e. returns Async::NotReady). this means that the control thread can’t make any progress until some future it’s depending on completes. then, we can look at the wait queues and psuedorandomly unpause some subset of them and then poll the control thread again. we repeat this process until the test completes.

                                                                            all of these scheduling decisions are made psuedorandomly from a fixed RNG seed that’s determined at the beginning of the test run. we can also use this seed for injecting errors, generating initial conditions, and “agitating” the system by simulating other concurrent events. the best part is that once we find a failure, we’re guaranteed that we can reproduce it given its original seed.

                                                                            in fact, we actually don’t even log in CI at all. we run millions of seeds every day and then if CI finds a failure, it just prints the seed and we then run it locally to debug.

                                                                            1. 4

                                                                              There are so many beautiful properties of a system that is amenable to discrete event simulation.

                                                                              • You can use the seed to generate a schedule of events that happen in the system. When invariants are violated, you can shrink this generated history to the minimal set that reproduces the violation, like how quickcheck does its shrinking (I usually just use quickcheck for generating histories though). This produces minimized histories that are usually a lot simpler to debug, as causality is less blurred by having hundreds of irrelevant concurrent events in-flight. Importantly, separating the RNG from the generated schedule allows you to improve your schedule generators while keeping the actual histories around that previously found bugs and reusing them for regression tests. Otherwise every time you improve your generator you destroy all of your regression tests because the seeds no longer generate the same things.
                                                                              • Instead of approaching it from a brute force exploration of the interleaving+fault space, it’s often much more bug:instruction efficient to start with what has to go right for a desired invariant-respecting workload, and then perturbing this history to a desired tree depth (fault tolerance degree). Lineage Driven Fault Injection can be trivially applied to systems that are simulator friendly, allowing bugs to be sussed out several orders of magnitude more cheaply than via brute force exploration.
                                                                              • This approach can be millions of times faster than black-box approaches like jepsen, allowing engineers to run the tests on their laptops in a second or two that would have taken jepsen days or weeks, usually with far less thorough coverage.

                                                                              Simulation is the only way to build distributed systems that work. I wrote another possible simulation recipe here but there are many possible ways of doing it, and different systems will benefit from more or less complexity in this layer.

                                                                              1. 3

                                                                                thanks for the pointer for the molly paper, looks really interesting.

                                                                                here’s another idea I was playing with a few months ago: instead of viewing the input to the test as random seed, think of it as an infinite tape of random bits. then, the path taken through the program is pretty closely related to different bits in the input. for example, sampling whether to inject an error for a request is directly controlled by a bit somewhere in the input tape.

                                                                                this is then amenable to symbolic execution based fuzzing, where the fuzzer watches the program execution and tries to synthesize random tapes that lead to interesting program states. we actually got this up and working, and it found some interesting bugs really quickly. for example, when populating our initial condition, we’d sample two random u64s and insert them into a hashmap, asserting that there wasn’t a collision. the symbolic executor was able to reverse the hash function and generate a tape with two duplicate integers in the right places within minutes!

                                                                                but, we didn’t actually find any real bugs with that approach during my limited experiments. I think the heuristics involved are just too expensive, and running more black box random search in the same time is just as effective. however, we do spend time tuning our distributions to get good program coverage, and perhaps with a more white box approach that wouldn’t be necessary.

                                                                                1. 3

                                                                                  I’ve also had disappointing results when combining fuzzing techniques with discrete event simulation of complex systems. My approach has been to have libfuzzer (via cargo fuzz) generate a byte sequence, and have every 2-4 bytes serve as a seed for a RNG that generates a scheduled event in a history of external inputs. This approach actually works extremely well for relatively small codebases, as a burst of Rust projects experienced a lot of nice bug discovery when this crate was later released, but the approach really never took off for my use in sled where it dropped the throughput so much that the coverage wasn’t able to stumble on introduced bugs anywhere close to as fast as just running quickcheck uninstrumented.

                                                                                  I’ve been meaning to dive into Andreas Zeller’s Fuzzing Book to gain some insights into how I might be able to better apply this technique, because I share your belief that it feels like it SHOULD be an amazing approach.

                                                                                  1. 4

                                                                                    here’s another pointer for interesting papers if you’re continuing down this path: https://people.eecs.berkeley.edu/~ksen/cs29415fall16.html?rnd=1584484104661#!ks-body=home.md

                                                                                    I’ve kind of put it on the backburner for now, but it’s good to hear that you’ve reached similar conclusions :)

                                                                            2. 4

                                                                              I don’t know anything about this project, but I do work on a system that has this property, I guess. There are lots of approaches but for me it’s just an exercise in designing the components carefully.

                                                                              First, you want to draw strict borders between the core protocol or domain or business logic, and the way the world interacts with that core. This is a tenet of the Clean Architecture or the Hexagonal Architecture or whatever, the core stuff should be pure and only know about its own domain objects, it shouldn’t know anything about HTTP or databases or even the concept of physical time. Modeling time as a dependency in this way takes practice, it’s as much art as it is science, and it depends a lot on your language.

                                                                              Second, you want to make it so that if the core is just sitting there with no input, it doesn’t change state. That means no timers or autonomous action. Everything should be driven by external input. This can be synchronous function calls — IMO this is the best model — but it can also work with an actor-style message passing paradigm. There are tricks to this. For example, if your protocol needs X to happen every Y, then you can model that as a function X that you require your callers to call every Y.

                                                                              Once you have the first step, you can implement in-memory versions of all of your dependencies, and therefore build totally in-memory topologies however you like. Once you have the second step, you have determinism that you can verify, and, if you’re able to model time abstractly rather than concretely, you can run your domain components as fast as your computer can go. With the two combined you can simulate whatever condition you want.

                                                                              I hope this makes sense. I’m sure /u/spacejam has a slightly? majorly? different approach to the challenge.

                                                                              1. 3

                                                                                this is spot on! we still have periodic timers in our system, but we hook them into our simulation of external time. there’s some special casing to avoiding scheduling a timer wakeup when there’s other queued futures, but it more-or-less just works.

                                                                                1. 2

                                                                                  this is spot on!

                                                                                  Nice to hear I’m not totally off base :)

                                                                                2. 2

                                                                                  I totally agree that if you can make your core business logic a pure function, it dramatically improves your ability to understand your system. What you said about time is also possible for most things that flip into nondeterminism in production:

                                                                                  • random number generators can be deterministic when run in testing
                                                                                  • threads/goroutines can be executed in deterministic interleavings under test with the use of linux realtime priorities / instrumented scheduling / ptrace / gdb scripts etc…

                                                                                  Determinism can be imposed on things in test that need a bit of nondeterminism in production to better take advantage of hardware parallelism. You lose some generality - code that you compile with instrumented scheduler yields that runs in a deterministic schedule for finding bugs will trigger different cache coherency traffic and possibly mask some bugs if you’re relying on interesting atomic barriers for correctness, as the scheduler yield will basically shove sequentially consistent barriers at every yield and cause some reordering bugs to disappear, but that’s sort of out-of-scope.

                                                                                  There are some fun techniques that allow you to test things with single-threaded discrete event simulation but get performance via parallelism without introducing behavior that differs from the single threaded version:

                                                                                  • software transactional memory
                                                                                  • embedded databases that support serializable transactions
                                                                                  • commutative data structures
                                                                              2. 2

                                                                                deterministic simulation is my favorite tech in the whole project

                                                                                Any tips on where to get started on this?

                                                                                1. 2

                                                                                  Any tips on where to get started on this?

                                                                                  the threads on this post are a good place to start: https://lobste.rs/s/ob6a8z/rewriting_heart_our_sync_engine#c_8zixa2 and https://lobste.rs/s/ob6a8z/rewriting_heart_our_sync_engine#c_ab2ysi. we also have a next blog post on testing currently in review :)

                                                                                  1. 1

                                                                                    Thank you! I’m looking forward to the next blog post too :)

                                                                            1. 3

                                                                              To extend the language control structures (loops, match), to embed a different computing paradigm other than recursion theory (hello FSA), to make a DSL (maybe not that of a good idea), to generate code using the same programming language.

                                                                              1. 1

                                                                                to embed a different computing paradigm other than recursion theory (hello FSA)

                                                                                Do you know of any available examples of this?

                                                                                1. 2

                                                                                  I read it here and it was very interesting, you can do at a type level but it requires a lot of thinking while this is very streamlined. Another interesting take I havent read yet is the virtual machine section of SICP.

                                                                                  1. 1

                                                                                    Thank you. For anyone wanting to read the paper, I found it available on the author’s site. [1]

                                                                                    [1] https://cs.brown.edu/~sk/Publications/Papers/Published/sk-automata-macros/paper.pdf

                                                                              1. 4

                                                                                It’s amazing how often rewrites for fun happen in our industry. If this was any other industry than programming people would be shocked.

                                                                                1. 30

                                                                                  Reading the article, it does look like they hit some performance issue, due to how the Go GC works, and they tried Rust because it was already seeing adoption at Discord. I would not call that “just for fun”.

                                                                                  On the other hand, a lot of things change (improve?) in our industry because we hack stuff just for fun. :)

                                                                                  1. 7

                                                                                    This happens in other industries too. Youtube is full of metalworkers/MechE’s, EE’s, chemists, woodworkers, and so on hacking their tools, twiddling with silly ideas just for fun, breaking stuff, trying out old ideas in new ways or vice versa, and so on. The only thing special about software is you can do it sitting in front a computer instead of needing a whole workshop or lab.

                                                                                    1. 4

                                                                                      It’s even codified, the car industry has time frames in which a whole car is essentially completely replaced.

                                                                                    2. 3

                                                                                      Even though rather expensive, doing a rewrite is usually not so bad in our industry than it would be in many others. Imagine replacing a bridge with a new one just because you got slightly better steel now than last year.

                                                                                      1. 11

                                                                                        Imagine building a new bridge because there was more traffic than would fit on the bridge you already have, and deciding you might as well use the new steel while you’re at it.

                                                                                      2. 1

                                                                                        What’s the equivalent of rewrites in other industries?

                                                                                        1. 5

                                                                                          Remodeling the kitchen.

                                                                                          1. 1

                                                                                            My own perspective: if software engineering is like writing then it’s a rewrite or a reimagining. If it’s math then it’s something like string theory or where a trail leads nowhere so you go back and start over from a new fundamental. I think software is mostly writing with a bit of the math in my analogies. I don’t think it’s like physical construction but sometimes I use those analogies.

                                                                                        1. 1

                                                                                          ../log/2020/20-01-04.txt — a file for every day, with a shell script to make/edit the file. I also used Day One for a couple of years, and still like it.

                                                                                          1. 2

                                                                                            care to share the shell script?

                                                                                            1. 2

                                                                                              Sure thing:

                                                                                              mkdir -p ~/Documents/log/$(date +"%Y")/$(date +"%m")
                                                                                              echo "\n\n---\n\n" > ~/Documents/log/$(date +"%Y")/$(date +"%m")/$(date +"%y-%m-%d").txt
                                                                                          1. 1

                                                                                            VPS (FreeBSD, with everything in a separate jail):

                                                                                            • ZNC
                                                                                            • git
                                                                                            • portfolio
                                                                                            • side-projects

                                                                                            Raspberry Pi:

                                                                                            • Pi-hole
                                                                                              1. 2

                                                                                                Please do!

                                                                                              1. 10

                                                                                                @ddevault Would it be possible to get a clear “Terms of Service” clarifying these sorts of use cases? 1.1 Gb seems like an excessive file size, but having a crystal clear & mutually agreed upon set of rules for platform use is essential for trust (more so for a paid service), and right now users don’t know what does and does not constitute as a reasonable use of the service .

                                                                                                1. 37

                                                                                                  No, they’re intentionally vague so that we can exercise discretion. There are some large repositories which we overlook, such as Linux trees, pkgsrc, nixpkgs, even mozbase is overlooked despite being huge and expensive to host.

                                                                                                  In this guy’s case, he had uploaded gigabytes of high-resolution personal photos (>1.1 Gb - it takes up more space and CPU time on our server than on your workstation because we generate clonebundles for large repos). It was the second largest repository on all of SourceHut. SourceHut is a code forge, not Instagram.

                                                                                                  1. 40

                                                                                                    No, they’re intentionally vague so that we can exercise discretion.

                                                                                                    I like to call this “mystery meat TOS”. You never know what you’ll get until you take a bite!

                                                                                                    1. 24

                                                                                                      I mean, honestly, a small fraction of our users hit problems. I’ve had to talk to <10 people, and this guy is the only one who felt slighted. It’s an alpha-quality service, maybe it’ll be easier to publish objective limits once things settle down and the limitations are well defined. On the whole, I think more users benefit from having a human being making judgement calls in the process than not, because usually we err on the side of letting things slide.

                                                                                                      Generally we also are less strict on paid accounts, but the conversation with this guy got hostile quick so there wasn’t really an opportunity to exercise discretion in his case.

                                                                                                      1. 30

                                                                                                        the conversation with this guy got hostile quick

                                                                                                        Here’s the conversation, for folks who want to know what “the conversation got hostile” means to Source Hut: https://paste.stevelosh.com/18ddf23cb15679ac1ddca458b4f26c48b6a53f11

                                                                                                        1. 32

                                                                                                          i’m not a native speaker, but have the feeling that you got defensive quickly:

                                                                                                          Okay. I guess I assumed a single 1.1 gigabyte repository wouldn’t be an unreasonable use of a $100/year service. I certainly didn’t see any mention of a ban on large binary files during the sign up or billing process, but I admit I may have missed it. I’ve deleted the repository. Feel free to delete any backups you’ve made of it to reclaim the space, I’ve backed it up myself.

                                                                                                          it’s a pay-what-you-like alpha service, not backed by venture capital. you got a rather friendly mail, noticing you that you please shouldn’t put large files into hg, not requesting that you delete it immediately.

                                                                                                          ddevaults reply was explaining the reasoning, not knowing that you are a mercurial contributor:

                                                                                                          Hg was not designed to store large blobs, and it puts an unreasonable strain on our servers that most users don’t burden us with. I’m sorry, but hg is not suitable for large blobs. Neither is git. It’s just not the right place to put these kinds of files.

                                                                                                          i’m not sure i’d label this as condescending. again I’m no native speaker, so maybe i’m missing nuances.

                                                                                                          after that you’ve cancelled your account.

                                                                                                          1. 13

                                                                                                            As a native speaker, your analysis aligns with how I interpreted it.

                                                                                                            1. 9

                                                                                                              Native speaker here, I actually felt the conversation was fairly polite right up until the very end (Steve’s last message).

                                                                                                            2. 28

                                                                                                              On the whole, I think more users benefit from having a human being making judgement calls in the process than not, because usually we err on the side of letting things slide.

                                                                                                              Judgement calls are great if you have a documented soft limit (X GB max repo size / Y MB max inner repo file size) and say “contact me about limit increases”. Your customers can decide ahead of time if they will meet the criteria, and you get the wiggle room you are interested in.

                                                                                                              Judgement calls suck if they allow users to successfully use your platform until you decide it isn’t proper/valid.

                                                                                                              1. 12

                                                                                                                That’s a fair compromise, and I’ll eventually have something like this. But it’s important to remember that SourceHut is an alpha service. I don’t think these kinds of details are a reasonable expectation to place on the service at this point. Right now we just have to monitor things and try to preempt any issues that come up. This informal process also helps to identify good limits for formalizing later. But, even then, it’ll still be important that we have an escape hatch to deal with outliers - the following is already in our terms of use:

                                                                                                                You must not deliberately use the services for the purpose of:

                                                                                                                • impacting service availability for other users

                                                                                                                It’s important that we make sure that any single user isn’t affecting service availability for everyone else.

                                                                                                                Edit: did a brief survey of competitor’s terms of service. They’re all equally vague, presumably for the same reasons


                                                                                                                [under no circumstances will you] use our servers for any form of excessive automated bulk activity (for example, spamming or cryptocurrency mining), to place undue burden on our servers through automated means, or to relay any form of unsolicited advertising or solicitation through our servers, such as get-rich-quick schemes;

                                                                                                                The Service’s bandwidth limitations vary based on the features you use. If we determine your bandwidth usage to be significantly excessive in relation to other users of similar features, we reserve the right to suspend your Account, throttle your file hosting, or otherwise limit your activity until you can reduce your bandwidth consumption


                                                                                                                [you agree not to use] your account in a way that is harmful to others [such as] taxing resources with activities such as cryptocurrency mining.

                                                                                                                At best they give examples, but always leave it open-ended. It would be irresponsible not to.

                                                                                                                1. 17

                                                                                                                  The terms of service pages don’t mention the limits, but the limits are documented elsewhere.


                                                                                                                  We recommend repositories be kept under 1GB each. Repositories have a hard limit of 100GB. If you reach 75GB you’ll receive a warning from Git in your terminal when you push. This limit is easy to stay within if large files are kept out of the repository. If your repository exceeds 1GB, you might receive a polite email from GitHub Support requesting that you reduce the size of the repository to bring it back down.

                                                                                                                  In addition, we place a strict limit of files exceeding 100 MB in size. For more information, see “Working with large files.”

                                                                                                                  GitLab (unfortunately all I can find is a blog post):

                                                                                                                  we’ve permanently raised our storage limit per repository on GitLab.com from 5GB to 10GB


                                                                                                                  The repository size limit is 2GB for all plans, Free, Standard, or Premium.

                                                                                                                  1. 9

                                                                                                                    I see. This would be a nice model for a future SourceHut to implement, but it requries engineering effort and prioritization like everything else. Right now the procedure is:

                                                                                                                    1. High disk use alarm goes off
                                                                                                                    2. Manually do an audit for large repos
                                                                                                                    3. Send emails to their owners if they seem to qualify as excessive use

                                                                                                                    Then discuss the matter with each affected user. If there are no repos which constitute excessive use, then more hardware is provisioned.

                                                                                                                    1. 11

                                                                                                                      Maybe this is something you should put on your TOS/FAQ somewhere.

                                                                                                                  2. 8

                                                                                                                    This informal process also helps to identify good limits for formalizing later.

                                                                                                                    Sounds like you have some already:

                                                                                                                    • Gigabyte-scale repos get special attention
                                                                                                                    • Giant collections of source code, such as personal forks of large projects (Linux source, nix pkgtree) are usually okay
                                                                                                                    • Giant collections of non-source-code are usually not okay, especially binary/media files
                                                                                                                    • These guidelines are subject to judgement calls
                                                                                                                    • These guidelines may be changed or refined in the future

                                                                                                                    All you have to do is say this, then next time someone tries to do this (because there WILL be a next time) you can just point at the docs instead of having to take the time to explain the policy. That’s what the terms of service is for.

                                                                                                                2. 8

                                                                                                                  Regardless of what this specific user was trying to do, I would exercise caution. There are valid use cases for large files in a code repository. For example: Game development, where you might have large textures, audio files, or 3D models. Or a repository for a static website that contains high-res images, audio, and perhaps video. The use of things like git-lfs as a way to solve these problems is common but not universal.

                                                                                                                  To say something like, “SourceHut is a code forge, not Instagram” is to pretend these use cases are invalid, or don’t exist, or that they’re not “code”, or something.

                                                                                                                  I’ve personally used competing services like GitHub for both the examples above and this whole discussion has completely put me off ever using Sourcehut despite my preference for Mercurial over Git.

                                                                                                                  1. 4

                                                                                                                    I agree that some use-cases like that are valid, but they require special consideration and engineering work that hg.sr.ht hasn’t received yet (namely largefiles, and in git’s case annex or git-lfs). For an alpha-quality service, sometimes we just can’t support those use-cases yet.

                                                                                                                    The instragram comparison doesn’t generalize, in this case this specific repo was just full of a bunch of personal photos, not assets necessary for some software to work. Our systems aren’t well equipped to handle game assets either, but the analogy doesn’t carry over.

                                                                                                              2. 4

                                                                                                                I don’t think the way you’re working is impossible to describe, I think it’s just hard and I think most people don’t understand the way you’re doing and building business. This means your clients may have an expectation that you will give a ToS or customer service level that you can not or will not provide

                                                                                                                To strive towards a fair description that honours how you are actually defining things for yourself and tries to make that more transparent without having to have specific use cases, perhaps there is a direction with wording such as:

                                                                                                                • To make a sustainable system we expect the distribution of computing resource usage and human work to follow a normal distribution. To preserve quality of service for all clients and to honour the sustainability of the business and wellbeing of our stuff and to attempt to provide a reasonably uniform and undestandable pricing model, we reserve the right to remove outliers who use an unusually large amount of any computing and/or human resource. If a client is identified as using a disproportionate amount of service, we will follow this process: (Describe fair process with notification, opportunity for communication/negotiation, fair time for resolution, clear actions if resolution is met or not).
                                                                                                                • This system is provided for the purposes of XYZ and in order to be able to design/optimise/support this system well we expect all users to use it predominatly for this purpose. It may be the case that using our system for other things is possible, however in the case we detect this we reserve the right to (cancel service) to ensure that we do not arrive at a situation where an established client is using our service for another prupose which may perform poorly for them in the future because it is not supported, or may become disproportionately hard for us to provide computing resource or human time for because it is not part of XYZ. This will be decided at our discretion and the process we will follow if we identify a case like this is (1,2,3)
                                                                                                                1. 2

                                                                                                                  Would it be possible to get a clear “Terms of Service” clarifying these sorts of use cases?

                                                                                                                  No, they’re intentionally vague so that we can exercise discretion. There

                                                                                                                  May I suggest, perhaps: “ToS: regular repositories have a maximum file size X and repository size Y. We provide extra space to some projects that we consider important.”

                                                                                                                  1. 1

                                                                                                                    No, they’re intentionally vague so that we can exercise discretion.

                                                                                                                    Funny way to say “so I can do whatever I want without having to explain myself”

                                                                                                                    1. 15

                                                                                                                      I think that’s unfair. He did in fact explain himself to the customer and it was the customer who decided to cancel the service. I’d agree if the data was deleted without sufficient warning, but that is not the case here.