Threads for ajdecon

  1. 4

    This feels right to me. Most of the software I write outside work has a very small audience: either just me, or people close to me. In the latter case, to borrow a phrase from Robin Sloan, the software is a home-cooked meal.

    When I occasionally choose to open-source those projects, it’s more for the sake of sharing something I think is wonderful, rather than looking for help or contributions. “This is fun, I shouldn’t keep it to myself.” Bug fixes are welcome, I suppose, but I’m unlikely to accept contributions that I don’t personally find useful.

    1. 6

      The “batch jobs vs services” tension is one that I’ve run into a lot over the past several years, as I’ve spent quite a bit of time working with organizations building on-premise compute clusters. Many of these orgs have both large batch jobs and a constellation of microservices to run, and quite reasonably wanted to use the same tooling for both use cases. Often the preferred tooling was Kubernetes, and we ran into many of the issues in this post.

      It’s worth noting, though, that there are a lot of pre-existing cluster schedulers that are designed around the batch job use case. Whether it’s a “big data” framework like Hadoop, or HPC schedulers like Slurm. The HPC schedulers in particular often have gang scheduling capabilities for multi-node jobs, as well as fine-grained tooling for managing resource allocations and policies around batch jobs. The downside is that they usually lack any mechanism for scheduling services.

      Occasionally I’ve seen Kubernetes bent to fit well enough to run both use cases, but more often I’ve seen organizations split their cluster in two — one running services in Kubernetes, the other running some other batch job scheduler. It’s a deeply unsatisfying solution, and increases operational load for the team that needs to run two clusters… but it has also generally been the most effective way to make each type of workload run well.

      TBH, I’d love to see Kubernetes get as good at batch jobs as the batch-oriented tools in this space, so that you don’t need different schedulers for services and jobs. But the use cases do have pretty different requirements, so having one tool for both might just be really hard.

      (Edited to fix typo)

      1. 1

        As far as I know, Oak Ridge Labs runs two clusters. There are some ideas from DKube here with pros and cons. I think there are two problems and two systems but maybe things will change in the future. I’ve never solo-grok’d this entire stack.

      1. 4

        I live in Colorado, and we’re expecting to be somewhat snowed in this weekend. So right now I’m planning for a weekend with the fireplace on, lots of tea and baked treats, and an interesting book.

        Depending on whether I feel like reading something vaguely work-related, current candidates include:

        1. 3

          I’ve been recently promoted to staff eng, and I need to start reading some of this staff engineer literature. I don’t quite have imposter syndrome (although maybe I should 🙃), but it would be good to have a clearer idea about what “growth” looks like. Would love to hear your review on the book after you’ve read it.

          1. 2

            Here in Thailand the rainy season is in swing so if I’m lucky I can turn off the AC and let the misty breeze chill my place.

          1. 1

            FWIW, AWS provides a huge variety of services, and even the “lower-level” services like EC2 and S3 provide a lot of configurable knobs to turn. I don’t know a lot of folks who’ve “learned AWS” at a broad level, but I know many who understand the particular corners they needed to do their work.

            The training and certification options from AWS are structured around either what you want to work with (ML, data analytics, databases, etc) or the role you want to play on a team (developer, ops, architect, etc). I think their materials are only ok (not bad, not amazing) but they do at least provide a number of different structured learning paths depending what you want to work on.

            1. 3

              Yes I think OP is asking the wrong question, he should think about what his end goal is. It would be a fool’s errand to try to learn AWS broadly without a plan.

            1. 2

              In my experience, and as you seem to have picked up on: the best way to get into research is to collaborate with a team at your organization that’s already doing it!

              The mechanics will, of course, depend a lot on your organization. For example, I used to work at a US national lab where the whole organization was pretty research-friendly. Despite working on a production computing team, we had regular contact with a lot of research teams, and it was often easy to get involved in their efforts. This extended pretty easily to folks with no degrees, but who could still contribute to a project.

              I’ve also worked at large Silicon Valley tech companies. In those settings, I’ve seen less casual contact between regular developers and research teams, and therefore fewer obvious opportunities to get involved. Now, in my experience, industry research teams are often pretty happy to let others help out, assuming there’s a clear way for them to do so! But the lack of regular contact makes it a bit harder to get a foot in the door.

              However, that organizational association matters. Getting published in academia depends a lot on establishing credibility. I know several folks who, without degrees, have co-authored papers with established research groups that have a good track record. I don’t personally know anyone without a PhD who has published outside such a group.

              1. 5

                One of the cases where I appreciate a large standard library, is for projects I work on that are “mostly stable”. These are projects where I have to add a new feature or fix a bug once or twice a year, but otherwise don’t get much active attention.

                When I do touch one of these codebases, I do usually update to the latest the language version and update any dependencies. If I’m using a language with a fairly complete stdlib, this is normally pretty easy. Indeed, I’ll often try to avoid any external dependencies, even if there’s a better third-party library, so that I can just follow language updates and generally stay in good shape.

                OTOH, my low-maintenance projects built on languages with small standard libraries are a lot more annoying. If I only need to change a codebase every six months, I don’t want to have to pay attention to thirty dependencies! The lack of a complete standard library “forces” you to stay engaged with the language community and library ecosystem, which can be a real pain when the project itself may not be a daily priority.

                1. 2

                  Really broadly:

                  • Read more
                  • Write more
                  • Plan more, react less
                  • Focus more on things I think are important, not on others’ priorities

                  2021 was really chaotic and stressful for me, both on the work and personal fronts. Some of that is frankly unavoidable (pandemic, unexpected events at work, family crises, etc). But a lot of the stress was due to my own reactions to what was going on. Where possible, I want to be more thoughtful and avoid letting external chaos throw my life into disarray.

                  1. 6

                    There’s one place where a few years ago ISOs were required and they may still be. HP’s iLO could only use ISOs for remote installation from virtual media. I haven’t used it in a while, so maybe they improved it. If not, it’s either the virtual CD or PXE boot if you want to install a server.

                    1. 3

                      I think my servers have a previous major version (or two?) of iLO, but that only takes ISO images yes. I usually boot then install from their livecd choices rather than load the ISOs directly. I guess I wouldn’t be able to boot puppy via netboot using the usb image though.

                      1. 1

                        This has also been the case for most BMCs I know of that support virtual media. And while in most cases I prefer to use PXE to image machines, I do occasionally need to do a “completely fresh” install of Linux on some machine in a datacenter…

                      1. 4

                        Mostly looking after our 14-week old puppy! My partner is visiting with friends most of the weekend, so I’m on exclusive puppy duty. He’s an adorable ball of fluff, but he has So Much Energy and can only be left alone for a few minutes at a time without getting into mischief.

                        1. 1

                          Off work this week, and the major project is getting our new puppy used to our home and integrated with the rest of the family.

                          I also have a bunch of reading I had planned to do this week, but I’m not sure the puppy will allow that…

                          1. 2

                            Might be a good time for audiobooks! Good luck with the pup :D

                          1. 6

                            My favorite interview ever was a systems interview that didn’t go as planned. This was for an SRE position, and while I expected the interview to be a distributed systems discussion, the interviewer instead wanted to talk kernel internals.

                            I was not at all prepared for this, and admitted it up front. The interviewer said something along the lines of, “well, why don’t we see how it goes anyway?” He then proceeded to teach me a ton about how filesystem drivers work in Linux, in the form of leading me carefully through the interview question he was “asking” me. The interviewer was incredibly encouraging throughout, and we had a good discussion about why certain design decisions worked the way they did.

                            I ended the interview (a) convinced I had bombed it, but (b) having had an excellent time anyway and having learned a bunch of new things. I later learned the interviewer had recommended to hire me based on how our conversation had gone, though I didn’t end up taking the job for unrelated reasons having to do with relocation.

                            I’m unsure what actionable advice this can provide, except maybe:

                            • Make sure ahead of time that you and the candidate agree what the interview is going to be about!
                            • Have a plan for what happens when the candidate is totally unprepared, that isn’t just “you fail”
                            1. 4

                              When I’ve set up interview processes, one of the aims is that it should be a good experience for the person being interviewed even if ultimately you decide not to hire, or they decide not to work with you.

                              For one thing, it’s just nicer, but it also means that they leave with a good impression of your company, and may recommend you to others or join in future years.

                              It’s not trivial to do that, but it sounds like that company did a great job of it.

                              1. 2

                                Reminds me of an engineering VP that interviewed me this year that opened it up with a simple “you are logged into an unknown system over SSH” hypothetical situation and then asked me how I’d discover certain parts of the environment: network addresses, running processes, etc. 30 minutes later we were geeking out on Linux linkers and code disassembly. The guy was very knowledgeable, curious about learning new things (of course, I shilled NixOS to him) and super methodical - he led the interview all the time and did not move his eyes off the goal - figuring out if I was a good fit for their team and product.

                                One of the best technical interviews I’ve had so far. Too bad they followed it with a low offer.

                              1. 3

                                Getting my second vaccine shot tomorrow afternoon (yay!) so fully expecting to be worthless on Sunday.

                                So tomorrow morning is chores and errands, and the Sunday plan is a relatively undemanding novel in a big comfy chair, or catching up on Critical Role if my brain has fully gone to mush.

                                Alternatively if I actually have energy and focus on Sunday, I need to refresh myself on building TCP services in Go so I might write some toy services. I’ve been thinking of implementing some of the older, lesser-used services that you still might find in an /etc/services file, e.g. daytime, qotd, finger, whois, maybe tftp…

                                1. 6

                                  I would definitely appreciate this.

                                  Right now, the best archive of past incidents I’m aware of is in the SRE Weekly newsletter. Which is lovely and I’m a happy subscriber! But being a newsletter, doesn’t allow for discussion — so I would love to see Lobsters make more room for this.

                                  1. 5

                                    I appreciate that the author described CI platforms as “remote code execution as a service”, and I generally feel like CI and build systems both fall into the bucket of tasks that reduce to a job scheduling system. I’ve seen at least one company abuse Jenkins in this fashion, building web UIs for build, CI, cron, and many other tasks that just submitted Jenkins jobs. And I expect to eventually find some CI system that’s just a UI on top of the Kubernetes Job API.

                                    A corollary to this is that any sufficiently generic job scheduling service will eventually be used (or abused) for CI.

                                    In my case, working in scientific computing, I’ve frequently observed developers running their CI jobs on actual supercomputers simply because they had an easy-to-use API for submitting parallel jobs. Completely failing to use the expensive high-speed low-latency network or the many-petabyte storage system, and ignoring all the scientific simulation jobs getting queued because they wanted to trigger a new 10-minute CI run on every commit… 😉

                                    1. 14

                                      I may have missed something, but what’s the problem with just having multiple accounts? I get that accounts tied to phone numbers are annoying, but that’s a problem with or without phone numbers (I for example would like to have no phone at all). The solution seems to be making accounts cheap (eg. via cryptography) that can be mapped on to complex identities, instead of basing everything on a complex system.

                                      That being said, in most cases identities/accounts are superfluous, and shouldn’t exist in the first place.

                                      1. 10

                                        Not the author, but my own biggest challenge is that most services make multiple accounts difficult and/or expensive to use in a lightweight fashion. In short, they’re a pain.

                                        I.e., the clients for most services don’t support being signed into more than one account at once; it’s not supported to run more than one active client at once; and if I’m running multiple clients, they’re really resource heavy. When I’m using multiple accounts, I usually want to use them more or less concurrently. I’d like to have just one client open, and toggle between accounts at will.

                                        Also, even when a phone number isn’t required, many services also require that each account be associated with some external resource like an email address, which means I need to manage those as well. (Some services even collapse plus-addresses, so I have to actually create separate email accounts, or do weird forwarding stuff, or whatever.)

                                        1. 7

                                          Some services even collapse plus-addresses

                                          Ugh, that’s downright evil.

                                          At least with your own domain you don’t have to use the plus scheme, you can make whatever scheme you want :)

                                          1. 4

                                            But then you have to use your own domain :D

                                          2. 3

                                            I think it’s obvious to everyone, that the system of accounts is broken, because it has been constructed in such a round-about way, on top of systems that weren’t meant for it, but are expressive enough for implement some kinds of identities. Email, for example, has been reduced to a (questionable) foundational-identity, that most people don’t even bother to use as Email, because all they get “click-here-to-confirm” links at best, and outdated notifications at worst. But for most clients, switching between identities is pretty easy, as it’s just changing what’s in the From field (eg. I automatically detect if a To, Cc or From field is from a university address, and let Emacs replace the To field for me).

                                            But most “platforms” have to re-implement everything from the ground up, accounts, passwords, backup systems, systems to reset passwords, usernames, etc. Just consider how a problem as simple as sending a text message has resulted in so many mutually incompatible solutions. As such, it’s not a surprise that systems of identification don’t cover all use-cases.

                                            Generally: The more I think about it, the more I believe that most of these problems should be solvable on a “lower” level, instead of making the towers we have built on top more and more complicated. Communication and identities should be primitive concepts in networked systems, that then ideally should be cheap and interchangeable. But all of this are just unfinished ideas.

                                          3. 14

                                            Here is a screenshot showing how many instances of Discord I have open right now: Discord’s UI doesn’t easily allow for creating and using multiple accounts. I have to run things like Rambox that make me end up dedicating about 5 GB of ram to that whole mess in addition to my primary account (Yes Discord really uses that many resources). This is even worse on my iPad because I can’t just have multiple accounts active at once.

                                            I currently have to keep track of over 12 email addresses between the various separations involved.

                                            in most cases identities/accounts are superfluous, and shouldn’t exist in the first place

                                            They are superfluous to you because you haven’t experienced the kind of realities that would demand you to have thought about this. This is as useful to parents that want to have a social life while having children as it is to others like plural systems.

                                            1. 18

                                              I mean, I agree that discord is terrible, and it’s a tragedy that it has become as pervasive as it is. But I’m not sure what that has to do with my point? Maybe my other comment in response to @ajdecon clarifies what I am talking about.

                                              They are superfluous to you because you haven’t experienced the kind of realities that would demand you to have thought about this.

                                              I’m not sure how you came to this conclusion, but all I can say is that it is not true, and that my experiences with the problems you have enumerated, have led me to the opposite conclusion. Please do not assume that everyone who has a different stance is ignorant or inexperienced, it comes of as quite arrogant (this applies to your post and your comment).

                                              Thank you.

                                              1. 2

                                                Discord’s terribleness is far from unique. The Google account switcher is basically the current best-of-industry on this problem, and it’s not exactly amazing..

                                                1. 1

                                                  Not sure what makes you think that, but I found the ones in Reddit and Twitter far better.

                                              2. 4

                                                Interestingly, official Telegram clients have a very easy account switcher.

                                                Even though Telegram is one of those phone-number-based messengers.

                                                1. 2

                                                  Isn’t that a problem with Discord that is largely orthogonal to the identity problem? That is, why do you still need an identity-management system if Discord was super-low-resource-usage, allowed you to run multiple instances signed into different accounts, and let you use a single phone number/email address for multiple accounts?

                                                2. 2

                                                  in most cases identities/accounts are superfluous, and shouldn’t exist in the first place.

                                                  Why? In what way? I can’t tell if you’re saying “people shouldn’t have multiple identities” or just “accounts on computers and websites are often not really needed”.

                                                  1. 2

                                                    The latter.

                                                    1. 2

                                                      Um, really? “Extraordinary claims demand extraordinary evidence.” I can think of some valid nuanced points about identity and security you might be making, but on face value your statements seem obviously wrong. If for no other reason, shared systems need identity/accounts just to distinguish between people’s content in a situation without total implicit trust.

                                                      1. 2

                                                        Looking through my password manager a solid third of them are for random websites that I had to create an account with just to access some content or make a one-off comment or purchase. I think there is a good claim that a lot of times identity/accounts are unnecessary.

                                                  2. 2

                                                    The solution seems to be making accounts cheap (eg. via cryptography) that can be mapped on to complex identities, instead of basing everything on a complex system.

                                                    This just shifts the domain modeling problem from the system to the user. Users would end up having to keep a private dictionary of identities which they map to cryptographic identities. Managing a single cryptographic identity is difficult as it is, and shifting the burden of management to the user makes this even more difficult and error prone.

                                                    TBL’s Solid project envisions a system of identity providers whose some job it is to offer a UX to managing user identities, and I think this is an interesting solution. You pay/host (or let them mine your data) an Identity Provider that lets you manage multiple identities, and use that identity provider across the web.

                                                  1. 4

                                                    I think this is a good idea. I often get a lot of value out of side discussions, but I also get frustrated when they crowd out on-topic comments.

                                                    The first idea, giving these an explicit place to live under the regular comment section, seems like a great way to explicitly promote side discussions and separate them from on-topic discussion.

                                                    I’d also suggest that the box for posting top-level comments have an optional checkbox “tangential” so that you could send your thread to this section directly.

                                                    1. 3

                                                      Hurray! I’m very happy that you’re publishing these. The history and general practices of science and engineering are fields that interest me deeply, and I think this work on making the comparison between software and other engineering fields is really fascinating. The occasional teases on your Twitter account have been fun to follow.

                                                      Anecdotally, even as a student I noticed that the people who were least familiar with each others’ fields exaggerated the differences the most. For example, I did some grad work at a big state school where people mostly socialized in their own departments, and you sometimes heard people talk about other departments very dismissively, or with an air of mystery. (“What do you think those people do all day, anyway?”)

                                                      By contrast, I got my undergrad degree at a smaller, physically isolated engineering university, where we all more or less had to get to know each other due to sheer proximity. ;-) In that setting, the similarities tended to stand out more than the differences, and there seemed to be little question that software folks “did engineering”.

                                                      1. 3

                                                        Painting! We’re moving into a new (to us) house next week, and we’re going to paint a few rooms before we make the move.

                                                        1. 6

                                                          (Disclaimer, this got longer than I meant it to…)

                                                          The “operating system” analogy has always fallen a bit flat for me, to be honest. When I’ve tried to use it, the analogy obscures more than it actually reveals.

                                                          Kubernetes provides a mechanism to take a big collection of machines and map some set of workloads onto it, without having to make individual decisions about which workloads go on which machines, or how they’ll communicate.

                                                          You can analogize it to an OS, by pointing out that an OS allows you to run programs without specifying individual CPUs and memory. But the analogy leaks a lot when you start talking about the details, such as the individual machines still running their own OS, and the networking and distributed systems details k8s exposes that a “traditional” OS doesn’t.

                                                          In my experience, debugging issues with an application on k8s also quickly devolves into regular Linux debugging with all the standard tools. It’s hard to argue that “Kubernetes is your OS now!” when you need to start thinking about iptables again as soon as things break.

                                                          At the end of the day, Kubernetes is just another cluster job scheduler. It works in the same space as tools like Nomad and Google Borg, and older tools like batch schedulers such as PBS and Slurm. It solves some problems the other tools don’t, like abstracting over cloud provider load balancer implementations, but adds some complexity to do so. It also solves some problems less well, such as backfill scheduling for batch jobs.

                                                          If you want to explain “why Kubernetes”, I’ve had the best luck talking about the operational problems it solves. If you have a hundred machines, it makes it easier to handle problems like what happens if a node fails? Or you need to do maintenance one one of them? How about if the process crashes, or you discover it needs to move because it needs more memory? And how do you make sure that when your service moves to a different machine, all its friends can still find it?

                                                          Cluster schedulers give you tools to solve those problems. Which are totally real problems, but also typically only show up when you have 12+ machines to manage. If you don’t have those problems, you probably don’t need a cluster. 🤷‍♂️ And that’s ok!

                                                          (My day job is basically helping customers build and operate compute clusters. But usually my first question is “are you sure you need to do this?” Because they typically introduce some complexity and pain while helping you abstract over other issues.)