1. 35
  1. 16

    I’m kindof wondering if the right way to think about this is not so much an issue of the number or size of packages that are dependencies, but the number of maintainers who are dependencies. Ultimately, whether two independent functions are part of the same package or two different ones maintained by the same person is a fairly shallow question. The micropackage approach is bad mainly in that it makes maintainership harder to understand.

    One thing I think both Elm and Go do right is that they don’t hide the maintainer’s name in the dependency; Go just does import by repository path, so you can tell by looking at your dependency list that e.g. all six of those packages are maintained by the same person. Elm denotes packages as user/repo; I’m not a fan of the fact that they tie their package manager to GitHub, but it at least doesn’t hide this.

    Almost every other language package manager does this wrong; when you do e.g. pip install foo, there is no indication whatsoever about who that package is coming from.

    With distro package managers like apt, it’s okay for these names to be unqualified since the whole repository is curated by the distro maintainers. But in the absence of curation maintainership should be explicit.

    1. 3

      With distro package managers like apt, it’s okay for these names to be unqualified since the whole repository is curated by the distro maintainers.

      I would say this is a problem even for distro package managers, at least for “universe”-like repositories. It’s pretty common for a package to disappear from one version of Ubuntu / Debian to the next because the maintainer disappeared and no one else picked it up. That being said, I agree with you in general.

      1. 3

        One thing about Go, you can use any Git host, not just GitHub, and it even works with Mercurial, SVN, etc.

        1. 2

          [maybe it’s] not so much an issue of the number or size of packages that are dependencies, but the number of maintainers who are dependencies.

          I really like this idea. And it seems like something that would be very easy to add to existing package managers (e.g., changing the final output from installed X packages in Y seconds to installed X packages from Y authors in Z seconds.

          But I have a question: do you think the relevant number is the number of organizations that are maintainers or the number of (natural) persons who are maintainers? Your comment seemed to treat these as always being the same, but they are often (very) different. I can see arguments for either, so I’m interested in which you meant.

          1. 1

            I think it makes sense to treat organizations as a single maintainer.

        2. 14

          The problem with left pad had nothing to do with number or size of dependencies.id had only to do with a practise of depending on an external, unarchived, mutable code store not under your control for production builds.

          If you have a local copy of left-pad, everything works fine.

          1. 13

            The left-pad fiasco served to illuminate two problems endemic to modern popular programming culture (besides the issue that packages could be retracted, which was a technical issue and has been fixed):

            • A mind-boggling amount of packages (eventually) depend on what’s essentially a very trivial feature that definitely would normally be better placed in the standard library, or defined in packages as a custom helper method. A library’s size doesn’t increase meaningfully by defining it themselves (and perhaps even inlining it).
            • Nobody knew that they even had that dependency (transitively). i.e., people were relying on code without being aware. That’s not a good thing, however you slice it.
            1. 5

              I do not agree with either of these points.

              No code is better off duplicated everywhere rather than being shared. Being in a “standard library” vs a “package” is splitting hairs.

              My programs all depend on lots of things I’m not fully aware of. Common OS components, CA stores, firmwares, and yes the dependencies of my dependencies. Not having to put every piece of the computer into my head in order to get some work done is the while point of abstraction.

              1. 5

                No code is better off duplicated everywhere rather than being shared.

                Code needs to be maintained. Code you rely on can be updated in incompatible ways, can break due to external factors, can have security problems etc. And it can fall unmaintained. If you rely on a piece of code you don’t understand and it breaks, you’ll still have to fix it. Overly generic libraries are often bloated and offer more than you need, and this can get in the way. You might be relying heavily on a fringe feature of the library which nobody else is using but which is important to you.

                Besides, within a project you’re likely to be re-using the same code more than once anyway, so it’s not like there’s no sharing going on.

                Being in a “standard library” vs a “package” is splitting hairs.

                Fair enough, and in fact I would argue it’s often better to have things external rather than the standard library (see the comment in the OP about batteries leaking acid). But packaging up trivial features is frivolous and generates unnecessary churn (i.e. more stuff to download, more licenses and versions to keep track of etc). And ironically, if there are more libraries doing the same thing in an ecosystem, you end up not sharing the code, as different of your direct dependencies pull in various different packages to achieve the same purpose.

                Not having to put every piece of the computer into my head in order to get some work done is the while point of abstraction.

                Abstraction isn’t about never having to care about anything. Every component you add has a cost. Yes, abstractions allow you to momentarily pretend the underlying things aren’t there, but they are still there. The art and science of programming is about knowing when you need to look below the abstraction. And I would argue you should always keep the edges of your abstraction boundaries in your “peripheral vision”, so to speak.

                My main gripe with overly theoretical education is that everything below the abstraction is typically swept completely under the rug. I’ve seen this so often with colleagues: “we don’t have to look into that because it’s a black box” and then you open up the black box to find a huge can of worms that was totally avoidable, but now you have a pile of code that relies on the specific API that “abstraction” offers with no way to switch to something else. And no, adding another abstraction (i.e, even more code) to hide the specific API is not the answer (though predictably that’s the first thing you’ll hear from the same people who got you into this mess)

                Of course I’m not advocating to avoid all dependencies (you wouldn’t get anything done!), but adding another dependency shouldn’t be the go to solution for all your problems.

            2. 2

              And after this incident the specific problem was fixed: authors aren’t allowed to delete packages on a whim any more.

              1. 2

                Yeah, that’s what I was referring to when I said

                the developer removed the package (in a way that couldn’t happen anymore for reasons not relevant here)

                But the post could certainly have been more explicit on that point.

            3. 10

              My quibble with all discussions I read about the dependency problem—this one included—is how big a “thing” is in the “do one thing, and do it well” mantra. It’s probably so highly variable and problem-dependent that you should, at best, take the Unix philosophy as a guiding principle and don’t get too attached to it.

              Here’s a pattern I’ve experienced seemingly countless times: break down a problem into smaller parts, put them together to get a solution, notice that it’s kind of hard to follow or slow. I then put it together into a “monolith” and it’s actually better in terms of comprehensability and performance. (The breaking down of the problem is quite a good exercise, though, for actually understanding the problem.)

              This might manifest itself as a solution S made up of some combination of libraries A, B, and C. But, it turns out, A, B, and C are not used anywhere else. Rewriting S to get rid of A, B, C means I don’t have to manage the connections between them and now S’s implementation is easier to both understand and build. And it performs better.

              What happens sometimes (and seemingly a lot to me…) is that S is smaller than the sum of its parts. So when the article says,

              This means that a library with only a few lines is much more likely to be correct – and thus can be said to better follow the Unix philosophy of doing just one thing.

              you have to be really careful interpreting that, because where does the boundary of a library begin and end?

              1. 17

                I don’t find the whole “Unix philosophy” thing to be useful in any real sense. It’s either tautologically true or collapses into definitional pedantry.

                1. 3

                  I think that’s why it’s called “Unix philosophy” and not “Unix dogma” or “Unix commandments”.

                  It’s also why I really don’t like the term “Best practices”. It’s usually stuff I’d recommend as well, but usually as something to keep in mind and not some rule to blindly follow and it’s just shocking sometimes how much nonsense comes out of that, because of people, often with good intentions blindly follow some “Best practice” they came across, sometimes in a really specific context, that doesn’t apply at all where it is used. Even worse, when a “Best practice” approach is believed to be used, but actually is misunderstood and pretty much the opposite is done.

                  One could say Best Practices are the best practice, unless they are not.

                  A classical example is “You never want to use this flag” unless you do, which is why it’s there.

                  1. 2

                    My quibble with all discussions I read about the dependency problem—this one included—is how big a “thing” is in the “do one thing, and do it well” mantra.

                    I agree. In fact, I said pretty much the same thing in the OP:

                    But what this ignores is that “one thing” is not well defined. Consider the output from the ls command.

                    I also agree that we should “take the Unix philosophy as a guiding principle [without getting] too attached to it”. But even as a guiding principle, it’s worth (imo) putting some thought into how to balance the do-one-thing principle with other design goals – or else it risks becoming a “guiding principle” that’s too fuzzy to actually provide guidance.

                  2. 9

                    Avoiding external dependencies when possible and reasonable doesn’t preclude modularity in your own code. “External dependencies for everything XOR spaghetti-code monolith” is a false choice.

                    1. 1

                      Yeah, 100% agree. I hope my article didn’t make it sound like I think that’s an XOR choice, because I don’t.

                      I think that those are two ends of a spectrum. Every project needs to decide where to fall on that spectrum, which could be at one extreme or the other but is more often somewhere in the middle.

                    2. 5

                      I’m all for left-pad-sized packages.

                      • Small packages benefit from isolation and clear public interfaces. In a large framework two different features could interact with each other behind the scenes. OTOH two features from two different packages won’t have a hidden shared state.

                      • Small dependencies are easier to code review. When a package does one thing, I can read it, check if it really does the thing. Difficulty of understanding all code in dependencies grows linearly with small packages, but superlinearly within large packages. Bigger packages that do more usually have more layers of abstraction internally, and more places for different features to interact with each other.

                      1. 2

                        I’m all for left-pad-sized packages

                        Me too. What I’m not for is thousands of dependencies.

                        In a lot of ways, the OP was my attempt to figure out how we can get more of the first without also making the second more likely.

                      2. 3

                        have a great standard library

                        Yes. As I understand it, the lack of one is why the left-pad package existed at all.

                        1. 1

                          Eh, I don’t know why it existed but probably not. Left-pad already existed in JS stdlib. It’s padStart. You can open a blank browser tab and get to a JS console to see this work with no libs.

                          "hello".padStart(10, "$")
                          "$$$$$hello"
                          

                          The author is implying that there is some enforcement but there’s not. You could publish left-pad on Raku and there it would be.

                          As a concrete example: no one would ever write a left-pad package in Raku because the standard library already has sprintf and ‘%5s’.sprintf($str) already does the job of left-pad($str, 5).

                          Someone already did write left-pad and the JS standard library already had the equivalent of sprintf. The closest enforcement mechanism I know of is Elm which has language specific features around API breakage. I’m not sure how pulling an Elm package would impact things exactly. It still wouldn’t prevent someone from writing redundant code. It would however (strongly) prevent you from API breakage on your app when you bump dependencies.

                          I liked the rest of the article, good take. TIL about Raku. I did Perl a long time ago.

                          1. 2

                            padStart was added to the library after the leftpad debacle.

                            1. 2

                              Ah. Whelp.

                              update: Hmm well it was proposed before but accepted after I guess?

                              • Commit in 2015: https://github.com/tc39/proposal-string-pad-start-end
                              • Incident was in 2016. Which I guess is besides the point, it didn’t exist. Someone had to write it at that moment in time. And maybe the incident caused the proposal to be accepted. I see. TIL