The word “micro” is doing a lot of work and I think exists on a spectrum.
The classic example, isEven, is a single function javascript library. Is that too small? I personally think so.
What about medley in Clojure? It’s a collection of ~40 functions that provide minimal and meaningful additional functionality to Clojure’s existing robust core library and can be read in a short sitting, with each function definition acting as its own documentation.
However, compare medley to another Clojure library, encore which has a significantly larger api and covers much more ground. This too is not a “framework” but is it a micro library? I don’t think so, even tho the functionality is fairly limited in scope (do some extra general stuff clojure.core doesn’t).
I guess there’s a temptation to interpret micro absolutely, but it seems like there’s probably a distinct scale for every problem/domain? Flask has long called itself a microframework. I guess there are probably more-micro web frameworks, but it’s certainly out on the minimal end.
Is a library with 1 function for validating that a user-supplied string looks like an email address micro? (Because it doesn’t bundle a bunch of other user-input validators? Because it doesn’t include batteries for generating confirmation emails?)
I’m inclined to say no, if it bites off a whole problem at a unit-of-use that is actually in demand? I guess if I am consistent that implies something is “micro” when it is a building block that doesn’t solve a whole problem on its own? (I am skeptical of these… just thinking aloud.)
I propose a middle opinion: micro libraries can be as good as a good library when they are well maintained, but do not degrade any further than cut-paste because you can pin the version. So it’s like a cut-paste like you see in older ecosystems, but with the benefit of a possible (but optional) upgrade.
IMO, micro libraries don’t seem like the solution, due to how easy it is for supply chain attacks to happen. Sounds more like a tooling problem, rather than a need for micro libraries. And these days, compilers are pretty good at optimizing away library code that isn’t needed.
What I hate about big libraries is the cognitive overhead, not the compiler overhead. With a library that does one or two things I can know when to use it. With a big library I have to go look and see if it has what I need before I go grab a smaller solution. Sometimes I can’t use my normal small library choice because the big library has an incomprtible approach baked in, etc
This is an interesting question because it largely depends on the ecosystem. I know there are a lot of complaints about micro libraries that do every little thing that add to the npm dependency hell, but coming at it from another language ecosystem (Scala), I yearn for more micro libraries. On the JVM there seems to just be tons of massive libraries that often bring in so much you don’t want or need. You didn’t hit on this in the post, but another huge benefit of micro libraries imo is the ability for someone not as familiar with the code to contribute. Contributing to a giant project with a potentially nasty build definition may deter people from even trying or being able to find where they should make a change. A lot of times these micro libraries are much easier on that front. Not really an answer to your question, but just a thought.
My biggest complaint with huge libs (beyond micro libs themselves, which end up being “the package metadata is bigger than the code for the lib” sometimes….) has been situations where mainatainers disappear and setting up the next generation of maintainers is next to impossible. I do not hold any sort of malice towards maintainers that do this (not only is nobody responsible to do anything, it’s extremely difficult to transfer things over).
BUT if there was a decent concensus-built way for a group of people to say ‘OK we believe this maintained fork is now the “canonical” version of this lib’ and for that to be communicatable to existing users of the libs, that would help a lot with package blockage.
Another thing is that package releasing in general has some friction, even when it ends up being a small script. Things like changelog management and version number bumping are all important to get right for libraries but are manual, error prone, and ultimately lead to people releasing less often (even despite there being a set of fixes in place!).
There are solutions like installing via git commit, of course, but I think all efforts to make documentation, releases and the like doable easily (read: without having to make local checkouts of the code and the like) would make maintenance easier in many situations
Keep the file tracked in source repository, metadata is as simple as a comment. The micro-package could be a good old setup.py. With some smarter parsing, it could be possible to allow edit to vendored functions and use merging techniques to apply upgrade. Could also be possible to keep multiple micro-package in the same file. This would fix many of the arguments against micro libraries: Keep them in the VCS and reviewable while keeping them easy to upgrade and test.
I really like this idea. Vendor + easy upgrades. From a usage perspective, it would force you to review changes and be lighter weight in your repo than a full package. From a mini-package creator perspective, it would great if creating a package was as simple as filling out a gist, or something equally lightweight.
Sometime safer: Unless your version pinning is a cryptographic hash, you still can’t prove that a version didn’t change. Prevent attacks where an attacker takes over a dependency.
More reliable: The dependency can be removed from the package registry, you still have a vendored copy.
Faster: When installing your package, the package comes with all its vendored dependencies, no need to do extra fetches.
It’s much more likely that people actually review the dependency code if its checked in your project.
More efficient. It actually takes less storage space since you don’t need to fetch all the metadata related to package management, just the actual code, which you still fetch and store with regular package manager.
Looking at two of my random projects, one using yarn and one using Cargo, both include a checksum of the contents in the lockfiles. Not sure if the package repo stores the checksum or if it’s computed when downloaded, but I don’t think it matters for security concerns either way. requirements.txt files don’t generally have this, but you can specify a --hash option which will enforce this.
Setting aside the evergreen debate over whether URL-based imports are the best thing ever or a giant security nightmare, I think that Deno’s module loading does offer a pretty good model for how to make publishing code as easy as publishing text: just put it on the web, make sure your Let’s Encrypt certificate is up to date, and move on. If you’re going to release multiple versions, make the version (or commit sha) part of the URL. Done!
Does that protect you from upstream authors arbitrarily swapping out incompatible versions, or sabotaging their packages? Nope. (Neither do most package managers, unless they hard pin against commits/hashes rather than semver.)
Do you still need some sort of repository/registry/search index to find 3rd party packages? Yep.
Is publishing anywhere from 5-500 lines of code really, really simple in the “modules-are-URLs” scheme? Absolutely.
IMHO the problem you get to quickly isn’t making publishing easier; there are hundreds of helpers for NPM, Go, Rust, etc. modules that streamline the default GitHub -> registry -> package manager workflows. The hard part is teasing apart the implicit dependencies, version conflicts, and runtime interactions of a deep dependency tree.
Centralized, metadata-rich registries give you another place to hook in analysis, scanning, and testing infrastructure to help with that problem, but so can “smart” local tooling.
I think they would have to get “way* smaller than the common examples. isEven, isObject, left-pad, these are all nontrivial from a copyright point of view. Though IANAL of course
“Pick your vocabulary” is my highly opinionated answer.
Your complaints revolve around different expectations of library maturity and value. A solid, well-maintained library is responsive to fixing bugs, updating to support the updates of other systems, semantic versioning and including the tests, documentation, sample code, and discussion forums we come to expect. This may not always be possible: the more specialized the library, the lower the odds of competent people providing free engineering.
You could do well to describe the solid and well maintained libraries differently than the hunks of code someone tossed out into the public with a BSD license. That is, make up words like “vetted”, “stable”, and “unsupported” using some criteria. Some options include:
Evaluation Function score. For example, 2 points for a README.md file, two points for starting a year ago, one point for activity in the last month, five points for example code, etc. “stable” might mean thirty points or more. There was an automated version of this for Python at one point.
Test your usage. Write enough tests of your own code and your code’s usage of the librarie to (hopefully) catch innocent mistakes. Good luck.
Vet or audit the code. Pay expensive humans to review all the changes. In theory, you could find a service that sold you lists of vetted versions. In practice, you find open source amalgamation platforms like Meteor.js where providing a stable open source platform is much of the value.
If a non-systemic approach (like “how can humans do this on demand”) is acceptable, I think things like the superfluous-C-compiler problem in the article should already be tractable in Nix and presumably others. In this case, by writing an expression that overrides the immediate dependency, drops its C compiler dependency, skips (or replaces) the configure/compile steps that require it, and outputs only the part you want to depend on.
My thoughts on ~systemic approaches got a bit long, so I’ll stuff them in a self-reply…
Not sure if there’s a silver-bullet systemic approach since it seems like there are a lot of tradeoffs.
I guess it’s good to tease out the main tensions? (spitballing; almost certainly incomplete…)
There’s some intrinsic logistical overhead with running each ~project, and it can be a meaningful share of total work for very small ones, so it’s also headwind at the individual level for maintaining many small projects. Especially if they’re all ~related and often end up getting bumped together.
Tradeoffs in what kind of overhead/heartburn different ecosystems create for their consumers?
Fine-grained ecosystems: It may be easier to evaluate each project, but I imagine a greater number of small projects makes discovery harder on net? Maybe greater overall effort to stay up to date? Likely to depend on a higher number of humans? Collectively these may be more of a headwind/trap for ~noobs?
Coarse-grained ecosystems: Harder to find narrow utilities (may have to code search for them instead of project search). Consuming narrow parts of a large project likely entails waste/overhead (though exactly what it looks like probably differs between ecosystems and toolchains). If the narrow part isn’t a core activity, it may not be as well tested and the maintainers may not want to spend time triaging PRs to add features or support new use-cases?
Maintainer side
Publishing multiple packages from a ~monorepo is probably pretty tractable with generic tooling/scripting? There’s maintainer overhead, but probably less than separate repos. Probably more work than something built into the ecosystem, but I don’t think I’d want to see every ecosystem independently throwing thousands of dev-hours at this problem? Unfortunately it depends on the maintainer’s ability to envision what units of re-use others will find interesting.
I’m not sure. I think it should be carefully recommended as something to ignore until there’s demand? I would be a little sad if this became dogma and I started noticing maintainers investing energy slicing a dozen sub-100-download packages out of their projects.
Consumer side
I imagine it’s addressable per ecosystem with some combination of compiler or source-analysis toolchain sophisticated enough to selectively pull in just what it needs (whether that’s like build/package time, or ~vendoring)? This strikes me as pretty ripe geekbait, so I’d imagine there’s some combination of ecosystems/toolchains already doing this, a graveyard of failed/inactive attempts, or people sawing on the problem in obscurity.
It wouldn’t have as much maintainer overhead, but smaller units of reuse wouldn’t be very discoverable. If it’s analysis-based and isn’t driven by the same tools as the core of the language, I imagine it’ll come with false positives/negatives.
I’m not sure how many implementations could actually get around the superfluous-C-compiler type of problem, especially in languages without namespacing. I have been picking at an idea that could (eventually…) make ~systemic approaches a bit more tractable in the Nix ecosystem.
I’m generating a directory of deterministic analysis artifacts (calling these lore) created by running an analysis on the “built” version of a package. For now, this analysis is crude. It’s looking for signs of exec, and signs that a shell script is just an exec wrapper. It’s only run as-needed by the Nix API for the project that needs the lore (and only generating the lore that project needs).
That said, in the longer run I hope to make the case that having standard builds in Nixpkgs generate a separate output containing lore could make it easier to build some interesting levers. Since this lore could be built and stored in the public cache, it’d be possible for packages and tools to introspect the lore. If the lore included things like a list of functions and symbol dependency graphs generated at compile-time, for example, it’d be possible to build tools that can reason about software that the local system doesn’t have (or couldn’t even build/run).
Used to be called snippets. I think this is basically what templating systems are for. Hopefully the thing I am making can become the de-facto solution for this kinda “copy-paste programming” (i.e. a general purpose metaprogramming “language”) but it’s still early days.
The word “micro” is doing a lot of work and I think exists on a spectrum.
The classic example,
isEven
, is a single function javascript library. Is that too small? I personally think so.What about
medley
in Clojure? It’s a collection of ~40 functions that provide minimal and meaningful additional functionality to Clojure’s existing robust core library and can be read in a short sitting, with each function definition acting as its own documentation.However, compare medley to another Clojure library, encore which has a significantly larger api and covers much more ground. This too is not a “framework” but is it a micro library? I don’t think so, even tho the functionality is fairly limited in scope (do some extra general stuff
clojure.core
doesn’t).I guess there’s a temptation to interpret micro absolutely, but it seems like there’s probably a distinct scale for every problem/domain? Flask has long called itself a microframework. I guess there are probably more-micro web frameworks, but it’s certainly out on the minimal end.
Is a library with 1 function for validating that a user-supplied string looks like an email address micro? (Because it doesn’t bundle a bunch of other user-input validators? Because it doesn’t include batteries for generating confirmation emails?)
I’m inclined to say no, if it bites off a whole problem at a unit-of-use that is actually in demand? I guess if I am consistent that implies something is “micro” when it is a building block that doesn’t solve a whole problem on its own? (I am skeptical of these… just thinking aloud.)
I propose a middle opinion: micro libraries can be as good as a good library when they are well maintained, but do not degrade any further than cut-paste because you can pin the version. So it’s like a cut-paste like you see in older ecosystems, but with the benefit of a possible (but optional) upgrade.
IMO, micro libraries don’t seem like the solution, due to how easy it is for supply chain attacks to happen. Sounds more like a tooling problem, rather than a need for micro libraries. And these days, compilers are pretty good at optimizing away library code that isn’t needed.
What I hate about big libraries is the cognitive overhead, not the compiler overhead. With a library that does one or two things I can know when to use it. With a big library I have to go look and see if it has what I need before I go grab a smaller solution. Sometimes I can’t use my normal small library choice because the big library has an incomprtible approach baked in, etc
Yep, this is exactly the key problem to solve in order to make them useful.
This is an interesting question because it largely depends on the ecosystem. I know there are a lot of complaints about micro libraries that do every little thing that add to the npm dependency hell, but coming at it from another language ecosystem (Scala), I yearn for more micro libraries. On the JVM there seems to just be tons of massive libraries that often bring in so much you don’t want or need. You didn’t hit on this in the post, but another huge benefit of micro libraries imo is the ability for someone not as familiar with the code to contribute. Contributing to a giant project with a potentially nasty build definition may deter people from even trying or being able to find where they should make a change. A lot of times these micro libraries are much easier on that front. Not really an answer to your question, but just a thought.
My biggest complaint with huge libs (beyond micro libs themselves, which end up being “the package metadata is bigger than the code for the lib” sometimes….) has been situations where mainatainers disappear and setting up the next generation of maintainers is next to impossible. I do not hold any sort of malice towards maintainers that do this (not only is nobody responsible to do anything, it’s extremely difficult to transfer things over).
BUT if there was a decent concensus-built way for a group of people to say ‘OK we believe this maintained fork is now the “canonical” version of this lib’ and for that to be communicatable to existing users of the libs, that would help a lot with package blockage.
Another thing is that package releasing in general has some friction, even when it ends up being a small script. Things like changelog management and version number bumping are all important to get right for libraries but are manual, error prone, and ultimately lead to people releasing less often (even despite there being a set of fixes in place!).
There are solutions like installing via git commit, of course, but I think all efforts to make documentation, releases and the like doable easily (read: without having to make local checkouts of the code and the like) would make maintenance easier in many situations
Could be nice to have a micro package manager that quickly vendor 1-file packages. Python micro libraries management could look like:
Keep the file tracked in source repository, metadata is as simple as a comment. The micro-package could be a good old setup.py. With some smarter parsing, it could be possible to allow edit to vendored functions and use merging techniques to apply upgrade. Could also be possible to keep multiple micro-package in the same file. This would fix many of the arguments against micro libraries: Keep them in the VCS and reviewable while keeping them easy to upgrade and test.
I really like this idea. Vendor + easy upgrades. From a usage perspective, it would force you to review changes and be lighter weight in your repo than a full package. From a mini-package creator perspective, it would great if creating a package was as simple as filling out a gist, or something equally lightweight.
How is this any different from version pinning? Seems equivalent, just takes up more storage space.
This is:
Looking at two of my random projects, one using
yarn
and one usingCargo
, both include a checksum of the contents in the lockfiles. Not sure if the package repo stores the checksum or if it’s computed when downloaded, but I don’t think it matters for security concerns either way.requirements.txt
files don’t generally have this, but you can specify a--hash
option which will enforce this.Your other points are still valid, of course.
Setting aside the evergreen debate over whether URL-based imports are the best thing ever or a giant security nightmare, I think that Deno’s module loading does offer a pretty good model for how to make publishing code as easy as publishing text: just put it on the web, make sure your Let’s Encrypt certificate is up to date, and move on. If you’re going to release multiple versions, make the version (or commit sha) part of the URL. Done!
Does that protect you from upstream authors arbitrarily swapping out incompatible versions, or sabotaging their packages? Nope. (Neither do most package managers, unless they hard pin against commits/hashes rather than semver.)
Do you still need some sort of repository/registry/search index to find 3rd party packages? Yep.
Is publishing anywhere from 5-500 lines of code really, really simple in the “modules-are-URLs” scheme? Absolutely.
IMHO the problem you get to quickly isn’t making publishing easier; there are hundreds of helpers for NPM, Go, Rust, etc. modules that streamline the default GitHub -> registry -> package manager workflows. The hard part is teasing apart the implicit dependencies, version conflicts, and runtime interactions of a deep dependency tree.
Centralized, metadata-rich registries give you another place to hook in analysis, scanning, and testing infrastructure to help with that problem, but so can “smart” local tooling.
When libraries get sufficiently small, are they still copyrightable? If not, then we might be able to find more efficient ways to distribute them.
I think they would have to get “way* smaller than the common examples. isEven, isObject, left-pad, these are all nontrivial from a copyright point of view. Though IANAL of course
“Pick your vocabulary” is my highly opinionated answer.
Your complaints revolve around different expectations of library maturity and value. A solid, well-maintained library is responsive to fixing bugs, updating to support the updates of other systems, semantic versioning and including the tests, documentation, sample code, and discussion forums we come to expect. This may not always be possible: the more specialized the library, the lower the odds of competent people providing free engineering.
You could do well to describe the solid and well maintained libraries differently than the hunks of code someone tossed out into the public with a BSD license. That is, make up words like “vetted”, “stable”, and “unsupported” using some criteria. Some options include:
Start by naming what you want.
If a non-systemic approach (like “how can humans do this on demand”) is acceptable, I think things like the superfluous-C-compiler problem in the article should already be tractable in Nix and presumably others. In this case, by writing an expression that overrides the immediate dependency, drops its C compiler dependency, skips (or replaces) the configure/compile steps that require it, and outputs only the part you want to depend on.
My thoughts on ~systemic approaches got a bit long, so I’ll stuff them in a self-reply…
Not sure if there’s a silver-bullet systemic approach since it seems like there are a lot of tradeoffs.
I guess it’s good to tease out the main tensions? (spitballing; almost certainly incomplete…)
Maintainer side
Publishing multiple packages from a ~monorepo is probably pretty tractable with generic tooling/scripting? There’s maintainer overhead, but probably less than separate repos. Probably more work than something built into the ecosystem, but I don’t think I’d want to see every ecosystem independently throwing thousands of dev-hours at this problem? Unfortunately it depends on the maintainer’s ability to envision what units of re-use others will find interesting.
I’ve never intentionally looked, but I have noticed that gitoxide does this in the rust ecosystem: https://crates.io/search?q=gitoxide https://github.com/Byron/gitoxide
I’m not sure. I think it should be carefully recommended as something to ignore until there’s demand? I would be a little sad if this became dogma and I started noticing maintainers investing energy slicing a dozen sub-100-download packages out of their projects.
Consumer sideI imagine it’s addressable per ecosystem with some combination of compiler or source-analysis toolchain sophisticated enough to selectively pull in just what it needs (whether that’s like build/package time, or ~vendoring)? This strikes me as pretty ripe geekbait, so I’d imagine there’s some combination of ecosystems/toolchains already doing this, a graveyard of failed/inactive attempts, or people sawing on the problem in obscurity.
It wouldn’t have as much maintainer overhead, but smaller units of reuse wouldn’t be very discoverable. If it’s analysis-based and isn’t driven by the same tools as the core of the language, I imagine it’ll come with false positives/negatives.
I’m not sure how many implementations could actually get around the superfluous-C-compiler type of problem, especially in languages without namespacing. I have been picking at an idea that could (eventually…) make ~systemic approaches a bit more tractable in the Nix ecosystem.
I’m generating a directory of deterministic analysis artifacts (calling these lore) created by running an analysis on the “built” version of a package. For now, this analysis is crude. It’s looking for signs of exec, and signs that a shell script is just an exec wrapper. It’s only run as-needed by the Nix API for the project that needs the lore (and only generating the lore that project needs).
That said, in the longer run I hope to make the case that having standard builds in Nixpkgs generate a separate output containing lore could make it easier to build some interesting levers. Since this lore could be built and stored in the public cache, it’d be possible for packages and tools to introspect the lore. If the lore included things like a list of functions and symbol dependency graphs generated at compile-time, for example, it’d be possible to build tools that can reason about software that the local system doesn’t have (or couldn’t even build/run).
Used to be called snippets. I think this is basically what templating systems are for. Hopefully the thing I am making can become the de-facto solution for this kinda “copy-paste programming” (i.e. a general purpose metaprogramming “language”) but it’s still early days.