This kind of sensationalized write-up, referring to typosquatting in scary terms as a “supply chain attack”, etc., is not particularly helpful, and is doubly unhelpful for being unwilling to actually suggest any solution.
My go-to example is the HIBP Pwned Passwords API; there are I don’t know how many packages on PyPI for interacting with it, including a bunch that are designed to hook into popular web frameworks like Django. I maintain one of them. But it’s inevitable that the names of those packages are all pretty similar, and if you’re not taking care to check that you really are getting the one you thought you were getting, you might get a different one! pwned-passwords-django, django-pwned-passwords, django-pwned-validator, and django-pwnedpasswords-validator all exist and are different legitimate packages, for example.
How would you suggest PyPI pick a “winner” there? First to upload? That rewards people who just land-rush every time there’s some new site or service to interact with, even if their packages are crap. Run a trial period of allowing multiple, then pick the most popular? That rewards people who can influence downloads, which again is not a measure of trustworthiness or quality. Try to impose some sort of quality metric and pick the “best” package? How would you even do that in a generic and neutral way?
But without doing something like the above, you can’t pre-empt typosquatting in community package repositories like RubyGems, PyPI, npm, etc. What would the author of this post want to see happen to fix this?
(also, I object strongly to the phrase “supply chain attack” as applied to this type of issue, because I feel it should be reserved for the case where someone manages to take over or otherwise corrupt a legitimate package)
I guess it’s possible to prevent typosquatting by modifying the namespace – requiring the Levenshtein distance between names to be >2 or whatever. That helps with one class of errors but not with the package name examples you brought up. It also doesn’t help with the case of a patient attacker where the package performs some useful function at first but has malicious code added later.
A radical solution Elm went for is only allowing pure functions in packages. That way, you have to wire in all the effects explicitly, so there’s no way to sneak in cryptocurrency mining or anything else.
This is the first I’ve heard of Elm only allowing pure functions. That’s really neat, and makes more sense the more I think about it.
I mean, you can still ultimately trick people into adding malicious code into the application, but I think it makes the barrier higher. At least you can’t have things like a third party date picker mining cryptocurrencies and making HTTP requests completely unbeknown to you.
The phrase “supply chain attack” applies generally to any attempt to use insecure parts of an infrastructure to damage or harm users of that infrastructure; it applies perfectly here, and isn’t at all sensational. It’s also been an accepted term for these sort of attacks — and using typosquatting to attempt to steal Bitcoin is certainly an attack — for some time.
There’s also no need to “pick a winner”… if you were trying to design a more robust system you’d require cryptographic signing of all packages with a verified public identity, ultimately pointing back to either an individual human or a legal entity, and you’d e ventually require some sort of escrow stake / proof of work as a natural and desirable barrier for entry. There’s a reason why Microsoft and Apple both make their distribution mechanisms pay-for-play.
A progressive approach would be to create a walled garden within PyPi and only allow packages into it that were so signed and which the community had voted to include and which had gone through some level of verification of trust… so requests would get in, but requestz wouldn’t unless the community thought it was a good idea for two such similar names to co-exist within the garden (hint: it’s not a good idea). Corporate and other users could then whitelist to allow only the packages in that garden, if they chose, by just configuring pip. Arguably the garden should in the end be the default configuration and someone should have to do a bit or work to allow packages from outside of it.
This isn’t really a new problem… ultimately every package ecosystem will need to evolve a means to distinguish between “official/trusted” packages and “community/untrusted” ones via some mechanism. OS distributions know this off the bat, the problem is that so far programming language communities have chosen to punt this problem down the road because it’s only an issue when your language achieves a certain level of ubiquity that it becomes a problem. Clearly it’s a problem for Ruby and Python.