Deno is an impressive project. But importing URLs rather than some abstraction (package names, “@vendorname/packagename”, reverse DNS notation, anything) is a no-go for me. I 100% do not want a strict association between the logical identifier of a package and a concrete piece of internet infrastructure with a DNS resolver and an HTTP server. No thank you. I hate that this seems to be a trend now, with both Deno and Go doing it.
package names, “@vendorname/packagename” and reverse DNS notation, both in systems like Maven or NPM, are just abstractions for DNS resolvers and HTTP servers, but with extra roundtrips and complexity. The idea is to get rid of all those abstractions and provide a simple convention: Whatever the URLs is, it should never change it’s contents, so the toolchain can cache it.
Any http server with static content can act as an origin for getting your dependencies. It could be deno.land/x, raw.githubusercontent.com, esm.sh, your own nginx instance, any other thing, or all of those options combined.
Package identifiers are useful abstractions. With the abstraction in place, the package can be provided by a system package manager, or the author can change their web host and no source code (only the name -> URL mapping) needs to be changed. As an author I don’t want to promise that a piece of software will always, forevermore, be hosted on some obscure git host, or to promise that I will always keep a particular web server alive at a particular domain with a particular directory structure, I want the freedom to move to a different code hosting solution in the future, but if every user has the URL in every one of their source files I can’t do that. As a result, nobody wants to take the risk to use anything other than GitHub as their code host.
With a system which uses reverse DNS notation, I can start using a library com.randomcorp.SomePackage, then later, when the vendor stops providing the package (under that name or at all) for some reason, the code will keep working as long as I have the packages with identifier com.randomcorp.SomePackage stored somewhere. With a system which uses URLs, my software will fail to build as soon as randomcorp goes out of business, changes anything about their infrastructure which affects paths, stops supporting the library, or anything else which happens to affect the physical infrastructure my code has a dependency on.
The abstraction does add “complexity” (all abstractions do), but it’s an extremely useful abstraction which we should absolutely not abandon. Source code shouldn’t unnecessarily contain build-time dependencies on random pieces of physical Internet infrastructure.
As an author I don’t want to promise that a piece of software will always, forevermore, be hosted on some obscure git host, or to promise that I will always keep a particular web server alive at a particular domain with a particular directory structure. I want the freedom to move to a different code hosting solution in the future.
Same applies to Maven and npm, repositories are coded into the project (or the default repository being defined by the package manager itself). If a host dies and you need to use a new one, you’ll need to change something.
What happens if npm or jcenter.bintray.com stops responding? Everyone will have to change their projects to point at the new repository to get their packages.
but if every user has the URL in every one of their source files I can’t do that. As a result, nobody wants to take the risk to use anything other than GitHub as their code host.
And another note: It’s somewhat typical for companies to have an internal repository that works as a proxy for npm/maven/etc and caches all the packages in case some random host dies, that way the company release pipeline isn’t affected. Depending on the package manager and ecosystem, you’ll need very specific software for implementing this (Verdaccio for npm, for example). But with Deno, literally any off-the-shelf HTTP caching proxy will work, something way more common for systems people.
Source code shouldn’t unnecessarily contain build-time dependencies on random pieces of physical Internet infrastructure.
That’s right, but there are only two ways to make builds from source code without needing random pieces of physical internet infrastructure, and these apply for all package management solutions:
You have no dependencies at all
All your dependencies are included in the repository
The rest of solutions are just variations of dependency caching.
Although this design is simpler, it has a security vulnerability which seems unsolvable.
The scenario:
A domain expires that was hosting a popular package
A malicious actor buys the domain and hosts a malicious version of the package on it
People who have never downloaded the package before, and therefore can’t possibly have a hash/checksum of it, read blog posts/tutorials/StackOverflow answers telling them to install the popular package; they do, and get compromised.
It’s possible to prevent this with an extra layer (e.g. an index which stores hashes/checksums), but I can’t see how the “URL only” approach could even theoretically prevent this.
I think the weak link there is people blindly copy-pasting code from StackOverflow. That opens the door to a myriad of security issues, not only for Deno’s approach.
There are plenty of packages in npm with very similar names as legit, popular packages, but maybe just a letter, an underscore, or a number, differs. Enough for many people installing the wrong package, a malicious one, and getting compromised.
Same applies to domain names. Maybe someone buys den0.land and just writes it as DEN0.LAND in a forum online because anyway domains are case-insentive and the zero can hide better.
Someone could copy some random Maven host from StackOverflow and get the backdoored version of all their existing packages in their next gradle build.
Sure, in that sense, Deno is more vulnerable because the decentralisation of package-hosting domains. It’s easier for everyone to know that the “good” domain is github.com or deno.land. If any host could be a good host, any host could mislead and become a bad one, too.
For npm, we depend entirely on the fact that the domain won’t get taken over by a malicious actor without no one noticing. I think people will end up organically doing the same and getting their dependencies mostly from well-known domains as github.com or deno.land, but I think it’s important to have the option to not follow this rule strictly and have some freedom.
EDIT:
Apart from the “depending on very well-known and trusted centralised services” strategy, something more could be done to address the issue. Maybe there’s something about that in Deno 2 roadmap when it gets published. But fighting against StackOverflow blind copy-pastes is hard.
What about people checking out an old codebase on a brand new computer where that package had never been installed before?
I dunno, this just feels wrong in so many ways, and there are lots of subtle issues with it. Why not stick to something that’s been proven to work, for many language ecosystems?
Why not stick to something that’s been proven to work, for many language ecosystems?
Well, the centralized model has many issues. Currently every piece of software that runs on node, to be distributed, has to be hosted by a private company property of Microsoft. That’s a single point of failure and a whole open source ecosystem relying on a private company and a private implementation of the registry.
Also, do you remember all the issues with youtube_dl on GitHub? Imagine something similar in npm.
The single point of failure is not necessarily inherent to the centralized model though. In CHICKEN, we support multiple locations for downloading eggs. In Emacs, the package repository also allows for multiple sources. And of course APT, yum and Nix allow for multiple package sources as well. If a source goes rogue, all you have to do is remove it from your sources list and switch to a trustworthy source which mirrored the packages you’re using.
You get the URL and the hash, problem solved. You either get the code as proved by the hash or you don’t get the code.
The downside is, you then lose auto-upgrades to some new latest version, but that is usually a bad idea anyway.
Regardless, I’m in favour of 1 level of indirection, so in X years when Github goes out of business(because we all moved on to new source control tech), people can still run code without having to hack up DNS and websites and everything else just to deliver some code to the runtime.
Agreed, I don’t know of anyone that does this either. In HTTPS land, we have SRI that is in the same ballpark, though I imagine the # or sites that use SRI can be counted on 1 hand :(
It’s sort of unlikely to happen often in Go because most users use github.com as their URL. It could affect people using their own custom domain though. In that case, it would only affect new users. Existing users would continue to get the hashed locked versions they saved before from the Go module server, but new users might be affected. The Go modules server does I think have some security checks built into it, so I think if someone noticed the squatting, they could protect new users by blacklisting the URL in the Go module server. (https://sum.golang.org says to email security@golang.org if it comes up.)
So, it could happen, but people lose their usernames/passwords on NPM and PyPI too, so it doesn’t strike me as a qualitatively more dangerous situation.
OK nice, I think that mitigates most of the downsides … including the problem where someone takes over the domain. If they publish the same file it’s OK :)
While I agree with your concern in theroy, in the Go community this problem is greatly mitigated by two factors:
Most personal projects tend to just use their GitHub/GitLab/… URLs.
The Go module proxy’s cache is immutable, meaning that published versions cannot be changed retrospectively.
These two factors combined achieve the same level of security as a centralized registry. It is possible that Deno’s community will evolve in the same direction.
Glad to hear they are taking npm compatibility seriously, that’s one of the hardest hurdles to overcome imo… we all love to trash the npm ecosystem but there are enormous amounts of trusted robust code in there with all the x-pads.
Consider python 2 and 3 - compatibility tools like “six” are useful in the interim as the community transitions between incompatible languages/runtimes, even if in the final state they aren’t particularly desirable.
Codemods I get. But maintaining source compatibility with Node requires all of the terrible decisions in Node to be supported forever as well. And it also makes it much less likely there’ll be a “killer app” package that runs on Deno only.
Deno is an impressive project. But importing URLs rather than some abstraction (package names, “@vendorname/packagename”, reverse DNS notation, anything) is a no-go for me. I 100% do not want a strict association between the logical identifier of a package and a concrete piece of internet infrastructure with a DNS resolver and an HTTP server. No thank you. I hate that this seems to be a trend now, with both Deno and Go doing it.
package names, “@vendorname/packagename” and reverse DNS notation, both in systems like Maven or NPM, are just abstractions for DNS resolvers and HTTP servers, but with extra roundtrips and complexity. The idea is to get rid of all those abstractions and provide a simple convention: Whatever the URLs is, it should never change it’s contents, so the toolchain can cache it.
Any http server with static content can act as an origin for getting your dependencies. It could be deno.land/x, raw.githubusercontent.com, esm.sh, your own nginx instance, any other thing, or all of those options combined.
Package identifiers are useful abstractions. With the abstraction in place, the package can be provided by a system package manager, or the author can change their web host and no source code (only the name -> URL mapping) needs to be changed. As an author I don’t want to promise that a piece of software will always, forevermore, be hosted on some obscure git host, or to promise that I will always keep a particular web server alive at a particular domain with a particular directory structure, I want the freedom to move to a different code hosting solution in the future, but if every user has the URL in every one of their source files I can’t do that. As a result, nobody wants to take the risk to use anything other than GitHub as their code host.
With a system which uses reverse DNS notation, I can start using a library
com.randomcorp.SomePackage
, then later, when the vendor stops providing the package (under that name or at all) for some reason, the code will keep working as long as I have the packages with identifiercom.randomcorp.SomePackage
stored somewhere. With a system which uses URLs, my software will fail to build as soon as randomcorp goes out of business, changes anything about their infrastructure which affects paths, stops supporting the library, or anything else which happens to affect the physical infrastructure my code has a dependency on.The abstraction does add “complexity” (all abstractions do), but it’s an extremely useful abstraction which we should absolutely not abandon. Source code shouldn’t unnecessarily contain build-time dependencies on random pieces of physical Internet infrastructure.
That’s my view of things anyways.
Same applies to Maven and npm, repositories are coded into the project (or the default repository being defined by the package manager itself). If a host dies and you need to use a new one, you’ll need to change something.
What happens if npm or jcenter.bintray.com stops responding? Everyone will have to change their projects to point at the new repository to get their packages.
In Deno you can use an import map (And I encourage everyone to do so): https://deno.land/manual/linking_to_external_code/import_maps so all the hosts are in a single place, just one file to look at when a host dies, just like npm’s .npmrc.
There are lockfiles, too: https://deno.land/manual@v1.18.0/linking_to_external_code/integrity_checking#caching-and-lock-files.
And another note: It’s somewhat typical for companies to have an internal repository that works as a proxy for npm/maven/etc and caches all the packages in case some random host dies, that way the company release pipeline isn’t affected. Depending on the package manager and ecosystem, you’ll need very specific software for implementing this (Verdaccio for npm, for example). But with Deno, literally any off-the-shelf HTTP caching proxy will work, something way more common for systems people.
That’s right, but there are only two ways to make builds from source code without needing random pieces of physical internet infrastructure, and these apply for all package management solutions:
The rest of solutions are just variations of dependency caching.
Although this design is simpler, it has a security vulnerability which seems unsolvable.
The scenario:
It’s possible to prevent this with an extra layer (e.g. an index which stores hashes/checksums), but I can’t see how the “URL only” approach could even theoretically prevent this.
I think the weak link there is people blindly copy-pasting code from StackOverflow. That opens the door to a myriad of security issues, not only for Deno’s approach.
There are plenty of packages in npm with very similar names as legit, popular packages, but maybe just a letter, an underscore, or a number, differs. Enough for many people installing the wrong package, a malicious one, and getting compromised.
Same applies to domain names. Maybe someone buys den0.land and just writes it as DEN0.LAND in a forum online because anyway domains are case-insentive and the zero can hide better.
Someone could copy some random Maven host from StackOverflow and get the backdoored version of all their existing packages in their next
gradle build
.Sure, in that sense, Deno is more vulnerable because the decentralisation of package-hosting domains. It’s easier for everyone to know that the “good” domain is github.com or deno.land. If any host could be a good host, any host could mislead and become a bad one, too.
For npm, we depend entirely on the fact that the domain won’t get taken over by a malicious actor without no one noticing. I think people will end up organically doing the same and getting their dependencies mostly from well-known domains as github.com or deno.land, but I think it’s important to have the option to not follow this rule strictly and have some freedom.
EDIT:
Apart from the “depending on very well-known and trusted centralised services” strategy, something more could be done to address the issue. Maybe there’s something about that in Deno 2 roadmap when it gets published. But fighting against StackOverflow blind copy-pastes is hard.
What about people checking out an old codebase on a brand new computer where that package had never been installed before?
I dunno, this just feels wrong in so many ways, and there are lots of subtle issues with it. Why not stick to something that’s been proven to work, for many language ecosystems?
That’s easily solved with a lockfile, just like npm does: https://deno.land/manual/linking_to_external_code/integrity_checking
Well, the centralized model has many issues. Currently every piece of software that runs on node, to be distributed, has to be hosted by a private company property of Microsoft. That’s a single point of failure and a whole open source ecosystem relying on a private company and a private implementation of the registry.
Also, do you remember all the issues with youtube_dl on GitHub? Imagine something similar in npm.
Related to the topic: https://www.youtube.com/watch?v=MO8hZlgK5zc
Good points those. I hadn’t considered that!
The single point of failure is not necessarily inherent to the centralized model though. In CHICKEN, we support multiple locations for downloading eggs. In Emacs, the package repository also allows for multiple sources. And of course APT, yum and Nix allow for multiple package sources as well. If a source goes rogue, all you have to do is remove it from your sources list and switch to a trustworthy source which mirrored the packages you’re using.
Seems like you might want to adopt a “good practice” of only using URL imports from somehow-trusted sources, e.g. npmjs.com or unpkg or whatever.
Could have a lint rule for this with sensible, community-selected defaults as well.
If in the code where the URL is also includes a hash of the content, then assuming the hash isn’t broken, it avoids this problem.
i.e.:
You get the URL and the hash, problem solved. You either get the code as proved by the hash or you don’t get the code.
The downside is, you then lose auto-upgrades to some new latest version, but that is usually a bad idea anyway.
Regardless, I’m in favour of 1 level of indirection, so in X years when Github goes out of business(because we all moved on to new source control tech), people can still run code without having to hack up DNS and websites and everything else just to deliver some code to the runtime.
This is a cool idea, although I’ve never heard of a package management design that does this!
Agreed, I don’t know of anyone that does this either. In HTTPS land, we have SRI that is in the same ballpark, though I imagine the # or sites that use SRI can be counted on 1 hand :(
[Comment removed by author]
It’s sort of unlikely to happen often in Go because most users use github.com as their URL. It could affect people using their own custom domain though. In that case, it would only affect new users. Existing users would continue to get the hashed locked versions they saved before from the Go module server, but new users might be affected. The Go modules server does I think have some security checks built into it, so I think if someone noticed the squatting, they could protect new users by blacklisting the URL in the Go module server. (https://sum.golang.org says to email security@golang.org if it comes up.)
So, it could happen, but people lose their usernames/passwords on NPM and PyPI too, so it doesn’t strike me as a qualitatively more dangerous situation.
Does Deno do anything to help with that or enforce it?
In HTML you can apparently use subresource integrity:
https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity
It would make sense for Deno to have something like that, if it doesn’t already.
Deno does have something like that. Here’s the docs for it: https://deno.land/manual@v1.18.0/linking_to_external_code/integrity_checking.
OK nice, I think that mitigates most of the downsides … including the problem where someone takes over the domain. If they publish the same file it’s OK :)
While I agree with your concern in theroy, in the Go community this problem is greatly mitigated by two factors:
These two factors combined achieve the same level of security as a centralized registry. It is possible that Deno’s community will evolve in the same direction.
change your hosts file
Glad to hear they are taking npm compatibility seriously, that’s one of the hardest hurdles to overcome imo… we all love to trash the npm ecosystem but there are enormous amounts of trusted robust code in there with all the x-pads.
Nodejs compatibility sounds like a terrible idea. Why develop for deno at all in that case?
Consider python 2 and 3 - compatibility tools like “six” are useful in the interim as the community transitions between incompatible languages/runtimes, even if in the final state they aren’t particularly desirable.
Codemods I get. But maintaining source compatibility with Node requires all of the terrible decisions in Node to be supported forever as well. And it also makes it much less likely there’ll be a “killer app” package that runs on Deno only.