I’m genuinely open to discussion about this question, and I really do want inputs from experts across the field of software engineering. I do think that there’s an obvious answer, “no,” but I don’t think that it is especially well-justified. Are packages a sociological phenomenon, or an engineering phenomenon? I’m not sure!
The answer on the linked page closely mirrors my opinion. Modules are a unit of composition, packages are a unit of distribution. They solve distinct problems. A package may provide multiple modules but updating the set is atomic: you never have mismatches versions of modules within a package.
This also means that a package is a more useful unit of visibility than a module: a set of closely related modules may see each others’ implementation details because anyone who can break the internal interfaces is guaranteed to have access to all of the code that needs to be fixed (this is the property that visibility qualifiers are trying to solve: remarking the world into things where changes break your code and where changes break other people’s code).
Or, to put it another way, packages are an atomic unit from the perspective of the producer, modules are an atomic unit from the perspective of the consumer.
If you conflate packages and modules then you end up making modules that are large (and, therefore don’t have a clear separation of concerns) because that’s the only way of guaranteeing that consumers treat your project as atomic.
I’m definitely of the opinion that distinguishing between packages and modules has value. Separating versioning and external visibility from the module boundary makes software maintenance easier, and allowing cyclic references across modules within the same package removes the busy work of moving code around to avoid import cycles.
And if it’s a compiled language, allowing multiple modules with different visibility barriers in the same compilation unit allows the developer to optimize compile times.
More generally, my takeaway is that semantically different things (e.g. compilation units, versions, visibility) should be distinct and not unnecessarily coupled.
How shall we classify Deno for example? You can import a single function directly from a URL. It looks like importing a module. You can reference a module by URL and import as if you were really just importing a module
Kenogo’s answer is pretty good, but I’d like to critique it. It says:
Packages provide a critical source of namespacing. Modules aren’t guaranteed unique names, and even with the right combination of imports and exports, at some point you’d need disambiguation tools if you are drawing from two modules with the same name. If module names are made universally unique, you can begin to blur these two concepts.
Packages provide control of versioning. This becomes especially critical when multiple modules each depend on different versions of the same module (e.g. module A depends on C-0.8, B on C-1.2). A language’s package management system must disambiguate and link modules to their intended targets. If modules carry optional version information in their names, you can begin to blur these two concepts.
Packages provide control of sourcing. The location and method of requisition of a package is all tied up at the package level: do you pull it from GitHub? From a public package repository? From your company’s private artifact store? This part of the problem could theoretically be dealt with at the module level, but it’s increasingly noisy to do so. However, if there are tools to abstract away package requisition, you can indeed begin to blur these two concepts.
This sets out the important properties of packages as having a global namespace, supporting versions and version IDs, and “control of sourcing”.
To this, I would add: package metadata (author, licence, etc). For a language I designed, there can’t be recursive dependencies between different packages, whereas this is okay for modules. The technical constraints that motivated “no recursive dependencies between packages” might not apply to all languages?
None of these properties place requirements on the structure of the code inside a package. To meet these requirements, a package doesn’t have to be a collection of named modules. (In a language I designed, the code part of a package is an expression.) The actual choice of package structure will depend on the specific programming language.
Edit: WHOOPS. I saw the similarites in the comment from the SO page, and assumed you were the author and copied it here, too. I think this response is still relevant, but it’s not a direct response to you, rather, additional commentary on the post which you’re commenting on. Sorry for the confusion!
Packages provide a critical source of namespacing.
How? The definition given to define “packages” doesn’t seem to say anything other than “a named collection of modules.” Based on that loose definition, the only namespacing really provided is by the module’s hierarchy / names. And, hierarchy can be collapsed into longer module names…
Packages provide control of versioning.
module A vs. module Av1
Packages provide control of sourcing.
My response here is that vendoring dependencies is a very reasonable, and depending on the author / project, even preferable than reference based sourcing. And, if you’re vendoring, you don’t necessarily need packages, just the modules within the “package” that you’re utilizing.
–
I do think that most people would prefer to operate in a language with packages the way you describe them. But, I see this as “convenience” not “necessity.”
Yes! Joe’s musing here was one of my big motivations in asking this question; if a language denotes functions, then surely we can just distribute the functions directly.
Other people have said this better than I have, but it’s the same situation as with git; the data itself has a True Name, and then separately humans attach layers of convenience names on top of that, because otherwise there would be no hope of ever finding anything. Ramsey Nasser’s talk on this is also very insightful: https://www.deconstructconf.com/2019/ramsey-nasser-a-personal-computer-for-children-of-all-cultures
Right. So, in git, my branch names don’t have to match anybody else’s branch names. If I pull from somebody else’s repository, then I get their branch names, isolated into their own namespace, not overwriting my names.
We might want something similar for functions. My names for functions don’t have to match anybody else’s, and pulling their collection should give me isolated access to their namespace.
This reminds me a lot of Unison’s idea of having content-addressed functions, where “content” in this case refers to ASTs. This doesn’t completely solve the problem of naming. In some ways it makes it more complicated. But it does make solving other problems like eliminating code duplication and optimizing build performance easier.
You definitely want this for functions! The ability to define and publish your own dictionaries is really valuable for people who speak different spoken languages, but also across different languages that run on the same runtime; for instance, Lua has strict limitations on what constitutes a valid identifier and thus usually sticks to snake_case and camelCase, but Fennel (which runs on the same runtime) doesn’t have these limitations and will let you do kebab-case and put question marks and exclamation points, and non-latin characters in your identifiers.
I will also critique the answer given in the link.
Packages provide a critical source of namespacing. Modules aren’t guaranteed unique names,
Neither are packages. What prevents two packages from having the same name? Is it that a package index where the packages are uploaded must check for uniqueness? In that case isn’t it the package index software that provides that guarantee, not the package itself? After all, if modules were the top-level objects uploaded to a ‘module index’, the index software could just as easily check for module name uniqueness.
Packages provide control of versioning. This becomes especially critical when multiple modules each depend on different versions of the same module (e.g. module A depends on C-0.8, B on C-1.2).
Assuming we are talking about OCaml, this is not possible. OCaml does not allow linking together different versions of the same package. So the question is moot.
Packages provide control of sourcing. The location and method of requisition of a package
Can be provided by a little-known standard called the ‘Uniform Resource Locator’. E.g. https://example.com/MyModule
D does not have packages, only modules. (It says it has “packages”, but it just uses that as a term for ‘all but the last part of the module path’.) As a near-direct consequence, I can define a project A that contains a module ‘foo’ and that depends (in the build tool) on a project B - which imports ‘foo’!
First-class packages would have prevented this.
So in general, we need packages because packages represent a part of the domain - they’re the unit of code distribution, versioning and dependency. In my language Neat, modules in a package can only import modules from a dependency package, and its own modules always have priority. This prevents package name collisions and order sensitivity.
This reads like you’re looking for a particular answer, and if you have strong opinions, you should write an answer to your own question.
I’m genuinely open to discussion about this question, and I really do want inputs from experts across the field of software engineering. I do think that there’s an obvious answer, “no,” but I don’t think that it is especially well-justified. Are packages a sociological phenomenon, or an engineering phenomenon? I’m not sure!
The answer on the linked page closely mirrors my opinion. Modules are a unit of composition, packages are a unit of distribution. They solve distinct problems. A package may provide multiple modules but updating the set is atomic: you never have mismatches versions of modules within a package.
This also means that a package is a more useful unit of visibility than a module: a set of closely related modules may see each others’ implementation details because anyone who can break the internal interfaces is guaranteed to have access to all of the code that needs to be fixed (this is the property that visibility qualifiers are trying to solve: remarking the world into things where changes break your code and where changes break other people’s code).
Or, to put it another way, packages are an atomic unit from the perspective of the producer, modules are an atomic unit from the perspective of the consumer.
If you conflate packages and modules then you end up making modules that are large (and, therefore don’t have a clear separation of concerns) because that’s the only way of guaranteeing that consumers treat your project as atomic.
I’m definitely of the opinion that distinguishing between packages and modules has value. Separating versioning and external visibility from the module boundary makes software maintenance easier, and allowing cyclic references across modules within the same package removes the busy work of moving code around to avoid import cycles.
And if it’s a compiled language, allowing multiple modules with different visibility barriers in the same compilation unit allows the developer to optimize compile times.
More generally, my takeaway is that semantically different things (e.g. compilation units, versions, visibility) should be distinct and not unnecessarily coupled.
How shall we classify Deno for example? You can import a single function directly from a URL. It looks like importing a module. You can reference a module by URL and import as if you were really just importing a module
Kenogo’s answer is pretty good, but I’d like to critique it. It says:
Packages provide a critical source of namespacing. Modules aren’t guaranteed unique names, and even with the right combination of imports and exports, at some point you’d need disambiguation tools if you are drawing from two modules with the same name. If module names are made universally unique, you can begin to blur these two concepts.
Packages provide control of versioning. This becomes especially critical when multiple modules each depend on different versions of the same module (e.g. module A depends on C-0.8, B on C-1.2). A language’s package management system must disambiguate and link modules to their intended targets. If modules carry optional version information in their names, you can begin to blur these two concepts.
Packages provide control of sourcing. The location and method of requisition of a package is all tied up at the package level: do you pull it from GitHub? From a public package repository? From your company’s private artifact store? This part of the problem could theoretically be dealt with at the module level, but it’s increasingly noisy to do so. However, if there are tools to abstract away package requisition, you can indeed begin to blur these two concepts.
This sets out the important properties of packages as having a global namespace, supporting versions and version IDs, and “control of sourcing”.
To this, I would add: package metadata (author, licence, etc). For a language I designed, there can’t be recursive dependencies between different packages, whereas this is okay for modules. The technical constraints that motivated “no recursive dependencies between packages” might not apply to all languages?
None of these properties place requirements on the structure of the code inside a package. To meet these requirements, a package doesn’t have to be a collection of named modules. (In a language I designed, the code part of a package is an expression.) The actual choice of package structure will depend on the specific programming language.
Edit: WHOOPS. I saw the similarites in the comment from the SO page, and assumed you were the author and copied it here, too. I think this response is still relevant, but it’s not a direct response to you, rather, additional commentary on the post which you’re commenting on. Sorry for the confusion!
How? The definition given to define “packages” doesn’t seem to say anything other than “a named collection of modules.” Based on that loose definition, the only namespacing really provided is by the module’s hierarchy / names. And, hierarchy can be collapsed into longer module names…
module A
vs.module Av1
My response here is that vendoring dependencies is a very reasonable, and depending on the author / project, even preferable than reference based sourcing. And, if you’re vendoring, you don’t necessarily need packages, just the modules within the “package” that you’re utilizing.
–
I do think that most people would prefer to operate in a language with packages the way you describe them. But, I see this as “convenience” not “necessity.”
I don’t have an answer here but Joe Armstrong’s take is insightful: https://erlang.org/pipermail/erlang-questions/2011-May/058768.html
Yes! Joe’s musing here was one of my big motivations in asking this question; if a language denotes functions, then surely we can just distribute the functions directly.
Other people have said this better than I have, but it’s the same situation as with git; the data itself has a True Name, and then separately humans attach layers of convenience names on top of that, because otherwise there would be no hope of ever finding anything. Ramsey Nasser’s talk on this is also very insightful: https://www.deconstructconf.com/2019/ramsey-nasser-a-personal-computer-for-children-of-all-cultures
Right. So, in git, my branch names don’t have to match anybody else’s branch names. If I pull from somebody else’s repository, then I get their branch names, isolated into their own namespace, not overwriting my names.
We might want something similar for functions. My names for functions don’t have to match anybody else’s, and pulling their collection should give me isolated access to their namespace.
This reminds me a lot of Unison’s idea of having content-addressed functions, where “content” in this case refers to ASTs. This doesn’t completely solve the problem of naming. In some ways it makes it more complicated. But it does make solving other problems like eliminating code duplication and optimizing build performance easier.
You definitely want this for functions! The ability to define and publish your own dictionaries is really valuable for people who speak different spoken languages, but also across different languages that run on the same runtime; for instance, Lua has strict limitations on what constitutes a valid identifier and thus usually sticks to snake_case and camelCase, but Fennel (which runs on the same runtime) doesn’t have these limitations and will let you do kebab-case and put question marks and exclamation points, and non-latin characters in your identifiers.
I will also critique the answer given in the link.
Neither are packages. What prevents two packages from having the same name? Is it that a package index where the packages are uploaded must check for uniqueness? In that case isn’t it the package index software that provides that guarantee, not the package itself? After all, if modules were the top-level objects uploaded to a ‘module index’, the index software could just as easily check for module name uniqueness.
Assuming we are talking about OCaml, this is not possible. OCaml does not allow linking together different versions of the same package. So the question is moot.
Can be provided by a little-known standard called the ‘Uniform Resource Locator’. E.g. https://example.com/MyModule
D does not have packages, only modules. (It says it has “packages”, but it just uses that as a term for ‘all but the last part of the module path’.) As a near-direct consequence, I can define a project A that contains a module ‘foo’ and that depends (in the build tool) on a project B - which imports ‘foo’!
First-class packages would have prevented this.
So in general, we need packages because packages represent a part of the domain - they’re the unit of code distribution, versioning and dependency. In my language Neat, modules in a package can only import modules from a dependency package, and its own modules always have priority. This prevents package name collisions and order sensitivity.