For the example, its actually really useful to allow a program to listen on either a unix socket or a tcp socket just by a command line flag. The string arg is useful when passed through to users.
Yes. The fact that an address is a mostly-opaque string to programs allowed plan 9 to add support for ipv6, without touching userspace networking APIs at all.
Often, the best thing your program can do is treat data the user passes it as an opaque blob, for as much of its travel through your code as possible.
Yeah I would call that a form of polymorphism, just like file descriptors.
Unix could have had different data types for disk files, pipes, and sockets, instead of using integers, but then you would lose the polymorphism. You wouldn’t be able to write select(), and shell couldn’t have redirects.
It also same caveats because there are some things you can do on disk files that you can’t do on the others (seek()).
Still, given the constraints of C, the untyped design is better.
Basically text is “untyped” and therefore you get generic operations / polymorphism (copy, diff, merge, etc.).
Types can inhibit composition.
(I know somebody is going to bring up more advanced type systems. I still want to see someone write a useful operating system that way. I’m sure it can be done but it hasn’t so far AFAIK. Probably somewhat because of the expression problem – i.e. you want extensibility in both data types (files, sockets) and operations (read, write, seek) across a stable API boundary. )
Basically text is “untyped” and therefore you get generic operations / polymorphism (copy, diff, merge, etc.).
“Sir, this is an Arby’s.”
(Go already has this problem. I do not see your point, nor your point about the expression problem, as the number of address types is enumerable, and set in stone..)
The “challenge” is to show me an operating system WITH fine-grained types that does NOT have the O(M*N) explosion of code. That is, show me how it solves the polymorphism problem. Here’s a canonical situation of M data types and N operations, and you can generalize it with more in each dimension (M and N both get bigger):
The data types: the operating system has persistent disk files, IPC like pipes, and networking to remote machines.
Now for the “operations”
How do you simultaneously wait on events from files, IPC and networks? Like select() or inotify().
Ditto with signals and process exits – waitfd(), signalfd(), etc. (Unix fails here so Linux invented a few more mechanisms).
How do you do redirects? A shell can redirect from a file or a pipe. (Both Rob Pike and DJB complained about the lack of compositionality for sockets: https://cr.yp.to/tcpip/twofd.html. Pike doesn’t like the Berkeley socket API because it’s non-compositional. That’s why both Plan 9 and Go have something different AFAIK)
How do you copy from a disk to IPC, disk to network, IPC to network, etc. In Unix, cat can work in place of cp. netcat also sort of works, but I think Plan 9 does better because networking is more unified.
The connection is:
When you have fine-grained types, you get O(M * N) problems, and the expression problem arises out of that. You have M data types and N operations.
If you have a single type like a file descriptor or a string, then you don’t have an O(M * N) problem. So you don’t have a composition problem, and the resulting explosion of code.
In other words, types can inhibit composition. Again, not saying it can’t be done, but just that I haven’t seen an OS that addresses this.
Plan 9 is more compositional precisely because it has FEWER types, not more. As mentioned, opaque strings are used as addresses.
(Rich Hickey also has a great talk on this about the Java HTTP Request interface. It’s a type in the Java standard library that inhibits composition and generic operations. Link appreciated from anyone who knows what I’m talking about.)
Kubernetes has a similar issue as far as I can tell:
I’m not familiar with the details, but I used the “predecessor” Borg for many years and it definitely has some composition problems. The Kubenetes ecosystem has a severe O(M * N) code explosion problem. Unix has much less of that, and Plan 9 probably has even less.
A program is generally exponentially complicated by the number of notions that it invents for itself. To reduce this complication to a minimum, you have to make the number of notions zero or one, which are two numbers that can be raised to any power without disturbing this concept. Since you cannot achieve much with zero notions, it is my belief that you should base systems on a single notion.
“Single notion” means ONE TYPE. Now this is taken to an extreme – Unix obviously does have both file descriptors and strings. But it uses those concepts for lots of things that would be modelled as separate types in more naive systems.
Many familiar computing ‘concepts’ are missing from UNIX. Files have no records. There are no access methods. User programs contain no system buffers. There are no file types. These concepts fill a much-needed gap.
Records are types. Unix lacks records. That’s a feature and not a bug. Records should be and ARE layered on top.
In distributed systems, types should be layered on top of untyped byte streams.
(This is going to be on the Oil blog. Many people have problems seeing this because it’s an issue of architecture and not code. It’s a systems design issue.)
The problem is that different instances support different functionality. Seek works on some file descriptors, but not others. Ioctl is a horrific mess.
without touching userspace networking APIs at all.
This kind of statement sounds amazing, but has about 10 *’s next to it pointing out all the caveats. User space code still needed to change to account for new address inputs, otherwise who’s to say that “window.alert()” isn’t a valid address? You’re gonna pass arbitrary strings to that syscall?
No. Create a constraint on the interface that ensures the caller isn’t an idiot.
In the example, the senior programmer spent nearly a day on this. It could have been a type error.
It is not true that stronger type systems, or stronger constraints, are strictly better in all circumstances.
I am not a static types apologist, and prefer to write stuff in Lispy languages, which are often quite dynamic, but I will never concede that two parameters that have a dependency on each other, as in a pair such as (socket type, address), are best represented by two arbitrary strings instead of an enumeration carrying a constrained value. That’s an absurd thing to argue. You’ve created an infinitely large set of potential inputs and told the programmer “Don’t worry! We’ll tell you if you’re wrong when you run the program.” How silly, especially when that state space can be reduced significantly, and the usage checked by the compiler we’re already paying for.
In the example, the senior programmer spent nearly a day on this. It could have been a type error.
Yep, that’s definitely toil, and it definitely could have been prevented by the compiler. But what about the toil that those type system features would bring to every other aspect of programming in the language? How do you measure that, and how do you weigh it? And not in the abstract, either, but in the specific context of Go as it exists today. Honest questions. I don’t know! But I do know it’s important to do, if you want a true reckoning of the calculus.
Of course, this could also have been caught and prevented by a unit test. Or an integration test. Or during code review. Or by using an architectural model that was a better fit to the problem. There are many tools available to software engineering teams to mitigate risks, and each tool carries its own costs and benefits as it is applied in different contexts. If the game is mitigating risk — which it is — then not everything needs to be solved at the language level.
—
I will never concede that two parameters that have a dependency on each other, as in a pair such as (socket type, address), are best represented by two arbitrary strings instead of an enumeration carrying a constrained value.
Sure, that sounds good to me, but it’s much more abstract than the specific claim made in the article, which is that
This would certainly convert a class of bugs that are currently runtime errors into compile time errors. But it would also make every use of net.Dial substantially more laborious. Is the benefit worth the cost? I don’t think the answer is obviously yes or no.
I don’t think it’s possible to have a clear answer to this—but FWIW, my experience is that in ecosystems where unioned string literal types are deployed widely (notably in TypeScript), there is effectively no meaningful dissent about whether that solution is strictly better than naked-string-only typing, in every practical measurable dimension. It makes docs better, it makes tooling and autocomplete better, it seems to prevent bugs, people generally just seem to like using it, and there are no appreciable downsides in practice (compile time impact is negligible, no implications for generics, etc.).
I understand that in Go this does not quite pass muster, because Go’s methodology for evaluating language features implicitly (and significantly) discounts what most other languages would call “ergonomic” improvements. Other ecosystems are willing to rely on intuition for deciding what sorts of day-to-day activities are worth improving, and in those ecosystems, it is harder to argue that we should not make better type amenities around the most important data type in the field of CS (i.e., the string) for things that people do all the time (e.g., pass in “stringly-typed” arguments to functions), especially when there are no significant downsides (e.g., compilation speed, design subtlety, etc.).
You’re gonna pass arbitrary strings to that syscall?
To the (userspace) connection server, which decides how to best reach the address specified, and returns an appropriate error if the caller provides garbage – yes. Why not? Having one place that does all the work shared across the system makes it easy to have a single, consistent, interface with one location to search for bugs.
There are, of course, a few asterisks: for example, the dns resolver needs to be taught about how to handle AAAA records, 6in4 tunnels need to know how to encapsulate, etc – but the programs that need this knowlege are generally the programs that provide the userspace APIs.
No. Create a constraint on the interface that ensures the caller isn’t an idiot.
Opacity is the strictest possible constraint on an interface. You may do nothing with the data, other than pass it on to something else. If the caller may not do anything, then the caller will not do anything stupid.
Opacity is the strictest possible constraint on an interface. You may do nothing with the data, other than pass it on to something else. If the caller may not do anything, then the caller will not do anything stupid.
Ok? But it’s not at all opaque in these examples. The two parameters actually relate to each other….
For the example, its actually really useful to allow a program to listen on either a unix socket or a tcp socket just by a command line flag. The string arg is useful when passed through to users.
Yes. The fact that an address is a mostly-opaque string to programs allowed plan 9 to add support for ipv6, without touching userspace networking APIs at all.
Often, the best thing your program can do is treat data the user passes it as an opaque blob, for as much of its travel through your code as possible.
Yeah I would call that a form of polymorphism, just like file descriptors.
Unix could have had different data types for disk files, pipes, and sockets, instead of using integers, but then you would lose the polymorphism. You wouldn’t be able to write
select()
, and shell couldn’t have redirects.It also same caveats because there are some things you can do on disk files that you can’t do on the others (
seek()
).Still, given the constraints of C, the untyped design is better.
This relates to my comment about textual protocols in Unix: https://lobste.rs/s/vl9o4z/case_against_text_protocols#c_wsdhsm (which will hopefully appear on my blog in the near future)
Basically text is “untyped” and therefore you get generic operations / polymorphism (copy, diff, merge, etc.).
Types can inhibit composition.
(I know somebody is going to bring up more advanced type systems. I still want to see someone write a useful operating system that way. I’m sure it can be done but it hasn’t so far AFAIK. Probably somewhat because of the expression problem – i.e. you want extensibility in both data types (files, sockets) and operations (read, write, seek) across a stable API boundary. )
“Sir, this is an Arby’s.”
(Go already has this problem. I do not see your point, nor your point about the expression problem, as the number of address types is enumerable, and set in stone..)
The “challenge” is to show me an operating system WITH fine-grained types that does NOT have the O(M*N) explosion of code. That is, show me how it solves the polymorphism problem. Here’s a canonical situation of M data types and N operations, and you can generalize it with more in each dimension (M and N both get bigger):
cat
can work in place ofcp
.netcat
also sort of works, but I think Plan 9 does better because networking is more unified.The connection is:
In other words, types can inhibit composition. Again, not saying it can’t be done, but just that I haven’t seen an OS that addresses this.
Plan 9 is more compositional precisely because it has FEWER types, not more. As mentioned, opaque strings are used as addresses.
(Rich Hickey also has a great talk on this about the Java HTTP Request interface. It’s a type in the Java standard library that inhibits composition and generic operations. Link appreciated from anyone who knows what I’m talking about.)
Kubernetes has a similar issue as far as I can tell:
https://twitter.com/n3wscott/status/1355550715519885314
I’m not familiar with the details, but I used the “predecessor” Borg for many years and it definitely has some composition problems. The Kubenetes ecosystem has a severe O(M * N) code explosion problem. Unix has much less of that, and Plan 9 probably has even less.
Widely mocked diagram: https://twitter.com/QuinnyPig/status/1328689009275535360
Claim: This is an O(M * N) code explosion due to lack of compositionality in Kubernetes.
Also read Ken Thompson’s “sermonette” in his paper on the design of Unix shell:
https://lobste.rs/s/asr9ud/unix_command_language_ken_thompson_1976#c_1phbzz
“Single notion” means ONE TYPE. Now this is taken to an extreme – Unix obviously does have both file descriptors and strings. But it uses those concepts for lots of things that would be modelled as separate types in more naive systems.
Records are types. Unix lacks records. That’s a feature and not a bug. Records should be and ARE layered on top.
In distributed systems, types should be layered on top of untyped byte streams.
(This is going to be on the Oil blog. Many people have problems seeing this because it’s an issue of architecture and not code. It’s a systems design issue.)
The problem is that different instances support different functionality. Seek works on some file descriptors, but not others. Ioctl is a horrific mess.
Even COM was a better solution.
I think Clojure, Made Simple is the talk you’re referring to.
This kind of statement sounds amazing, but has about 10
*
’s next to it pointing out all the caveats. User space code still needed to change to account for new address inputs, otherwise who’s to say that “window.alert()” isn’t a valid address? You’re gonna pass arbitrary strings to that syscall?No. Create a constraint on the interface that ensures the caller isn’t an idiot.
“window.alert()” IS a valid address for a unix domain socket. Maybe other things too.
Unfortunate random string choice. :)
How much risk does that actually mitigate in practice, and how much toil does it create?
It is not true that stronger type systems, or stronger constraints, are strictly better in all circumstances.
In the example, the senior programmer spent nearly a day on this. It could have been a type error.
I am not a static types apologist, and prefer to write stuff in Lispy languages, which are often quite dynamic, but I will never concede that two parameters that have a dependency on each other, as in a pair such as (socket type, address), are best represented by two arbitrary strings instead of an enumeration carrying a constrained value. That’s an absurd thing to argue. You’ve created an infinitely large set of potential inputs and told the programmer “Don’t worry! We’ll tell you if you’re wrong when you run the program.” How silly, especially when that state space can be reduced significantly, and the usage checked by the compiler we’re already paying for.
Yep, that’s definitely toil, and it definitely could have been prevented by the compiler. But what about the toil that those type system features would bring to every other aspect of programming in the language? How do you measure that, and how do you weigh it? And not in the abstract, either, but in the specific context of Go as it exists today. Honest questions. I don’t know! But I do know it’s important to do, if you want a true reckoning of the calculus.
Of course, this could also have been caught and prevented by a unit test. Or an integration test. Or during code review. Or by using an architectural model that was a better fit to the problem. There are many tools available to software engineering teams to mitigate risks, and each tool carries its own costs and benefits as it is applied in different contexts. If the game is mitigating risk — which it is — then not everything needs to be solved at the language level.
—
Sure, that sounds good to me, but it’s much more abstract than the specific claim made in the article, which is that
should become
This would certainly convert a class of bugs that are currently runtime errors into compile time errors. But it would also make every use of net.Dial substantially more laborious. Is the benefit worth the cost? I don’t think the answer is obviously yes or no.
I don’t think it’s possible to have a clear answer to this—but FWIW, my experience is that in ecosystems where unioned string literal types are deployed widely (notably in TypeScript), there is effectively no meaningful dissent about whether that solution is strictly better than naked-
string
-only typing, in every practical measurable dimension. It makes docs better, it makes tooling and autocomplete better, it seems to prevent bugs, people generally just seem to like using it, and there are no appreciable downsides in practice (compile time impact is negligible, no implications for generics, etc.).I understand that in Go this does not quite pass muster, because Go’s methodology for evaluating language features implicitly (and significantly) discounts what most other languages would call “ergonomic” improvements. Other ecosystems are willing to rely on intuition for deciding what sorts of day-to-day activities are worth improving, and in those ecosystems, it is harder to argue that we should not make better type amenities around the most important data type in the field of CS (i.e., the string) for things that people do all the time (e.g., pass in “stringly-typed” arguments to functions), especially when there are no significant downsides (e.g., compilation speed, design subtlety, etc.).
To the (userspace) connection server, which decides how to best reach the address specified, and returns an appropriate error if the caller provides garbage – yes. Why not? Having one place that does all the work shared across the system makes it easy to have a single, consistent, interface with one location to search for bugs.
There are, of course, a few asterisks: for example, the dns resolver needs to be taught about how to handle AAAA records, 6in4 tunnels need to know how to encapsulate, etc – but the programs that need this knowlege are generally the programs that provide the userspace APIs.
Opacity is the strictest possible constraint on an interface. You may do nothing with the data, other than pass it on to something else. If the caller may not do anything, then the caller will not do anything stupid.
Ok? But it’s not at all opaque in these examples. The two parameters actually relate to each other….
Yes. That’s a poor choice in the Go API – it should have been a single opaque blob, instead of two dependent ones. Plan 9 does not make this mistake.
TL:DR: Don’t take API design advice from people who have ignored any kind of progress in language design since the 60ies.