C matched the programming metaphors in UNIX because both occurred contemporaneously. That is why it was such a useful systems programming language, just as PL/1 did for MULTICS.
If we evaluate systems programming languages against current need inside kernel/OS/utilities/libraries, won’t we come up short because the metaphors don’t match well anymore? Sure, we’ve stretched things with extensions to C, but that just makes C suffice for need, and no one will attempt to match metaphors to any new language because it’s shooting at TWO moving targets (the language and what it’s used for).
Perhaps the success of C/UNIX derivatives forestalls their necessary concurrent replacement?
After watching all the language wars play out, I started pushing the idea that a C replacement should keep its style and compatibility as much as possible. The same data types, calling conventions, automatic C FFI, extract to C for its compilers, whatever. Maybe close on the syntax like some competitors were. Then, on top of that, build something better without its shortcomings. Might be better module system, concurrency, macros, live coding a la Lisp/Smalltalk, and so on. It should be something that leverages the experience of those converting that plugs right into the ecosystem getting all its benefits. Plus, allows incremental rewrites of legacy codebases.
Clay and Cyclone at least attempted something like this on safety side. I found another one that did on productivity side I’m submitting Sunday. I don’t see many attempts, though. Most take people far away from C model to then try to do something in a C model system.
I’m keeping an eye on zig.
D.
That’s a quite reasonable concern. Share it as well. Because minimalism has a fundamental advantage in and of itself in composition.
I think what replaces all of C (and all of UNIX) is in like manner such “composable minimalism”. But I’m not convinced that it will be “C like” or “UNIX like” at all, because the metaphors that are incompletely fit in this modern environment.
I greatly enjoyed working in Python for its clarity and focus on writing concise and understandable code, but what with the PEP 572 that is compelling van Rossum to step down as Python BDFL, once can see the limits of how far that can be taken. (He’s very emotional about what I regard as an “overreach”.)
I’ve been wrestling with Rust, attempting to rewrite my earlier kernel work in it in place of C, and it does have definite advantages. However, unlike C and Python, too much is “lost in translation” - code becomes obscure. Gets back to Ken Thompson’s comment in UNIX V5/6 “you are not expected to understand this” in doing his backwards coroutine context switch.
So again we are at a crossroads - we might need a new metaphor, but not have it in sight yet.
You want more low level hardware “involvement”, but wish to have the logic become more densely abstract to deal with complexity. You want greater “stop on a dime” debugging, but also “obviousness” in exposition to avoid much need for awkward comments. I’ve been thinking about AR/ML means to do augmented development as a means to bridging these, but we’ll see.
Far as low-level Rust, you might find this work interesting given it’s about composing abstractions to deal with low-level stuff in embedded. The Tock people are also publishing interesting stuff. For now, I’m not sure if you were having problems due to the language itself, abstractions you were using, or some combination. Rust programmers are still in early exploration stage for that stuff.
There’s also the possibility, which I encourage, of using safe/proven C or assembly for unsafe stuff with safe Rust for the rest that can support better abstraction. By safe/proven, I’m talking types like Cyclone/Clay or automated solvers like Frama-C/SPARK. Even if manual, there’s probably only a small amount of code we’d need specialists for. If doing generic components with reuse, then even that might be reduced. To be clear, I’m just brainstorming here based on all the reuse I’m seeing in Why3-based, CompCert-based and Myreen et al-based work.
re AR/ML. I’ve been thinking of them for verification more than debugging. I just don’t have the depth required to know more past “attempt to apply the generic methods that work on everything to see what each accomplishes.” Monkey try thing, monkey see it work, monkey do more of that. That suggestion isn’t worth a Lobsters or IEEE submission, though. ;)
It always amazes me the amount of blithe ignorance that allows pirate businesses to take off. SPAM started because early IETF/IAC didn’t care about policy, viruses started because uncritical MAC OS/Windows installation/isolation, and key leakage / weak passwords / uncritical AP&DSL NAT boxes allowed mass security attacks as well.
Compiled languages definitely get an advantage out of strong typing and concrete API statement, because the planning of resource utilization allows many execution strategies, data layout, and even caching on code hoisting to be exploited. The more the desire to maximally “use” the hardware architecture, the more these grow to “fit” the hardware.
At the same time dynamic languages/JITs are getting better to fit the abstract expression of the programmer - functional programming can express compactly/accurately/clearly very elaborate programs, irrespective of the intermediate data types/APIs used in constructing them. The idea is to “fit” the nature of abstractions being manipulated rather than the nature of how they are executed.
I’m currently debugging a symbolic configuration mechanism that was prototyped in a week in a dynamic language, but is meant to function in an embedded OS with a very low-level language, as part of a bootstrap. It is taking months to finish, mostly due to adapting the code to work in such a programming environment - you alter the assemblage of primitives to build enough of a virtual machine to handle the semantics of necessary symbol processing. An oddball case, but it’s an example of the two (the virtue of this is that it allows enough “adaptability” at the low-level that you don’t need to drag along the entire dynamic programming environment to serve huge amounts of low-level code that otherwise fits the compiled model perfectly.
Compiled/interpreted and strongly/weakly typed have little to do with each other. Ditto for low/high level: Swift compiles to machine code but good luck maintaining any cache locality with its collections.
Depends on application. And yes we don’t have a good model for cache locality. How much can we get vs complexity to code/maintain?
Can you elaborate on the strengths of the dynamic language that allowed you to prototype it so quickly? The difference in development time stated here is really striking.
Sure. First about the problem - “how do you configure unordered modules while discovering the graph of how they are connected?”. The problem requires multilevel introspection of constructed objects with “temporary” graph assignments in multilevel discovery phase, then successive top down constructor phase with exception feedback.
The symbolic “middle layer” to support this was trivial to write in a language like Python using coroutines/iterators, and one could refactor the topological exception handling mechanism to deal with the corner cases quickly, by use of the annotation methods to handle the cases. So the problem didn’t “fight” the implementation.
While with the lower level compiled language, too much needed to be rewritten each time to deal with an artifact, so in effect the data types and internal API changed to compensate to fit the low-level model. Also, it was too easy to introduce boundary condition “new” errors each time, while the former’s more compact representation that didn’t thrash so much didn’t have this.
Sometimes with low level code, you almost need an expert system to maintain it.
Repetitive, irrelevant, and … pointless.
Much of the man page corpus is just plain wrong. Many changed the code and never bothered to change the documentation. One can easily get misled.
UNIX/POSIX … is getting massively “bit-rotted” in its old age. Time for different metaphors, possibly maintained by ML to keep them effective and relevant?
I run across examples semi-regularly, and try to report upstream when I find them (some upstreams are easier to report to than others). Mostly I’m pretty happy with manual pages, though.
Just recently, I noticed that pngcrush(1) on Debian is missing the newish -ow option. Upstream doesn’t ship a manpage at all, so Debian wrote one, but it doesn’t stay in sync automatically. Therefore I should probably report this to Debian. Upstream does ship a fairly complete pngcrush -v help text though, so I wonder if the manpage could be auto-synced with that.
I’m pretty sure I’ve reported a bunch of others in the past, but the only example that comes to mind at the moment is that privileges(5) on illumos used to be years out of date, missing some newer privileges, but it was fixed fairly quickly after I reported a bug.
I really want to see documentation generated via machine learning systems. I wouldn’t want to use that documentation for anything, but I’d like to see it.
This is an example of “low level” vs “high level” abstractions. One can climb around the path tree picking through files, or aggregate into directory path “bushel’s” , or poke away at a project’s components via an IDL.
These choices seem to be set by how the particular activity was set-up. Perhaps there are better ways to setup activities than optimizing default ways of working them?
Maybe the effort should be spent in asking / answering the question - is there a better way to do this? Rather than just “doing this”?
Maybe. With the slowdown that KPTI incurs, it makes EPYC even more attractive.
Now whether AMD can fab enough to keep up with demand is another question.
Unfortunately AMD historically hasn’t had the management and the stockholder return to take on Fortress Intel. So Intel board hires weasel CEO’s to exploit the situation. Ironically, the tech is more than good enough.
It already is. Across the board 30% hit is fairly common on cloud services. So the hit is worse than say, Apple and it’s battery/clock down issue, but clearly Intel weasels think they can outlast it - what are you going to do, not buy more Intel?
Nicely written and turns the resolution of a bug into a victory of both discovery of the flaw as well as an deeper appreciation of the design of the language and its advantages of writing solid code with it.
Having written/debugged much C, a very dangerous language for memory structures, examples like this reach to the heart of how to step beyond the theory of use of a language to the practice of use that illustrates both the risks and rewards of a language that improves upon C.
Some people want easy access to the benefits of containerization such as: resource limits, network isolation, privsep, capabilities, etc. Docker is one system that makes that all relatively easy to configure, and utilize.
Docker is one system that makes me wish Solaris Zones took off, which had all of that, but without the VM.
Docker hasn’t used LXC on Linux in a while. It uses its own libcontainer which sets up the Linux namespaces and cgroups.
This is the correct answer. It’s a silly question. Docker has nothing to do with fat binaries. It’s all about creating containers for security purposes. That’s it. It’s about security. You can’t have security with a bunch of fat binaries unless you use a custom jail, and jails are complicated to configure. You have to do it manually for each one. Containers just work.
security
That is definitely not why I use it. I use it for managing many projects (go, python, php, rails, emberjs, etc) with many different dependencies. Docker makes managing all this in development very easy and organized.
I don’t use it thinking I’m getting any added security.
I don’t use it thinking I’m getting any added security.
The question was “Why would anyone choose Docker over fat binaries?”
You could use fat binaries of the AppImage variety to get the same, and probably better organization.
Maybe if AppImages could be automatically restricted with firejail-type stuff they would be equivalent. I just haven’t seen many developers making their apps that way. Containers let you deal with apps that don’t create AppImages.
Interesting. So in effect you wish to “scope” portions for “protected” or “limited” use in a “fat binarie”. As opposed to the wide open scope implicit in static linking?
So we have symbol resolution by simply satisfying an external, resolution by explicit dynamic binding (dynload call), or chains of these connected together? These are all the cases, right?
We’d get the static cases handled via the linker, and the dynamic cases through either the dynamic loading functions or possibly wrapping the mmap calls they use.
That sounds genuine.
So I get that its one place, already working, to put all the parts in one place. I buy that.
So in this case, it’s not so much Docker as Docker, as it is a means to an end. This answers my question well, thank you. Any arguments to the contrary with this? Please?
This answers my question well, thank you. Any arguments to the contrary with this? Please?
While I think @adamrt is genuine, I’m interested in seeing how it pans out over the long run. My, limited, experience with Docker has been:
I suspect the last point is going to lead to many “we have this thing that runs but don’t know how to make it again so just don’t touch it and let’s invest in not touching” situations. People that are thoughtful and make conscious decisions will love containers. People inheriting someone’s lack of thoughtfulness are going to be miserable. But time will tell.
Well these aren’t arguments to the contrary but accurate issues with Docker that I can confirm as well. Thank you for detailing them.
I think there’s something more to it than that. On Solaris and SmartOS, you can have security/isolation with either approach. Individual binaries have privileges, or you can use Zones (a container technology). Isolating a fat binary using ppriv is if anything less complicated to configure than Zones. Yet people still use Zones…
I thought it was about better managing infrastructure. Docker itself runs on binary blobs of priveleged or kernel code IIRC (dont use it). When I pointed out its TCB, most people talking about it on HN told me they really used it for management and deployment benefits. There was also a slideshow a year or two ago showing security issues in lots of deployments.
What’s the current state in security versus VM’s on something like Xen or a separation kernel like LynxSecure or INTEGRITY-178B?
Correct. It is unclear the compartmentalization aspect of containers to security specially.
I’ve implemented TSEC Orange Book Class B2/B3 systems with labelling, and worked with Class A hardware systems that had provable security at the memory cycle level. Even these had intrusion evaluations that didn’t close, but at least the models showed the bright line of where the actual value of security was delivered, as opposed to a loose, vague concept of security present as a defense here of security.
FWIW, what the actual objective that the framers of that security model was, was program verifiable object oriented programming model to limit information leakage in programming environments that let programs “leak” trusted information to trusted channels.
You can embed crypto objects inside an executable container and that would deliver a better security model w/o additional containers, because then you deal with issues involving key distribution w/o having the additional leakage of the intervening loss of the additional intracontainer references that are necessary for same.
So again I’m looking for where’s the beef instead of the existing marketing buzz that makes people feel good/scure because they use the stuff that’s cool of the moment. I’m all ears for a good argument for all this things, I really am, … but I’m not hearing it yet.
Thanks to Lobsters, I already met people that worked in capability companies such as that behind KeyKOS and E. Then, heard from one from SecureWare who had eye opening information. Now, someone that worked on the MLS systems I’ve been studying a long time. I wonder if it was SCOMP/STOP, GEMSOS, or LOCK since your memory cycle statement is ambiguous. I’m thinking STOP at least once since you said B3. Do send me an email to address in my profile as I rarely meet folks knowledgeable about high-assurance security period much less that worked on systems I’ve studied for a long time at a distance. I stay overloaded but I’ll try to squeeze some time in my schedule for those discussions esp on old versus current.
thought it was about better managing infrastructure.
I mean, yes, it does that as well, and you’re right, a lot of people use it just for that purpose.
However, you can also manage infrastructure quite well without containers by using something like Ansible to manage and deploy your services without overhead.
So what’s the benefit of Docker over that approach? Well… I think it’s security through isolation, and not much else.
Docker itself runs on binary blobs of priveleged or kernel code IIRC (dont use it).
Yes, but that’s where capabilities kicks in. In Docker you can run a process as root and still restrict its abilities.
Edit: if you’re referring to the dockerd daemon which runs as root, well, yes, that is a concern, and some people, like Jessie Frazelle, hack together stuff to get “rootless container” setups.
When I pointed out its TCB, most people talking about it on HN told me they really used it for management and deployment benefits. There was also a slideshow a year or two ago showing security issues in lots of deployments.
Like any security tool, there’s ways of misusing it / doing it wrong, I’m sure.
According to Jessie Frazelle, Linux containers are not designed to be secure: https://blog.jessfraz.com/post/containers-zones-jails-vms/
Secure container solutions existed long before Linux containers, such as Solaris Zones and FreeBSD Jails yet there wasn’t a container revolution.
If you believe @bcantrill, he claims that the container revolution is driven by developers being faster, not necessarily more secure.
According to Jessie Frazelle, Linux containers are not designed to be secure:
Out of context it sounds to me like you’re saying “containers are not secure”, which is not what Jessie was saying.
In context, to someone who read the entire post, it was more like, “Linux containers are not all-in-one solutions like FreeBSD jails, and because they consist of components that must be properly put together, it is possible that they can be put together incorrectly in an insecure manner.”
Oh sure, I agree with that.
Secure container solutions existed long before Linux containers, such as Solaris Zones and FreeBSD Jails yet there wasn’t a container revolution.
That has exactly nothing (?) to do with the conversation? Ask FreeBSD why people aren’t using it as much as linux, but leave that convo for a different thread.
That has exactly nothing (?) to do with the conversation?
I’m not sure how the secure part has nothing to do with the conversation since the comment this is responding to is you saying that security is the reason people use containers/Docker on Linux. I understood that as you implying that was the game change. My experience is that it has nothing to do with security, it’s about developer experience. I pointed to FreeBSD and Solaris as examples of technologies that had secure containers long ago, but they did not have a great developer story. So I think your believe that security is the driver for adoption is incorrect.
Yes. Agree not to discuss more on this thread, … but … jails both too powerful and not enough at the same time.
Generally when you add complexity to any system, you decrease its scope of security, because you’ve increased the footprint that can be attacked.
I’m a big fan of having data in files, where you can look at their contents or replace or alter them in a pinch to solve a problem. Where the operating system can be at least somewhat aware of the structure of the data, so that we can observe and instrument an application with OS-level tools without needing to trust whatever debugging facilities are able to run inside the process itself. There are lots of occasions where I want to make use of facilities provided by another process alongside, e.g., cron or rsyslogd. Once you have at least two thoroughly different programs in the mix, the whole “fat binary” approach doesn’t really help anyway.
I really don’t buy the “everything must be in the one binary!” approach at all, regardless of how trendy it is in Go and Rust at the moment. If it works for you, I suppose that’s great – but some of us will continue along the less windswept and interesting path of incrementally improving the systems that already work very well for us and countless others.
I could build into a “fat binary” a FUSE-like filesystem and you could integrate it like same, so I’d like to understand your objection better. Is it convenience, taste, … or operational need.
A long time ago I had to move a comprehensive system from OS/360 under MVS, and a variant running under Multics … to an early UNIX system. There were tons of dependencies on different OS features/capabilities not then present on UNIX. Eventually found that all of them were distractions, some quite costly, that I remedied with a “fat binary”, because that was the only thing possible at the time.
The experience left me wary of arbitrary OS abstractions that in the end did not pass muster. I intentionally left out shared libraries from an OS I did, because it did not benefit enough for the added complexity that it added.
Why would I want to reinvent the file system that I already have, which works just fine, inside a program?
I understand that shared libraries are a minefield of nuance and are difficult to get right, which is why they often get left out of new projects (e.g., Go) in the early years. Even Go seems to be getting some sort of nascent shared library support, though: see plugins.
On an established system where we’re already able to use them to great effect, I really see no reason to stop. As with the file system: we’ve built it already, and it works, and we’ll keep improving it. I’m really not at all worried about some hypothetical future in which we’re all suddenly forced to throw out the baby, the bath water, and the bath itself.
Not to mention the security implications. If there’s a security problem in a library, you can update that library. For Rust/Go apps, you need to update the dependency and recompile and redistribute that application.
There was a researcher at Ruxcon years ago who was studying embedded libraries in C/C++ projects. An excellent example is Firefox, which doesn’t link to a lot of system libraries, but has its own embedded JPEG and PNG decoders. FF is kinda insane because it’s pretty much it’s own little operating system in a lot of ways.
It’s a tough balance to strike sometimes. If you’re trying to ship software that will run on lots of systems, but you need to depend on things which don’t really promise a stable interface, sometimes you have no choice but to bundle a private copy.
You misunderstand. No reinvention is required, one can redirect the kernel to perform the filesystem within an embedded object that is part of an existing container, namely the executable. And this was a “for example” to deal with your need for “data in files” call out. Please don’t obsess on the leaves instead of the forest being discussed.
The direction being argued here is why do we have some much crap in OS/kernel in the first place. Understand that many just make use of what happens to be there, that’s fine, we all need to get shit done.
But shared objects create contention for multi-threaded, multi-core systems - they add complexity and reduce the benefits of parallelism and fault-tolerance. So if one aspires to 100x cores/threads/parallelism … we don’t want to spend resource uselessly for abstractions that subtract for little/no gain.
So back to what I asked you - what are the objections to “everything in the existing container” other than “that’s not what I do right now”? I don’t find all the overhead for additional containers justified by that.
I don’t “misunderstand”, I just think you’re talking out of your hat.
What do you mean shared objects create contention? Once the initial relocation is performed, subsequent calls to functions are generally just calls. It’s not like there’s a mutex in the path of every call to a shared library.
Also, the idea that you can just “redirect the kernel” to store all your files inside an ELF object seems like a stretch at best. What if you need to append to or modify one of those files? Why take on the complexity of something like FUSE to paper over the deficiency of requiring everything to be jammed into the executable, when you could have otherwise just called fopen()?
It’s true that operating systems are larger and more complicated than they used to be, but that’s because they used to be bloody awful. They used to panic instead of failing a system call. They used to need to operate on machines that had a single processor and I/O bus, and a flat memory architecture. Machines now are hugely more complicated themselves, and modern operating systems reflect our best efforts to provide a substrate for a wide variety of applications.
I’d rather continue to improve on all of the work done already, where we already have some amazing tools and a lot of accumulated experience, rather than pretend that reinventing everything is a good engineering decision.
At work, I wrote a program in using Lua. To simplify the installation of the program, I embedded all the Lua modules (both written in C and Lua) into the executable. All I had to do then was extend the normal Lua method of loading modules to include looking inside the executable for them (not hard to do—Lua has hooks for doing just that). That way, we only have one file to install in production, instead of a dozen or so. I don’t need the ability to modify those files, so in this case, it works.
You got me about the shared libraries though.
Yup, do this with statically linked modules and an interpreter (not Lua but similar). Works great, for all the same reasons.
Do you understand the term “shared”? That means multiple processes/threads “share” it. As opposed to a “shared nothing” environment, which is entirely impermeable.
In a true shared library system, the same library file is mmap’ed content in all the MMU’s of all using processes/threads. Sure the data is copy on modify, but all the support for the shared abstraction involves hardware/software to maintain this abstraction, which isn’t for free. And yes you can have re-entrent sitations with libraries, like I/O and event handing, where you do need to have the code able to anticipate these issues.
Many of the early multicore systems had tons of esoteric bugs of this sort, which is why we had global locks to “safe” the programming environment.
The kernel has virtual filesystem interfaces that express the semantics of a filesystem implementation, where you can redirect the functionality elsewhere (other hosts via RPC, user processes via library reflection/exception). With it one can embed the content of a filesystem inside an executable container in various ways. And I didn’t say ELF either. (Note that one can shortcut content transfer various ways so its even faster than going through the kernel.)
Your argument is presumptive that something is being “papered over” - it’s actually more optimal because there is less code for the common case of the subsystem being within the address space for the references needed by the microservice it is implementing. In net far simpler than what is being done today.
Operating systems have a much larger scope of what they have to contend with, that’s why they all tend to “bit rot”, because everyone wants to keep their bit of crap in it, human nature.
I didn’t ask what you liked, I asked to defend the need for something. You act as if I’m stealing your dog or something. I doubt you have a clue to anything I’m talking about, and just want to keep obfuscating the discussion so as to hide you lack of understanding of why you use containers, because … you just use them.
FIne. Use them. But you don’t know why, and you don’t want to do anything but distract from this fact by focusing on a red herring that was served up as a means to start a discussion. Die on that hill if you must, but you still aren’t responsive to my inquiry.
Hello person relatively new to lobste.rs, just a comment on your how you communicate:
In this thread you seem to make a vague and poorly specified claim, like put a FUSE filesystem in your binary or same thing about how the named shared objects means they are shared so things have to be more complicated and error prone. I believe that your vague comments do not help the discussion. While I do not think @jclulow is doing himself a service by responding to you, I to assumed you meant an ELF binary with some crazy FUSE setup.
It’s really hard for me to tell if you have a good idea or are just being contrarian given that you aren’t being more specific in your proposal. We’re left making assumptions about what you mean (which is actually our fault) but rather than clarify how our assumptions are incorrect you are seeming to use it as a way to be smug. In particular:
Do you understand the term “shared”? That means multiple processes/threads “share” it. As opposed to a “shared nothing” environment, which is entirely impermeable.
and
Please don’t obsess on the leaves instead of the forest being discussed.
I think this discussion would be much more productive if you could specify what you are proposing. I’m quite interested in it finding out.
Feel free to completely disregard this comment if you think it’s full of shit.
Even if there is less code running using this method, that does not make the code less potentially buggy. Any OS you’ll likely deploy on will have more tested and proven code than what you or your small team can produce.
Less depedance on third parties can be good, but up to a certain to a certain point.
I believe, by the way, that Docker at least shares the filesystem layers between similar containers. Not sure how well that works, and if then binaries are also still shared in memory.
Why would I want to reinvent the file system that I already have, which works just fine, inside a program?
Why don’t we put sqlite into the kernel?
Invention is not the issue. You would use well known libraries in your program instead of developing stuff yourself.
I picked sqlite as an example, because it considers itself a competitor to the open syscall and thus related the file systems. Similarly compression and encryption is file system related. Why can’t my kernel treat zip files as directories? Instead some GUIs reinvented that while bash/zsh cannot do that.
Could you elaborate on what your response means? The proposed solution is to build a fat binary that includes a file system in it and to use FUSE to interact with this fat binary. What does that have to do with sqlite in the kernel? To me, the FUSE suggestion seems like a huge hack to get around using kernel primitives for no particularly good reason other than one can, as far as I can tell. I’m not even really sure what it would mean to start editing the binary given one’s usual expectations on binaries.
It is a balancing act, what should be put where. Sometimes it makes sense to put the file system into the binary. Sometimes to put the file system into the kernel. Sometimes to put the file system into a daemon (microkernels). For example I have heard that some databases run directly on block devices because using the file system is slower.
Jclulow is “a big fan of having data in files” because the default tools of the operating system can then be used to inspect and change the data. To pursue that way means we should extend the capabilities of the OS, for example by putting sqlite into the kernel. Then default tools of the operating system can then be used to inspect and change the data. I now think that zip file support is actually the better example. You could use ls, grep, find, and awk on zip file contents. It seems to be available via FUSE. It does not seem to be a popular option though. Why? Genuine question but I guess there are good reasons.
I do not consider the Unix philosophy that great and it seems to be underlying this discussion (Just use simple files!). Unix likes to compose via files, pipes, and processes (leading to containers). Others prefer to compose via data types, functions, and libraries (leading to fat binaries). I do not see inherent advantages in either of the approaches.
I don’t see the connection to the Unix philosophy here. In Windows it’s common for executables and their configurations to be separate as well. I’m trying to understand exactly what is being proposed but struggling. Are you saying that when I build nginx, the nginx executable has all of the resources it will use including config, and any code for dynamic content it will generate?
I think for microservices and small programs, fat binaries are arguably better(or at least no worse) than using a container with shared libraries, extra files, etc. But like @jclulow said, once you move past small programs/microservices, then having it all shoved into a single binary will only cause pain. Especially if you have/need multiple programs for some reason, say syslog, linkerd, a watcher process to restart or some other helper app, etc. Then suddenly containers start to be arguably better than fat binaries. Getting multiple programs into a binary is do-able, things like busybox do it, but it definitely complicates things unnecessarily, when there is little to no need. If your program has lots of data files, a database or other static data and shoving that into a binary starts to seem.. less than wise. We have a perfectly good OS and filesystem, that works reliably.
Is Docker all that and a box of chocolates, definitely not, it has it’s use cases. Shoving Go(or other fat binary) into Docker makes little sense, of that I agree.
How specifically “does it cause pain”? How specifically are “containers arguably better”? Just because someone says its so? What happens when they say something different/conflicting?
Feels vs reals?
Using Python as a specific example, but other VM based languages tend to have similar pains in my experience.
Pain: Things like Python for example tend to not do well when shoved into fat binaries, other VM based languages are the same. PyInstaller(the app that shoves things into fat binaries) for example still doesn’t support Python3.6. plus my other examples I think are fairly specific, what part would you like more explanation on?
Containers are arguably better because they are a lot easier to reason about and get all dependencies together. Yes things like Virtualenv exist, but building C libs into Virtualenv’s is not the easiest. It’s much easier to use system package managers/libraries for C libraries and venv’s for python code. Or just use a container and shove all the various dependencies into that, so you get to contain everything you need with the app while also being lazy and using system libraries, python libs, etc when using a container you don’t need to worry about the venv/building C library headaches, you can shove that responsibility off to your system package manager, and still contain 1 application from another. The alternative is something like Nix/Guix, but last I played with them they were not really ready for prime-time.
I started my entire thing with ‘I think’, so clearly it’s opinion, not fact. If you decided to take it as fact, I worry about your reading comprehension. I’d also worry if you decided to take the linked article as fact.
As for different/conflicting views, I welcome them! It helps me learn and re-think my attitudes and opinions. How about you?
How specifically are containers arguably worse (which I assume is the position you are taking)?
Excellent response, thank you. (Python is also important to me, and understand the difficulties in doing a “pip install” into a static binary w/o needing PyInstaller - already have a nested filesystem. Consider this not to be a problem for this discussion.) Please explain further any of your other examples you decide needs greater scope than just with what I’ve described here, as I’ve like to hear them.
(I’m beginning to think that its just poor support for doing useful things with static binaries that might be at the heart of creating new containers - one adds to entropy because it’s easier to do a “clean sheet” that way, without regard for messing with peoples dependence on the past.)
I’ve used Virtual environments to encompass multiple development environments with limited effect. You’re right, C’s too messy to fit that model, although pure Python development its good enough. Package managers always seem to be “work in progress”, where things mostly work, but then you trip across something undone, under done, or flat out wrong, so you spend too much time having to debug someone else’s poorly documented code. Yes I didn’t care much for Nix either. I guess the problem with all of these is you have to rely on others to maintain what you’ll depend on, and so if it isn’t closely related to your own tool base / shell utils, its just too much pain for too little gain. Is that about right?
Opinions aren’t bad, they just help more if there’s some collateral to justify them. I realize that takes effort, but I do appreciate it when you take the effort. (Also, when it doesn’t appear to bruise egos as my remarks seem to do some - not what I’m after in contributing to this community by challenging opinions.)
Haven’t taken anything as fact from the linked article. Like many articles, it’s a bit conclusory and absurd, but it does “edge onto” an interesting area. (BTW am no fan of GO and I think Rob Pike should have his head examined.) My “agenda” is one is more compact, less fragile, more obvious, deployable Python distributed applications where N > 10,000 and I can change the OS/kernel to do this the best way w/o any involvement of anyone else. I like my stuff.
Thank you for your inclusory mind set, I share that aim, and I’d like to encourage your trust in your genuine expression. If what I’m speaking to doesn’t work for you, I’d just like to understand it better, because I’m sure what you’re after is what I’m after to, once I understand it. Don’t want to waste anyone’s time with noise.
Not talking the position that containers are arguably worse. Just pushing back with “wait, is this really doing what I want, what baggage is it bringing along, and why do I beat my head on this thing when I didn’t before”. So just some casual skepticism, where I’m willing to explore other approaches industriously to check out a hypothesis. (Like building a filesystem into a static executable container just to see that I can do a pip install inside it.)
So when I make Docker containers, I find that they are difficult to bound with content required by various packages. You’ll end up with things that mostly work, but the exceptions/omissions are often hard to find. If one builds a regression framework to prove a container’s scope of use/function/capability, one seems to spend as much time maintaining the regression framework as one does the container itself. (With the static binary, the issue becomes more of the scope of path names, for that you can chroot/jail and catch the exception and do a fixup.)
Docker containers thus get rebuilt a lot, which is overly complex to that of a static executable. Also, it’s easier to trace/profile a static executable to get a map of where time/memory is used in a fine grained way - with containers its much more hit/miss, and most of the container’s I’ve inherited from others seem to contain much unused portions as well as obscure additions that are left in “just because they seem to be needed somehow”. These may be insignificant, … but how do you then know the scope of what the container will do then?
Then, there’s the sudden spikes in memory/storage usage that exceeds the container’s size/resources. For a static binary, one can more easily backtrack memory allocations to find the determinism of “why?”
Finally, when you want to change code to dynamically shift relocation addresses to foil injection attacks, it’s really simple to do so by relinking the static binary as a single, atomic operation. Doing such with Docker is fraught with surprises, as sometimes you discover dependencies within the libraries that are in part set off by the artifacts of how the libraries are dynamically linked i.e. ordering/assignment. Not to mention debugging this to find these surprises.
Hope this isn’t TL;DR.
Containers(i.e. docker) are very well defined, so I don’t see these issues you speak of. Perhaps they come hither and yonder when doing funky things like requiring external disks, etc.
Why try to push a static filesystem into a static binary when chroot and/or containers do that for you? That’s sort of the whole point.
As for memory usage of Docker, Docker does have a horrible case of no resource limits by default, but you can definitely force them on. Hashicorp Nomad for example does this by default with Docker containers. If Kubernetes/Marathon/etc don’t do this, that’s kind of sad.
There are of course surprises when dependency management hits you on the head, but if you limit yourself as much as possible to the OS level dependencies(i.e. dpkg/yum/rpm), especially for C level code, then you shoot yourself in the foot a lot less when dealing with these problems.
We haven’t really covered security here, and Docker IN THEORY gives you better security, but it’s sort of laughable now to claim that it actually does give you better security, especially with Docker set to default values. Jails and Zones definitely give you better security, and I’d like to think Docker will get there.. eventually, but I’m not holding my breath. It’s hard to bolt security on after the fact.
Shoving Go(or other fat binary) into Docker makes little sense, of that I agree.
There are use cases for this. There is a base Go docker image that you can pull into your CI for building/distributing your application. If you use some type of scheduling system (DC/OS with Marathon or Kubernetes), you can then easily cluster that Go app.
There are lots of different use cases for containers and they can be used to solve a lot of problems … and introduce new ones, like not having security update checks for libraries within the containers.
Using Docker to build go apps makes some sense, you want your build environments to be well defined and that’s something containers give you. If you are of the mindset to deploy everything via Marathon/Kubernetes I could see a use case for hiding it in a Docker image just to make deployment easier. I’d argue that’s one of the good parts of Hashicorp Nomad, it supports exec as well as docker, so you can run anything, not only docker containers.
You put your “fat binary” in Docker to control the resources that the process is allowed to make use of. Docker isn’t a replacement for binaries. The question here is akin to asking, “Why would anyone choose Docker over chroot?”
Because I can build a durable object that can be singularly managed instead of a collection that doesn’t have a well defined boundary.
Why I build something in a container. It is one thing to contend with. But I’ve also done this with static binaries too, on systems w/o cgroups and the like. Is it a lack of cgroups/tools that currently cause static binaries to fail at the same thing?
It’s because for 10+ years we were told that dynamic linking is the only sensible way. But we need static linking because everything is so much easier that way. Docker is just a way to appease this dissonance, plus some features (you gotta have features!) on top.
Don’t buy this. At Sun when they (among others) did shared libraries, which was done to allow the growth of libraries/api’s at a time of extreme memory constraints - the idea being that less redundant portions of memory would supply more processes in memory not swapped with access to the same content - kind of a hamburger helper for VM.
At Tandem computers with shared nothing clusters adapted to SMP UNIX variant, the fragility of this arrangement was huge, as the rest of the industry found. It’s easy to have a mess of conflicting versions and nested dependencies that are hard to sort out statically, that are immediately found with static binarys - e.g. the dynamic postpones too much the semantics of all of the bindings, at least the scope of the problem is less with the static binaries. This didn’t help fault tolerance but could be worked around.
In using Docker, one creates a “static world” for dynamic things to live within, so it helpfully bounds the problem. However, when you try to track down all the dependencies/library names/paths/configuration files … trying to make the smallest Docker containers with the most resilience, you end up with a larger NP complete issue than before,
Now I know everyone just likes to make things work that they are comfortable with, so one accepts this as a norm. That’s how we get into “normalization of deviance” - it becomes the new normal as accepted.
Having been down this path with dozens of excellent for their time OS’s, they’re like tribbles eating grain, the excess bulk builds up until it kills them. Entropy, of a sort, enters, grows, … and never leaves.
When I saw UNIX the first time, I couldn’t understand it at first because it was so dense, so few shibboleths, where I’d gotten used to them. It was too concentrated/useful. Because they didn’t have any “space” to tolerate things that didn’t have a definite role. Where the many paths of Multics (“all things to all people”) could only be “one thing to one person” UNIX. So I get the fear of taking away someone’s toy.
But if we are “staticfying” a container again, to bound … why not simplify the contents, where we might be able to prove more easily the deterministic scope/size/security within a smaller, simpler thing to begin with.
And also to be fair … perhaps these addition mechanisms I’m challenging still have a benefit above the reductionism I’ve described. I’m open to that. If so, what is it other than “I’m used to it, don’t want my world to change in any way, stop scaring me and making me feel small about the tools I already love”?
Sorry if I piss you off.
Docker (and common practices of its uses) mix several concerns: virtualization (including virtual network), dynamic linking, dependency management, versioning, build system, init-like process supervising. This results in complexity and bloat.
Many people really need only dependency management for dynamically linked libraries. Another alternative to statically linking and fat binaries is Nix, which only manages dependencies (including external binaries like sed and awk mentioned in post) and is like (mentioned in post) bundler and virtualenv, but system-level. Unfortunately, it’s somewhat hard to use and has strange configuration language.
Yes, I’ve noticed this as well. And it’s sensible to use a less cryptic tool. Mostly I’m interested not in use cases but underlying architecture, so not a critic of Docket but an iconoclast over unnecessary complexity in an already too conflicted computer systems architecture.
System library bindings or dependency bindings aren’t the only kind of application dependencies. Many (most?) applications also have system level dependencies, e.g. sendmail, specific versions of a language runtime, specific versions of imagemagick, organisation specific custom applications, legacy configuration details etc. It’s not clear to me what the author’s answer to this problem is.
He also glosses over what I consider to be one of docker’s biggest advantages: simplifying development environment dependency. Docker as a simpler and faster development environment tool is unmatched. After all, at some point you have to build that static binary from it’s constituent parts, most often in different environments (CI, development machines).
Specifically what dependencies? If I go and build say separate images of each of these as isolated intermediate files, then link them (sort of like busybox), what is the net benefit to be “simpler and faster” with Docker create?
Original UNIX was around printing terminals. Bell Labs Columbus, later Bellcore, did a virtual terminal interface, Berkeley did the termcap library, TERM, and process suspend because Western Electric postponed indefinitely.
It was a total screw-up that was never supposed to happen. One could detect the common cases and reflect this through libraries. Fully agree with andyc on the need to capture all of it in one go.
The problem is in proving all the interpreters - as a character based emulation of terminals mapped onto various existing terminal programs, they’re actually very small. Things like xterm grow a lot more due to extensive functionality beyond that of just the terminal. Such functionality might not map into all apps too well, so where to put it becomes an issue.
The architecture here reminds of datakit and the IBM 3270 terminal emulator - rather extensive. I’ve been thinking about a form of annotating apps/utilities that would enclose things like this as a general case.
In that case, the container for the process would match the terminal/other protocols and translate them into interactions with the container, mapping a library dynamically and modifying the window from IPC API calls instead of through the pty, putting the burden of the terminal emulators of needing to be rewritten and abstracting the terminal/protocol specific into discrete libraries, which might be abstracted from the termcap itself, which is also where you’d get (in the reverse direction) the way to autodetect which one it was …
The comparison to datakit is indeed quite reasonable, and much more so for the state of the main project a few years ago (ok, 2007,2008 something) as everything did run over pipes. I tried pretty much every form of IPC possible before going with shared memory due to all the possible pitfalls but all in all, finding something that worked on OSX, Windows and Linux with the least amount of copies, OS specific code, few unique syscalls and being GPU- friendly didn’t leave many options (isolating ffmpeg- libraries was my biggest priority at that stage).
Termcap was a total screw-up? Or the whole terminal thing in Unix? And what does Western Electric have to do with any of this?
A total screw up in that it took place at a time before open source when the communities didn’t interact intentionally. So the partially completed abstractions didn’t “mix” - the virtual terminal didn’t adapt to termcap/terminfo at a time when the two could have reinforced, so instead you didn’t get a proper layering in the OS/kernel, instead termcap/terminfo got embellished to serve ad-hoc need.
This isn’t about debugging or UNIX for that matter. It’s about “normalization of deviance”, that one gets used to an incomplete, unfinished tool/feature/system, come to rely on it, and then something doesn’t work and isn’t transparent about “why”.
Reminds of Ken Thompson’s “ed” editor’s sole error message - “?”. Because you should be able to figure out why on your own.
So he’s right that UNIX is filled with the issue from the beginning, but he’s a bit obscure about the connection to the debugger. gdb is no different than ton’s of similar debuggers I’ve used for decades on systems unrelated to UNIX as well, so don’t blame it.
If something refuses to function, there needs to be a means to say “hey, this part over here needs attention”. In this case, it’s like an uncaught exception, which is shorthand for “figure it out yourself”, i.e. back to Ken Thompson’s world as above.
These things happen because someone doesn’t have the time or interest to finish the job. All kinds of software are left similarly uncompleted, because it works well enough to get the immediate job done, and the rest is quickly forgotten. Hardware examples abound, the the Pentium floating point errors, and like variants.
I’m reminded of this presentation, aimed at language (in a very broad sense) designers about how to make better error messages.
Consider Python as a means to write good code. Perhaps it may be difficult to decide what “writing good code” means in say a Dropbox environment, verses other environments, and since one has obvious human limits it may be impossible to step back from that.
More to the point with PEP 572, perhaps a threshold issue of how far to take a concept, and the manner of “selling” such to a much broader audience of Python programmers given the pattern suggested. Reminds of those I’ve seen who cross to another discipline, and then return later - they aren’t even aware of the perception changes that others notice, and the nuance others observe in scope of behavior.