1. 6

    You might not need Docker, either? NixOS!

    1. 4

      On Atlassian Marketplace, we use both!

      1. 3

        I would like to read more about creating your own derivations. Preferably with some good examples of nontrivial stuff, such as adding static assets and configuration files.

        1. 4

          Not sure why static assets and configuration files are non trivial? But sure, I really want to write about nixos tbh.

      1. 3

        I’ve gotta wonder how much more effective their NN is than the dirt simple idea of using the lower half of the binarized image and correcting to keep the white on either side roughly balanced…

        1. 1

          Agreed, but probably educational.

        1. 2

          Are you using boar, @ac? How does it compare against git-annex? I like the idea of it, but doesn’t look like the project has been worked on much recently.

          The git-annex pages do mention boar, but only briefly: https://git-annex.branchable.com/not/

          “git-annex is also not boar, although it shares many of its goals and characteristics. Boar implements its own version control system, rather than simply embracing and extending git. And while boar supports distributed clones of a repository, it does not support keeping different files in different clones of the same repository, which git-annex does, and is an important feature for large-scale archiving.”

          1. 2

            I was deciding between boar and git annex, I feel like boar is much simpler, but git annex has more features, I have a feeling if you know git, git annex might be better, but if you don’t know git, boar is simple and powerful.

          1. 3

            I really don’t understand the logic in implementing bash. It seems like 21 months of effort to re-implement bash and the real work of designing a better shell language isn’t started?

            Devil’s advocate, why is migrating old scripts to a hypothetical new language important? Why not have a way to compile a new shell into bash instead for deployment?

            1. 6

              Short answer: fish and zsh already exist, but they are generally used “on top of” bash. Linux distros and modern cloud systems all use bash:

              http://www.oilshell.org/blog/2018/01/28.html

              The goal is to replace bash, not just make a new shell.

              Another way to put it: a big motivation for Oil was digging around the guts of Debian and being horrified… :)


              Also, there’s a big list of other shells here: https://github.com/oilshell/oil/wiki/ExternalResources

              1. 2

                I fully agree with the mission, I still don’t understand why replacing bash involves reimplementing it? I suppose it is like a C++ situation? bash++ ?

                1. 5

                  How else would you replace bash? Why didn’t any of the other shells I linked replace bash?

                  Compatibility is really important – nobody has time to rewrite thousands of lines of bash scripts (and not just time, but expertise). If Oil isn’t compatible, they’ll keep using bash out of inertia.

                  I myself have written thousands of lines of bash scripts, and I wouldn’t even port them into a new language wholesale! I would port them incrementally if there is enough benefit.

                  The Python 2 -> 3 transition should give an idea how much inertia a programming language has, even to minor changes. That is, it was probably 10x more inertia than the Python dev team thought.

                  1. 3

                    To me the point is that it enables people with pre-existing bash scripts to move to a better shell system. This completely removes any barriers for people to migrate from bash to Oil.

                2. 3

                  Although it wouldn’t lead to Oil, I considered new implementations for verifying correctness of configs and secure operation:

                  Formally-Verified Interpreter for a Shell-like Language with interesting comments by AndyC

                  SHILL: A Secure, Shell-Scripting Language

                  1. 1

                    Also, good question about compiling Oil to bash, answered here:

                    https://www.reddit.com/r/ProgrammingLanguages/comments/9i3mnd/does_anybody_know_of_a_language_that_compiles_to/e6l6z80/

                    It’s not out of the question, but I don’t think it’s as straightforward as it seems, or a clear win.

                  1. 1

                    Can you elaborate a bit more on what the impetus was for writing this tool?

                    Just curious as to why you needed to sort json out of core.

                    1. 2

                      I just pipe a lot of json data around for a project I am working on for some analytics, core utils does sorting really well with multi core and temporary files, but the standard sort tool doesn’t play well with json.

                      My main use case right now is just sorting a huge dump of records by time stamp.

                      1. 2

                        Ah yep, that makes sense, I have used coreutils sort for similar reasons to partition large datasets by key out-of-core.

                        A lot of the unix tools kick-ass at data processing once you get things into the right (i.e textual, whitespace separated, newline delimited) format.

                        Just watch out for collation issues (use LC_ALL=C), and other weird locale stuff, if you are depending on a particular order.

                    1. 4

                      A Turin turambar turún’ ambartanen. Another shell that isn’t shell, shells that aren’t shells aren’t worth using because shell’s value is it’s ubiquity. Still, interesting ideas.

                      This brought to you with no small apology to Tolkien.

                      1. 13

                        I’ve used the Fish shell daily for 3-4 years and find it very much worth using, even though it isn’t POSIX compatible. I think there’s great value in alternative shells, even if you’re limited in copy/pasting shell snippets.

                        1. 12

                          So it really depends on the nature of your work. If you’re an individual contributor, NEVER have to do devops type work or actually operate a production service, you can absolutely roll this way and enjoy your highly customized awesomely powerful alternative shell experience.

                          However, if you’re like me, and work in environments where being able to execute standardized runbooks is absolutely critical to getting the job done, running anything but bash is buying yourself a fairly steady diet of thankless, grinding, and ultimately pointless pain.

                          I’ve thought about running an alternative shell at home on my systems that are totally unconnected with work, but the cognitive dissonance of using anything other than bash keeps me from going that way even though I’d love to be using Xonsh by the amazing Anthony Scopatz :)

                          1. 5

                            I’d definitely say so – I’d probably use something else if I were an IC – and ICs should! ICs should be in the habit of trying lots of things, even stuff they don’t necessarily like.

                            I’m a big proponent of Design for Manufacturing, an idea I borrow from the widgety world of making actual things. The idea, as defined by an MFE I know, is that one should build things such that: “The design lends itself to being easily reproduced identically in a reliable, cost-effective manner.”

                            For a delivery-ops guy like me, working in a tightly regulated, safety-critical world of Healthcare, having reproducible, reliable architecture, that’s cheap to replace and repair is critical. Adding a new shell doesn’t move in that needle towards reproducibility, so it’s value has to come from reliability or cheapness, and once you add the fact that most architectures are not totally homogeneous, the cost goes up even more.

                            That’s the hill new shells have to climb, they have to get over ‘sh is just easier to use, it’s already there.’ That’s a very big hill.

                            1. 2

                              “The design lends itself to being easily reproduced identically in a reliable, cost-effective manner.” “That’s the hill new shells have to climb,”

                              Or, like with the similar problem posed by C compilers, they just provide a method to extract to whatever the legacy shell is for widespread, standard usage.

                              EDIT: Just read comment by @ac which suggested same thing. He beat me to it. :)

                              1. 2

                                I’ve pondered about transpilers a bit before, for me personally, I’ve learned enough shell that it doesn’t really provide much benefit, but I like that idea a lot more then a distinct, non-compatible shell.

                                I very much prefer a two-way transpiler. Let me make my old code into new code, so I can run the new code everywhere and convert my existing stuff to the new thing, and let me go back to old code for the machines where I can’t afford to figure out how to get new thing working. That’s a really big ask though.

                                The way we solve this at $work is basically by writing lots of very small amounts of shell, orchestrated by another tool (ansible and Ansible Tower, in our case). This covers about 90% of the infrastructure, with the remaining bits being so old and crufty (and so resource-poor from an organization perspective) that bugs are often tolerated rather than fixed.

                            2. 4

                              The counter to alternative shells sounds more like a reason to develop and use alternative shells that coexist with a standard shell. Maybe even with some state synchronized so your playbooks don’t cause effects the preferred shell can’t see and vice versa. I think a shell like newlisp supporting a powerful language with metaprogramming sounds way better than bash. Likewise, one that supports automated checking that it’s working correctly in isolation and/or how it uses the environment. Also maybe something on isolation for security, high availability, or extraction to C for optimization.

                              There’s lots of possibilities. Needing to use stuff in a standard shell shouldn’t stop them. So, they should replace the standard shell somehow in a way that still lets it be used. I’m a GUI guy whose been away from shell scripting for a long time. So, I can’t say if people can do this easily, already are, or whatever. I’m sure experts here can weigh in on that.

                            3. 7

                              I work primarily in devops/application architecture – having alternative shells is just a big ol’ no – tbh I’m trying to ween myself off bash 4 and onto pure sh because I have to deal with some pretty old machines for some of our legacy products. Alternative shells are cool, but don’t scale well. They also present increased attack surface for potential hackers to privesc through.

                              I’m also an odd case, I think shell is a pretty okay language, wart-y, sure, but not as bad as people make it out to be. It’s nice having a tool that I can rely on being everywhere.

                              1. 13

                                I work primarily in devops/application architecture

                                Alternative shells are cool, but don’t scale well.

                                Non-ubiquitous shells are a little harder to scale, but the cost should be controllable. It depends on what kind of devops you are doing:

                                • If you are dealing with a limited number of machines (machines that you probably pick names yourself), you can simply install Elvish on each of those machines. The website offers static binaries ready to download, and Elvish is packaged in a lot of Linux distributions. It is going to be a very small part of the process of provisioning a new machine.

                                • If you are managing some kind of cluster, then you should already be doing most devops work via some kind of cluster management system (e.g. Kubernetes), instead of ssh’ing directly into the cluster nodes. Most of your job involves calling into some API of the cluster manager, from your local workstation. In this case, the number of Elvish instances you need to install is one: that on your workstation.

                                • If you are running some script in a cluster, then again, your cluster management system should already have a way of pulling in external dependencies - for instance, a Python installation to run Python apps. Elvish has static binaries, which is the easiest kind of external dependency to deal with.

                                Of course, these are ideal scenarios - maybe you are managing a cluster but it is painful to teach whatever cluster management system to pull in just a single static binary, or you are managing some old machines with an obscure CPU architecture that Elvish doesn’t even cross-compile to. However, those difficulties are by no means absolute, and when the benefit of using Elvish (or any other alternative shell) far outweighs the overheads, large-scale adoption is possible.

                                Remember that bash – or every shell other than the original bourne shell - also started out as an “alternative shell” and it still hasn’t reached 100% adoption, but that doesn’t prevent people from using it on their workstation, servers, or whatever computer they work with.

                                1. 4

                                  All good points. I operate on a couple different architectures at various scales (all relatively small, Xe3 or so). Most of the shell I write is traditional, POSIX-only bourne shell, and that’s simply because it’s everywhere without any issue. I could certainly install fish or whatever, or even standardized version of bash, but it’s an added dependency that only provides moderate convenience at the cost of another ansible script to maintain, and increased attack surface.

                                  The other issue is that ~1000 servers or so have very little in common with each other, about 300 of them support one application, that’s the biggest chunk, 4 environments of ~75 machines each, all more or less identical.

                                  The other 700 are a mish mash of versions of different distros, different OSes, different everything, that’s where /bin/sh comes in handy. These are all legacy applications, none of them get any money for new work, they’re all total maintenance mode, any time I spend on them is basically time lost from the business perspective. I definitely don’t want to knock alternative shells as a tool for an individual contributor, but it’s ultimately a much simpler problem for me to say, “I’m just going to write sh” then “I’m going to install elvish across a gagillion arches and hope I don’t break anything”

                                  We drive most cross-cutting work with ansible (that Xe3 is all vms, basically – not quite all, but like 98%), bash really comes in as a tool for debugging more than managing/maintaining. If there is an issue across the infra – say like meltdown/spectre, and I want to see what hosts are vulnerable, it’s really fast for me (and I have to emphasize – for me – I’ve been writing shell for a lot of years, so that tweaks things a lot) to whip up a shell script that’ll send a ping to Prometheus with a 1 or 0 as to whether it’s vulnerable, deploy that across the infra with ansible and set a cronjob to run it. If I wanted to do that with elvish or w/e, I’d need to get that installed on that heterogenous architecture, most of which my boss looks at as ‘why isn’t Joe working on something that makes us money.’

                                  I definitely wouldn’t mind a better sh becoming the norm, and I don’t want to knock elvish, but from my perspective, that ship has sailed till it ports, sh is ubiquitous, bash is functionally ubiquitous, trying to get other stuff working is just a time sink. In 10 years, if elvish or fish or whatever is the most common thing, I’ll probably use that.

                                  1. 1

                                    The other 700 are a mish mash of versions of different distros, different OSes, different everything, that’s where /bin/sh comes in handy.

                                    So, essentially, whatever alternative is built needs to use cross-platform design or techniques to run on about anything. Maybe using cross-platform libraries that facilitate that. That or extraction in my other comment should address this problem, eh?

                                    Far as debugging, alternative shells would bring both a cost and potential benefits. The cost is unfamiliarity might make you less productive since it doesn’t leverage your long experience with existing shell. The potential benefits are features that make debugging a lot easier. They could even outweigh cost depending on how much time they save you. Learning cost might also be minimized if the new shell is based on a language you already know. Maybe actually uses it or a subset of it that’s still better than bash.

                                2. 6

                                  My only real beef with bash is its array syntax. Other than that, it’s pretty amazing actually, especially as compared with pre bash Bourne Shells.

                                  1. 4

                                    Would you use a better language that compiles to sh?

                                    1. 1

                                      Eh, maybe? Depends on your definition of ‘better.’ I don’t think bash or pure sh are all that bad, but I’ve also been using them for a very long time as a daily driver (I write more shell scripts then virtually anything else, ansible is maybe a close second); so I’m definitely not the target audience.

                                      I could see if I wanted to do a bunch of math, I might need use something else, but if I’m going to use something else, I’m probably jumping to a whole other language. Shell is in a weird place, if the complexity is high enough to need a transpiler, it’s probably high enough to warrant writing something and installing dependencies.

                                      I could see a transpiler being interesting for raising that ceiling, but I don’t know how much value it’d bring.

                                3. 10

                                  Could not disagree more. POSIX shell is unpleasant to work with and crufty; my shell scripting went through the roof when I realized that: nearly every script I write is designed to be launched by myself; shebangs are a thing; therefore, the specific language that an executable file is written in is very, very often immaterial. I write all my shell scripts in es and I use them everywhere. Almost nothing in my system cares because they’re executable files with the path to their interpreter baked in.

                                  I am really pleased to see alternative non-POSIX shells popping up. In my experience and I suspect the experience of many, the bulk of the sort of scripting that can make someone’s everyday usage smoother need not look anything like bash.

                                  1. 5

                                    Truth; limiting yourself to POSIX sh is a sure way to write terribly verbose and slow scripts. I’d rather put everything into a “POSIX awk” that generates shell code for eval when necessary than ever be forced to write semi-complex pure sh scripts.

                                    bash is a godsend for so many reasons, one of the biggest being process substitution feature.

                                    1. 1

                                      For my part, I agree – I try to generally write “Mostly sh compatible bash” – defaulting to sh-compatible stuff until performance or maintainability warrant using the other thing. Most of the time this works.

                                      The other mitigation is that I write lots of very small scripts and really push the worse-is-better / lots of small tools approach. Lots of the scripting pain can be mitigated by progressively combining small scripts that abstract over all the details and just do a simple, logical thing.

                                      One of the other things we do to mitigate the slowness problem is to design for asynchrony – almost all of the scripts I write are not time-sensitive and run as crons or ats or whatever. We kick ‘em out to the servers and wait the X hours/days/whatever for them to all phone home w/ data about what they did, work on other stuff in the meantime. It really makes it more comfortable to be sh compatible if you can just build things in a way such that you don’t care if it takes a long time.

                                      All that said, most of my job has been “How do we get rid of the pile of ancient servers over there and get our assess to a disposable infrastructure?” Where I can just expect bash 4+ to be available and not have to worry about sh compatibility.

                                    2. 1

                                      A fair cop, I work on a pretty heterogenous group of machines, /bin/sh works consistently on all of them, AIX, IRIX, BSD, Linux, all basically the same.

                                      Despite our (perfectly reasonable) disagreement, I am also generally happy to see new shells pop up. I think they have a nearly impossible task of ousting sh and bash, but it’s still nice to see people playing in my backyard.

                                    3. 6

                                      I don’t think you can disqualify a shell just because it’s not POSIX (or “the same”, or whatever your definition of “shell” is). The shell is a tool, and like all tools, its value depends on the nature of your work and how you decide to use it.

                                      I’ve been using Elvish for more than a year now. I don’t directly manage large numbers of systems by logging into them, but I do interact quite a bit with services through their APIs. Elvish’s native support for complex data structures, and the built-in ability to convert to/from JSON, makes it extremely easy to interact with them, and has allowed me to build very powerful toolkits for doing my work. Having a proper programming language in the shell is very handy for me.

                                      Also, Elvish’s interactive experience is very customizable and friendly. Not much that you cannot do with bash or zsh, but much cleaner/easier to set up.

                                      1. 4

                                        I’ve replied a bunch elsewhere, I don’t mean to necessarily disqualify the work – it definitely looks interesting, and for an individual contributor somewhere who doesn’t have to manage tools at scale, or interact with tools that don’t speak the JSON-y api it offers, etc – that’s where it starts to get tricky.

                                        I said elsewhere in thread, “That’s [the ubiquity of sh-alikes] the hill new shells have to climb, they have to get over ‘sh is just easier to use, it’s already there.’ That’s a very big hill.”

                                        I’d be much more interested if elvish was a superset of sh or bash. I think that part of the reason bash managed to work was that sh was embedded underneath, it was a drop-in replacement. If you’re a guy who, like me, uses a lot of shell to interact with systems, adding new features to that set is valuable, removing old ones is devastating. I’m really disqualifying (as much as I am) on that ground, not just that it’s not POSIX, but that it is less-than-POSIX with the same functionality. That keeps it out of my realm.

                                        Now this may be biased, but I think I’m the target audience in terms of adoption – you convince a guy like me that your shell is worth it, and I’m going to go drop it on my big pile of servers where-ever I’m working. Convincing ICs who deal with their one machine gets you enough adoption to be a curiousity, convince a DevOps/Delivery guy and you get shoved out to every new machine I make and suddenly you’ve got a lot of footprint that someone is going to have to deal with long after I’m gone and onto Johhny Appleshelling the thing at whatever poor schmuck hires me next.

                                        Here’s what I’d really like to see, a shell that offers some of these JSON features as an alternative pipe (maybe ||| is the operator, IDK), adds some better numbercrunching support, and maybe some OO features. All while remaining a superset of POSIX. That’d make the cost of using it very low, which would make it easy to justify adding to my VM building scripts. It’d make the value very high (not having to dip out to another tool to do some basic math would be fucking sweet,), having OO features so I could operate on real ‘shell objects’ and JSON to do easier IO would be really nice as well. Ultimately though you’re fighting uphill against a lot of adoption and a lot of known solutions to these problems (there are patterns for writing shell to be OOish, there’s awk for output processing, these are things which are unpleasant to learn, but once you do, the problem JSON solves drops to a pretty low priority).

                                        I’m really not trying to dismiss the work. Fixing POSIX shell is good work, it’s just not likely to be successful by replacing. Improving (like bash did) is a much better route, IMO.

                                      2. 2

                                        I’d say you’re half right. You’ll always need to use sh, or maybe bash, they’re unlikely to disappear anytime soon. However, why limit yourself to just sh when you’re working on your local machine? You could even take it a step further and ask why are you using curl locally when you could use something like HTTPie instead? Or any of the other “alternative tools” that make things easier, but are hard to justify installing everywhere. Just because a tool is ubiquitous does not mean it’s actually good, it just means that it’s good enough.

                                        I personally enjoy using Elvish on my local machines, it makes me faster and more efficient to get things done. When I have to log into a remote system though I’m forced to use Bash, it’s fine and totally functional, but there’s a lot of stupid things that I hate. For the most ridiculous and trivial example, bash doesn’t actually save it’s history until the user logs out, unlike Elvish (or even IPython) which saves it after each input. While it’s a really minor thing, it’s really, really, really useful when you’re testing low level hardware things that might force an unexpected reboot or power cycle on a server.

                                        I can’t fault you if you want to stay POSIX, that’s a personal choice, but I don’t think it’s fair to write off something new just because there’s something old that works. With that mindset we’d still be smashing two rocks together and painting on cave walls.

                                      1. 12

                                        I don’t like tools that make using awk or cut harder.

                                        The output could be improved without using the parenthesis around the bytes, or having a flag .

                                        1. 5

                                          Tools should have optional JSON output so I can use jq queries instead of awk or cut :P

                                          1. 4

                                            I really like jq too :)

                                            1. 2

                                              https://github.com/Juniper/libxo

                                              This is integrated in most FreeBSD utilities.

                                              1. 3

                                                We should all switch to Powershell, where values have types and structure, instead of switching to a different text format.

                                            1. 0

                                              Last time I checked, CompCert did not do any formal analysis comparing its behavior and the semantics of the C language. The formal verification stuff is all back end, i.e., code generation.

                                              More CompCert soap powder advertising.

                                              Using the CompCert proof ‘methodology’, I could translate Visual Basic to the CompCert low level code (from which machine code is generated) and claim to have a formally verified Visual Basic compiler (i.e., no connection to the formal semantics of any language is required).

                                              1. 2

                                                I thought I told you or you told me that CompCert uses its own dialect of C that they formalized. Its a subset with specific interpretations of anything with wiggle room. Most of their work centers on Clight as stsrting point. Most efforts targeting CompCert target it. One could check the MISRA-C to Clight translation by eye and equivalence tests if concerned.

                                                I do think they should be more clear on exactly what they support at various times.

                                                1. 1

                                                  Incorrect, a subset is not a dialect. compcert implements a very large subset…

                                                  1. 1

                                                    It’s possible my wording is wrong. I’m not sure how different something has to be from the original before it qualifies as its own version or dialect. Usually, once a non-standard change is introduced, people often refer to it as something different sometimes using “dialect.”

                                                    CompCert C has never been C itself. It was always its own easier-to-verify thing. For example, its memory model has evolved over time. These verifiable subsets of C not only have a subset of functionality: they expect it to be used in specific ways/semantics that facilitate their analyses and algorithms.

                                                    Don’t these subsets that have specific, unusual interpretations and usages become new dialects? Or is my own interpretation of what a dialect is off?

                                                    Note: That was first link I could find about this in DDG. Original was on Lobsters but its search isn’t giving it up.

                                                    1. 2

                                                      The compcert memory model is shared with C from what you linked. Also from the compcert front page

                                                      The main result of the project is the CompCert C verified compiler, a high-assurance compiler for almost all of the > C language (ISO C99), generating efficient code for the PowerPC, ARM, RISC-V and x86 processors.

                                                      A dialect is different, a subset is a subset. also see the words ‘almost all’ .

                                                      1. 2

                                                        There is a bit more information available here.

                                                        CompCert C supports all of ISO C 99, with the following exceptions:

                                                        -switch statements must be structured as in MISRA-C; unstructured “switch”, as in Duff’s device, is not supported.

                                                        -longjmp and setjmp are not guaranteed to work.

                                                        -Variable-length array types are not supported.

                                                        There is a specific mention that they cover all of MISRA C 2004, which is already quite good considering that this tool might be especially interesting for industry application in safety-critical context, like in the linked paper.

                                                        1. 1

                                                          I could be recalling incorrectly what made it a dialect or something about C which I’m far from an expert on. I’ll say subset for now unless I can cite a difference deserving the word dialect.

                                                    2. 1

                                                      Clight looks like it might be a very small subset of C (I have not checked). It might also qualify as a very small subset of Java and perhaps other languages.

                                                      Since Clight is defined by its creators and whatever it does, by default, is correct. What exactly does a proof of the correctness of Clight mean?

                                                      1. 2

                                                        The proofs in CompCert typically mean:

                                                        1. The specs possess a specific property.

                                                        2. The implementation does what the spec says on all inputs.

                                                        In CompCert’s case, they mainly want to specify the meaning of the languages, the operations (esp transformations) they perform on them, and prove they have equal meaning after those operations. The Csmith testing empirically confirmed high quality for at least those type of tests. Found a few spec errors while non-verified compilers had piles of failures.

                                                        Personally, Id like to see a testing suite of every feature, common combinations (esp w/ nesting and aliasing) and random combos. Compare how each compiler handles programs generated from those templates.

                                                    3. 1

                                                      you can formally verify C code with things like frama-c or verifast… But once you have verified code, you need a verified compiler to convert it to assembly, that is where compcert comes in. If you compile verified code with some buggy compiler it might do you little good.

                                                      1. 1

                                                        Frame-C and Verifast are essentially static checkers of source code. They have no means of formally verifying C source against the requirements contained in the C Standard.

                                                        Where is the proof showing that CompCert implements the semantics specified in the C Standard? There could be a fault in CompCert’s handling of C semantics (just like in other compilers) which causes it to generate incorrect code.

                                                        1. 1

                                                          Frama-C WP plugin absolutely allows you to write formal specifications of your code, and checks the C code matches that specification with relation to the C standard. https://frama-c.com/wp.html

                                                          The whole point of compcert is that it is a formally verified translator of the C standard to assembly… Obviously there could be bugs in the compcert proofs, but they are probably less likely than many bugs in other compilers.

                                                          Where is the proof showing that CompCert implements the semantics specified in the C Standard?

                                                          The proofs are in the git repository…

                                                          1. 1

                                                            I don’t see anything in the frame-c link that makes any claim about “relation to the C Standard”. In fact, the word ‘standard’ doe snot appear in the link.

                                                            In frama-c, you write you own specifications and the tool checks the source code against what you have written. No connection to the C standard provided or required.

                                                            Please provide a link to the files involving proofs against the C Standard, I cannot find them.

                                                            1. 1

                                                              What language do you think you are checking against? ASCL (what WP checks the C code against) stands for ANSI/ISO C Specification.

                                                              What language do you think it checks against if not standard C? Here is the wiki page - https://en.wikipedia.org/wiki/ANSI/ISO_C_Specification_Language

                                                              Are you claiming they invented their own language that is not C to check the spec against?

                                                              1. 1

                                                                The ANSI-C just come from the fact that it targets ANSI C. ASCL itself is just a language for formally specifying behavioral properties of code written in ANSI C. The book is here with tons of examples.

                                                                The only one I know that can check C code against its specification is KCC that RV-Match is built on. That’s because it’s pretty much the only complete, open specification of C that can catch undefined behavior and be used as a push-button tool (if their docs are honest). It also passed the parts of the GCC Torture Test that their semantics support, RV-Match is getting good results on benchmarks. So, you could say it’s only formal semantics with significant, empirical evidence on top of claims in formal specs/proofs.

                                                    1. 1

                                                      Sucks that I can only watch this in 480p, the server seems to be limited to streaming at 275kB/s…

                                                      1. 4

                                                        Yeah, if only there were some sort of way to download from more than one server… some sort of peer to peer system. :)

                                                        1. 3

                                                          The problem with PeerTube is that it’s hard to permanently seed. You have to remember to manually find the torrent link and open it in webtorrent. As such most videos on PeerTube that I’ve seen only stream from the server.

                                                          1. 2

                                                            WebRTC-TCP barrier seems to be a problem for truly decentralized Web.

                                                      1. 13

                                                        Quoting the author from elsewhere (one year ago):

                                                        “ I’m the original author of pixie, and yeah, I’m a bit surprised to see this hit HN today. It should be mentioned that I put about a year of work into this language, and then moved on about a year or so ago. One of the biggest reasons for my doing so is that I accomplished what I was looking for: a fast lisp that favored immutability and was built on the RPython toolchain (same as PyPy). But in the end the lack of supporting libraries and ecosystem became a battle I no longer wanted to fight.

                                                        Another goal I had was to see how far I could push immutability into the JIT. I learned a lot along the way, but it turns out that the RPython JITs aren’t really that happy with VMs that are 99.99% pure. At one point I had a almost 100% immutable VM running for Pixie…as in each instruction executed created a new instance of the VM. It worked, but the JIT generated by RPython wasn’t exactly happy with that execution model. There was so much noise in the maintenance of the immutable structures that the JIT couldn’t figure out how to remove them all, and even when it could the JIT pauses were too high.

                                                        So anyways, after pouring 4 hours a day of my spare time into Pixie for a full year, I needed to move on.

                                                        Some other developers have commit rights and have pushed it along a bit, but I think it’s somewhat a language looking for usecase.

                                                        And these days ClojureScript on Node.js could probably be made to handle most peoples needs. “

                                                        1. 3

                                                          This deserves more traction. I believe there is room for another lisp, other than the various Scheme(s) and Common Lisp. I do like Clojure but I don’t like to be locked-in on the Oracle platform. I would also like to see something that is more community driven like Python. But enough of my wishes.

                                                          I think the idea is very nice, implementing a clojure dialect on top of RPython. Part of me wondes why this does not happen more often. Pypy/Rpython seems like a matured framework for JITs for dynamic languages.

                                                          1. 3

                                                            I know you mentioned liking to see another Lisp other than various Schemes, but Pycket is also an implementation of Racket on RPython/PyPy. I agree with you, would love to see more happening in this area.

                                                            1. 2

                                                              I followed pixie when it started. The author is the reason it didn’t grow imo, he just stopped caring and never dog fooded it.

                                                              1. 3

                                                                “Latest commit by heyLu about 1 year ago” says the GitHub page. However I can see recent commits in the log. Using RPython is certainly smart, but I would really like to see some benchmarks, especially when Clojure can run on GraalVM.

                                                            1. 1

                                                              Neat, I guess the next step is running the whole of plan9?

                                                              1. 1

                                                                I think HarveyOS is working on this, no?

                                                                1. 4

                                                                  In my opinion and with limited experience, HarveyOS’ team is attempting to port gcc and(?) llvm/clang to Plan 9. I don’t see the value on doing that other than importing lots of bloat to a relatively clean base system.

                                                                  1. 1

                                                                    Ah, good point. I forgot they were doing that.

                                                                    1. 1

                                                                      One man’s bloat is another man’s livelihood.

                                                                1. 5

                                                                  Oh, I really want to use sway and wayland on nixos, but keep having issues :( I try every release.

                                                                  1. 3

                                                                    By the way, a Sway 1.0 beta derivation has been added to nixpkgs two days ago:

                                                                    https://github.com/NixOS/nixpkgs/commit/d8d8a0ba14a7be1aa36bcbdf66b7f2cd9df9d3ff

                                                                    Tried it yesterday, works with amdgpu by just running sway from a tty.

                                                                    1. 3

                                                                      Yikes.

                                                                      Package maintainers: please do not package this release.

                                                                      1. 3

                                                                        The person who made this derivation asked @SirCmpwn for permission. He was ok with it, as long as it was clear that it packages a beta version of Sway. The derivation is called sway-beta.

                                                                        You also need unstable nixpkgs. So it says ‘expect problems’ on the tin twice ;).

                                                                        1. 1

                                                                          How else you gonna try it out on NixOS? We have to package things.

                                                                          1. 1

                                                                            Same way as anyone else? ./configure && make && make install? Or is that not supported with nix?

                                                                            1. 3

                                                                              Not the make install part.

                                                                              1. 2

                                                                                When installing an application is a simple ./configure && make && make install, writing a Nix derivation is trivial and has a lot of benefits:

                                                                                • You don’t have to install the dependencies in your system or user environment.
                                                                                • You don’t have to install the program in your system or user environment.
                                                                                • The build is sandboxed (when sandboxing is enabled).
                                                                                • You can manage the resulting package with the usual nix tools.
                                                                                • Easy to reproduce the build across systems.

                                                                                The sway-beta derivation mostly consists of metadata, package dependencies, and the GitHub username/repo/revision/SHA. But you’d have to figure out the package names of dependencies in your distribution anyway, even when doing the configure/make dance.

                                                                          2. 1

                                                                            thanks, I use the amdgpu driver, i will give it a go.

                                                                          3. 2

                                                                            I really want to use sway and wayland on nixos, but keep having issues :( I try every release.

                                                                            I’m also interested. Did you ask for help on the nix forum?

                                                                            1. 2

                                                                              I was having issues with my graphics drivers and wayland, can’t remember the exact problem off the top of my head. Don’t think I asked anybody other than google for help last time.

                                                                            2. 1

                                                                              I used NixOS to build a VM with Wayland and Sway a long time ago. I was able to use it! Hopefully you can solve your graphics problems.

                                                                            1. 2

                                                                              I dunno, I dislike this idea. It seems you are taking something like rust which has a strength of being ‘close to the metal’ then putting so many layers in front of it that you probably lose all the benefits anyway (small start time, low foot print, performance …).

                                                                              1. 2

                                                                                Various things throughout the blog posts linked make me nervous. Using HMAC as a random oracle isn’t ideal, but probably not immediately problematic. HMAC then Encrypt makes me very nervous.

                                                                                Taking a step back: you have to assume the encrypting party is not compromised at time of backup, or else confidentiality is trivially lost. Why not then symmetrically encrypt with authenticated encryption, public-key encrypt the data encryption key and send it along, then locally delete the symmetric key?

                                                                                Deduplication leads to potential side-channel attacks on backup confidentiality in a model which assumes compromised backup hosts.

                                                                                Randomized encryption is critical for confidentiality: don’t use textbook rsa.

                                                                                Obviously don’t put anything important into a homebrewed cryptosystem.

                                                                                1. 1

                                                                                  HMAC then Encrypt makes me very nervous.

                                                                                  If there is evidence for that, explain or cite. HMAC is designed specifically for cases like this.

                                                                                  Doing no encryption, or having no access controls, or plain SHA deduplication all make me far more nervous than HMAC then encrypt. Let me know what you use so that you are not nervous? I don’t claim this is perfect, just that I think it is an improvement over what I have seen.

                                                                                  Taking a step back: you have to assume the encrypting party is not compromised at time of backup, or else confidentiality is trivially lost. Why not then symmetrically encrypt with authenticated encryption, public-key encrypt the data encryption key and send it along, then locally delete the symmetric key?

                                                                                  This is essentially what nacl secret box does, which is what packnback uses. Not sure where the criticism is here.

                                                                                  Obviously don’t put anything important into a homebrewed cryptosystem.

                                                                                  nacl crypto box is not home brewed. It was designed by a respected cryptographer. I also didn’t design HMAC or sha256. Explain to me which part you consider home brewed?

                                                                                  Deduplication leads to potential side-channel attacks on backup confidentiality in a model which assumes compromised backup hosts.

                                                                                  By this logic we should just do nothing. This design follows a tiers of security, and potential side-channels AFTER the storage host is compromised is better than most systems we have today.

                                                                                  1. 1

                                                                                    MAC then encrypt is one method of combining authenticity and confidentiality, but while it isn’t as broken as MAC-and-encrypt (which doesn’t necessarily satisfy IND-CPA) it is potentially problematic for attacks which are mitigated by IND-CCA2 security such as padding oracles.

                                                                                    Basically, nervous because they’re using an HMAC sort of as a random oracle, and not for message authenticity, but might be using it for authenticity, in which case encrypt-then-mac would be preferable.

                                                                                    NaCl is very likely good software. I’m glad they’re using it- all I meant was that even good parts can be put together in potentially vulnerable ways, and that’s particularly easy to do with crypto.

                                                                                    To your last point, I strongly disagree that my statement implies inaction. I simply meant to point out a class of attacks that need to be carefully avoided when implementing a deduplicating backup system.

                                                                                    Reasonable explanations for any interested https://en.wikipedia.org/wiki/Authenticated_encryption https://en.wikipedia.org/wiki/Chosen-plaintext_attack https://en.wikipedia.org/wiki/Adaptive_chosen-ciphertext_attack

                                                                                    1. 1

                                                                                      Reasonable explanations for any interested https://en.wikipedia.org/wiki/Authenticated_encryption https://en.wikipedia.org/wiki/Chosen-plaintext_attack https://en.wikipedia.org/wiki/Adaptive_chosen-ciphertext_attack

                                                                                      Can you explain to me how a chosen plaintext attack would work in this scenario? As was mentioned, it is essentially the same idea as gpg encrypted email. Are you suggesting the ability to send encrypted emails to someone is the same as a chosen plain text attack? Because that seems very wrong to me.

                                                                                      Sending arbitrary encrypted backups does not give you access to the time and place the administrator accesses your poisoned backups, so I’m not sure what you are suggesting. Obviously it is up to the administrator to detect the time of compromise and discard backups after that point in time, but no system can avoid that issue.

                                                                                      Basically, nervous because they’re using an HMAC sort of as a random oracle, and not for message authenticity, but might be using it for authenticity, in which case encrypt-then-mac would be preferable.

                                                                                      So in encrypt-then-mac how do you ensure a stable nonce so that deduplication can function? The only approach I have seen is people derive the nonce from the plain text in which case the nonce is now what leaks bits. HMAC seems far preferable to that.

                                                                                      I strongly disagree that my statement implies inaction. I simply meant to point out a class of attacks that need to be carefully avoided when implementing a deduplicating backup system.

                                                                                      You implied it was a ‘home brewed’ and untrustworthy set of ideas… then you offered no alternative.

                                                                                      1. 1

                                                                                        Deterministic encryption (such as textbook RSA) is vulnerable to chosen-plaintext attacks which compromise indistinguishability from randomness, and with potentially known plaintext file contents might compromise message confidentiality. A backup server might be able to run this kind of attack - they can be thought of as an active MitM adversary against encrypting machine and the decrypting machine.

                                                                                        Now, using NaCl or similar means that it’s unlikely that the encryption is deterministic, which is great. The next class of attacks I’d be worried about are of the Vaudenay/Bleichenbacher style, which are mitigated through authenticated encryption, or something like OAEP- hopefully this is used in the implementation. The argument “we don’t need CCA2 security because there isn’t a decryption oracle”- not that you’re making that argument -falls through in too many historical cases where a decryption oracle is later found, and thus I’d strongly lean toward AE just as a best practice.

                                                                                        Regarding encrypt-then-mac vs mac-then-encrypt, and HMAC: I’m not sure what the best option for a stable nonce is. My comment was simply to point out that mac-then-encrypt doesn’t have the property of ciphertext integrity which the implementation may want/rely on, and to be careful to avoid relying on it for anything the construction does not necessarily provide. Perhaps something involving the hashing of authenticated-encryption ciphertexts would solve this problem, or something similar- intuitively I think there’s probably something simpler than HMACing the plaintext -but I don’t want to claim anything without in-depth analysis.

                                                                                        Regarding offering an alternative: raising a potential concern comes with no obligation of offering a solution. Again: in-depth analysis that I don’t necessarily consider myself qualified to perform would be required to ensure that this system is secure, and I wouldn’t want to suggest an alternative or definitively claim security or insecurity without such analysis.

                                                                                        What concerns me is that implementation has started seemingly without thorough exploration of these threat models …unless this project is simply for educational purposes, in which case, hooray, go for it, but make that clear in your documentation. But if this will ever be recommended to anyone as an actual tool for securing important backups, it would need thorough consideration of the above, and likely many more, subtler potential issues.

                                                                                        1. 2

                                                                                          Vaudenay/Bleichenbacher style … which are mitigated through authenticated encryption.

                                                                                          Yes, there is another layer of authenticated encryption used outside of the dedup key. This is provided by NaCL. The encryption is a combination of Salsa20 and Poly1305 with random nonces. My current implementation follows all guidelines from NaCl.

                                                                                          As far as I can see, is the main complaint you have is that there is an HMAC of the plaintext along side the cipher text. This is an open question that warrants more investigation, but as things currently stand, the state of the art is that HMAC is not reversible, and and NACL with a random nonce is secure even if the plain text is known. In packnback case the plain text may theoretically be hinted at with sophisticated attacks that don’t exist yet, that may require breaking HMAC-SHA256 or Salsa20 or the interactions between the two. The day this happens is the day the project would become packnback2

                                                                                          There are no chosen cipher text or adaptive cipher text attacks that I can see either, because encrypted data is written one way by design, there is no feed back to an attacker.

                                                                                          The storage provider itself is in a semi privileged position, if there is a fundamental weakness found, the system would revert to being signed plain text backups. But again, the design is layered such that this is still better than many providers today while being in the worst case failure mode. You still get strict append only semantics protecting historic data from a compromised client, just done via access controls enforced via https auth, or ssh forced commands.

                                                                                          But if this will ever be recommended to anyone as an actual tool for securing important backups, it would need thorough consideration of the above, and likely many more, subtler potential issues.

                                                                                          Of course it is the reason this is public to begin with. I have considered this and other issues more than is written, but of course more specialized help is wanted. It does say “work in progress” after all. I don’t see why you are so sure subtle issues cannot be resolved.

                                                                                          If there is a real fundamental flaw found I am totally willing to scrap the idea completely, I just think it is an improvement of what exists currently.

                                                                                          1. 2

                                                                                            You misread my comments as complaints :) I’m all for implementation. Given what you’re saying in your last comment, it sounds like detailed architecture documentation would be useful- I didn’t gather that any of that was going on short of looking at the code. I think that such documentation along with reasonable threat modeling and analysis will likely be the best way to encourage people to give you that more specialized help, and to avoid people like me asking the really basic questions.

                                                                                            Best of luck!

                                                                                            1. 2

                                                                                              I tend to get overly defensive - $dayjob is slowing things down, but I would love to have a detailed threat model laid out and all formats and protocols fully documented.

                                                                                              Thanks for taking the time to give feedback, It is valuable because people definitely would have questions like these.

                                                                                1. 20

                                                                                  The “lacks” of Go in the article are highly opinionated and without any context of what you’re pretending to solve with the language.

                                                                                  Garbage collection is something bad? Can’t disagree harder.

                                                                                  The article ends with a bunch of extreme opinions like “Rust will be better than Go in every possible task

                                                                                  There’re use cases for Go, use cases for Rust, for both, and for none of them. Just pick the right tool for your job and stop bragging about yours.

                                                                                  You love Rust, we get it.

                                                                                  1. 2

                                                                                    Yes, I would argue GC is something that’s inherently bad in this context. Actually, I’d go as far as to say that a GC is bad for any statically typed language. And Go is, essentially, statically typed.

                                                                                    It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left. In other words, you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.

                                                                                    That’s why Go has the “defer” statement, it’s there because of the GC. Otherwise, destructors could be used to defer cleanup tasks at the end of a scope.

                                                                                    So that’s what makes a GC inherently bad.

                                                                                    A GC, however, is also bad because it “implies” the language doesn’t have good resource management mechanisms.

                                                                                    There was an article posted here, about how Rust essentially has a “static GC”, since manual deallocation is almost never needed. Same goes with well written C++, it behaves just like a garbage collected language, no manual deallocation required, all of it is figured out at compile time based on your code.

                                                                                    So, essentially, a GC does what language like C++ and Rust do at compile time… but it does it at runtime. Isn’t this inherently bad ? Doing something that can be done at CT during runtime ? It’s bad from a performance perspective and also bad from a code validation perspective. And it has essentially no upsides, as far as I’ve been able to tell.

                                                                                    As far as I can tell the main “support” for GC is that they’ve always been used. But that doesn’t automatically make them good. GCs seem to be closer to a hack for a language to be easier to implement rather than a feature for a user of the language.

                                                                                    Feel free to convince me otherwise.

                                                                                    1. 11

                                                                                      It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left.

                                                                                      Why do you think this would be the case? A language with GC can also have linear or affine types for enforcing that resources are always freed and not used after they’re freed. Most languages don’t go this route because they prefer to spend their complexity budgets elsewhere and defer/try-with-resources work well in practice, but it’s certainly possible. See ATS for an example. You can also use rank-N types to a similar effect, although you are limited to a stack discipline which is not the case with linear/affine types.

                                                                                      So, essentially, a GC does what language like C++ and Rust do at compile time… but it does it at runtime. Isn’t this inherently bad ?

                                                                                      No, not necessarily. Garbage collectors can move and compact data for better cache locality and elimination of fragmentation concerns. They also allow for much faster allocation than in a language where you’re calling the equivalent of malloc under the hood for anything that doesn’t follow a clean stack discipline. Reclamation of short-lived data is also essentially free with a generational collector. There are also garbage collectors with hard bounds on pause times which is not the case in C++ where a chain of frees can take an arbitrary amount of time.

                                                                                      Beyond all of this, garbage collection allows for a language that is both simpler and more expressive. Certain idioms that can be awkward to express in Rust are quite easy in a language with garbage collection precisely because you do not need to explain to the compiler how memory will be managed. Pervasive use of persistent data structures also becomes a viable option when you have a GC that allows for effortless and efficient sharing.

                                                                                      In short, garbage collection is more flexible than Rust-style memory management, can have great performance (especially for functional languages that perform a lot of small allocations), and does not preclude use of linear or affine types for managing resources. GC is hardly a hack, and its popularity is the result of a number of advantages over the alternatives for common use cases.

                                                                                      1. 1

                                                                                        What idioms are unavailable in Rust or in modern C++, because of their lack of GC, but are available in a statically typed GC language ?

                                                                                        I perfectly agree with GC allowing for more flexibility and more concise code as far as dynamic language go, but that’s neither here nor there.

                                                                                        As for the theoretical performance benefits and real-time capabilities of a GCed language… I think the word theoretical is what I’d focus my counter upon there, because they don’t actually exist. The GC overhead is too big, in practice, to make those benefits outshine languages without runtime memory management logic.

                                                                                        1. 9

                                                                                          I’m not sure about C++, but there are functions you can write in OCaml and Haskell (both statically typed) that cannot be written in Rust because they abstract over what is captured by the closure, and Rust makes these things explicit.

                                                                                          The idea that all memory should be explicitly tracked and accounted for in the semantics of the language is perhaps important for a systems language, but to say that it should be true for all statically typed languages is preposterous. Languages should have the semantics that make sense for the language. Saying a priori that all languages must account for some particular feature just seems like a failure of the imagination. If it makes sense for the semantics to include explicit control over memory, then include it. If it makes sense for this not to be part of the semantics (and for a GC to be used so that the implementation of the language does not consume infinite memory), this is also a perfectly sensible decision.

                                                                                          1. 2

                                                                                            there are functions you can write in OCaml and Haskell (both statically typed) that cannot be written in Rust because they abstract over what is captured by the closure

                                                                                            Could you give me an example of this ?

                                                                                            1. 8

                                                                                              As far as I understand and have been told by people who understand Rust quite a bit better than me, it’s not possible to re-implement this code in Rust (if it is, I would be curious to see the implementation!)

                                                                                              https://gist.github.com/dbp/0c92ca0b4a235cae2f7e26abc14e29fe

                                                                                              Note that the polymorphic variables (a, b, c) get instantiated with different closures in different ways, depending on what the format string is, so giving a type to them is problematic because Rust is explicit about typing closures (they have to talk about lifetimes, etc).

                                                                                              1. 2

                                                                                                My God, that is some of the most opaque code I’ve ever seen. If it’s true Rust can’t express the same thing, then maybe it’s for the best.

                                                                                                1. 2

                                                                                                  If you want to understand it (not sure if you do!), the approach is described in this paper: http://www.brics.dk/RS/98/12/BRICS-RS-98-12.pdf

                                                                                                  And probably the reason why it seems so complex is because CPS (continuation-passing style) is, in general, quite hard to wrap your head around.

                                                                                                  I do think that the restrictions present in this example will show up in simpler examples (anywhere where you are trying to quantify over different functions with sufficiently different memory usage, but the same type in a GC’d functional language), this is just a particular thing that I have on hand because I thought it would work in Rust but doesn’t seem to.

                                                                                                  1. 2

                                                                                                    FWIW, I spent ~10 minutes trying to convert your example to Rust. I ultimately failed, but I’m not sure if it’s an actual language limitation or not. In particular, you can write closure types in Rust with 'static bounds which will ensure that the closure’s environment never borrows anything that has a lifetime shorter than the lifetime of the program. For example, Box<FnOnce(String) + 'static> is one such type.

                                                                                                    So what I mean to say is that I failed, but I’m not sure if it’s because I couldn’t wrap my head around your code in a few minutes or if there is some limitation of Rust that prevents it. I don’t think I buy your explanation, because you should technically be able to work around that by simply forbidding borrows in your closure’s environment. The actual thing where I got really hung up on was the automatic currying that Haskell has. In theory, that shouldn’t be a blocker because you can just introduce new closures, but I couldn’t make everything line up.

                                                                                                    N.B. I attempted to get any Rust program working. There is probably the separate question of whether it’s a roughly equivalent program in terms of performance characteristics. It’s been a long time since I wrote Haskell in anger, so it’s hard for me to predict what kind of copying and/or heap allocations are present in the Haskell program. The Rust program I started to write did require heap allocating some of the closures.

                                                                                      2. 5

                                                                                        It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left. In other words, you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.

                                                                                        Deterministic freeing of resources is not mutually exclusive with all forms of garbage collection. In fact, this is shown by Rust, where reference counting (Rc) does not exclude Drop. Of course, Drop may never be called when you create cycles.

                                                                                        (Unless you do not count reference counting as a form of garbage collection.)

                                                                                        1. 2

                                                                                          Well… I don’t count shared pointers (or RC pointers or w/e you wish to call them) as garbage collected.

                                                                                          If, in your vocabulary, that is garbage collection then I guess my argument would be against the “JVM style” GC where the moment of destruction can’t be determined at compile time.

                                                                                          1. 8

                                                                                            If, in your vocabulary, that is garbage collection

                                                                                            Reference counting is generally agreed to be a form of garbage collection.

                                                                                            I guess my argument would be against the “JVM style” GC where the moment of destruction can’t be determined at compile time.

                                                                                            In Rc or shared_ptr, the moment of the object’s destruction can also not be determined at compile time. Only the destruction of the Rc itself; put differently the reference count decrement can be determined at compile time.

                                                                                            I think your argument is against tracing garbage collectors. I agree that the lack of deterministic destruction is a large shortcoming of languages with tracing GCs. It effectively brings back a parallel to manual memory management through the backdoor — it requires manual resource management. You don’t have to convince me :). I once wrote a binding to Tensorflow for Go. Since Tensorflow wants memory aligned on 32-byte boundaries on amd64 and Go allocates (IIRC) on 16-byte boundaries, you have to allocate memory in C-land. However, since finalizers are not guaranteed to run, you end up managing memory objects with Close() functions. This was one of the reasons I rewrote some fairly large Tensorflow projects in Rust.

                                                                                            1. 2

                                                                                              However, since finalizers are not guaranteed to run, you end up managing memory objects with Close() functions.

                                                                                              Hmm. This seems a bit odd to me. As I understand it, Go code that binds to C libraries tend to use finalizers to free memory allocated by C. Despite the lack of a guarantee around finalizers, I think this has worked well enough in practice. What caused it to not work well in the Tensorflow environment?

                                                                                              1. 3

                                                                                                When doing prediction, you typically allocate large tensors relatively rapidly in succession. Since the wrapping Go objects are very small, the garbage collector kicks in relatively infrequently, while you are filling memory in C-land. There are definitely workarounds to put bounds on memory use, e.g. by using an object pool. But I realized that what I really want is just deterministic destruction ;). But that may be my C++ background.

                                                                                                I have rewritten all that code probably around the 1.6-1.7 time frame, so maybe things have improved. Ideally, you’d be able to hint the Go GC about the actual object sizes including C-allocated objects. Some runtimes provide support for tracking C objects. E.g. SICStus Prolog has its own malloc that counts allocations in C-land towards the SICStus heap (SICStus Prolog can raise a recoverable exception when you use up your heap).

                                                                                                1. 1

                                                                                                  Interesting! Thanks for elaborating on that.

                                                                                            2. 3

                                                                                              So Python, Swift, Nim, and others all have RC memory management … according to you these are not GC languages?

                                                                                          2. 5

                                                                                            One benefit of GC is that the language can be way simpler than a language with manual memory management (either explicitly like in C/C++ or implicitly like in Rust).

                                                                                            This simplicity then can either be preserved, keeping the language simple, or spent on other worthwhile things that require complexity.

                                                                                            I agree that Go is bad, Rust is good, but let’s be honest, Rust is approaching a C++-level of complexity very rapidly as it keeps adding features with almost every release.

                                                                                            1. 1

                                                                                              you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.

                                                                                              That is a terrible point. The result of closing the file stream should always be checked and reported or you will have buggy code that can’t handle edge cases.

                                                                                              1. 0

                                                                                                You can turn off garbage collection in Go and manage memory manually, if you want.

                                                                                                It’s impractical, but possible.

                                                                                                1. 2

                                                                                                  Is this actually used with any production code ? To my knowledge it was meant to be more of a feature for debugging and language developers. Rather than a true GC-less option, like the one a language like D provides.

                                                                                                  1. 1

                                                                                                    Here is a shocking fact: For those of us who write programs in Go, the garbage collector is actually a wanted feature.

                                                                                                    If you work on something where having a GC is a real problem, use another language.

                                                                                            1. 1

                                                                                              now do rust (Sort of kidding). Amazing effort putting 37k commits!

                                                                                              1. 2

                                                                                                a lot of those commits appear to be of Go. The Go source tree is part of BiscuitOS’s repo.

                                                                                              1. 22

                                                                                                These days I would almost prefer if services didn’t daemonize themselves. Daemonization can be solved in the service manager if needed.

                                                                                                1. 10

                                                                                                  A thousand times, yes.

                                                                                                  Run in the foreground, log to stdout; the service manager will deal with log rotation & alerting/restarting on exit, etc.

                                                                                                  1. 1

                                                                                                    The only problem I have with this is structured logging. Lots of languages dump a stacktrace of several lines when the app crashes. But each of those lines is then a separate log entry.

                                                                                                    1. 1

                                                                                                      A simple approach would be to ensure that stdout has one entry pre line, while stderr is used for stack traces & co.

                                                                                                      1. 1

                                                                                                        That’s why I add the process ID and line number when I log a crash. That way, I can collate the output from a crash and don’t have to worry about line lengths when logging via syslog().

                                                                                                    2. 2

                                                                                                      almost? I have never wanted this… I just use daemontools or systemd.

                                                                                                    1. 3

                                                                                                      I would like nix so much more if I didn’t end up with 6 copies of gcc and a texlive install every time I installed something nontrivial.

                                                                                                      1. 1

                                                                                                        Yep. I wondered “where did that texlive install come from” recently too.

                                                                                                        1. 2

                                                                                                          There’s a few tools for this. For example, you can ask where it’s installed:

                                                                                                          $ nix-store -q --roots $(which ghc)
                                                                                                          /nix/var/nix/profiles/per-user/brian/profile-377-link
                                                                                                          /nix/var/nix/profiles/per-user/brian/profile-378-link
                                                                                                          /nix/var/nix/profiles/per-user/brian/profile-379-link
                                                                                                          /nix/var/nix/profiles/per-user/brian/profile-380-link
                                                                                                          

                                                                                                          Or if it’s just alive from a shell or something, you can ask what referred to it:

                                                                                                          $ nix-store -q --referrers $(which ghc)
                                                                                                          /nix/store/kjxr0bm87jljb91qd7k47cqiknnxdknx-ghc-8.2.2-with-packages
                                                                                                          /nix/store/1djxkifaxmz940x571ab4ndfmvmb1lh5-env-manifest.nix
                                                                                                          /nix/store/crxhjrgkngwk8bv77sy6bxl19k6id2f3-env-manifest.nix
                                                                                                          /nix/store/50rbrla0m0819wr788gs588qi7yl6vxy-user-environment
                                                                                                          /nix/store/7wqjzgci6v9fnhgs0zsvxhhc6hzjczfv-env-manifest.nix
                                                                                                          /nix/store/i3gz4d3jkqkh4v9vscx7s2fa0q7wll3q-env-manifest.nix
                                                                                                          /nix/store/f53kz6isz60xr83yiwvl7xw0b2v5k5pg-user-environment
                                                                                                          /nix/store/wc037in1jmkdwxj1hhjhbih5arwazrcw-user-environment
                                                                                                          /nix/store/zismyry5a61xwpvnkvdidiqgvcrqwg04-user-environment
                                                                                                          
                                                                                                          1. 1

                                                                                                            Thanks, I’ll look into that. I’m curious why things even need texlive in the first place - seems like it’s unnecessary to have for a running system.

                                                                                                            Personally, I’d love to see NixOS be able to meet requirements like this - the Nix store seems like it would facilitate reduction of attack surface if derivations could either be written in a more minimalist way or if containers (or the system in general) could include only the smallest set of software they need to run.

                                                                                                            (edit)

                                                                                                            OK, I tracked it down:

                                                                                                            $ nix-store -q --referrers /nix/store/h610l0v3b3d6mqfwapgd8jb0jxilbd47-texlive-combined-2018.drv
                                                                                                            /nix/store/f0ir4m0ww8qjbialvjxrv7zzlmhsmrjc-dwarf-therapist-41.0.2.drv
                                                                                                            

                                                                                                            A package I maintain apparently used it as a build input… sigh

                                                                                                            Does this mean dwarf-therapist is still referencing texlive as far as the GC is concerned, even though it’s already built? A nix-collect-garbage -d didn’t clean it up.

                                                                                                            1. 2

                                                                                                              Seems relatively pointless, if an attacker can write source code to disk they can also just write precompiled binaries to disk (or just keep shell code in memory and never touch disk).

                                                                                                              1. 1

                                                                                                                Yeah. Requirements sometimes are that way. :-(

                                                                                                                More to the point though - it’s the same “if your users don’t need it, don’t include it” approach hank mentioned. You probably won’t need to compile programs on a production system, so why include a compiler, even one that a user would have to manually spelunk through the Nix store to use?

                                                                                                                1. 1

                                                                                                                  Nix detects dependencies automatically and sometimes package maintainers don’t notice they accidentally did something to include one. It could be considered a bug with a ticket if it isn’t necessary for a program to function.

                                                                                                                  1. 1

                                                                                                                    I’m seeing a lot of things that use nativeBuildInputs sticking around in the Nix store on my setup.

                                                                                                                    From the manual:

                                                                                                                    since these packages only are guaranteed to be able to run then, they shouldn’t persist as run-time dependencies. This isn’t currently enforced, but could be in the future.

                                                                                                                    And, indeed, I’m seeing them ref’ed by certain packages that needed to be compiled locally:

                                                                                                                    $ nix-store -q --referrers /nix/store/imfm3gk3qchmyv7684pjpm8irvkdrrkk-gcc-7.3.0/bin/gcc
                                                                                                                    /nix/store/imfm3gk3qchmyv7684pjpm8irvkdrrkk-gcc-7.3.0
                                                                                                                    /nix/store/8xfm1g4vbv0mv0mn7zlny85dhmw3djvm-dfhack-base-0.44.12-r1
                                                                                                                    /nix/store/fj0qxv4lqrgnva1yhn09f641373rbjvy-dwarf_fortress_unfuck-0.44.12
                                                                                                                    

                                                                                                                    So, it sounds like nativeBuildInputs should be GC’ed, but currently aren’t for some reason. I just assumed they would be. I’d be curious to see what’s producing extra copies of texlive for hank as well.

                                                                                                              2. 1

                                                                                                                You’re asking which derivations mention your TexLive derivation. You’re not asking what references the TexLive output. I wouldn’t worry about disk usage of these small derivations.

                                                                                                        1. 13

                                                                                                          I’ve heard some people raving about tiddly wiki for this which seems like it has a lot of flexible interfaces. I don’t share my notes publicly as often, but I take tons of notes, and I almost always have this tui mind map that I wrote open in a tmux pane.

                                                                                                          1. 4

                                                                                                            Why don’t you share your notes publicly? I think there is so much value to be had if everyone shared their notes in public.

                                                                                                            1. 14

                                                                                                              I have a lot of personal info in there that I don’t need to share. But I am often thinking “I should be writing more for public consumption” and I also started making a gitbook, with sort of a long plan of turning it into a sort of “designing data intensive applications” but with an eye toward actually building the underlying infrastructure that supports data intensive applications. Sort of a missing guide for modern database and distributed systems implementation. But it’s been hard for me to overcome the friction of really building a durable habit of writing for others.

                                                                                                              1. 5

                                                                                                                That’s the cool thing about a wiki. You write the notes for yourself. Just share it with others.

                                                                                                                Would love to read your book. I’d release it gradually too as the topic of building the infrastructure for data intensive apps is vast.

                                                                                                                1. 2

                                                                                                                  Not to discourage you from writing, because I do believe that there’s always room for new takes on a topic, but Martin Kleppman’s Designing Data-Intensive Applications may be of interest to you, if you haven’t seen it. I’m not sure that it goes into “underlying infrastructure”, as you’re thinking, though.

                                                                                                                  1. 1

                                                                                                                    I love that book! I want to do something similar but targeting people building infrastructure instead of the applications on top.

                                                                                                                    1. 1

                                                                                                                      Gotcha. That sounds useful, though I’d think less evergreen because the technology on which the infrastructure is built will keep changing.

                                                                                                                      I’ve seen a lot about the virtues of event sourcing, but a lot less about how one implements event sourcing at different scales. Am I correct that’s the kind of thing you’d dig in to?

                                                                                                                2. 1

                                                                                                                  No software I tried supported tagging things as public or private. So I am forced to make everything private.

                                                                                                                  1. 1

                                                                                                                    Cowyo is a wiki with public and private pages. Fairly straightforward codebase.

                                                                                                                3. 3

                                                                                                                  @icefall This void reminds me a lot of maxthink, an old DOS personal organizer (http://maxthink.com) I’ll take a deep look, thanks for sharing.

                                                                                                                  @nikivi, your gitbook is a prime. Very well crafted. I also try to keep notes on a similar structure but yours are way more strucutred.

                                                                                                                  1. 1

                                                                                                                    The simplicity that was TiddlyWiki has unfortunately been mostly lost in the name of security. It used to be a single file that you could open and edit, and everything would save automatically and transparently.

                                                                                                                    Now you have to install a browser extension for it to save correctly, which makes TiddlyWiki much harder to transfer between browsers.

                                                                                                                    Edit: I’m not taking sides on the security-functionality tradeoff, to forestall any off-topic discussion. However, TiddlyWiki has been along a looooong time (at least 10 years) now, and I assume that this would mean the Wiki portion of the software is superb.