1. 2

    I don’t think left-to-right matters in stack languages really, because it’s just a series of operations happening in order? Just reading 1 2 + makes more sense when you’re thinking about the stack as a data structure than + 2 1

    1. 2

      Reverse Polish is the native syntax for stack languages because it reflects the order of evaluation. A left-to-right stack language would need a layer that reorders the operations, since in your example the 1 and 2 have to be evaluated (pushed) first, then the “+”.

      (Unless you tried running the parser backwards over the input, which I guess is possible but weird. And begs the question of how you deal with interactive multi-line input.)

      Even an infix language that compiles to a stack machine ends up generating byte code in RPN order.

      1. 1

        (so called “normal”) Polish Notation is used in Math & Logic quite a bit; Wikipedia even has an article on evaluation. For fixed arity functions, it’s basically Shunting Yard really, nothing too complex.

    1. 9

      I never use code completion. If the editor provides it, I disable it.

      Having worked on large code bases (500k+ loc) with a lot of models (20-30) I can’t imagine life with a good quality language server…

      I’m not sure if I envy the author or if I’m afraid of them.

      1. 5

        It’s most likely they use tools like grep and find which easily work with any language, but yes, LSPs are significantly changing the usefulness of autocompletion.

        1. 2

          This is very true, and built tooling around it. My largest codebase was 10mm lines of code written in a Mainframe language, that couldn’t leave the clients hardware; so we wrote some simple tooling to help with finding things, and basically built a map around things.

          Whilst I still don’t use an editor that provides those sorts of things, I do use tools like ssadump or go guru to help give me the lay of the land

        2. 1

          Having worked on large code bases

          Having worked on small code bases I can’t imagine how much I’d have to be offered to agree to work on a codebase over 100kloc.

        1. 15

          I think the key insight here is that container images (the article confuses images and containers, a common mistake that pedants like me will rush to point out) are very similar to statically linked binaries. So why Docker/container images and why not ELF or other statically linked formats?

          I think the main answer is that container images have a native notion of a filesystem, so it’s “trivial” (relatively speaking) to put the whole user space into a single image, which means that we can package virtually the entire universe of Linux user space software with a single static format whereas that is much harder (impossible?) with ELF.

          1. 4

            And we were able to do that with virtualization for at least 5 - 10 years prior Docker. Or you think that packaging also the kernel is too much?

            Anyways, I do not think that a container having the notion of a filesystem is the killer feature of Docker. I think that moving the deployment code (installing a library for example) close to compilation of the code helped many people and organizations who did not have the right tooling prior that. For larger companies who had systems engineers cgroups gave the security part mostly because packaging was solved decades prior to Docker.

            1. 1

              IMO it’s not the kernel but all of the supporting software that needs to be configured for VMs but which comes for ~free with container orchestration (process management, log exfiltration, monitoring, sshd, infrastructure-as-code, etc).

              Anyways, I do not think that a container having the notion of a filesystem is the killer feature of Docker. I think that moving the deployment code (installing a library for example) close to compilation of the code helped many people and organizations who did not have the right tooling prior that.

              How do you get that property without filesystem semantics? You can do that with toolchains that produce statically linked binaries, but many toolchains don’t support that and of those that do, many important projects don’t take advantage.

              Filesystem semantics enable almost any application to be packaged relatively easily in the same format which means orchestration tools like Kubernetes become more tenable for one’s entire stack.

            2. 4

              I can fit a jvm in a container! And then not worry about installing the right jvm in prod.

              I used to be a skeptic. I’ve been sold.

              1. 2

                Slightly off topic - but JVM inside a container becomes really interesting with resource limits. Who should be in charge of limits, JVM runtime or container runtime?

                1. 7

                  Gotta be the container runtime (or the kernel or hypervisor above it) because the JVM heap size limit is best-effort. Bugs in memory accounting could cause the process to use memory beyond the heap limit. Absent that, native APIs (JNI) can directly call malloc and allocate off-heap.

                  Would still make sense for the container runtime to tell the JVM & application what the limits on it currently are so it can tailor its own behaviour to try to fit inside them.

                  1. 4

                    It’s easy: the enclosing layer gets the limits. Who should set the resource limits? ext4 or the iron platter it’s on?

                    1. 2

                      What’s the enclosing layer? What happens when you have heterogenous infrastructure? Legacy applications moving to cloud? Maybe in theory it’s easy, but in practice much tougher.

                    2. 2

                      Increasingly the JVM is setting its own constraints to match the operating environment when “inside a container”.

                  2. 4

                    Yes, layers as filesystem snapshots enable a more expressive packaging solution than statically linked alternatives. But its not just filesystems, but also runtime configuration (variables through ENV, invocation through CMD) that makes the format even more expressive.

                    p.s. I have also updated the post to say “container images”

                    1. 3

                      I think the abstraction on images is a bit leaky. With docker you’re basically forced to give it a name into a system registry, so that you can then run the image as a container.

                      I would love to be able to say like… “build this image as this file, then spin up a container using this image” without the intermediate steps of tagging (why? because it allows for building workflows that don’t care about your current Docker state). I know you can just kinda namespace stuff but it really bugs me!

                      1. 3

                        Good practice is addressing images by their digest instead of a tag using the @ syntax. But I agree - registry has always been a weird part of the workflow.

                        1. 1

                          addressing images by their digest instead of a tag using the @ syntax.

                          Be careful about that. The digest of images can change as you push/pull them between different registries. The problem may have settled out, but we were bitten by changes across different releases of software in Docker’s registry image and across the Docker registry and Artifactory’s.

                          I’m not sure if there’s a formal standard for how that digest is calculated, but certainly used to be (~2 years back) be very unreliable.

                          1. 1

                            Oh I wasn’t aware of that! That could let me at least get most of the way to what I want to do, thanks for the pointer!

                        2. 3

                          I noticed Go now has support for, in its essentially static binary, including a virtual filesystem instantiated from a filesystem tree specified during compilation. In that scenario, it further occurs to me that containerization isn’t perhaps necessary, thereby exposing read only shared memory pages to the OS across multiple processes running the same binary.

                          I don’t know in the containerization model if the underlying/orchestrating OS can identify identical read only memory pages and exploit sharing.

                          1. 2

                            I think in the long term containers won’t be necessary, but today there’s a whole lot of software and language ecosystems that don’t support static binaries (and especially not virtual filesystems) at all and there’s a lot of value in having a common package type that all kinds of tooling can work with.

                            1. 2

                              As a packaging mechanism, in theory embedded files in Go works ok (follows single process pattern). In practice, most Go binary container images are empty (FROM scratch + certs) anyways. Lots of files that are environment dependent that you would want at runtime (secrets, environment variables, networking) that are much easier to declaratively add to a container image vs. recompile.

                            2. 2

                              So why Docker/container images and why not ELF or other statically linked formats?

                              There are things like gVisor and binctr that work this way, as do somethings like Emscripten (for JS/WASM)

                              1. 2

                                I really hope for WASI to pick up here. I used to be a big fan of CloudABI, which now links to WASI.

                                It would be nice if we could get rid of all the container (well actually mostly Docker) cruft.

                            1. 4

                              I’m familiar with tagged pointers, as so many interpreters use them (particularly Ruby, which I’ve hacked on quite a bit). NaN boxing is new to me, though.

                              My first thought after reading about the technique is whether signaling NaNs could be used to make arithmetic more efficient. Since using a signaling NaN would generate an exception, generated code could just use the number as-is and catch the exception, falling back to a slow path in that case.

                              My second thought is that we’ve had tagged pointers for decades, so why aren’t there hardware instructions for working with them?

                              1. 4

                                What I do to make arithmetic more efficient in my nan-boxed interpreter is something like this:

                                Value add_op(Value a, Value b) {
                                    double result = a.number + b.number;
                                    if (result != result) goto slow_path;
                                    return Value{result};
                                slow_path:
                                    ...
                                

                                I optimistically add the two nan-boxed values using float arithmetic. If the result isn’t a nan, I return it. The overhead for the fast path is a nan test and a conditional branch that isn’t taken. The slow path takes care of adding things that aren’t doubles, and throwing an exception if the arguments have the wrong type.

                                Using signaling nans and traps would make the code significantly more complicated. And I don’t see why it would be faster. Setting up and tearing down the trap handler for the add case would probably be at least as expensive as the above code, and probably more so.

                                1. 2

                                  That’s interesting; I very much like that technique.

                                  I wouldn’t expect the trap handler to be set up and torn down for every evaluated opcode, just once at startup. The exception would certainly be slower than a failed branch prediction. The fast path would save a comparison, which may or may not be faster on a modern processor.

                                  1. 3

                                    The tail-threaded interpreter passes around its state in registers. In the case of “tails”, there are two registers containing sp and pc. How does a generic trap handler determine what opcode is being executed, and how does it access the interpreter state? If this information is stored into global variables before executing the add instruction, then that setup code is more expensive than the nan test in my code. Is there a way to use a trap that is cheaper than the nan-test code? Without using assembly language or other CPU and ABI specific code, I mean.

                                    1. 1

                                      Hmm, that’s a good point. I had imagined it walking the stack like a debugger would, but that would definitely be CPU/ABI specific.

                                  2. 2

                                    That’s essentially what I do. The constructor that takes a double checks whether it’s already a NaN, which would of course be misinterpreted, and substitutes the value that means “null”. A handy side effect is that arithmetic on non-numbers produces null, which is similar to SQL semantics.

                                  3. 2

                                    My second thought is that we’ve had tagged pointers for decades, so why aren’t there hardware instructions for working with them?

                                    There are some, these are generally called “tagged architectures”. Things like Burroughs Large Systems and LispMs did this. ARM, PowerPC, and a few others also have support if you opt into it, but these are usually exposed to compiler writers, so your compiler has to know about & use them. It’s definitely an interesting area.

                                    1. 3

                                      CHERI is a tagged architecture, with a single one-bit non-addressable tag to differentiate between memory capabilities (which the C/C++ compiler uses to represent pointers and have strong integrity properties) and other data. Arm is providing us with an implementation early next year.

                                  1. 1

                                    This is really cool, but I’ve only dabbled with Atari STs before; I wonder how you would test this?

                                    1. 10

                                      OpenSSL is used by billions, so why can’t they even afford a single developer to work on it?

                                      1. 3

                                        Wait, really? Even before Heartbleed they had two full-time devs and were pulling in non-trivial amounts of money from support contracts IIRC; has this changed?

                                        1. 9

                                          This is part of why the Core Infrastructure Initiative was created, because they were relatively underfunded:

                                          Prior to the CII funding, only one person, Stephen Henson, worked full-time on OpenSSL; Henson approved well over half of the updates to more than 450,000 lines of the OpenSSL’s source code.[10] Besides Henson, there are three core volunteer programmers. The OpenSSL Project existed on a budget of $2,000 per year in donations, which was enough to cover the electrical bill, and Steve Henson was earning around $20,000 per year.[7] To gather more revenue for the project, Steve Marquess, a consultant for the Defense Department, created the OpenSSL Software Foundation. This allowed programmers to make some money by consulting for organizations that used the code. However, the foundation brought in less than $1 million per year,[5] and the contract work tended to focus on adding new features rather than maintaining the old ones.[7]

                                          They did make non-trivial amounts of money from support, but $20k is hardly much to go on for a project as important as OpenSSL is

                                          1. 1

                                            I wasn’t aware of that. I remember reading plenty of articles bemoaning the lack of funding for OpenSSL. Maybe the situation isn’t as dire as they made it out to be.

                                        1. 2

                                          The initial prototype was done in Haskell and is available here: https://github.com/fortran-lang/fpm-haskell. It was rewritten to (what else?) Fortran.

                                          1. 1

                                            I only quickly skimmed the page, so I apologize if I missed it, but does this only support Fortran 90+? That’s not a problem, just curious; a lot of the code I’ve dealt with in the past was still Fortran 77 and maybe some Fortran 90 code, at least in Physics.

                                            1. 3

                                              Youʼll need a Fortran 2008 compiler to build fpm, but fpm itself handles FORTRAN 77 as well.

                                              1. 1

                                                that’s perfect, thank you! I’d love to see an experience report in translating from Haskell to Fortran 2008 as well, that would be interesting…

                                          1. 6

                                            IMO a honking great mistake to turn on Strict-Transport-Security by default with a max-age of 2 years. People will drop this in and break their systems and that makes this a footgun. No one should deploy HSTS like that that seems to me to be the consensus among HSTS advocates.

                                            And why is HTTP caching getting disabled? What does that have to do with security for a website?

                                            1. 1

                                              And why is HTTP caching getting disabled? What does that have to do with security for a website?

                                              it shouldn’t be disabled for all routes, but it minimizes data exposure (1, 2). Basically, if you have anything cached by proxies or in browser, an attacker with something like Local access can steal that data (3).

                                              1. 2

                                                X-Frame-Options: SAMEORIGIN is a sane default. Mucking around with cache settings in the abstract with no particular threat model in mind just feels like random, uninformed changes to me.

                                                How many sites/apis have you personally worked on where there was been serious consideration about how to defend a users local HTTP cache from an attacker on the same machine? I think for me the number is 1, in 10 years, and we weren’t outsourcing it to a third party library.

                                                1. 1

                                                  How many sites/apis have you personally worked on where there was been serious consideration about how to defend a users local HTTP cache from an attacker on the same machine?

                                                  There are two things to consider really:

                                                  1. that caching isn’t just about the local machine, since depending on the cache control instructions and the path to the end user, it could be cached by other caches, such as proxies
                                                  2. that you only consider the risk of the data being exposed, not that we are defending users from all attacks with Local position

                                                  However, I would say quite a few, esp in the mobile space; lots of mobile applications used to cache all sorts of sensitive data, which could then be exposed by backups or the like. Is this a huge concern for security teams? Generally no, nor should it be; an attacker would need sufficient position, &c. to actually impact incorrect caching, so the Likelihood is low or even very low. That still has worked out to a few hundred reports in my career.

                                                  On the flip side, the Impact could be High or Very High based on the data cached; I’ve seen everything from PANs to user’s PII or other sensitive data to passwords. Again, it depends on the sensitivity of the data being cached, and should be applied selectively. It’s not something you can just scan and hand over, but rather something you work with a system’s actual data to determine.

                                                  Lastly, remember that browser Cache Control nuance led Twitter to have to disclose a DM privacy exposure bug just last year.

                                            1. 5

                                              For flask, flask-talisman already exists and in Django has quite a few security headers which can be easily configured through standard settings.

                                              1. 1

                                                I agree, however, if you need to secure multiple frameworks within an enterprise, it is nice to have one solution that can be applied across frameworks in a standardized way. For example, at my previous company, we had a client with ~350 python web applications across frameworks, versions, and even versions of python. So it would be nice to have one security framework to be applied regardless of if it were Django, Flask, Bottle, CherryPy, &c.

                                              1. 13

                                                I laughed pretty hard at someone suggesting SHA2 as a remotely reasonable password hashing choice.

                                                1. 4

                                                  SHA2 with PBKDF2? Sure. SHA2 alone? Please no.

                                                  Otherwise the article does make a good point. Personally I think the biggest benefit of the modern HTTP everywhere culture is the widespread proliferation of HTTPS rather than tons of custom protocols.

                                                  1. 8

                                                    SHA2 with PBKDF2?

                                                    PBKDF2-HMAC has some interesting failure modes. Unless you absolutely need FIPS-140 support (because you’re support USG customers), you probably want to use Argon2id.

                                                    What I took away from this post is that we need more libraries like libsodium, that offer the top-level primitives that programmers actually use, rather than “safe” versions of things they shouldn’t be using. Tink is sorta there, but we really need to move away from all the terrible failure modes that happen when using crypto wrong and towards things that help you make the right choices. Another good example is Fernet from pyca, which makes some opinionated choices about how to use several primitives, but saves programmers from doing the wrong thing.

                                                    1. 1

                                                      PBKDF2 is “if you have to, then ok” type of choice. Use Argon2 or bcrypt if you aren’t forced to use PBKDF2.

                                                    2. 1

                                                      It’s a good hash… for a checksum.

                                                      1. 1

                                                        Why? I mean, it doesn’t have specific hardening to slow down brute-force attacks but is that all?

                                                        1. 1

                                                          Right, it’s not a KDF at all and so trivial to brute force that you might as well be in the clear. It could be a building block in a KDF, but sibling comments have covered that.

                                                      1. 13

                                                        Copper is the language used to implement that same author’s excellent Code Browser folding text editor.

                                                        Copper’s predecessor language, Zinc has the unusual property of allowing whitespace in identifier names, which makes for a very readable language once you get the hang of it.

                                                        1. 7

                                                          Yes, I found out about it in this thread: https://news.ycombinator.com/item?id=25561688

                                                          It’s pretty hidden! Big piece of work with apparently no source repo, just tarballs.

                                                          Reminds me a little of Virgil, although this language has a source repo and paper. It’s a very significant self-hosted compiler.

                                                          https://github.com/titzer/virgil

                                                          1. 3

                                                            has the unusual property of allowing whitespace in identifier names

                                                            this was big in certain Algol dialects, and was one of the interesting properties that Algol (and language experiments like ISWIM) were separated from their representations (like the ISWIM paper defines “physical,” “abstract,” “logical,” and the intermediary representation in what Landin called “applicative expressions.”

                                                            It’s neat because Algol dialects had all sorts of tricks to delineate keywords from identifiers to allow spaces in names or more free-form naming. From some of the ones I’ve seen, there have been dots before keywords (like .INT and .MODE), just capitalizing, special character sets… it was a super fascinating time. It’s also neat to see what languages like PL/I and COBOL allow with just - within reserved names, it’s pretty nice

                                                            1. 3

                                                              I don’t know if this is a SQL standard thing, or a MS SQL Server/T-SQL extension, but in that dialect one can have table- and field names with whitespace as long as they’re enclosed in square brackets.

                                                          1. 1

                                                            This is really interesting; Clint just spoke at a conference I ran (direct link to his time slot, we’ll slice his video out this week too), and it’s interesting to see what the r2c folks are doing

                                                            1. 3

                                                              I have two computers setups for what I call “ascetic computing,” which are minimal interaction machines; they:

                                                              • don’t have slack
                                                              • aren’t logged into lobste.rs or twitter generally
                                                              • aren’t used for banking or other things I need to do
                                                              • generally aren’t logged into anything

                                                              I just use git, whatever compilers I need, and the man pages for the most part. Getting a good workflow with my editor & version control system was pretty big, and being disciplined enought to make sure I pull before starting to code and push after I come back is key, but otherwise it’s pretty natural for me.

                                                              It really reminds me of when I was young, and would code just by man pages and local version control…

                                                              1. 5

                                                                I don’t believe the core point is language-specific at all.

                                                                I disagree with this thesis. In particular, I think that there are some properties of Rust and Haskell which are not universal among languages:

                                                                • I/O can block
                                                                • Exceptions can be inspected by whoever catches them
                                                                • Exceptions can be masked
                                                                • Actions cannot be deferred to a later time

                                                                I’d like to reinspect these claims with a language which forces I/O to be distinct from computation, hides exception information from unauthorized catchers, and has syntax for deferring actions until some later time. Dart, Deno, or Go might be reasonable mainstream languages; I’m going to use Monte, my flavor of E.

                                                                First, since I/O cannot block, how do we read from files? Monte has an unsafe object, makeFileResource, which constructs objects which have a method, .getContents(), which returns a promise for a bytestring. So, putting this together, we have:

                                                                ⛰  def bs := makeFileResource("yes.txt")<-getContents()
                                                                Result: <unsafely-printed promise>
                                                                

                                                                (These examples are at the REPL, which has access to unsafe objects and can print promises too quickly for the runtime. Error handling is not just hard, but fractally hard.) If we wait a moment, and then ask the REPL:

                                                                ⛰  bs
                                                                Result: b`test$\n`
                                                                ⛰  bs.size()
                                                                Result: 5
                                                                

                                                                Then it seems that there is a test file which has some bytes in it. (And a bit of Monte’s philosophy about syntax; that hard-to-see newline is decorated to make it obvious, but it is a plain old \n newline.)

                                                                If you’re Rust-fluent, that .unwrap() may stick out to you like a sore thumb. You know it means “convert any error that occurred into a panic.” And panics are a Bad Thing. It’s not correct error handling.

                                                                Similarly, if one is E-fluent, then the direct usage of this promise sticks out quite badly. We know that it is like, “convert any error which occurred into a broken promise.” And broken promises are a Bad Thing if we try to use them directly; errors aren’t handled. Suppose we open a non-existent file:

                                                                ⛰  def err := makeFileResource("no.txt")<-getContents()
                                                                Result: <unsafely-printed promise>
                                                                ⛰  err
                                                                Result: <ref broken by Couldn't open file fount for /home/simpson/typhon/no.txt: no such file or directory (ENOENT)>
                                                                ⛰  err.size()
                                                                Exception: Couldn't open file fount for /home/simpson/typhon/no.txt: no such file or directory (ENOENT)
                                                                File '/nix/store/8s05yq4h1cx6w9nf20yvigcsbir57hv9-mast-full/prelude/m.mast' 41:30::42:80, in object _$eval:
                                                                  <eval>.evalToPair(m`err.size()`, ["&&null" => <binding <FinalSlot(null)>…)
                                                                …
                                                                

                                                                The REPL is saving us from ourselves on a direct access to a broken promise, mostly for legibility. But when we try to invoke a behavior on the promise by calling a method, then an exception is thrown. (I’ve omitted most of the traceback; you can see that the most recent frame is where the REPL calls eval.)

                                                                Let’s see the beautiful error message generated by my program: … Huh… that’s thoroughly unhelpful.

                                                                Yes. This is the worst part of error handling: Errors must be legible to the computer, so that the computer can recover algorithmically and automatically; but also legible to the programmer, so that the programmer can imagine possible failures and remediations in the mechanisms of the code. And it’s never good enough.

                                                                Finally, since I implicitly claim that Monte’s error handling is better than Rust’s or Haskell’s, let’s see how to implement graceful high-level control-flow for promises, along the lines of the examples with the error-context libraries. These sorts of libraries aren’t possible in Monte due to how modules are isolated, so let’s hope that what’s built-in is sufficient!

                                                                ⛰  def finish(p) { return when (p) -> { `Opened file, got ${p.size()} bytes` }
                                                                …                   catch problem { `Couldn't open file: $problem` } }
                                                                Result: <finish>
                                                                ⛰  def p1 := finish(bs)
                                                                Result: <unsafely-printed promise>
                                                                ⛰  def p2 := finish(err)
                                                                Result: <unsafely-printed promise>
                                                                ⛰  p1
                                                                Result: "Opened file, got 5 bytes"
                                                                ⛰  p2
                                                                Result: "Couldn't open file: Couldn't open file fount for /home/simpson/typhon/no.txt: no such file or directory (ENOENT)"
                                                                

                                                                when blocks are the powerhouse of E. In some later period of computation (some later “turn”), once the given promises are resolved enough, then the first clause runs; however, if the promises break, then the second clause is run. The block evaluates to a promise itself, allowing for promise chaining and promise pipelining.

                                                                I should point out that there are libraries which bring this power to other languages. I have personally used Haskell’s async, and I believe that similar crates are available for Rust. However, their usage is not automatically hygienic, and care is still required to handle the caveats I listed at the beginning.

                                                                I’ll continue with my general advice of using your language’s preferred mechanisms for error handling.

                                                                Excellent conclusion. In Monte, use ejectors and catch blocks, try to avoid unsealException or other unsafe exception handlers, and use broken promises to propagate asynchronous errors, and ABCDEF: Always Be Calling Deferred Ejector “FAIL”.

                                                                1. 3

                                                                  I’m going to use Monte, my flavor of E.

                                                                  this is a great post, but excuse me? WHAT? You have a dialect of E floating around? that’s amazing! I’ve experimented with E and such, as well as W7, in my Scheme dialect. It’s really great to see someone experimenting! If you’re curious like me, I think it’s this page

                                                                  1. 3

                                                                    Thanks! Our living documentation is at monte.rtfd.io, and we have #erights and #monte on Freenode. Our progress is very slow; we are hesitant to make wrong steps and add too much authority.

                                                                    Recently there is something of a confluent renaissance in CapTP, the capability transport protocol. Goblins, Agoric, and Monte have all recently demonstrated various flavors of working CapTP, and we are all slowly coalescing around using Capn Proto RPC as common lingua franca.

                                                                    1. 2

                                                                      This is super interesting; I’ve seen CapTP from some client work, as well as Capn Proto, but it’s very interesting to see it coalesce as a lingua franca!

                                                                      If someone (e.g. me) wanted to help, where would they get started?

                                                                      1. 2

                                                                        Right now it would be good to both get a sense of the technical portions of CapTP and the current politics, like here and here. Idling in #erights or reading cap-talk will be good for talking to folks.

                                                                        Your main goal should be to find a capability-aware community which fits your needs. There are many different groups, and you shouldn’t feel pressure to find a flavor of E which you like.

                                                                        If you want to contribute to Monte specifically, then read our docs and idle in #monte. There is much to do.

                                                                  2. 3

                                                                    I/O can block

                                                                    GHC Haskell can only have blocking IO when using the FFI. Not saying this assuming you don’t know, but in case the direct comparison to Rust confuses another reader without this clarification. Rust I/O is blocking by default. Haskell I/O is non-blocking by default.

                                                                    1. 2

                                                                      This is an excellent point and helps expose how deep the language-design differences can go. In particular, Monte doesn’t have FFI, and one reason is because FFI would have to be run in a very special way in order to not block anything else. (And, of course, FFI can still break other various runtime invariants at will. It’s simply too powerful to load Somebody Else’s Code into an address space with C linkage and then transfer control to it.)

                                                                      We are also likely nuanced on what blocking means. Distilling an example from Coroutines Reduce Readability, let us consider a Python class which has a transactional behavior; its method meth wants to first enter a critical section, and then call its argument. In the following straight-line Python code, it is possible for our object self to fail its internal invariants precisely when c is a coroutine:

                                                                      def meth(self, c):
                                                                          self._beginCriticalSection()
                                                                          self.status = c()
                                                                          self._endCriticalSection()
                                                                      

                                                                      This same problem can occur in Haskell and Rust. It might not be obvious, but in an extended example in Java, we can see the problem. We call this class of bugs plan interference, and it is possibly the most important thing addressed by E. In Monte, it is not possible for c() to block indefinitely but also for meth to be called again, unless c itself is re-entrant. (Another flavor of E, Joule, works around this caveat by requiring most method invocations to create promises and delay their work, as if everything were wrapped in when-blocks. Joule’s core language is strange and elegant.)

                                                                  1. 41

                                                                    Something other than “everything is bytes”, for starters. The operating system should provide applications with a standard way of inputting and outputting structured data, be it via pipes, to files, …

                                                                    Also, a standard mechanism for applications to send messages to each other, preferably using the above structured format when passing data around. Seriously, IPC is one of the worst parts of modern OSes today.

                                                                    If we’re going utopic, then the operating system should only run managed code in a abstract VM via the scheduler, which can provide safety beyond what the hardware can. So basically it would be like if your entire operating system is Java and the kernel runs everything inside the JVM. (Just an example, I do not condone writing an operating system in Java).

                                                                    I’m also liking what SerenityOS is doing with the LibCore/LibGfx/LibGui stuff. A “standard” set of stuff seems really cool because you know it will work as long as you’re on SerenityOS. While I’m all for freedom of choice having a default set of stuff is nice.

                                                                    1. 21

                                                                      The operating system should provide applications with a standard way of inputting and outputting structured data, be it via pipes, to files

                                                                      I’d go so far as to say that processes should be able to share not only data structures, but closures.

                                                                      1. 4

                                                                        This has been tried a few times, it was super interesting. What comes to mind is Obliq, (to some extent) Modula-3, and things like Kali Scheme. Super fascinating work.

                                                                        1. 3

                                                                          Neat! Do you have a use-case in mind for interprocess closures?

                                                                          1. 4

                                                                            To me that sounds like the ultimate way to implement capabilities: a capability is just a procedure which can do certain things, which you can send to another process.

                                                                            1. 5

                                                                              This is one of the main things I had in mind too. In a language like Lua where closure environments are first-class, it’s a lot easier to build that kind of thing from scratch. I did this in a recent game I made where the in-game UI has access to a repl that lets you reconfigure the controls/HUD and stuff but doesn’t let you rewrite core game data: https://git.sr.ht/~technomancy/tremendous-quest-iv

                                                                          2. 1

                                                                            I would be interested in seeing how the problem with CPU time stealing and DoS attacks that would arise from that could be solved.

                                                                          3. 17

                                                                            Digging into IPC a bit, I feel like Windows actually had some good stuff to say on the matter.

                                                                            I think the design space looks something like:

                                                                            • Messages vs streams (here is a cat picture vs here is a continuing generated sequence of cat pictures)
                                                                            • Broadcast messages vs narrowcast messages (notify another app vs notify all apps)
                                                                            • Known format vs unknown pile of bytes (the blob i’m giving you is an image/png versus lol i dunno here’s the size of the bytes and the blob, good luck!)
                                                                            • Cancellable/TTL vs not (if this message is not handled by this time, don’t deliver it)
                                                                            • Small messages versus big messages (here is a thumbnail of a cat versus the digitized CAT scan of a cat)

                                                                            I’m sure there are other axes, but that’s maybe a starting point. Also, fuck POSIX signals. Not in my OS.

                                                                            1. 5

                                                                              Is a video of cats playing a message or a stream? Does it matter whether it’s 2mb or 2gb (or whether the goal is to display one frame at a time vs to copy the file somewhere)?

                                                                              1. 2

                                                                                It would likely depend on the reason the data is being transferred. Video pretty much always fits into the ‘streaming’ category if it’s going to be decoded and played, as the encoding allows for parts of a file to be decoded independent of the other parts. Messages are for atomic chucks of data that only make sense when they’re complete. Transferring whole files over a message bus is probably a bad idea though, you’d likely want to instead pass a message that says “here’s a path to a file and some metadata, do what you want with it” and have the permissions model plug into the message bus so that applications can have temporary r/rw access to the file in question. Optionally, if you have a filesystem that supports COW and deduplication, you can efficiently and transparently copy the file for the other applications use and it can do whatever it wants with it without affecting the “original”.

                                                                                1. 5

                                                                                  Which is why copy&paste is implemented the way it is!

                                                                                  Many people don’t realize but it’s not actually just some storage buffer. As long as the program is running when you try to paste something the two programs can talk to each other and negotiate the format they want.

                                                                                  That is why people sometimes have odd bugs on linux where the clipboard disappears when a program ends or why Powerpoint sometimes asks you if you want to keep your large clipboard content when you try to exit.

                                                                            2. 13

                                                                              Something other than “everything is bytes”, for starters. The operating system should provide applications with a standard way of inputting and outputting structured data, be it via pipes, to files, …

                                                                              It’s a shame I can agree only once.

                                                                              Things like Records Management Services, ARexx, Messages and Ports on Amiga or OpenVMS’ Mailboxes (to say nothing of QIO), and the data structures of shared libraries on Amiga…

                                                                              Also, the fact that things like Poplog (which is an operating environment for a few different languages but allows cross-language calls), OpenVMS’s common language environment, or even USCD p-System aren’t more popular is sad to me.

                                                                              Honestly, I’ve thought about this a few times, and I’d love something that is:

                                                                              • an information utility like Multics
                                                                              • secure like seL4 and Multics
                                                                              • specified like seL4
                                                                              • distributed like Plan9/CLive
                                                                              • with rich libraries, ports, and plumbing rules
                                                                              • and separated like Qubes
                                                                              • with a virtual machine that is easy to inspect like LispM’s OSes, but easy to lock down like Bitfrost on one-laptop per child…

                                                                              a man can dream.

                                                                              1. 7

                                                                                Something other than “everything is bytes”, for starters. The operating system should provide applications with a standard way of inputting and outputting structured data

                                                                                have you tried powershell

                                                                                1. 4

                                                                                  or https://www.nushell.sh/ for that matter

                                                                                2. 4

                                                                                  In many ways you can’t even remove the *shells from current OS’s IPC is so b0rked.

                                                                                  How can a shell communicate with a program it’s trying to invoke? Array of strings for options and a global key value dictionary of strings for environment variables.

                                                                                  Awful.

                                                                                  It should be able to introspect to find out the schema for the options (what options are available, what types they are…)

                                                                                  Environment variables are a reliability nightmare. Essentially hidden globals everywhere.

                                                                                  Pipes? The data is structured, but what is the schema? I can pipe this to that, does it fit? Does it make sense….? Can I b0rk your adhoc parser of input, sure I can, you scratched it together in half a day assuming only friendly inputs.

                                                                                  In many ways IPC is step zero to figure out. With all the adhoc options parsers and adhoc stdin/out parsers / formatters being secure, robust and part of the OS.

                                                                                  1. 3

                                                                                    I agree wholeheartedly with the first part of your comment. But then there is this:

                                                                                    If we’re going utopic, then the operating system should only run managed code in a abstract VM via the scheduler, which can provide safety beyond what the hardware can.

                                                                                    What sort of safety can a managed language provide from the point of view of an operating system compared to the usual abstraction of processes (virtual memory and preemptive scheduling) combined with thoughtful design of how you give programs access to resources? When something goes wrong in Java, the program may either get into a state that violates preconditions assumed by the authors or an exception will terminate some superset of erroneous computation. When something goes wrong in a process in a system with virtual memory, again program may reach a state violating preconditions assumed by the authors, or it may trigger a hardware exception, handled by the OS which may terminate the program or inform it about the fault. Generally, it all gets contained within the process. The key difference is, with a managed language you seem to be sacrificing performance for an illusory feeling of safety.

                                                                                    There are of course other ways programs may violate safety, but that has more to do with how you give them access to resources such as special hardware components, filesystem, operating system services, etc. Nothing that can be fixed by going away from native code.

                                                                                    No-breaks programming languages like C may be a pain for the author of the program and there is a good reason to switch away from them to something safer, in order to write more reliable software. But a language runtime can’t protect an operating system any more than the abstractions that make up a process, which are a lot more efficient. There are of course things like Spectre and Meltdown, but those are hardware bugs. Those bugs should be fixed, not papered over by another layer, lurking at the bottom.

                                                                                    Software and hardware need to be considered together, as they together form a system. Ironically, I may conclude this comment with an Alan Kay quote:

                                                                                    People who are really serious about software should make their own hardware.

                                                                                  1. 5

                                                                                    It’s impressive how much StandardML has resisted extensions over the years, especially in a world where people believe that adding a feature improves a language.

                                                                                    I’ll always be fond of ML and ML-like languages (except Ocaml and Reason).

                                                                                    1. 3

                                                                                      SML/NJ (probably the most used, most popular) does have a bunch of extensions and MLton has adopted some of them.

                                                                                      But there’s nothing like camlp4.

                                                                                      1. 2

                                                                                        how much StandardML has resisted extensions over the years

                                                                                        This is discussed in section 9.3 of the HOPL IV paper from this year; Milner and Tofte expressly stated in a mailing list post from 2001 that there would be no further revisions to the Definition, and that any future work would not be “Standard ML”.

                                                                                        An amusing (but entirely relatable!) quote from an interview with Milner in 2010 suggests that even from the early days, there was a desire to prevent too much meddling:

                                                                                        By the way we called it `Standard’ ML to prevent some organisation stepping in to standardise it.

                                                                                        1. 1

                                                                                          Look up the successor ml group on github. Many existing SML compiler authors are definitely interested in a next version of direct descendent.

                                                                                        2. 2

                                                                                          I’ll always be fond of ML and ML-like languages (except Ocaml and Reason).

                                                                                          Why the exclusion?

                                                                                          1. 1

                                                                                            Lack of taste.

                                                                                            1. 3

                                                                                              You could also call that “pragmatism”. Look at the respective size of communities :-)

                                                                                              I find OCaml mostly tasteful. Compared to SML it has better pattern matching and applicative functors, for example.

                                                                                              1. 3

                                                                                                I have to agree; as much as I like SML, it’s not always the easiest to work with. iirc, Okasaki even mentions this in some commentary about Purely Functional Data Structures, that he had to flub some of the SML code because it wasn’t working the way he needed it to, and Haskell was easier to work with.

                                                                                                Personally, I prefer F# to OCaml proper, but either is probably fine. There’s definitely a siren’s song for the simplicity that SML provides, and ML for the Working Programmer was definitely a foundational book in my thinking.

                                                                                        1. 2

                                                                                          Interesting post, and really neat to see so much F#!

                                                                                          I’d probably recommend adding the tags +ml and +dotnet as well.

                                                                                          Wrt the crypto, I’d probably recommend Argon2id over BCrypt, as it is more modern and resistant than BCrypto itself, and I’d definitely avoid RSA as much as possible. Also, many voting systems follow NIST 800 series, NIST 1800 series, and things like NIST IR 7711 and the various UOCAVA studies. Because of this, you may be limited to things like PBKDF2 and such as well.

                                                                                          Additionally, these are things we discussed in the Voatz Security Review and Threat Model. It’s definitely an interesting space, I wish we could see more innovation in it.

                                                                                          Also, if you’re interested in this space too, looking at how the Swiss Post voting system was broken is fascinating, as well as things like the Moscow Parliamentary elections flub with ElGamal

                                                                                          1. 2

                                                                                            Neat. Great stuff here.

                                                                                            15-20 years ago, I coded a secure voting system for the Board of Governors of the Federal Reserve. You know, the guys that set interest rates. Back then it was all bubblegum and bailing wire. But hell, it was better than Lotus Notes, which it replaced.

                                                                                            I appreciate this update. If I continue to play with the idea I’ll look into this information.

                                                                                            I’m a conceptual/intuitive thinker. That works great for some things and not others. For stuff like crypto, it pays to be the other way, both detail-oriented and methodical. There are a lot of really cool technical folks out there that are working in this space, mostly because of crypto-currencies. Things like information leakage or SIGINT can come at you from so many dimensions that it is impossible to track it all, even conceptually. The old joke is why go to all of the trouble to hack a guy’s system when you can just stick a cam in his office?

                                                                                            Frankly I love this space. I also love the AI/ML/GAN stuff I’m seeing. So many areas of cool tech and so little time!

                                                                                            1. 2

                                                                                              It’s an extremely interesting space! The thing that gets me is most of it is focused on blockchain, which I don’t think is the correct direction, it’s more the fundamentals of cryptography, provenance, authenticity, confidentiality, integrity, &c. that are the real killer features needed.

                                                                                              There are a lot of really cool technical folks out there that are working in this space, mostly because of crypto-currencies.

                                                                                              Spot on, and it’s more than just cryptography too! As much as I dislike the cryptocurrency space, there have been huge inroads in Program Analysis, Cryptography, &c that have been paid for by the likes of Bitcoin, Ethereum, and so on.

                                                                                              Things like information leakage or SIGINT can come at you from so many dimensions that it is impossible to track it all, even conceptually. The old joke is why go to all of the trouble to hack a guy’s system when you can just stick a cam in his office?

                                                                                              exactly right; it’s like that old XKCD about what cypherpunks want to happen, versus what will happen.

                                                                                              Frankly I love this space. I also love the AI/ML/GAN stuff I’m seeing. So many areas of cool tech and so little time!

                                                                                              There’s also some interesting inroads that tie AML/KYC to ML and what not. There’s a lot of fascinating things going on here…

                                                                                              1. 2

                                                                                                I completely agree, and I agree that there’s some crossover here.

                                                                                                When I think of the things I would like to do in this space, I end up thinking about stuff like code-to-hardware, where the code and hardware both are digitally-signed and only does the one thing the coder programmed it to do. To me, this guarantee of provenance and behavior is what’s missing in modern computing. I have no idea what the hell my phone or PC are doing, and that’s way f*cked up. We’ve somehow gotten upside-down in relationship to our technology. Ctypro has a big place at the table in fixing that.

                                                                                                But I also wanted to be an astronaut, master Kung Fu, and become a secret agent. Conceptually I can understand many things, and I can even roughly spec it out and work on accomplishing them, but there’s no way around investing the money and doing the hard work, even if it’s possible to get there from here. (And it might not be. Most things like this you don’t understand until you do them. As we marines say, we learn by engaging with the enemy, not planning)

                                                                                                I’ve seen a lot of money coming from various folks to make progress here, but in the end it all kind of boils down to “Here’s yet another app to identify hot dogs, only it’s uses our tech, it’s a dapp and it works zero trust!”

                                                                                                I ain’t got no time for that stuff. Life’s too short.

                                                                                          1. 23

                                                                                            the probability that changeme is actually valid base64 encoding must be very low

                                                                                            On the contrary, any 8 char alphanumeric string is valid base64. All such strings with a length divisible by 4, in fact, as those strings are guaranteed not to need trailing padding.

                                                                                            1. 3

                                                                                              I remember with distinct horror the first time I realize that rot13(base64(data)) is also valid base64, and that there’s probably a developer out there that was using this…

                                                                                            1. 3

                                                                                              I wonder what was used to visualize the SQL statements in there, that’s pretty sweet!

                                                                                              edit Thinking about it further, I’ve never written a malicious application for a tester to run. I have done the following during CTFs and such tho:

                                                                                              • wrote a middleware for Flask that exploited the API Shutdown function in ZAP
                                                                                              • given teams vulnerable routers and then exploited them one by one
                                                                                              • hosted back doors & other unknown vulns in applications that I could exploit as part of either a white or black team
                                                                                              • run while true; do killall -9 $server; sleep 3; done on the host via various mechanisms
                                                                                              • written a programming language and then inserted a backdoor/bug/key into its prelude
                                                                                              1. 8

                                                                                                If your DB exists inside one org - sure. If your DB is shared between multiple orgs and you want to concentrate on tech, not on legal… Maybe time-series database won’t cut it.

                                                                                                1. 10

                                                                                                  What attack, exactly, do you fear that is allowed by giving your partner orgs append-only access to a DB, but is not possibly by giving them append access to your blockchain?

                                                                                                  I note that the article explicitly addresses the ‘partners who don’t trust each other’ use case.

                                                                                                  For example, a private ledger allows data to be shared and seen by different parties who do not need to trust each other because of the software’s rules – this could be a bank and a regulator sharing data access to customer trades data. I would argue that such data access could be done via the business logic layer sitting on top of the database layer to provide data to outside parties.

                                                                                                  1. 7

                                                                                                    This

                                                                                                    ‘partners who don’t trust each other’

                                                                                                    is not compatible with this:

                                                                                                    giving your partner orgs append-only access to a DB

                                                                                                    Because it requires your partner to trust that you do not do funny stuff on DB yourself. Simplest attack - place a buy before appending a large incoming buy order, and place a sell just after it. Free money. Happens on public blockchains all the time.

                                                                                                    BTW, this article is a dumpster fire. It is full of false claims, half-truths and just irrelevant bullshit. The guy who wrote it knows his time-series databases and knows almost nothing about blockchain design space.

                                                                                                    Blocks are added to the blockchain at regular time intervals. For each block of data, there is an associated timestamp.

                                                                                                    Yes. Trains also arrive at different time points. Is Amtrak a blockchain? How is this even relevant?

                                                                                                    Data replication: Each node in the blockchain holds the entire history of transactions. If one node is compromised, we rely on the others to provide the full history. Again, this concept has been in effect for decades with traditional databases: if one database fails, we may want another as a backup.

                                                                                                    False. There are multiple types of nodes. There are chains where history can be truncated - chains with snapshots. Reason for nodes to have the full state is an ability to validate every state transition. Data availability is an important concern, but it is secondary since the need to share some data can be removed with the help of zksnarks / bulletproofs.

                                                                                                    full history of all individual transactions ordered by time; this is how blockchain nodes work

                                                                                                    No, this is not how blockchains work. They implement total global ordering of events. But the order is logical, not time based. E.g. both Bitcoin and Ethereum include transactions in the order defined by the fee paid - from txes paying high fees, to txes paying low fees. Total global order of transactions plus deterministic execution equals ability to arrive to the same final state. It has very little to do with time.

                                                                                                    Blockchains would have multiple parties (i.e., nodes) to agree for a specific transaction. There are consensus algorithms such as Raft and Paxos in traditional databases akin to a voting mechanism.

                                                                                                    This bit is just lazy. Consensus determines the order of inclusion of transactions. Nothing more.

                                                                                                    Long 256 format: This is the format of crypto public addresses. At QuestDB we have built a data type that is better than a string to efficiently write and read Long 256 blockchain addresses.

                                                                                                    Irrelevant. This is a wrong layer. See Tendermint. It provides a total ordering on a set of binary blobs. Binary blobs are untyped, and Tendermint knows nothing about details of business logic of the application it is running.

                                                                                                  2. 8

                                                                                                    You are not avoiding any “legal”.

                                                                                                    This reminds me how a bunch of blockchain peddlers that made their way into a meeting of a municipal IT steering committee, trying to sell their “block-chain based authentication mechanism”. They did not read eIDAS and GDPR (nor the related local laws) did not read the public service information systems law, nor the cybernetic security law and showed that they had zero understanding of the fact, that if public administration screws up proving someone’s identity and that someone is harmed by that, the administration is held liable.

                                                                                                    It is several times easier to write a contract between couple of parties detailing how you share data and who is responsible for what, adjust your respective privacy policies and use a couple of regular databases than “putting it all onto a blockchain”, potentially being liable for publishing personal information that cannot ever be deleted.

                                                                                                    And frankly, I am pretty confident writing small-scale contracts myself, without being a lawyer. Continental Law tends to provide mostly sane defaults (to a point it is safe to sign most of the common contracts without any additional clauses apart from the mandatory ones) and since overly unfair contracts are invalid and courts are expected to read into what the signatories meant, you only need lawyers in high-risk situations or once something blows up.

                                                                                                    And if you need it to scale, just use adhesive contract (terms of service) with one organization acting as a steward. If you need it to be neutral, association is both the simplest of corporations to fund and also the simplest to administer (almost no accounting) providing democratic (one member = one vote) decision-making by default.

                                                                                                    1. 6

                                                                                                      How do you realistically prevent a 51% attack though…?

                                                                                                      1. 8

                                                                                                        What you decide to do will depend a lot on the specifics of your use-case. You might decide to run your own proof of authority blockchain, or to run some other BFT protocol such as HotStuff. You also might communicate using a public blockchain. However, what you must not do is run your own proof of work blockchain. Your comment correctly identifies one of the reasons why doing so is a bad idea.

                                                                                                        1. 3

                                                                                                          WoT?

                                                                                                          1. 4

                                                                                                            Web of Trust style, ala Proof of Authority, doesn’t actually handle adversarial consensus, only who can publish to the blockchain, so often you’ll see another consensus algorithm underneath that layer, like IBFT, Raft, what-have-you, if you’re concerned with adversaries within the trusted nodes

                                                                                                            1. 2

                                                                                                              Thank you, that’s helpful.

                                                                                                              1. 2

                                                                                                                of course! It’s a really interesting space, and there’s lots of nuances to it all. We obviously deal with it a lot at work, more than other folks may in their day to day

                                                                                                          2. 2

                                                                                                            Enterprise organisations are also vulnerable to 51% attack. You can buy 51% of the shares, then burn the whole thing to the ground.

                                                                                                          3. 4

                                                                                                            Yeah this is my understanding of the value of enterprise blockchains. It’s more about getting diverse organisations to run a database together in an interoperable manner without one of them becoming the “owner” of the database or similar. I’ve never actually worked in such a large organisation so I have no idea if this rings true.

                                                                                                            1. 4

                                                                                                              This is generally the case; my company reviews blockchains quite frequently, as well as their installations, and we’ve seen this comment often. Having said that, I haven’t seen as much success from that sort of thing; very often it’s a pilot, and doesn’t go much further than that.