Threads for lojikil

  1. 12

    PSA: before proposing your favorite language as a better alternative, make sure to read the previous story to avoid rehashing the same arguments. Thanks :)

    1. 2

      hard agree with this comment, and also it would be great if we could apply the ML tag

      1. 2

        you can use the suggest link under a submission to change which tag’s are applied to a submission

        1. 2

          I never provide feedback to posts with enough frequency to remember this, so thank you! will comply

          edit actually I don’t have a suggest link for this one…

    1. 2

      FWIW Python 2.7 is pretty much frozen in time and is a very capable language …

      SML is nice but I haven’t seen many (any?) big or common programs written in it. One reason may be that a language isn’t a sufficient foundation for useful programs (100 year or otherwise) – you also need a definition of an operating system.

      On the other hand, I’m sure there are many >100K and >1M line codebases still running on Python 2.7 (mostly inside companies or big organizations).


      Also it feels like this 2003 essay deserves a mention: http://www.paulgraham.com/hundred.html

      In fact I remember Larry Wall saying it was partly an inspiration for Perl 6 (now Raku).

      And the two languages from its author: Arc and Bel

      http://arclanguage.org/

      http://www.paulgraham.com/bel.html (apparently defined in a single big text file: https://sep.yimg.com/ty/cdn/paulgraham/bellanguage.txt?t=1595850613&)

      I’ve never used them but the design motivation is related. They’re trying to “define away” most of the implementation details of scheme, i.e. bootstrap the language from a smaller set of axioms. So presumably they won’t change in 100 years.

      But I think the real reason that something won’t change is because a lot of working software depends on it. There are large portions of Unix and the web that will never change.

      This is mostly food for thought… If you are having fun with SML, then that is what is important :-) But it seems like there is a lot of personal taste involved. Invoking the “Lindy Effect”, I’d choose Unix / C / Shell / Fortran / Lisp / SQL, since those languages are extremely old and still widely used.

      So it seems like there are 2 directions here:

      1. Finding a language IMPLEMENTATION that has not changed very much and is likely to exist in the future. That’s why I pointed to Python 2.7

      2. Building your own language (like Arc or Bel, though of course it doesn’t have to be a Lisp). I feel like if you take the the 100 year goal very seriously it will eventually lead to this :-)

      1. 1

        SML is nice but I haven’t seen many (any?) big or common programs written in it.

        Several large provers are written in it (like HOL4, Twelf, and Isabelle, which admittedly I am most familiar with), and it’s used in that space in industry (as a prover implementation). When MLWorks (related to LispWorks and other Harlequin products) was new, it was used in that space as well as in financial modeling, where nowadays we see Haskell, OCaml, Scala, & F# used. There are definitely many large SML code bases that will be floating around for quite some time to come.

      1. 3

        I always found the chronicle of this company as told in Hackers : heroes of the computer revolution to be riveting and insightful.

        1. 3

          I actually just bought “Sierra Adventure” and “Not all fairy tales have a happy ending,” both about Sierra On-line; everything I’ve heard & seen about the company is fascinating

        1. 3

          Thank you for writing this up, there’s an excellent amount of detail here. The proliferation of silo’ed chat protocols has been one of my pet peeves and has definitely (thus far) been heading the wrong direction; as bad as the proprietary protocols were back in the aughts, they were at least neutrally interoperable in their heyday – these days most companies are downright hostile in how their enforce their ToS when it comes to third-party clients etc.

          I’m hoping the relatively smaller ecosystems (e.g. Discord) take note and at least loosen their ToS to allow for calling user APIs without fear of a perma-ban.

          A couple of additional questions: is it known if companies might attempt to limit the exposure of their APIs to EU markets only, or does the DMA cover that explicitly? Is the DMA a pre-requisite for fully scaling out use of Matrix Bridging Services – i.e. does the interoperability climate pre-DMA preclude you from offering bridging as a commercial service?

          1. 3

            Thanks for the positive feedback :)

            is it known if companies might attempt to limit the exposure of their APIs to EU markets only, or does the DMA cover that explicitly?

            I don’t believe the DMA covers that explicitly, but IANAL. Much like some sites decided to cut off EU traffic rather than implement GDPR, I guess it’s possible that the gatekeepers might only offer open APIs to EU IP addresses - but it feels like the negative PR of doing so (and the theatre of doing so, given how easy it is to get a EU IP) would not be worth it.

            Is the DMA a pre-requisite for fully scaling out use of Matrix Bridging Services – i.e. does the interoperability climate pre-DMA preclude you from offering bridging as a commercial service?

            Any kind of bridging to a closed service from Matrix (or XMPP) is pretty miserable today, given you have to do adversarial interoperability, which massively reduces the interest in building bridges or relying on them. So yes, DMA would be transformative for bridging and interop in general :)

            1. 1

              So yes, DMA would be transformative for bridging and interop in general :)

              How much of this do you suspect will be bridges for alternative open protocols vs alternative clients? Also, how do you foresee abuse/spam issues being handled?

          1. 4

            This is a fine explanation of baby-free-monads and how they are analogous to Haskell IO. I do think it should be pointed out though that this is only one way to accomplish IO from a purely functional system. Albeit a very popular and useful one

            1. 4

              A baby-free-monad sounds like a monad that doesn’t have any babies inside :)

              1. 1

                (avoid success) at all costs vs avoid (success at all costs) => (baby-free)-monad vs baby-(free-monad)

              2. 3

                What are some examples of other ways to accomplish IO in a purely functional system?

                1. 4

                  I think the three big ones are that I’m familiar with are:

                  I don’t know of too many examples of the second two, but the first one has gained quite a bit of popularity in the last few years.

              1. 5

                Ah ok I guess the title says what it literally is but I was hoping it compiled to Go or something. It’s just written in Go and generates code with LLVM. No Go interop at all. Slightly disappointing (only because I’m predicting the next hit language to be an ML on the Go ecosystem a la F# and Scala) but an awesome compiler project nonetheless.

                1. 2

                  I would also love to see an ML derivative that compiles to and interoperates with go. I feel like that would be the perfect language for me.

                  1. 2

                    only because I’m predicting the next hit language to be an ML on the Go ecosystem a la F# and Scala

                    I agree with this, I think Go is a useful target; I may be influenced by the fact that I’m not much of a fan of writing Go as well, but I do think it’ll be a useful target. I spent a fair amount of time getting carML to output Go nicely, and it’ll be something I target in coastML for that reason as well.

                    1. 1

                      I’ve had this idea for quite some time, but not a ton of motivation. I feel like this is so close that I should be jumping up and down to do it… but, time. Probably won’t be me.

                    1. 2

                      This is also the blog post I should have written when I switched from working on Digamma (and it’s many implementations) to carML it self as well.

                      I really wanted to not get bogged down in little one-off lexer issues and just focus on what I was parsing; generally I hand roll my lexers and spend a ton of time fixing minor issues that result in major mis-parses. So with coastML, I just wanted to use the simplest things possible, and keep the code relatively approachable whilst still fixing issues that cropped up right away (I probably should add all of this to that blog post).

                      I also didn’t want to get bogged down in lots of parsing things and cool language features; honestly, I want to work on other stuff besides just the compiler (such as analysis tools). I think I have the right balance here, esp once I can generate more compiler code from coastML than from Python (or Go, C, Java, what-have-you).

                      1. 35

                        Why did GitHub remove his account/projects?

                        1. 44

                          That’s the part that bothers me.

                          I understand it wasn’t a nice thing to do, and that people are upset, but it’s his own code in his own repos. He even announced ahead of time he would do something like this, so “buyer” beware.

                          I realize GitHub TOS covers them to remove accounts and repos at their discretion, but it’s a little unsettling that they’ll actually do so arbitrarily without a clear TOS violation. It might be time I move everything to Sourcehut and treat GitHub as a mirror…

                          1. 24

                            It might be time I move everything to Sourcehut…

                            The Sourcehut guy has always seemed a little unstable to me (didn’t he get banned from this site, in fact?) So, why would I trust him any more than I trust GitHub?

                            1. 33

                              I banned him and I would not call him unstable. Not just because that kind of insult is inappropriate here, but because it obviously doesn’t apply. He writes inflammatory hyperbole that’s not a good fit for this site, but he’s a skilled, accomplished professional who looks like he’s seeing a lot of success in his life.

                              1. 11

                                I didn’t mean to insult him. Maybe “erratic” would have been a better word without any mental health connotations (which I absolutely didn’t intend)? Poor word choice on my part, I’m sorry for that.

                                …but he’s a skilled, accomplished professional who looks like he’s seeing a lot of success in his life.

                                Sure, same goes for the GitHub guys. A person who can’t tone it down enough to keep a Lobsters account just isn’t someone I feel I can trust to host my code, particularly given that he’s in charge of the whole operation. Obviously everyone is free to decide who to trust and for what reasons.

                                1. 9

                                  A person who can’t tone it down enough to keep a Lobsters account just isn’t someone I feel I can trust to host my code

                                  Bear in mind, Linus Torvalds would also probably have been banned from here multiple times in the past.

                                  I’d be perfectly happy to trust someone that volatile a lot (and I guess I do, running Linux since 1996 :) ). But I would be careful which groups and forums I invited them to :)

                                  1. 2

                                    …I guess I do, running Linux since 1996

                                    Very different, at least to me. If Linux was a service, control would have been taken away from Linus a long time ago (I mean, as it is they made him step back for awhile to work on his issues). But it’s not, it’s just code that other people then build into something, often applying patches in the process. If Linus had a meltdown there is already sufficient infrastructure in place that the vast majority of us wouldn’t even notice.

                                    I wouldn’t trust a code hosting service Linus ran by himself either.

                                    1. 1

                                      Nobody made Linus step back. He recognized that he had issues and took a sabbatical to deal with them himself. Are you saying you wouldn’t trust a service by a guy who has been diligently working on the same project for 30 years? Not to mention the guy who invented the base of all of the services discussed in this thread.

                                      Why do people assume that “Bigger is better” when it comes to web services? The two most reliable services I use are Pinboard, run by an insanely opinionated and outspoken developer, and NewsBlur, who was, and may still be, a one man shop that just quietly does his own thing. In the same time as those services have been faithfully up and running, Google has shut down more services than I can count, because “It didn’t fit with their corporate vision”

                                      It’s misguided, and harmful.

                                      1. 1

                                        Nobody made Linus step back.

                                        We’ll probably never know for sure, but the subtext (well, and the text) of his announcement email sure makes it sound like his hand was forced, at least to me.

                                        Are you saying you wouldn’t trust a service by a guy who has been diligently working on the same project for 30 years?

                                        No, I’m saying I wouldn’t trust a service run by a guy who randomly goes off on people in totally inappropriate ways (his admission). Or, as is the case here, a guy who can’t even behave himself well enough to keep a Lobsters account.

                                        Not to mention the guy who invented the base of all of the services discussed in this thread.

                                        That has literally nothing to do with anything. A person can be productive or brilliant and also have other, undesirable, qualities.

                                        Why do people assume that “Bigger is better” when it comes to web services?

                                        I don’t, so I can’t answer that.

                                        Google has shut down more services than I can count…

                                        Agree with you there! I don’t trust Google for anything but search (I don’t even use Gmail), because that’s the one thing I don’t think they’ll ever kill (or break). I don’t think GitHub is going anywhere either, the worst case scenario is that Microsoft sells it.

                                        It’s misguided, and harmful.

                                        If there was a person who had the views you seem to ascribe to me, then I might agree!

                              2. 30

                                That’s unfair to Drew. He’s passionate, and rude, and opinionated, and submissions here from his site generally stirred up giant flamewars. But I do believe he’s got what it takes to keep sourcehut running.

                                1. 18

                                  GitHub will keep running, too. I’m not sure we’ve answered the question of

                                  why would I trust him any more than I trust GitHub?

                                  1. 8

                                    Not only is the sourcehut software available under the AGPL, the issue trackers and such give you export and import functions to pull your data into another instance easily. The software itself is not trivial to host, but it’s not prohibitively hard either. If I needed to eject because Drew became untrustworthy, I am very comfortable that I could do that.

                                    Even though that’s a non-zero amount of work, GitHub gives me no comparable ability. That’s a good reason to trust him more than I trust GitHub, in my opinion.

                                    1. 3

                                      GitHub gives me no comparable ability.

                                      The GitHub command line client provides this functionality, as does the API. Obviously, the data formats are specific to the way GH works, but there are ways to extract most if not all of the relevant data (I use this heavily with my team to script up our findings workflow, for example).

                                      1. 5

                                        Interesting. Unless I’m missing something, you can’t stand up your own self-hosted instance of github, and import that, can you? The ability to stand up my own instance of the forge and import my data, to use on a self-hosted site, is what I meant by “comparable”. (That’s the angle I was coming from… if Drew won’t let me use his hosted service, I can just set up my own copy on any host I want since upstream is AGPL, then import my data from the sr.ht export since sr.ht exposes those functions.)

                                        1. 2

                                          GitLab supports importing to a self-hosted instance from GitHub [1], although I’m sure it’s not perfect, so it may or may not be useful. It also isn’t clear to me based on a 15 second review whether you can import from some kind of local data dump or raw GitHub API responses or if your GitHub account needs to be currently active.

                                          [1] https://docs.gitlab.com/ee/user/project/import/github.html

                                          1. 2

                                            That looks much better than I thought, particularly if it turns out to work off saved data/responses. And it’s nice that Gitlab enable that for all their tiers.

                                          2. 1

                                            Unless I’m missing something, you can’t stand up your own self-hosted instance of github, and import that, can you?

                                            GitHub Enterprise can be bought as a GitHub-hosted or self-hosted thing. These support (most of) the same APIs as the public GitHub, so you can run your own instance if you are willing to pay.

                                            1. 2

                                              It would be an interesting experiment to see whether they would sell an enterprise installation to someone whose account they forcibly closed. I was sort of assuming that if they won’t let you be a customer of their public service, they won’t sell you the private one, but that is uninformed speculation.

                                      2. 3

                                        Because sourcehut is open source so nothing is lost when I leave. More than that chances are if sourcehut goes a bad route there would likely be others jumping in.

                                      3. 2

                                        Not that you exactly claim otherwise, but Drew also makes some nice things and has created a business structure that enables at least one other developer to make some nice things.

                                        Quite apart from that, though, and similarly quite apart from whether he has what it takes to keep sourcehut running, he’s given me an out so that I don’t, strictly speaking, need him to. He’s released the software that runs the forge under the AGPL, here. And it exposes ways for me to export the hosted stuff and import it into a self-hosted instance.

                                        So regardless of whether I trust Drew personally, he’s made it so I don’t need to for this purpose.

                                        If Drew got angry and decided I couldn’t be his customer anymore, I could stand up my own instance or pay someone to do that for me and import my data. My repos wouldn’t be down at all, my tickets, docs, etc. would be down for a day or so, and my mailing lists might see a bit more disruption than that. If github decided that I shouldn’t be their customer anymore, I’d have my repos. For the rest, I’d kind of be out of luck. (I think this last paragraph is more responsive to @glesica ‘s comment than the one I’m replying to, and I’m too lazy to split it to another reply.)

                                      4. 17

                                        Because “more than I trust Microsoft” is a damn low bar.

                                        1. 7

                                          It’s like a little devil hovering over my right shoulder, and a slightly less predictable devil hovering over the other.

                                      5. 6

                                        From other options there’s also fediverse approach with Gitea, and p2p approach will be available soon with Radicle.

                                        1. 11

                                          It might be time I move everything to Sourcehut and treat GitHub as a mirror…

                                          That time was years ago, but hey, better late than never.

                                          1. 5

                                            Consider hosting your own, instead. I published a blog post with a list of defunct code hosting sites which I update occasionally. Maybe that list is a good reminder. Remember, it’s not just code that goes away with such sites, it’s also issue queues and in some cases, wikis and mailing lists too.

                                            1. 4

                                              Are you also start hosting a list of defunct private websites that used to host Git repos that are gone forever and where the disappearence came completely unexpected? I would trust Github more with staying online since that’s their job than a developer running a Gitweb on some VPS with some domain name that requires regular payment to stay online.

                                              Kinda like I registered callcc.org after it lapsed to make sure the links to the CHICKEN website don’t break and it doesn’t get domain-squatted and I’m redirecting to the official website these days.

                                              1. 1

                                                Are you also start hosting a list of defunct private websites that used to host Git repos that are gone forever and where the disappearence came completely unexpected?

                                                I can’t think of anything offhand where I’ve taken a dependency that’s done that. But when I do take a dependency on something, I generally mirror the SCM repo if there is one. And I am very reluctant to take dependencies on things I can’t have the source to. Since the things I depend on generally haven’t gone away, I haven’t bothered to publish my mirrors, but I would if the license permits it.

                                                1. 3

                                                  But when I do take a dependency on something, I generally mirror the SCM repo if there is one.

                                                  I learned that the hard way when Rubyforge went down, a few employers ago. We weren’t that active in the Ruby community anymore, so we missed the notice. When the site went away and I had to do some small maintenance tasks on a legacy project, all the third party svn subtrees from Rubyforge were no longer working (and, more painfully, another project of ours was completely gone too). Note that Rubyforge was huge in the Ruby community back in the day.

                                                2. 1

                                                  I would trust Github more with staying online since that’s their job than a developer running a Gitweb on some VPS with some domain name that requires regular payment to stay online.

                                                  Like I said, history has shown these hosting sites are not as trustworthy as people like to believe they are. The GitHub company can get sold to an untrustworthy partner (har har, like that’d ever happen… oh wait) or go out of business (what if MS decides to sell the company to, I dunno, Oracle or something because it’s not making a profit?), or there might be some other new VCS that comes out that completely blows git out of the water. I’m sure nobody saw coming what happened to Bitbucket - it started out as a Mercurial hosting site, then started offering git and finally dropped Mercurial after Atlassian took over. Its founders probably never would have let that happen if it were still in their power.

                                                  From my own perspective, I’ve personally ran into at least five hosting sites who were hosting projects I started or heavily contributed to that are no longer available now (Rubyforge, Dutch govt OSOSS’ uitwisselplatform, Berlios, Bitbucket and Google Code). And then there’s Sourceforge which at least still hosts some of my defunct projects, but had for a while been injecting malware into downloads. If I or my employers (as the case may be) had hosted our own projects from the start, this pain would’ve been completely avoided. These are projects in which I had a stake, and it was in my interest to not let them die.

                                                  Now, having said that, I agree that from a third party perspective (someone who is using the hosted code) that’s different. I understand your point saying you don’t want to rely on some random developer’s VPS being up, and neither would I. But then, people change repositories on code hosting sites all the time, too. They move to other hosting sites, or rename repositories etc. Links rot and die, which is unfortunate and something we all have to live with.

                                                  Case in point:

                                                  Kinda like I registered callcc.org after it lapsed to make sure the links to the CHICKEN website don’t break and it doesn’t get domain-squatted and I’m redirecting to the official website these days.

                                                  Thanks for doing that. AFAIK this domain was never communicated as being official, but I might be wrong.

                                            2. 8

                                              I don’t know what the GitHub rationale was, but the ‘limitation of liability’ bit in most open source licenses only goes so far. If I intentionally introduce malicious behaviour into one of my open source projects, knowing that it would damage downstream consumers, then I’d probably be liable under the Computer Misuse Act in the UK and similar legislation elsewhere. GitHub’s T&C’s don’t explicitly prohibit using their service for criminal purposes but that’s normally implicit: if GitHub didn’t act then they might end up being liable as an accessory (at least as an accessory after the fact). Their distribution channel (NPM) is being used by a malicious actor to attack other users.

                                              It’s normally difficult to prove malicious intent in this kind of thing (incompetence and malice look similar) but it seems pretty clear here from the author’s own statements.

                                              1. 12

                                                I don’t know what the GitHub rationale was, but the ‘limitation of liability’ bit in most open source licenses only goes so far.

                                                This is disturbing. Software is provided as is, with no liability whatsoever, but the author should still be liable for what happens when other people use it, because it broke things? What if the author decided to completely change the library’s API, or recycle it to just print squares of color, because they liked the name?

                                                If find what the author did pretty stupid, but frankly, suggesting it falls into criminal behavior call for some stepping back and put things in perspective.

                                                1. 8

                                                  There is a difference, and it’s not subtle at all, between making a possibly unwanted change in software that is provided without any warranty, and deliberately making a crippling change with the express intent of breaking other people’s applications.

                                                  To put it another way: if you accidentally commit an integer overflow bug that causes batteries to blow up, that is, presumably, just bad engineering. But if you deliberately commit a clever hack that causes people’s batteries to blow up, with the express intent of getting people injured, or at least destroying their phones, I think it makes a little sense to not put it under “well, it did say no warranty of any kind on the box, didn’t it?”.

                                                  Obviously, this didn’t kill anyone, so I’m obviously not thinking it ought to be treated as murder. But “no warranty” is not a license to do anything.

                                                  It’s not like software is being given special treatment here, it’s how warranties work everywhere. If you sell boats with two years’ servicing warranty and they break down after three years, that’s one thing, but if you fit them with bombs that go off after two years and one day, with the express intent of killing anyone on them, that still falls under “murder”, not “what happens after two years isn’t our problem, it says so on the purchase contract”.

                                                  (Edit: obviously IANAL and this is not legal advice, either, I’m only parroting second-hand, non-lawyer advice about how insurance works for some high-stakes software projects)

                                                  1. 5

                                                    I guess that makes sense, when you put it that way :)

                                                  2. 3

                                                    I am not a lawyer, this is not legal advice:

                                                    My understanding is that it comes down to intent. If I upload a buggy piece of crap to GitHub with an open source license, and you use it, then it sucks to be you. If I upload something to GitHub, wait for you to deploy it and then intentionally introduce a vulnerability or other malicious behaviour in it then legally dubious. Normally it’s very difficult to prove intent. If I submit a patch to the Linux kernel that introduces a vulnerability, if you wanted to prosecute me then you’d have to prove that I did so knowing that the bug was there and with the intent to cause harm. That’s very difficult to do in the general case (the NSA null-pointer dereference bugs are a great case in point here: people suspect that the NSA knew about that vulnerability class and introduced it deliberately, but no one can prove it and there’s enough reasonable doubt that it would never stick in court unless there was some corroborating evidence - it could easily have been accidental). If, before I submit the patch, I post publicly about how I am going to intentionally break things for the people using my code and then I push a new version out to public repositories then it’s likely to be easy to prove malicious intent. The author of these packages did exactly that: posted saying that he was going to break things for people if they didn’t pay him and then, when they didn’t pay him, broke things. That may (again, not a lawyer) count as blackmail, as well as computer misuse.

                                                    1. 3
                                                      1. Code license != Github TOS.
                                                      2. Liability could only be disclaimed to the extent permitted by law. You cannot put a sign “free food, no liability whatsoever” and then put poison inside and expect that disclaimer to save you from prison. E.g., GPL states “THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.”
                                                  3. 7

                                                    I think until they make a statement about it, nobody knows but them. But my assumption is that this happened on a weekend, and whoever was on call figured that the easiest thing to do to minimize disruption till Monday was to suspend the account and hard revert the content until more people could be brought in. I’m also assuming suspending the account just automatically says that you violated the ToS.

                                                    1. 3

                                                      I could imagine that somebody identified this as a possible account hack and thus locked it.

                                                      1. 2

                                                        They didn’t, they suspended his account so he can’t log in. You are still free to troll him on GitHub without any recourse whatsoever.

                                                      1. 2

                                                        An, Standard ML. Ullman’s ‘Elements of ML Programming’ remains one of my favorite programming textbooks in any language. Sad to not see it on the list.

                                                        1. 1

                                                          I had never heard of this before catching up on this thread, but I ordered a copy today, thank you for the recommendation!

                                                          I had mainly used ML for the Working Programmer and Purely Functional Data Structures (which isn’t even exactly SML, but a close dialect thereof) previously, but this looks interesting.

                                                        1. 4

                                                          What to use for its backend, though? There’s not actually that many general-purpose compiler backends out there, your choices are basically GCC or LLVM, and both of those are large hulking monsters I would really prefer not to have to deal with.

                                                          Would Zig’s self-hosted compiler theoretically be an option here?

                                                          1. 4

                                                            Possibly? I don’t know if it’s designed for the backend to be reusable in other projects.

                                                            1. 1

                                                              There’s also qbe, which is meant to give a large amount of functionality to a backend, in a small amount of code. It’s relatively approachable and has an LLVM-like language, without being a “large hulking monster.”

                                                              1. 2

                                                                Yeah, I’m aware of it. I’m not sure it’s mature enough I’d want to rely on it yet, but it is what first made me think “…okay, maybe I don’t need LLVM or a comparable-sized project”.

                                                          1. 3

                                                            Looks like custom query is a new attack surface

                                                            1. 3

                                                              How so? Isn’t this proposal just a way to standardize what we already do in some other ways?

                                                              1. 1

                                                                There are two major issues I can see:

                                                                1. developers will need to consider this continuously, and they already have issues ensuring that all verbs’ paths transit security controls
                                                                2. compensating/mitigating controls at the edge will need to consider this, and they’re notoriously bad at keeping up to date with standards.

                                                                The first is relatively straight forward: developers will now need to track a new verb that they have to make sure that all authentication, authorization, &c controls are applied. For frameworks that tend to have a single object/method per URL/route (endpoint), this is relatively straight forward: you can have a single entry point for the endpoints, and just detect if the new verb is being used to access it. For frameworks that tend to separate out endpoints, this means we need a new method, it needs to have all controls applied, tested, &c. It’s not a huge deal, but it’s often an edge case I see when working with developers, but also not too different from our current pain.

                                                                The second is more myriad; generally, depending on your devices, servers, security software, &c &c, you can have all sorts of odd interactions between these devices and the destined applications. For example, years ago many large venders had very strict rules for SQLi and XSS. So how would we bypass them? well, we’d just change GET or POST to Foo (or the like). At the time, several application servers would use heuristics for verbs to determine how they should be interpreted: if it had a body, it was a POST, if not, it was a GET. But how did this bypass edge controls? Well they also had heuristics: if they could determine what the verb was, they would apply rules, otherwise they would just pass it to the downstream application server. If you application server was reliant on the edge controls to block SQLi or XSS, you were in trouble. We’ve gotten much better about these sorts of things, but they can still lag significantly (HTTP/2 controls come to mind, or SCTP as a TCP replacement for data exfiltration, because many systems simply pass those along).

                                                            1. 3

                                                              I think it would have happened no matter what really; consider DSSSL and Viola’s scripting framework

                                                              1. 3

                                                                Hey thanks for the interest. So far we have built the lexer and have had lots of conversations about the parser, and there’s a PoC for the output->C stage. Hopefully we can make something useful of this.

                                                                1. 2

                                                                  This is really cool! I asked in an issue on GitHub, but do you have even nascent sketches of what yous think the syntax will look like?

                                                                  1. 2

                                                                    Thanks! I responded :) Basically it’ll be a subset of Swift.

                                                                    1. 2

                                                                      hahaha I saw that, and those were great answers! I would definitely be interested in what subset of Swift you’ll be picking, as that would be really interesting as well. I’m also happy to help if you need it, I’ve written a few systems that compile to human-readable C, Java, & Golang.

                                                                      1. 2

                                                                        Thanks! I hoped i had the right idea. As for ‘what subset’ - that’s a bit of an open question at this stage, sorry i can’t clarify yet. I’d be thrilled to have you involved, I’ll send a DM with details :)

                                                                1. 1

                                                                  BLUF: interesting comments on cryptography from the author(s), and I’d be interested in knowing more, although I’m always leery of novel cryptography constructions, since I’ve seen them fail all too often. Would be interesting to see what cryptanalysis the author(s) did.

                                                                  The “third party technology” section stuck out to me:

                                                                  • ed25519 from Dan Bernstein (in the donna version from floodyberry)
                                                                  • Keccak (original reference implementation)
                                                                  • Threefish as block cipher in ECB mode and in Threefish AEAD mode as backup for Keccak.
                                                                  • (wurstkessel from myself — now replaced by Keccak)

                                                                  I’m not a cryptographer, and I don’t know the author(‘s|s’) background, but those are some interesting choices.

                                                                  • ed25519 seems natural, tho I’d be curious what constructions they’re using it with
                                                                  • Keccak is… often overloaded. It’s actually a family of constructions & algorithms, but most often folks are just using the SHA-3 subset
                                                                  • Threefish isn’t bad, it’s just not much used, but I’m curious what choices lead to using it in ECB mode (there are valid uses like as a pseudo-random function for SIV mode and what not, but generally I’d shy away from it)
                                                                  • The author(s) also have a page on their Threefish AEAD, which is interesting; I’d have to look more into what they’re doing to feel totally comfortable with that
                                                                  • There is a page on wurstkessel as well, which seems to be original work from the author; they also have some criticism of the SHA-3 process on the page. Having said that, I’d be leery of relying on novel cryptographic constructions

                                                                  my interest is definitely piqued to read more here, esp since the author(s) seem to be willing to document this quite a bit

                                                                  1. 4

                                                                    I like the shell language that has been added; it reminds me a bit of es’ shell language without the rc-isms. I think my only complaint is that I use dune-the-build system a lot more than I would run my shell, but I probably can link it as dunesh or the like…

                                                                    1. 2

                                                                      I don’t think left-to-right matters in stack languages really, because it’s just a series of operations happening in order? Just reading 1 2 + makes more sense when you’re thinking about the stack as a data structure than + 2 1

                                                                      1. 2

                                                                        Reverse Polish is the native syntax for stack languages because it reflects the order of evaluation. A left-to-right stack language would need a layer that reorders the operations, since in your example the 1 and 2 have to be evaluated (pushed) first, then the “+”.

                                                                        (Unless you tried running the parser backwards over the input, which I guess is possible but weird. And begs the question of how you deal with interactive multi-line input.)

                                                                        Even an infix language that compiles to a stack machine ends up generating byte code in RPN order.

                                                                        1. 1

                                                                          (so called “normal”) Polish Notation is used in Math & Logic quite a bit; Wikipedia even has an article on evaluation. For fixed arity functions, it’s basically Shunting Yard really, nothing too complex.

                                                                      1. 9

                                                                        I never use code completion. If the editor provides it, I disable it.

                                                                        Having worked on large code bases (500k+ loc) with a lot of models (20-30) I can’t imagine life with a good quality language server…

                                                                        I’m not sure if I envy the author or if I’m afraid of them.

                                                                        1. 5

                                                                          It’s most likely they use tools like grep and find which easily work with any language, but yes, LSPs are significantly changing the usefulness of autocompletion.

                                                                          1. 2

                                                                            This is very true, and built tooling around it. My largest codebase was 10mm lines of code written in a Mainframe language, that couldn’t leave the clients hardware; so we wrote some simple tooling to help with finding things, and basically built a map around things.

                                                                            Whilst I still don’t use an editor that provides those sorts of things, I do use tools like ssadump or go guru to help give me the lay of the land

                                                                          2. 1

                                                                            Having worked on large code bases

                                                                            Having worked on small code bases I can’t imagine how much I’d have to be offered to agree to work on a codebase over 100kloc.

                                                                          1. 15

                                                                            I think the key insight here is that container images (the article confuses images and containers, a common mistake that pedants like me will rush to point out) are very similar to statically linked binaries. So why Docker/container images and why not ELF or other statically linked formats?

                                                                            I think the main answer is that container images have a native notion of a filesystem, so it’s “trivial” (relatively speaking) to put the whole user space into a single image, which means that we can package virtually the entire universe of Linux user space software with a single static format whereas that is much harder (impossible?) with ELF.

                                                                            1. 4

                                                                              And we were able to do that with virtualization for at least 5 - 10 years prior Docker. Or you think that packaging also the kernel is too much?

                                                                              Anyways, I do not think that a container having the notion of a filesystem is the killer feature of Docker. I think that moving the deployment code (installing a library for example) close to compilation of the code helped many people and organizations who did not have the right tooling prior that. For larger companies who had systems engineers cgroups gave the security part mostly because packaging was solved decades prior to Docker.

                                                                              1. 1

                                                                                IMO it’s not the kernel but all of the supporting software that needs to be configured for VMs but which comes for ~free with container orchestration (process management, log exfiltration, monitoring, sshd, infrastructure-as-code, etc).

                                                                                Anyways, I do not think that a container having the notion of a filesystem is the killer feature of Docker. I think that moving the deployment code (installing a library for example) close to compilation of the code helped many people and organizations who did not have the right tooling prior that.

                                                                                How do you get that property without filesystem semantics? You can do that with toolchains that produce statically linked binaries, but many toolchains don’t support that and of those that do, many important projects don’t take advantage.

                                                                                Filesystem semantics enable almost any application to be packaged relatively easily in the same format which means orchestration tools like Kubernetes become more tenable for one’s entire stack.

                                                                              2. 4

                                                                                I can fit a jvm in a container! And then not worry about installing the right jvm in prod.

                                                                                I used to be a skeptic. I’ve been sold.

                                                                                1. 2

                                                                                  Slightly off topic - but JVM inside a container becomes really interesting with resource limits. Who should be in charge of limits, JVM runtime or container runtime?

                                                                                  1. 7

                                                                                    Gotta be the container runtime (or the kernel or hypervisor above it) because the JVM heap size limit is best-effort. Bugs in memory accounting could cause the process to use memory beyond the heap limit. Absent that, native APIs (JNI) can directly call malloc and allocate off-heap.

                                                                                    Would still make sense for the container runtime to tell the JVM & application what the limits on it currently are so it can tailor its own behaviour to try to fit inside them.

                                                                                    1. 4

                                                                                      It’s easy: the enclosing layer gets the limits. Who should set the resource limits? ext4 or the iron platter it’s on?

                                                                                      1. 2

                                                                                        What’s the enclosing layer? What happens when you have heterogenous infrastructure? Legacy applications moving to cloud? Maybe in theory it’s easy, but in practice much tougher.

                                                                                      2. 2

                                                                                        Increasingly the JVM is setting its own constraints to match the operating environment when “inside a container”.

                                                                                    2. 4

                                                                                      Yes, layers as filesystem snapshots enable a more expressive packaging solution than statically linked alternatives. But its not just filesystems, but also runtime configuration (variables through ENV, invocation through CMD) that makes the format even more expressive.

                                                                                      p.s. I have also updated the post to say “container images”

                                                                                      1. 3

                                                                                        I think the abstraction on images is a bit leaky. With docker you’re basically forced to give it a name into a system registry, so that you can then run the image as a container.

                                                                                        I would love to be able to say like… “build this image as this file, then spin up a container using this image” without the intermediate steps of tagging (why? because it allows for building workflows that don’t care about your current Docker state). I know you can just kinda namespace stuff but it really bugs me!

                                                                                        1. 3

                                                                                          Good practice is addressing images by their digest instead of a tag using the @ syntax. But I agree - registry has always been a weird part of the workflow.

                                                                                          1. 1

                                                                                            addressing images by their digest instead of a tag using the @ syntax.

                                                                                            Be careful about that. The digest of images can change as you push/pull them between different registries. The problem may have settled out, but we were bitten by changes across different releases of software in Docker’s registry image and across the Docker registry and Artifactory’s.

                                                                                            I’m not sure if there’s a formal standard for how that digest is calculated, but certainly used to be (~2 years back) be very unreliable.

                                                                                            1. 1

                                                                                              Oh I wasn’t aware of that! That could let me at least get most of the way to what I want to do, thanks for the pointer!

                                                                                          2. 3

                                                                                            I noticed Go now has support for, in its essentially static binary, including a virtual filesystem instantiated from a filesystem tree specified during compilation. In that scenario, it further occurs to me that containerization isn’t perhaps necessary, thereby exposing read only shared memory pages to the OS across multiple processes running the same binary.

                                                                                            I don’t know in the containerization model if the underlying/orchestrating OS can identify identical read only memory pages and exploit sharing.

                                                                                            1. 2

                                                                                              I think in the long term containers won’t be necessary, but today there’s a whole lot of software and language ecosystems that don’t support static binaries (and especially not virtual filesystems) at all and there’s a lot of value in having a common package type that all kinds of tooling can work with.

                                                                                              1. 2

                                                                                                As a packaging mechanism, in theory embedded files in Go works ok (follows single process pattern). In practice, most Go binary container images are empty (FROM scratch + certs) anyways. Lots of files that are environment dependent that you would want at runtime (secrets, environment variables, networking) that are much easier to declaratively add to a container image vs. recompile.

                                                                                              2. 2

                                                                                                So why Docker/container images and why not ELF or other statically linked formats?

                                                                                                There are things like gVisor and binctr that work this way, as do somethings like Emscripten (for JS/WASM)

                                                                                                1. 2

                                                                                                  I really hope for WASI to pick up here. I used to be a big fan of CloudABI, which now links to WASI.

                                                                                                  It would be nice if we could get rid of all the container (well actually mostly Docker) cruft.

                                                                                              1. 4

                                                                                                I’m familiar with tagged pointers, as so many interpreters use them (particularly Ruby, which I’ve hacked on quite a bit). NaN boxing is new to me, though.

                                                                                                My first thought after reading about the technique is whether signaling NaNs could be used to make arithmetic more efficient. Since using a signaling NaN would generate an exception, generated code could just use the number as-is and catch the exception, falling back to a slow path in that case.

                                                                                                My second thought is that we’ve had tagged pointers for decades, so why aren’t there hardware instructions for working with them?

                                                                                                1. 4

                                                                                                  What I do to make arithmetic more efficient in my nan-boxed interpreter is something like this:

                                                                                                  Value add_op(Value a, Value b) {
                                                                                                      double result = a.number + b.number;
                                                                                                      if (result != result) goto slow_path;
                                                                                                      return Value{result};
                                                                                                  slow_path:
                                                                                                      ...
                                                                                                  

                                                                                                  I optimistically add the two nan-boxed values using float arithmetic. If the result isn’t a nan, I return it. The overhead for the fast path is a nan test and a conditional branch that isn’t taken. The slow path takes care of adding things that aren’t doubles, and throwing an exception if the arguments have the wrong type.

                                                                                                  Using signaling nans and traps would make the code significantly more complicated. And I don’t see why it would be faster. Setting up and tearing down the trap handler for the add case would probably be at least as expensive as the above code, and probably more so.

                                                                                                  1. 2

                                                                                                    That’s interesting; I very much like that technique.

                                                                                                    I wouldn’t expect the trap handler to be set up and torn down for every evaluated opcode, just once at startup. The exception would certainly be slower than a failed branch prediction. The fast path would save a comparison, which may or may not be faster on a modern processor.

                                                                                                    1. 3

                                                                                                      The tail-threaded interpreter passes around its state in registers. In the case of “tails”, there are two registers containing sp and pc. How does a generic trap handler determine what opcode is being executed, and how does it access the interpreter state? If this information is stored into global variables before executing the add instruction, then that setup code is more expensive than the nan test in my code. Is there a way to use a trap that is cheaper than the nan-test code? Without using assembly language or other CPU and ABI specific code, I mean.

                                                                                                      1. 1

                                                                                                        Hmm, that’s a good point. I had imagined it walking the stack like a debugger would, but that would definitely be CPU/ABI specific.

                                                                                                    2. 2

                                                                                                      That’s essentially what I do. The constructor that takes a double checks whether it’s already a NaN, which would of course be misinterpreted, and substitutes the value that means “null”. A handy side effect is that arithmetic on non-numbers produces null, which is similar to SQL semantics.

                                                                                                    3. 2

                                                                                                      My second thought is that we’ve had tagged pointers for decades, so why aren’t there hardware instructions for working with them?

                                                                                                      There are some, these are generally called “tagged architectures”. Things like Burroughs Large Systems and LispMs did this. ARM, PowerPC, and a few others also have support if you opt into it, but these are usually exposed to compiler writers, so your compiler has to know about & use them. It’s definitely an interesting area.

                                                                                                      1. 3

                                                                                                        CHERI is a tagged architecture, with a single one-bit non-addressable tag to differentiate between memory capabilities (which the C/C++ compiler uses to represent pointers and have strong integrity properties) and other data. Arm is providing us with an implementation early next year.

                                                                                                    1. 1

                                                                                                      This is really cool, but I’ve only dabbled with Atari STs before; I wonder how you would test this?