1. 0

    This is an illustration of why I find “type theory” so uncompelling. What is proved here? Why is it interesting? (Animal -> Greyhound) <: (Dog -> Dog) means what? If f: Animal -> Greyhound, then f(pig) is defined, but clearly pigs are not dogs so other than being able to misuse some terminology from category theory, what have we learned?

    Added: Suppose f: X -> Y and g: Z -> Q. If X is a subset of Z and Q is a subset of Y, then any expression involving G(f) can be replaced by G(g) with the properties that if G(f)(a) is defined, then G(g)(a) is defined (since any argument a that is defined for f must be defined for g and since any value in the image of g is a value in the image of f. Hooray. No category theory needed, but on the other hand, still nothing non-trivial learned. In fact, as decades of silly papers about the Liskov principle have shown, a program attempting to make use of this property of subtyping can fail easily because we are generally interested in properties more complex than set membership.

    1. 14

      It means that if you expect a function Dog -> Dog then possessing a function Animal -> Greyhound is good enough. It doesn’t mean what you implied about pigs—to do that is to invert the whole idea of substitutability/variance.

      So I guess the whole point of the post is to explain why what you suggest is wrong and it apparently didn’t succeed.

      1. 0

        I had two points: both poorly expressed, I’m sure. The first is that what is so laboriously expressed using terms copied from category theory is a trivial property of sets. If f: X -> Y and g: Z -> Q and X is a subset of Z and Q is a subset of Y then P(g(x) will be defined if P(f(x)) is defined. Amazing! I showed the same thing using concepts of elementary school algebra - David Parnas once described Milner agebras as a complex way of restating primary school sets and functions, i had the same thought here. And my second point was that this is not a particularly interesting fact for programming because , as shown in the endless literature on Liskov substypes, set membership is often conflicted and complicated and does not carry the semantic information that one might assume.

        1. 12

          Covariance and contravariance are mainstream CS terms, the kind of language you’d expect to read in a book about Java. Not sure why you are bringing up category theory, other than to trigger reflexive disgust in people who don’t know any better.

          It’s good that you find this easy. It would be a problem if someone who has read “endless literature” on the Liskov substitution principle didn’t find a 101-level description of subtyping to be beneath them.

      2. 11

        This is an illustration of why I find “type theory” so uncompelling.

        Do you also find type safety uncompelling? Or is it simply the case that you do not fancy a more elaborate type systems than what C offers?

        What is proved here? Why is it interesting?

        This article is not a proof. This article explains some terminology, with clear examples. The terminology might or might not be interesting, but the concept is interesting because it answers what type of functions are subtypes of another.

        (Animal -> Greyhound) <: (Dog -> Dog) means what?

        It means that (Animal -> Greyhound) is a subtype of (Dog -> Dog). Which means, if you need a function that takes a Dog and outputs a Dog, you can safely use a function that takes an Animal (it will accept dogs) and outputs a Greyhound (which is a dog).

        If f: Animal -> Greyhound, then f(pig) is defined, but clearly pigs are not dogs so

        It sounds like you misunderstood the example. f is of type (Dog -> Dog) -> T, so it wants some function that is a subtype of (Dog -> Dog). You can’t give it a Pig. g might be capable of accepting Pigs, but that is of no concern for f, which only requires that g can take Dogs and gives back Dogs.

        1. 1

          I think it’s a trivial problem. Why is it interesting to know that a function type is a subtype with that definition? imagesOnInternet -> Dogs is a subtype of Dogs->Dogs and so is Animals->Greyhound. So what?

          The point I was attempting to make is that modifying the pre-image may be more semantically interesting than subtype conformance. In many cases, in actual programming, it is an error for a program to accept certain inputs.

          1. 6

            (Animals->Greyhound) is not a subtype of (Dogs->Dogs). And neither is (imagesOnInternet->Dogs.)

            (Dogs->Dogs) is a subtype of (Animals->Greyhound) maybe that’s what you meant?

            But there is no subtype relationship in either direction between (Dogs->Dogs) and (imagesOnInternet->Dogs) so I have no idea what you were trying to say there.

            1. 1

              I agree with you that makes more sense, but in type theory the set of maps X->Y is a subtype of A->B when Y is a subset of B and X is a SUPERSET of A. This is explained in the OP.

              1. 1

                I got the exact opposite of the OP’s article. I’ll have to go read it again to see if miss read it.

      1. 8

        Upvoted for the Why Emacs section alone. I hadn’t heard of many of these back when people were pushing me to use Emacs. This is probably the best marketing I’ve seen for it. It also makes me want to consider trying it again.

        1. 10

          In practice, the “mutable environment” part isn’t so great. My experience with defadvice and redefining functions is that it’s a great way to have your development environment break on upgrade.

          Although it isn’t part of Emacs proper, magit is such a big part of my Emacs experience that I’d add it to any pitch. By far the best Git interface I’ve used.

          1. 3

            In practice, the “mutable environment” part isn’t so great.

            I’ve found the opposite to be true in my experience; using other programs that don’t work like this is endlessly frustrating.

            Of course as with any programming language it’s easy to write brittle code if you aren’t disciplined, but the introspective features make it easy to track down the point of incompatibility.

            1. 3

              Of course as with any programming language it’s easy to write brittle code if you aren’t disciplined

              What do you mean by discipline here? To me, this means that there exists some contract between the maintainer of a library and the user of a library, and as the user, I stay well within that contract so that my code does not rely on implementation details. I guess there are technically some uses of defadvice & redefining functions that satisfy that criteria but they generally aren’t what I think come to mind when someone says “nothing is off-limits” (which isn’t even true)

              I won’t lie and say I haven’t found the ability to patch an arbitrary function useful at times, but I try to limit my usage to temporary fixes until the maintainer accepts my patch upstream. If that goes nowhere I’d rather vendor the library w/ changes in my .emacs.d.

              1. 1

                I won’t lie and say I haven’t found the ability to patch an arbitrary function useful at times, but I try to limit my usage to temporary fixes until the maintainer accepts my patch upstream.

                I fully agree with this as far as it applies to defadvice. The “undisciplined” behavior I referred to above was more about “writing code and not taking responsibility for maintaining it going forward” which is what happens when you make a “fire and forget” defadvice.

                I would say that defadvice is only a tiny part of what makes internal reprogrammability so wonderful. Most of the time when I’m doing magical things in Emacs that wouldn’t be possible elsewhere, it’s writing new code, or reloading new definitions of functions I’ve written in the past, or adding hooks to explicit extension points which library authors have provided. I only resort to defadvice as a band-aid to be removed when the underlying lack of extensibility can be addressed.

            2. 2

              Thanks for the tip. Although I don’t use Git, the first picture is very organized and I understand quite a bit of it. The usability looks good at first glance at least.

          1. 6

            Today I learned that downvoting a comment requires you to pick a category. I guess I’ve never tried to do that here before.

            The comment provided as an example is the first comment I tried to downvote here on Lobste.rs…

            And indeed, none of the available categories fit.

            If there was a ‘destructive’ category, I’d pick that one. Meanwhile, I’ll use ‘troll’, which is clearly not correct. AFAIK.

            [EDIT: uh-oh, it looks as if I’ve committed a “me-too”!]

            1. 5

              I only learned that downvotes require a category today as well. And I was pleasantly surprised both because I realized that I haven’t needed to downvote a comment here before, and because I really like that lobste.rs requires a reason for a downvote. I feel like your comment had value beyond just a me-too, so you’re fine :)

              1. 6

                Interestingly, gave me an opportunity to upvote that comment. Don’t know why someone’s personal feelings, which is what they are describing there should be less valid because of the color of their skin. I thought that was what we were all striving for.

                1. 4

                  People reading that comment who missed the thread might not know it was very context-sensitive. Remember that the context (OP) is specific people pushing a specific set of political views on everyone asking that all disagreement be censored. They say they benefit minorities but wont allow them to have a say if they have different beliefs. Coraline et al are uncompromising in that the options are (a) agree with them pushing same things or (b) shut up and leave every public space they take over.

                  With that backdrop, I read the various Github articles and the OP. She constantly talked about extreme negative reactions she got as if it’s incidental to being a minority. She was a minority, did some work, and piles of hate emerged. She never mentions when doing so that she aggresively evangelizes, insults, and coerces FOSS projects usually with a pile of likeminded people behind her. I kept bringing that behavior up since I think her showing up at people’s doorsteps insulting them and telling them to do her bidding in their projects might be why people dont like her. That pisses all types of people off here in the Mid-South, including minorities. Consistently. I imagine some in other areas, too.

                  Anyway, in the thread you linked, my main comment on that article was judged by site as follows:

                  +73 yes -4 incorrect -1 off-topic -8 troll

                  It means the main post got overwhelming support esp considering how few upvotes I normally get. The others were peripheral supporting it as part of a larger debate. Anyone trying to judge the linked one should probably look at OP and first comment to get context:

                  https://lobste.rs/s/js3pbv/antisocial_coding_my_year_at_github#c_h8znxo

                  Im just a political moderate calling out hypocrisy/deceit of an article’s source (i.e. source integrity) and protecting right to dissent as usual. I do it on all topics. Even my favorites on occasion. On political ones, people tend to have strong emotional reactions that clouds judgment or just evokes strong reactions. Im not saying whose right or wrong so much as disagreement they take personally, get disgusted/angry, and will hit any button to make that person or post disappear.

                  I think I warned of that in either linked thread or Community Standards discussion. Both then and now, people started calling out others that should disappear with often opposite views of what should be allowed. There was no consensus except against comments that are blatantly harmful where there is a consensus by most peeple that it’s abusive. The same thing I see play out in person every day. So, I oppose comment deletions or bans in political situations without consensus so long as people keep it civil and about specific claims with supporting evidence. And if one side can speak, the other parties better be able to as well.

                  And a minimum of politics on Lobsters period! Keep it focused on tech and such. Somone had to post something by a decietful activist on politics pushing a mix of truth and propaganda. And that hit my mental button of calling them out staying as civil and factual as I could despite knowing with every word I might be censored for it. Might. The upvotes from my first comment were reason I kept taking the risk of more argument given there was a surge of dissent that needed to be represented. Not just me. I always help the underdogs. :)

                  Note: That was long as we were just talking about but I wanted context and intent clear given it’s about whether to filter or ban me. I also hold no grudges against anyone who did. It’s their deeply-held, personal beliefs about what’s right and wrong. People will do what you believe is necessary there.

                  Note 2: Lunch break is over. Darn. I was hoping for tech over politics. Ill do what’s necessary, though, since I value and respect this community. Gotta defend dissent as it’s critical.

                  1. 3

                    While I disagree with your positions on the topic of the OP, that’s not really what I wanted to bring attention to in this thread. And, as you correctly point out, the longer post you had there does contribute to the discussion. This is why I specifically linked only to that one comment, because that is the only one I feel is not contributing, constructive, or otherwise meaningful as a part of the larger thread. Under no circumstances do I think any of what happened in that thread is cause for banning or deletion; on that, we are in complete agreement. What I wanted to highlight in this topic is that we should have a way of discouraging comments that are solely inflammatory without carrying other value, and I believe that particular one was of that kind. I did not downvote your other posts despite disagreeing with them, because (as also mentioned elsewhere in this thread), I do not think disagreement should be a reason for downvoting. We can have a whole different discussion about how politics and tech mix, but that does not belong in this thread.

                    1. 3

                      This is why I specifically linked only to that one comment, because that is the only one I feel is not contributing, constructive, or otherwise meaningful as a part of the larger thread. Under no circumstances do I think any of what happened in that thread is cause for banning or deletion; on that, we are in complete agreement.

                      Well, my respect just went up for you quite a bit. Very reasonable position far as critiques go. The selected comment was lower info than the other one and maybe even unnecessary. Likely because it was part of a back and forth on politics where comment quality on all sides (including my own) tend to get lower as it goes on. One of reasons I don’t like political discussions in low-noise sites like Lobsters. They also can have less info since more of the specific points and context is already defined where the replies start just implying that stuff with less info content in general. That one was some combination of those.

                      In any case, I appreciate you clarifying your position. I at least get why you’d want to see less of that kind of comment than the main one.

                      1. 4

                        I’m glad we’ve found common ground. As I’ve said elsewhere (this thread is getting pretty large), I don’t want to see downvotes used as a way to signal disagreement, nor do I want them to be used to “punish” a particular user or otherwise label the user as bad. Downvotes to me are a way of signaling that a particular comment is unwanted, along with the reason why, nothing more, nothing less.

                        1. 1

                          I’m fine with that as long as there’s a consensus across majority of community’s users. That’s really all I ask with these sorts of things even though I’m biased toward free speech or low censorship. Your proposal isn’t a big risk to that esp given it’s mostly a tech-focused forum.

                  2. 1

                    alynpost’s suggestion meets my needs and I withdraw my support for a new category.

                  1. -1

                    Consequently, to overcome this restriction, the implementations of Rust’s standard libraries make widespread use of unsafe operations, such as “raw pointer” manipulations for which aliasing is not tracked. The developers of these libraries claim that their uses of unsafe code have been properly “encapsulated”, meaning that if programmers make use of the APIs exported by these libraries but otherwise avoid the use of unsafe operations themselves, then their programs should never exhibit any unsafe/undefined behaviors. In effect, these libraries extend the expressive power of Rust’s type system by loosening its ownership discipline on aliased mutable state in a modular, controlled fashion: Even though a shared reference of type &T may not be used to directly mutate the contents of the reference, it may nonetheless be used to indirectly mutate them by passing it to one of the observably “safe” (but internally unsafe) methods exported by the object’s API

                    C has too many unsafe operations. To solve this problem, our new super language rules out all unsafe operations except those which one precedes with the keyword “unsafe”. Ta da!

                    1. 16

                      There’s always an unsafe part. It’s like the trusted part in secure systems: it’s the TCB. You can’t get rid of it entirely. So, you make it as small and simple as possible. Then interface with it carefully. In the process, you avoid the severe damage (esp code injection) of common defects in vast majority of your code.

                      That they can do this up to temporal safety and race-free concurrency is a huge improvement over status quo.

                      1. 1

                        I don’t even know if that’s a good method of design for security. DJBs comments on design for security seem pretty insightful. To be fair, I don’t have a better solution, just some skepticism.

                      2. 11

                        What evidence would convince you that unsafe markings as manifest in Rust are an effective tool?

                        1. -2

                          I think the requirement for unsafe indicates that the basic system is not adequate. Either you solved the problem or you didn’t. I think that e.g. Java or Go or Lua using C libraries is a more coherent response than a system programming language with an elaborate safety mechanism that needs to be defeated in order to implement its own libraries. This is the same problem I have with the stupid C standard type aliasing rules: to impose “safety” restrictions that have to be escaped in order to implement basic functions seems like putting ones hands over ones eyes.

                          1. 14

                            Okay? I’ll note that you didn’t actually answer my question. Skepticism is good, but it’s a lot more productive when you can state more precisely the level at which your belief is falsifiable. Like, I don’t know what “you either solved the problem or you didn’t” actually means in this context.

                            elaborate safety mechanism

                            How is unsafe simultaneously just a keyword and also an elaborate safety mechanism? I found your initial comment overly reductive, but you jumped from that to “elaborate safety mechanism” in the blink of an eye! What gives?

                            seems like putting ones hands over ones eyes

                            How so? What examples of “safety” restrictions are you referring to? How are they like Rust’s unsafe keyword?

                            1. -2

                              The Rust system of memory management and pointer aliasing is elaborate. But to create necessary libraries, the pointer safety system needs to be escaped. To me, that’s a design failure. It’s the classic failure in security too. It’s not like you can average safety together: 1000000 lines of totally safe code and 10000 lines of unsafe does not make it 99% safe.

                              1. 13

                                … You still haven’t answered my question! Could you please address it?

                                The Rust system of memory management and pointer aliasing is elaborate.

                                This seems inconsistent with your initial comment.

                                But to create necessary libraries, the pointer safety system needs to be escaped. To me, that’s a design failure. It’s the classic failure in security too.

                                If it’s a design failure, then that implies there is either a better design that isn’t a failure or that there is no possible design that wouldn’t be a failure in your view. If it’s the former, could you elaborate on what the design is? (Or if that’s not possible to do in a comment, could you at least describe the properties of said design and what you think it would take to achieve them?) If it’s the latter, then we are back to square 1 and I’m forced to ask: are some design failures better than others? How would you measure such things?

                                It’s not like you can average safety together: 1000000 lines of totally safe code and 10000 lines of unsafe does not make it 99% safe.

                                Who is doing this, exactly? Do you think such a simplistic reduction is an appropriate way to judge the merit of a safety system? Can you think of a better way?

                                1. 0

                                  … You still haven’t answered my question! Could you please address it?

                                  I’m not sure why you are still confused by this but the “elaborate” in my initial comment was not referring to the escape.

                                  This is like: I’ve invented a perpetual motion machine, you just need to push it every now and then to keep it moving. I’ve invented a safe programming language, it just needs an unsafe escape mechanism or an FFI for implementing real applications.

                                  If it’s a design failure, then that implies there is either a better design that isn’t a failure or that there is no possible design that wouldn’t be a failure in your view

                                  I think there should be a better design, but don’t know what it is.

                                  This seems inconsistent with your initial comment.

                                  It is not even remotely inconsistent.

                                  1. 7

                                    This is the question I was referring to that I haven’t seen answered:

                                    What evidence would convince you that unsafe markings as manifest in Rust are an effective tool?

                                    1. 1

                                      My complaint is that the language requires an escape mechanism. So what would convince me is if it did not need to turn off its own safety mechanisms.

                                      1. 8

                                        If it didn’t need to turn off its own safety mechanisms, then the unsafe markings themselves would cease to exist. So, that doesn’t answer my question unfortunately. If you’d like some clarification on my question, then I’d be happy to give it, but I’m not sure where you’re confused.

                                        Here’s another way to think about this:

                                        I think there should be a better design, but don’t know what it is.

                                        What would it take to convince you that there is no better design? If you were convinced of such an outcome, would you still consider Rust’s memory safety mechanisms a design failure?

                                        Here’s yet another way: if a better design does exist, do you think it’s possible to improve our tools until the better design is known, even if you would consider said tools to be a design failure? Or are all design failures equal in your eyes?

                                2. 1

                                  If C had been defined with keywords to partition blocks with unsafe operations from safe ones, wouldn’t leveraging those be a best practice now? Or do you feel like we would see it now as a design failure of C?

                                  This concept seems very similar to inline assembly and/or linking against handwritten assembly implementations of popular functions. C-libraries generally have some critical sections implemented in assembly whereas it’s much less popular for normal applications to leverage this feature.

                              2. 13

                                I think the requirement for unsafe indicates that the basic system is not adequate.

                                I would like to write an operating system. I need to write a VGA driver. My platform is x86. To do this, the VGA device is memory mapped at the physical address 0xB8000. I have to write two bytes to this address, one with the colors, and one with the ASCII of the character I’d like to print.

                                How do I convince my language that writing to 0xB8000 is safe? In order to know that it’s a VGA driver, I’d need to encode the entire spec of that hardware into my programming languages’ spec. What about other drivers on other platforms? Furthermore, I’d need to know that I was in ring0, not ring3. Is that aspect of the hardware encoded into my language?

                                How would you propose getting around this?

                                I think that e.g. Java or Go or Lua using C libraries is a more coherent response

                                This is interesting, since many people refer to unsafe as a kind of FFI :).

                                Fundamentally, the difference here is “when you use FFI you don’t know because it’s not marked”, and unsafe is marked. Why is not marking it more coherent? They’re isomorphic otherwise.

                                1. 3

                                  How would you propose getting around this?

                                  I think the point is somewhat that you can’t without losing your ability to claim truly safe and secure status.

                                  A totally-safe systems language has to include the semantics of the hardware systems it runs on, otherwise it’s just wishful thinking.

                                  Now, it’s clearly a hard problem on how to do this, right, but maybe that’s informative in and of itself.

                                  1. 6

                                    Or it just needs a certifying compiler or translation validation of generated code. Certifying compilers exist for quite a bit of C, LISP 1.5, and Standard ML so far. Ensures resulting assembly will do exactly what source says. They also have intermediate languages in them that themselves can be useful.

                                    As TALC and CoqASM show, one can also add types and memory safety to assembly code to prove properties directly. One could replace the unsafe HLL code with provably-safe, assembly code. Then, you just need to take the interface specification of one then plug it into the others’ tooling. It’s one of the things Microsoft did for VerveOS: OS written in C# compiling to typed assembly interfacing with “Nucleus” separately verified.

                                    https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/pldi117-yang.pdf

                                    1. 4

                                      Actually, I think this misses the point. The point, IMO, are the trade offs between two choices:

                                      1. Encoding the safety in the language itself by encoding the semantics of the hardware systems it runs on.
                                      2. Give users the tools to encode safety themselves.

                                      The process for proving safety is the same and neither are “more safe” than the other. The only difference between them is that one is practical while the other is not (in a general programming language). That is, neither choice is “totally safe” (by your defintion). The choices just push safety around different levels of abstraction. The abstraction of safety in the first place is the important bit.

                                      1. 1

                                        Give users the tools to encode safety themselves.

                                        That’s what we did with C, and look how that turned out. :)

                                        The only way to prevent people from doing stupid things is to forbid them by construction–and sadly, this often limits the clever things too.

                                        1. 6

                                          That’s what we did with C, and look how that turned out.

                                          Uh, no? C has no way to encapsulate memory safety.

                                          The only way to prevent people from doing stupid things is to forbid them by construction–and sadly, this often limits the clever things too.

                                          This seems overly reductive to me. We don’t actually have to prevent people from doing stupid things. A measurable reduction would be an improvement.

                                          1. 0

                                            C has no way to encapsulate memory safety.

                                            Eh? Users can “embed” memory safety by using the correct library calls and compiler warnings and whatnot to catch mistakes (like, incorrect number of args to printf, using uninitialized pointers, etc. and so forth)–and can use libraries that provide APIs that do things like prevent incorrect allocation of memory and unsafe arithmetic.

                                            The problem with saying “Users can embed their own safety!” is that you then have to consider all of the legacy ways that users did that (e.g., in C, as explained above) and how that didn’t always work, because of user failings.

                                            And a “measurable reduction” instead of complete prevention makes Rust a lot less compelling than just using people’s existing knowledge of C and competent analysis tools and practices.

                                            Given the amazing marketing efforts by the Rust Evangelion Strike Force and friends, I’d rather hope you all would look to see just how good you could make it.

                                            1. 10

                                              Eh? Users can “embed” memory safety by using the correct library calls and compiler warnings and whatnot to catch mistakes (like, incorrect number of args to printf, using uninitialized pointers, etc. and so forth)–and can use libraries that provide APIs that do things like prevent incorrect allocation of memory and unsafe arithmetic.

                                              This equivalence you’re trying to establish seems incorrect to me. In Rust, I can provide an API and make the following guarantee that is enforced by the compiler: if my library is memory safe for all inputs to all public API items, then all uses of said library in safe Rust code are also memory safe. You can’t get that guarantee in C because it’s unsafe-everywhere. The obvious benefit of this implication is that unsafe becomes a marker for where to look when you find a memory safety bug in your program. Equivalently, unsafe becomes a marker for flagging certain aspects of your code for extra scrutiny. This implication is what makes encapsulation of safety possible at all.

                                              There are plenty of downsides with this scheme that make it imperfect:

                                              1. People could misuse unsafe. (e.g., By not marking something as unsafe that should be unsafe.)
                                              2. The safe subset of Rust could be so useless that unsafe winds up being a large proportion of one’s code.
                                              3. If unsafe is used infrequently, then folks will have less experience with it, and therefore might be more inclined to screw it up when they do need to use it.

                                              But they are tradeoffs. That’s my point. They must be evaluated in a context that compares your available choices. Not some idealized scheme of perfection. Everyone who has used Rust has formed their own opinions about how well it guards against memory safety bugs. If we’re lucky enough, we might even get to collect real data that supports a conclusion that using Rust leads to fewer memory safety bugs than C or C++ in the aggregate. (The answer seems obvious enough to me, and I have my own data to support it, but it’s just anecdotal.)

                                              And a “measurable reduction” instead of complete prevention makes Rust a lot less compelling than just using people’s existing knowledge of C and competent analysis tools and practices.

                                              “you aren’t perfect, so you aren’t worth my time” — That’s an amazing ideal to have. I don’t know how you possibly maintain it. Seems like a surefire way to never actually improve anything! Surely I must be mis-interpreting your standards here?

                                              For me personally, I’m more inclined to not let perfect be the enemy of good.

                                              Given the amazing marketing efforts by the Rust Evangelion Strike Force and friends, I’d rather hope you all would look to see just how good you could make it.

                                              <rolleyes> Go troll someone else.

                                              1. 2

                                                “you aren’t perfect, so you aren’t worth my time” — That’s an amazing ideal to have. I don’t know how you possibly maintain it. Seems like a surefire way to never actually improve anything!

                                                As you said, there are plenty of tradeoffs with Rust’s memory safety scheme, and established industry knowledge of C vs. Rust is just another kind of tradeoff. That seems to be his point.

                                                1. 3

                                                  If that was the point, then I’d be happy, but that’s not my interpretation of friendlysock’s comments at all. (They contain zero acknowledgment of tradeoffs, and instead attempt to judge Rust against a model of perfection.)

                                                  1. 1

                                                    And a “measurable reduction” instead of complete prevention makes Rust a lot less compelling than just using people’s existing knowledge of C and competent analysis tools and practices.

                                                    I interpreted this as “use existing knowledge and tooling to make C programs better, or switch to a new language with some embedded memory safety guarantees.” Sounds like there are tradeoffs in there to me, especially if you rely on unsafe code.

                                                  2. 1

                                                    You’ve grokked the essence of it–it’s not just industry knowledge, it’s also things like the vast amount of code which, while unsafe, has been tested and patched, and the operating system protections put into place to mitigate compromised programs.

                                                    Who cares if somebody can escape to a shell via an overflow if they end up as a neutered user account?

                                                    1. 5

                                                      Who cares if somebody can escape to a shell via an overflow if they end up as a neutered user account?

                                                      Step 1: Get barely-privileged account.

                                                      Step 2: Privilege escalation with another bug.

                                                      This works so much it’s standard in hacking guides. So, preventing that vulnerability bug is quite worthwhile. If you’re preventing step 2, then whatever they’re interacting with that’s privileged has to have no or few bugs. That hasn’t been true in mainstream software. So, these two steps are both worth putting extra effort into given the countless vulnerabilities that have happened with each.

                                                      1. 2

                                                        Fair enough!

                                                2. 4

                                                  It’s impossible to have a programming language that is simultaneously 1) good for systems-level programming and 2) has no mechanism for bypassing memory safety. On Linux you can simply read and write from /proc/self/mem. Windows and Mac OS X have similar mechanisms.

                                                  1. 1

                                                    Is that true?

                                                    Java ME Embedded, for example, was successfully used as a systems programming language without that escape hatch. If one reads the Oberon documentation, it looks like this language (used for real systems!) managed to support pointers without a lot of the pitfalls.

                                                    1. 3

                                                      So your definition of “good for systems-level programming” excludes being able to read and write from the filesystem on Linux?

                                                      1. 0

                                                        That’s not a reasonable criticism. The programming language is not asked to enforce safe use of the memory subsystem or the file system or to keep someone from jamming a paperclip in the processor fan. The question is purely a design decision for the language. Obviously there are many ways to split the difference and, right now, none of them are totally satisfactory.

                                                        1. 6

                                                          I believe my criticism is quite reasonable.

                                                          • friendlysock wants “complete prevention,” and is against “measurable reduction.”

                                                          • Rust is a systems programming language, and for a systems programming language, security is relative to the set of syscalls available.

                                                          • Popular operating systems have syscalls that violate memory safety.

                                                          • Therefore, “complete prevention” is out of the picture, and we must talk in terms of “measurable reduction.” This completes my argument.

                                                          It often isn’t worthwhile to model “paperclip in the processor fan”, although, sometimes it is. If you’re designing a programming language that can deliver correct results in the face of faulty hardware, then random bit flips need to be part of the model. If you’re designing a systems programming language, then the syscall interface needs to be part of the model. Once you acknowledge that, you find that fretting about the programmer using unsafe when they shouldn’t is pointless.

                                                          1. 1
                                                            friendlysock wants “complete prevention,” and is against “measurable reduction.”
                                                            

                                                            Not quite–I’m not “against” it, it’s just that the benefits of using something that only provides “measurable reduction” instead of “complete prevention” are not sufficiently large when I also take into account retraining and retooling.

                                                            Popular operating systems have syscalls that violate memory safety.

                                                            Ah, I think I see the angle you’re taking. I kind of assume that since we’re talking a systems programming language, we ignore ill-conceived syscalls since we could be writing an OS that doesn’t contain them.

                                                            1. 1

                                                              If we’re talking about a language for writing applications for an operating system that uses type safety to enforce process isolation, then I agree, Rust as-is is not suitable for the task. Maybe Rust combined with a different language for unsafe segments, as mentioned in this thread.

                                                            2. 1

                                                              If you’re designing a systems programming language, then the syscall interface needs to be part of the model.

                                                              Since it’s highly relevant to this topic, see Galois Group’s presentation on Ivory, a synthesis language for systems. Notably, they assume that an underlying system task scheduler exists.

                                                              1. 1

                                                                You might find immunity-aware programming interesting:

                                                                https://en.m.wikipedia.org/wiki/Immunity-aware_programming

                                            2. 2

                                              “I’d need to encode the entire spec of that hardware into my programming languages’ spec. “

                                              That’s pretty much what you do. You can do abstract, state machines with its key properties to avoid doing the whole thing in detail. You do one each for program and hardware function. Then you basically do an equivalence on the likely inputs.

                                              Alternatively, you assume the hardware works then specify and implement the unsafe stuff in something like SPARK Ada. Prove it safe. Then wrap the result in Rust with interface checks that ensure safe Rust code uses the “unsafe,” but verified, code it’s calling in a safe way. I think Rust’s would do well combined with tech like Frama-C or SPARK for unsafe parts.

                                              End result of either method is only the lowest-level or least safe stuff needs extra verification. Still reduces burden on developers a lot versus looking for all kinds of undefined behavior..

                                              1. 4

                                                What does this have to do with whether Rust’s safety guarantees are completely encoded in the language vs permitting users of the language to use unsafe markers?

                                                1. 2

                                                  Im only addressing how to handle code that’s actually unsafe that’s included in Rust apps and can maybe break the safety. I dont know the implications or uses of Rust’s unsafe keyword enough to do anything further. Just defaults I say for any safe, systems language with unsafe parts. :)

                                                  1. 2

                                                    The point is that encoding the hardware semantics is not sufficient to remove unsafe while retaining Rust’s “zero-overhead” guarantee. You would also need to add a logic that subsumes quantifier-free Peano arithmetic to your type system which will seriously gum up the little type inference that Rust does already. One alternative is to use a different (more proof-oriented) language and type system for the unsafe bits. The boundary between unsafe and safe then is well-defined as the point where the invariants of the unsafe code can be expressed in terms of Rust’s type system.

                                                2. -1

                                                  That’s my point. I want to write an OS. You propose a programming language that includes elaborate type safety mechanisms, particularly strong control of pointers and explain it is much better than C/C++ or Ada or assembler or some other unsafe language because of this mechanism and also when I need to write a VGA driver or parse a command string or do any of the other things that are where raw pointers are most problematic - I can escape the control! So, to me, you haven’t solved the actual problem, you just made coding more inconvenient.

                                                  The basic problem is very difficult, I agree.

                                                  Fundamentally, the difference here is “when you use FFI you don’t know because it’s not marked”, and unsafe is marked. Why is not marking it more coherent? They’re isomorphic otherwise.

                                                  How do I know when I call a library function that, 7 layers down, some knucklehead decided code would look “more clean” using the unsafe escape? you are right: it’s essentially a FFI. So what have you gained? It may well be that the general design of the language is great and has other virtues, but you have not really solved the problem of unsafe pointers, you’ve just swept them under the FFI. The FFI at least gives me a clean separation.

                                                  1. 5

                                                    you just made coding more inconvenient

                                                    In my experience, coding becomes more convenient. Is your experience to the contrary? Could you elaborate?

                                                    The basic problem is very difficult, I agree.

                                                    What do you think the basic problem is?

                                                    So what have you gained?

                                                    A tool to encapsulate unsafety.

                                                    1. 1

                                                      A tool to encapsulate unsafety.

                                                      Textually. But e.g. a buffer overlow in this “encapsulated” code can spill into your safe zone - no?

                                                      1. 6

                                                        Yes, but then you know precisely from where it spilled, unlike in a language that does not demarcate unsafe code. That’s what you gain. When things go wrong, it’s quite quick to hone in on the problem area. The surface area to explore is reduced potentially by orders of magnitude.

                                                        1. 3

                                                          Yes. If there’s a bug. What form of encapsulation doesn’t have the problem that it might contain bugs?

                                                        2. 1

                                                          A tool to encapsulate unsafety.

                                                          Textually. But e.g. a buffer overlow in this “encapsulated” code can spill into your safe zone - no?

                                              1. 2

                                                My suspicion is that computer scientists have gotten types all wrong - and have latched onto type theory from formal logic without understanding why formal logic failed at its larger goals.

                                                1. 10

                                                  Why do you say that? I’d say computer scientists have been developing types in an interesting way that directly accounts for the flaws in formalism and profits from them. CS forces the “open logical system extended through learning/communication” perspective which avoids the failings of Formalism.

                                                  1. 0

                                                    I don’t know what you are quoting, but types as sets or as some byzantine formalism (such as the Hindley-Milner mess) doesn’t solve any problem as far as I can tell. In actual programming, types and polymorphism are simple and come from the mapping between finite bit sequences and values. Why one would need to drag in a failed project from metamathematics is something I don’t get.

                                                    1. 11

                                                      You are making too big of a deal of the connection to set theory. Nobody is “dragging in a failed project”, it is mostly a historical curiosity at this point. Most research in type theory involves systems that are unsound when interpreted as a logic. Even the people today interested in using type theory to formalize mathematics are using it as a way to replace set theory, with massive implications for machine verification of proofs.

                                                      I don’t understand how you can call Hindley-Milner a “byzantine formalism.” I get the feeling that you are unfamiliar with the sequent notation commonly used to describe the precise behavior of a type system. Hindley-Milner is the simplest type system that can elide type annotations and has the principal type property (each expression has a “best type”). Is type inference esoteric? Is querying your IDE for the type of an expression esoteric?

                                                      types as sets … doesn’t solve any problem as far as I can tell.

                                                      In actual programming, types and polymorphism are simple and come from the mapping between finite bit sequences and values.

                                                      So I guess you could say that a type is a set of finite bit sequences?

                                                      1. 0

                                                        . Most research in type theory involves systems that are unsound when interpreted as a logic.

                                                        Really? It is definitely not my field, but I can’t imagine what value could be obtained by a logical system in which False is a theorem. Certainly Milner claims his scheme to be sound. Maybe I don’t know what you mean by “interpreted as a logic”.

                                                        I still don’t know what problem is being solved. I see what Hindley is trying to do, but he’s interested in combinatory logic which is something that does not interest me all that much (I’m not saying it is not an interesting topic, just not to me). What I don’t get is what utility comes from applying such a generic approach to programming languages.

                                                        I don’t see a type as a set of finite bit sequences, since the interpretation of the same bit sequences into mathematical objects can be different. For example, if we look at the set of K bit sequences, we can have a mapping into Z_{2^K} and a mapping of symbolic operations into operations e.g. so integer(“a*b”) => integer(a) times integer(b) mod 2^K or something. We could also map to a subset of the reals with a NaN value if we are using the same bit sequences as floating point. To me, the hard parts of type theory are figuring out how to bound errors, deal with overflows, work with limited representations, handle not-a-number results etc.

                                                        1. 7

                                                          “Research in type theory” comes in two camps: the PL-ish and the mathematical. There’s a lot of value in a type system that proves False on the PL-ish side as that is the type of a value for which you have no information and any program with a non-productive infinite loop must take that type. You can excise general recursion, but now we’re heading toward the mathematical side of type theory.

                                                          Even when False is a theorem you can still get a lot of value out of the fact that structural reflection on the rules leading to a type judgement can give you a lot of information about the dynamic semantics of the proof/program. In this case, False just confers the information I said above: a totally non-productive program which will not, even in principle, ever produce information.

                                                          I’m not the one suggesting that types are sets of finite bit sequences (I’d argue pretty heavily that they’re nothing close to that at all!) but I would suggest that the practical problems you’re taking aim at w.r.t. type theory are very interesting and totally a real part of the PL-ish community of studying type theory.

                                                          They’re a bit weird from a mathematical perspective because there isn’t as much of a foundational motivation for why one should spend time trying to model overflows, but even that’s pretty interesting. I’d love to see type theories which would let you talk about Reals (perhaps computational, intuitionistic Reals) and Doubles together and relate what properties can be “ported” from one to the other as a basis for discussing how finite approximations to algorithms designed over the Reals can succeed or fail.

                                                          Why type theories as opposed to something else? Because type theories essentially specialize in information that can be obtained statically. Oppose this with information that can only be obtained “dynamically” by running a program. In a world where non-termination is a real, ever-present concern (more generally, there is a real “cost of learning” even in the very most theoretical sense) what type theories present is a measure of what “free” information is available.

                                                          A PL designed with a type system begins to optimize its design to offer more “free” information all the time. This tendency doesn’t always help directly with concerns which are structured to be dynamic problems (“will this algorithm ever OOB this array?”) but it does offer an opportunity to stack cracking away at those problems by expressing more information about the system under question statically.

                                                          Abstract interpretation (which, caveat emptor, I don’t really understand that well) seems like another dimension in this whole picture. It offers information which isn’t “free” (completely static) but offers opportunity to start accessing information about a problem without paying the “entire cost” of running it.

                                                          1. 2

                                                            I’m not the one suggesting that types are sets of finite bit sequences (I’d argue pretty heavily that they’re nothing close to that at all!)

                                                            I was primarily pointing out what I believed to be a contradiction, but I’ll defend this viewpoint anyway. Let’s say we are compiling to X86-64 and our compiler is ignoring SSE registers. Then every type–including arrow types, which are often thought of as primitive–ultimately boils down to “int64”. Parametricity prevents us from inspecting these physical representations, which not only preserves the freedom of the compiler to choose the physical representation, but also memory safety. If we are compiling to a typed assembly language, this may very well be the actual way it works.

                                                            1. 1

                                                              Sorry, didn’t mean to come off critically. I totally understand an idea that types are sets of bit sequences—I just didn’t want to defend it right there!

                                                            2. 0

                                                              They’re a bit weird from a mathematical perspective because there isn’t as much of a foundational motivation for why one should spend time trying to model overflows, but even that’s pretty interesting.

                                                              So my suggestion is that not only is there little foundational motivation for modelling overflows, but that there is little engineering motivation for studying foundations.

                                                              I don’t get your claims about False as a theorem. If False=True, then we can prove that e.g. the type of a function is X and also Y and also anything else. Unsound deduction produces no information at all. Are you sure you don’t mean “incomplete” rather than “unsound” ?

                                                              1. 2

                                                                I can see your perspective, but I think there’s a lot of engineering motivation in studying foundations. I spend more time trying to model situations and build and test expectations about those models than I do working with algorithms. For me, having a system to introduce ideas and given them logical support is great and quite practical.

                                                                The heart of what makes False useful is that while it does implode your propositions, it cannot be practically realized. A generic coercion function forall a b. a -> b can be derived from False, but canonical inspection of the proof “takes forever”. So, your logic works for all “productive programs” and doesn’t require you to toss out general recursion.

                                                                Then, you examine proofs assuming that you don’t have False and see what you get. This is still meaningful information even though it can be in principle violated. It’s useful because that matches the logic we use when programming anyway: maybe there’s an infinite loop here, but I don’t think there is so let’s assume there’s not and see what we get. Then at runtime we find we’re wrong, oh no!

                                                                Logics exist that let you avoid this song and dance but they’re require a lot more of the programmer. It’s not always a practical tradeoff.

                                                        2. 2

                                                          Not quoting, relating a fuzzy idea from the Intuitionist’s school of logical foundations.

                                                          types and polymorphism are simple and come from the mapping between finite bit sequences and values

                                                          I don’t think I can agree with that statement. At the very least I’d like to understand what you mean by “value” here. I’d probably also be interested in better understanding how you’re defining “type” and “polymorphism”.

                                                          But maybe the better place to start is: I can agree with you that formalists failed to achieve some of their stated ends, but in what way do you feel that type theory as applied to PL is depending upon those successes or continuing along failed routes?

                                                          1. 0

                                                            They failed all of their stated goals. No? Will try your harder questions later.

                                                            1. 9

                                                              They “failed” their ultimate Program in that what Hilbert hoped to achieve was proven impossible. This is a bit different from their “goals” however. The goal was ultimately to achieve greater understanding of the challenge of formalizing mathematics and logic and to create common understanding and shared language around the most challenging fundamental aspects of it. Hilbert expressed a philosophical goal by suggesting how this would play out and was shown to be wrong in his guess. That doesn’t mean that the larger mathematical work was a failure at all.

                                                              The intense study of these problems led to things like types and type theory which embrace a different philosophical foundation than Hilbert (or, really, are quite foundationally agnostic) and have been widely successful as a language for discussing mathematics. Instead of showing, as they hoped, that there could be an ultimate final formalization of algorithms for constructing mathematical proofs of any variety they instead showed that intense study of formalization and foundation proves the need for continuous growth of mathematical thought. It forces the issue of an open and growing system and ends up being a Full Employment Theorem for mathematicians.

                                                              Type theory here is just the technology for letting Mathematical Foundations reach that stage of its evolution. Even if we as of yet hadn’t found a way past the sticky point Gödel left us in we’d still have to call type theory successful as a technology.

                                                              1. 0

                                                                I’m not in a position to argue that in terms of mathematics as a whole, but I don’t see any interesting results in CS that are not self-referential.

                                                      2. 2

                                                        Should computer scientists care why formal logic failed at the goals mathematicians gave it?

                                                        What matters to us is if it helps create more reliable software. If mathematicians set their goals higher and failed to meet them, that’s orthogonal.

                                                      1. 3

                                                        Thank you for this!!!!

                                                        After using Colemak (3+ years) and then attempting Workman (slightly better than Colemak at reducing discomfort with reduced horizontal index finger travel for me personally), I’m ready for a keyboard that’s optimized for reduced pinky usage (even on Windows/Linux machines, I’ve swapped Ctrl with Alt/Meta such that keyboard shortcuts primarily use my thumb like Mac OSX’s Cmd) while still reducing the horizontal finger motion that was so common with Colemak.

                                                        Time to roll up my sleeves and learn QGMLWY!

                                                        For anyone who suffers from typing discomfort, I can’t recommend alternative keyboard layouts enough. It’ll likely take a long while to get used to typing in a different keyboard layout, however (I believe Colemak took me well over 8+ months to get decently proficient at [80+ WPM; my QWERTY baseline is about 95WPM], and I never did get proficient to the level I would have liked with Workman…).

                                                        However, if you’re not willing to take the plunge to retrain your muscle memory (not a small undertaking!), there’s two small changes that really helped me out which I would recommend to anyone:

                                                        1. Swap Capslock with Backspace. No more reaching the top right side of the keyboard with your right pinky in an awkward motion! Some VIM users have told me they remapped this to Esc… but I’m much more of a Ctrl+C person (plus, after the second tip below, Ctrl+C no longer becomes a torture test on your left pinky!)
                                                        2. Swap Left Ctrl and Left Alt so that hotkeys only requires your thumb to hold onto the modifier instead of your pinky! (This is unnecessary if you’re on Mac OSX)
                                                        1. 2

                                                          I had pinky problems and have been using QFMLWY for 6 years. It’s one of the best investments I’ve made in my career. If you want a keyboard try the Kinesis Advantage.

                                                          I wrote a little more here last time this came up on lobste.rs

                                                          1. 2

                                                            Thanks for the testimonial! Btw, what made you choose QFMLWY over QGMLWY (the latter is the one with ZXCV unchanged)? Part of the reason I was attracted to Colemak/Workman was because I didn’t want to have to change my hot key muscle memory/bindings (one of the reason why I never gave Dvorak a try). I’m guessing you didn’t find that to be a problem?

                                                            I’ve demo’d the Kinesis Advantage in person, and wasn’t quite a fan of the bowl size (I have small hands. I’ve also used the Ergo Dox previously and had to sell it because my hands also too small to reach the keys and the thumb clusters comfortably)–I’m thinking of getting a TypeMatrix 2030 keyboard since I did enjoy the columnar non-staggered layout of the Ergo Dox.

                                                            1. 1

                                                              Oops, I actually use QGMLWB, I can never remember which and just copied what I (mistakenly) said last time. They’re similar enough that you can confuse them so I don’t think it matters what you pick :) I’d just go with your intuition.

                                                              However, your concern still applies. The answer is that I don’t use keyboard shortcuts outside of my custom Emacs setup in any significant capacity. But even if I did, it wouldn’t have been a consideration–I overhauled everything at once and just resigned to being useless for a few weeks.

                                                              The TypeMatrix looks good to me except for Ctrl under the pinky. I think if I had used this keyboard I would have kept with the foot pedals.

                                                              1. 1

                                                                Yeah, I’m definitely going to give the “most optimized” version a try… What do I have to lose ;)?

                                                                Re: TypeMatrix: Per my own “life pro tip #2” in my GP post, I would personally be swapping Left Ctrl and Left Alt, so that I’d be using my thumb instead of my pinky for Ctrl (I never ever use Right Ctrl anyways, so that’s not much of a big deal, and if I needed to use Alt, for say Alt + Tab, I just use a combination of my right thumb [on R-Alt] and my left ring finger [on Tab]).

                                                          2. 2

                                                            I took the hardware way to solve the ‘pinky’ problem, and bought a typematrix 2030. It brings the enter amd backspace in the middle so you use your index/thumb to press them. The shift/control keys are also taller to make them easier to access.

                                                            1. 1

                                                              I swapped CapsLock for Ctrl and its 1000% more comfortable for my hands to not have to reach for the Ctrl key. Having CapsLock on home row and then having it be such a rare keypress (does anyone use caps lock any more) is easy to change into a big win.

                                                              I’ve used Dvorak for a couple years and as a programmer, I would recommend Colemak to someone interested simply because they leave the symbol keys alone. Having dvorak’s home row vowels is a huge win but largely off setted by putting <>? up at QWE.

                                                              1. 1

                                                                However, if you’re not willing to take the plunge to retrain your muscle memory (not a small undertaking!)…

                                                                Still not a small undertaking, but you feel better even after an hour of fumbling as you learn it. Compared to Colemak and Workman where I still couldn’t get with O and I after weeks of practice…

                                                                ‘A’ being on the other hand entirely will take some getting used to.

                                                              1. 28
                                                                Mozilla may have won, but all of us have lost.

                                                                WebAssembly (for brevity, WASM) is a terrible idea being carried through with excellent vision and engineering effort. It has the following fundamental issues that nobody seems to care about:

                                                                It further erodes the transparency to end-users. A process that began with minification and concatenation will finish running once WebAssembly is in place. Without a full debugger and decompiler, end-users will lose the ability to easily look at what their browsers are loading and executing. The user browser moves further and further from being a document viewer into a beachhead of control on the user’s machine.

                                                                It further erodes the freedom of end-users. It will be no surprise in the next few years when DRM (enabled via code signing and other supporting technologies) is rolled out even further into the browsers. This will prevent users from running their own modified scripts on pages, from tweaking via proxy other scripts, and from being able to selectively replace things in their software stack that don’t work. It’s hard now, but it’s only going to get worse, because of “security concerns” (read: copyright enforcement).

                                                                It massively fragments the development ecosystem. At a time when anybody with half a brain realizes that the churn and balkanization of the Javsacript world is a pox, we are embracing the idea that somehow everything will be better once browser software can be written in Python, Clojure, Ruby, C, C++, D, Rust, Common Lisp, Elixir, Erlang, Forth, FORTRAN, COBOL, and who knows what else. Man-centuries of effort will be wasted bringing the tooling in each of those communities up to something resembling the current state of the frontend, all so people can avoid spending a couple of weeks learning the vulgar lingua franca of Javascript. This is not progress.

                                                                It reduces the transferability of front-end skill sets. For better or worse, most front-end skills transfer between gigs. A blob of jQuery and React and Angular looks roughly similar across jobs: arrays and strings work the same, functions and closures work the same, and so forth. WASM will remove this convenience. Ever had to relearn a codebase written in CoffeeScript instead of boring ES5? Or even learn the quirks of ES6 instead of ES5. Now, imagine you have to learn an entire new programming language just to do the same goddamn thing you were doing ten years ago. This makes switching jobs and even hiring devs harder if the stack doesn’t use boring JS.

                                                                It decreases the pressure to fix JS and DOM APIs. Every problem in CS is solvable with another layer of abstraction–the “benefit” of WASM in providing the “if you don’t like this API, use this other language to hack around it!” escape hatch. Except, that usually leads to stagnation in the base layer. JS’s own standard library is a poverty-ridden ball of shit precisely because TC39 and others have focused on interesting language features and support for compiler wankery at the expense of the needs of the working developer.

                                                                It solves problems that we don’t have. The majority of web traffic is boring CRUD operations, despite what the snake-oil salesmen at conferences and vendors want to tell you. The stuff that WASM would truly benefit? “low-level” coding like for games and whatnot? All of that is handled better in native code already. If performance is the important thing you pull on your big kid pants and write C/C++/D/Rust and if you’re unlucky some assembly or you busy out some CUDA or OpenCL. If performance isn’t important, you just write in JS. This magical market segment of “Man, I really want to run heavily compilers and numerical codes in the browser” doesn’t really exist.

                                                                It creates bad vendor incentives. Remember up above when I complained about transparency for users? So, imagine a future job where a contractor delivers a pile of WebAssembly to make a site work. Said vendor dies, or stops working on the project, and now the client needs to do maintenance on it. How’s that going to work again? What about when the vendor bakes in a killswitch to disable the software if they aren’t kept on a “maintenance” contract? Again, opaque binary, what do?

                                                                It is fundamentally incompatible with the licensing and learning norms of the Web. At some level in our gut, we all knew that people could see how our code worked, and that we could benefit from learning how their code worked. This sort of informality allowed us all to speak more freely and frankly about our code and our practices, and for two decades the web flourished. So, all the webshits have played fast and loose (BSD/Apache/etc.) with open source licensing because at the end of the day, it was trivial to look at the client code and see how it worked. It was also easy to spot when somebody was using code we’d written, or using a particular library–the “moral rights” of authorship were not so threatened. WASM kills this freedom.

                                                                ~

                                                                Of course, none of this matters, since FB and GOOG are going to use WASM to stream ads to people faster and watch them more closely, and every webshit will lap it up as progress and folks giving conference talks will sell more tickets on their new framework for language XYZ for doing web shit.

                                                                Seriously, fuck this myopic industry.

                                                                1. 28

                                                                  Can’t agree with most of your points.

                                                                  Have you tried decompiling Java classes? The code produced by the decompilers is very clear, and easy to read. Sometimes more clear than trying to beautify minified code. Unless the bytecode is obfuscated, but JS code can be obfuscated as well.

                                                                  DRM is only dangerous if you want to participate in it. You can always reject DRM content.

                                                                  It might fragment the ecosystem, but sticking to one solution is hardly progress as well. Also there are lots of people for which JavaScript is the biggest reason they stay away from web development.

                                                                  About reducing transferability – the same can be told about normal, ‘desktop’ world, and people manage to cope with it. Some classes of applications require one set of languages (java, scala), others require different languages (c, c++). I don’t really see many problems with this, languages are different tools for different jobs, one language for everything doesn’t exist (can’t exist).

                                                                  WASM fixes JS by replacing it. That’s the best way of fixing JS from my point of view! ;)

                                                                  And what do you mean we don’t have problems – there are lots of them. Downloading a minified javascript source that was transpiled from typescript and running it seems like a big hack. The speed of more complicated web applications is slow like hell on a new i7 CPU (google docs, google maps). We have so many problems it’s a disaster.

                                                                  Have you tried reading the source code of Google Docs? I bet you know that there are lots of scripts nowadays that start with “var fpa=function(){var a=_.ft.T”, finishing with the same set of trash. Compilation to bytecode alone isn’t denying the access to source, because bytecode still can be decompiled.

                                                                  1. 2

                                                                    The code produced by the decompilers is very clear, and easy to read.

                                                                    How many man-decades of work were put into that tooling, one wonders? Also, “easy to read” != “easy to maintain and modify”.

                                                                    DRM is only dangerous if you want to participate in it. You can always reject DRM content.

                                                                    Yeah, like the W3C did with–oh wait, no, they rolled over on their fucking belly. Well, at least Intel and Apple and Nvidia–wait, shit, they did too. And Elvis didn’t do no drugs!

                                                                    sticking to one solution is hardly progress as well

                                                                    Vanilla JS has worked decently for 20 years. SQL for over 40. Sometimes a working solution is enough.

                                                                    This constant neophilia is killing us, and observations like yours are predicated on the idea that somehow we must keep resolving solved problems or we aren’t making “progress”. Shovels haven’t changed significantly in two thousand years–does that mean progress with civil engineering stopped?

                                                                    the same can be told about normal, ‘desktop’ world, and people manage to cope with it.

                                                                    They coped with it by killing native apps and moving onto the Web. And are trying to move the Web back to the desktop/mobile/server with “cross-platform” JS debacles like Electron and NW.JS.

                                                                    The speed of more complicated web applications is slow like hell on a new i7 CPU (google docs, google maps).

                                                                    Works on my machine, sorry I guess? Most users don’t even know what slow is, compared to ten years ago (or God forbid 20!).

                                                                    And those big companies are going to write even more slow and bloated shit once they have WASM, because they can bring over their crufty codebases wholesale.

                                                                    1. 11

                                                                      This constant neophilia is killing us, and observations like yours are predicated on the idea that somehow we must keep resolving solved problems or we aren’t making “progress”.

                                                                      If you disregard the actual improvements a new technology makes, you can easily dismiss it as “change for the sake of change”, or as you say, “neophilia”. I see this comment every day on the Internet, like this fellow who claimed that someone used Rust for a project just to be “buzzword compilant”.

                                                                      People don’t agree that these problems are “solved” and I welcome their attempts to fix the status quo.

                                                                      1. 0

                                                                        Well, first, that fellow wasn’t necessarily wrong in pointing out that wrapping a perfectly functioning C/C++ glob in Rust didn’t immediately make sense.

                                                                        People don’t agree that these problems are “solved” and I welcome their attempts to fix the status quo.

                                                                        Do you do web shit? Do you do frontend web shit? Do you do application programming in an ecosystem that changes rapidly?

                                                                        If you answered “no” to any of the above questions then it is no surprise that you would have that (misguided and incorrect) opinion. In the abstract, sure, we can all talk about the magic pixie dust of progress and finding better tools, but in the concrete here and now it’s yakshaving on a grand scale.

                                                                      2. 4

                                                                        How many man-decades of work were put into that tooling, one wonders? Also, “easy to read” != “easy to maintain and modify”.

                                                                        I don’t think this issue should be thought in such category: “imagine how good JS would be if everyone would just focus on JS, instead of trying to invent different things”. We would be still in the stone age with very sharp rocks if that would be the case! ;) This is the same argument against GNU/Linux and its world with lots of distributions. Well, imagine how a GNU/Linux distro would be awesome if everyone would just focus on one distribution instead of forking new ones? It doesn’t work this way, everyone has different needs. One requires the distribution to be fully automatic, another one requires it to be mostly manual. Same with development ecosystem.

                                                                        Vanilla JS has worked decently for 20 years. SQL for over 40. Sometimes a working solution is enough.

                                                                        Should we switch back to Fortran? It worked well. Also I would argue about JS working decently in first years if its existence. Before jQuery and similar, writing a multibrowser JS code sometimes required to simply write multiple copies of the same code for different JS dialects in different browsers.

                                                                        They coped with it by killing native apps and moving onto the Web. And are trying to move the Web back to the desktop/mobile/server with “cross-platform” JS debacles like Electron and NW.JS.

                                                                        What apps are you talking about? My file manager? My virtualization platform? My terminal emulator? My Office suite? My music player? My video editor? My games? The browser itself is a native app as well. There are web alternatives for some apps, but they feel like demo versions of the “real deal”.

                                                                        Also, “native guys” aren’t trying to bring web apps to the desktop. The “web guys” are trying to push it; people who got into web development now want to create standalone desktop apps, so they’re bringing their environment with them. I don’t think this is a bad thing. The more people are in the development world, the better. Now Electron is a trend, but this trend will eventually finish and evolve, we’ll see how. Maybe the trend would be to rewrite Electron apps into native versions, who knows. The thing is that popularity of Electron isn’t something that ends some kind of an era. It’s a tool that helps web guys to reach over the desktop world.

                                                                        Works on my machine, sorry I guess? Most users don’t even know what slow is, compared to ten years ago (or God forbid 20!).

                                                                        Well it doesn’t work properly on my machines. I’m not satisfied and I think it should be better. Also I don’t really care if someone doesn’t realize web apps are slow. I realize this, and this is what matters to me.

                                                                        And those big companies are going to write even more slow and bloated shit once they have WASM, because they can bring over their crufty codebases wholesale.

                                                                        Agree, this will happen. But it’s already happening with JS, so I don’t really see any changes here.

                                                                        1. 1

                                                                          “imagine how good JS would be if everyone would just focus on JS, instead of trying to invent different things”

                                                                          See, that’s incorrect. You’ve overlooked the entire category of things you build with JS–the things we build with the tools. The tools should not be the primary, or even secondary, objective. A better statement would be “imagine how much more cool applications and stuff we could build if we didn’t keep retooling”.

                                                                          Should we switch back to Fortran? It worked well.

                                                                          For serious numerical work, that’s still the tool used very frequently. Because it works. BLAS/LAPACK are proof of the value of not fucking around greatly with tooling once you have something that works well enough.

                                                                          What apps are you talking about? My file manager? My virtualization platform? My terminal emulator? My Office suite? My music player? My video editor? My games?

                                                                          Google apps and Dropbox would be the obvious counterexample to you here.

                                                                    2. 12

                                                                      This magical market segment of “Man, I really want to run heavily compilers and numerical codes in the browser” doesn’t really exist.

                                                                      I want to make games and run them in the web, so that people don’t need to download them and they run on all devices.

                                                                      I also want to use the browser for genetic programming.

                                                                      For both of these endeavors I am bottlenecked by the JS memory model, so I would prefer to use something like Rust.

                                                                      Do I not exist?

                                                                      1. -1

                                                                        For both of these endeavors I am bottlenecked by the JS memory model, so I would prefer to use something like Rust.

                                                                        How? How exactly are you bottlenecked?

                                                                        1. 5

                                                                          If you want to make games that run at 60fps, uncontrollable stop the world garbage collection is far from ideal and introduces hiccups.

                                                                          Even if you avoid all allocation at runtime I would still like more control over how my memory is managed.

                                                                          As for genetic programming, the more cycles you can squeeze out of your code, the better the result you can produce. You can only optimize your JavaScript so far. I want more cycles.

                                                                          1. 0

                                                                            It is entirely possible to write arena allocators in JS, even going so far as to supporting malloc-style tricks using TypedArrays if you really want to go there.

                                                                            In the GP case, you probably should be using real iron once you’ve settled on a proof-of-concept.

                                                                            1. 9

                                                                              […] if you really want to go there.

                                                                              Who wants to actually go there? After some point, you should stop adding hacks to a system, and just create a new system. I see WebAssembly as a response to the overwhelming amount of things being forced onto/into JavaScript.

                                                                              1. 4

                                                                                In the GP case, you probably should be using real iron once you’ve settled on a proof-of-concept.

                                                                                I’d have said that a while back. These days I wouldn’t because of all the good demos of tech that are usable as a web application without installing anything. A GP demo that took no upfront investment might convince some people to dig deeper into it. It might need to be a web application for a lot of them these days. That means it also needs to be fast given more CPU = better results. Stuff like that might justify a faster web solution.

                                                                        2. 3

                                                                          Thanks for that. I have to admit I’d not seen this coming. I thought WASM would simply be a vehicle for other application types we otherwise wouldn’t have seen to come to the browser, not an excuse to throw out the entire web ecosystem and replace it with native compiled everything.

                                                                          Do you actually think people will throw out the baby with the bathwater like this? I’d bet there’d be some sincere resistance across the board if they tried.

                                                                          1. 2

                                                                            Do you actually think people will throw out the baby with the bathwater like this?

                                                                            Have you met the web developer community?

                                                                            It’s a constant struggle of wits–the half-wits and the nit-wits and the dim-wits.

                                                                            And I say this as somebody whose livelihood depends on that ever-increasing fuckery.

                                                                            1. 3

                                                                              Sympathies. Sincerely. I couldn’t do professional web dev nowadays, I think I’d go mad.

                                                                              I was thinking though - mobile. I don’t see WebAssembly taking over the mobile platform - the performance constraints should keep it away for a long time to come.

                                                                              1. 0

                                                                                Ah, that’s the insidiousness of it, though!

                                                                                The big plug will be this:

                                                                                “Use WASM and you’ll get better performance/battery-life/etc. since our phone/browser has a compiler/interpreter that magically knows how to optimize the bytecode for the platform it’s running on.”

                                                                                Or worse, you could expect that with code-signing and modular scripts, browser vendors (read: GOOG) would be enabled to offer specific fallbacks for modules matching the signature of jQuery or something. Magically, there is more vendor lock-in.

                                                                                We fell for AMP, why not this?

                                                                          2. 2

                                                                            If performance is the important thing you pull on your big kid pants and write C/C++/D/Rust and if you’re unlucky some assembly or you busy out some CUDA or OpenCL. If performance isn’t important, you just write in JS. This magical market segment of “Man, I really want to run heavily compilers and numerical codes in the browser” doesn’t really exist.

                                                                            While I agree with you in principle, I like to hold out the hope that in the future we might see AAA games released, not for Windows, or macOS, or Linux…but for “the Web”. And then anyone can play.

                                                                            1. 3

                                                                              And then anyone can play.

                                                                              Sure, and then if the server goes away, they never get to play again. Let’s put our playthings into the state of permanent transience!

                                                                              I have a copy of Half-Life that’s nearing 20 years old, and it still works. I can still bust it out at LAN parties.

                                                                              AAA games in the browser only serve to further indenture and impoverish users.

                                                                              1. 2

                                                                                This already happened to me with a game where I used my Xbox Live profile to play single player. I temporarily lost Live access. Then, I couldnt use any single player progress: restart with offline account or only use offline accounts to begin with. Ridiculous.

                                                                                1. 2

                                                                                  Well, that’s not necessarily true. It’s possible to make games that would be written for “the Web”, and would even work at LAN parties or merely locally. Of course, the question is then whether or not people/companies would make games in such a fashion.

                                                                                2. 2

                                                                                  And then anyone can play.

                                                                                  Anyone who has a browser which implements WASM correctly, completely, and with the needed efficiencies in graphics and processing, input devices, etc. Will that include TempleOS or Haiku? Probably not.

                                                                                  1. 5

                                                                                    True, but I’d say Haiku will get a correct WASM environment long before it can run native Windows binaries. Baby steps. :)

                                                                              1. 8

                                                                                I’d be super interested to hear other people’s accessibility setups!

                                                                                Was one of those guys that thought that simply mapping Caps Lock -> Ctrl would be enough to save my hands from Emacs. After 10 or so years of it my pinkies started to give out. Looked into other keyboard layouts, found that many of them (such as Dvorak) actually increased stress on the pinkies because they optimized solely for travel distance. Eventually went with carpalx’s QFMLWY layout, switched to the Kinesis Advantage which has all of the modifiers under the thumb, and completely re-did all of my Emacs keybindings. Also own Kinesis foot pedals but I don’t use them anymore.

                                                                                Took me about a month to adjust. Used to type 150 WPM in my prime, would say I type about 110 WPM now. Still going strong after 6 years.

                                                                                1. 3

                                                                                  That’s fascinating! How extensive are the emacs rebindings? Did you rebind them incrementally?

                                                                                  1. 2

                                                                                    Fairly extensive. One of my favorite ways of procrastinating is messing with my editor, or brainstorming ideas for new editors/editing paradigms. So before the pinky crisis happened I already had some helper functions for overriding bindings in my init.el and had done some experiments with my bindings.

                                                                                    The Kinesis has arrow keys conveniently located under my index and middle fingers so that freed up C-n, C-p, C-f, and C-b. I used keyboard-translate to free up C-i, C-m, C-[, and C-] which are usually unavailable because of terminal constraints. I only use Emacs in GUI mode so that wasn’t a big concern for me. The core of the remapping process involved me writing down all of my favorite commands, printing out the Kinesis layout with the QFMLWY keys, and manually mapping things such that they were convenient to reach w/o my pinkies using mnemonic sense as a tiebreaker.

                                                                                    Some examples: Instead of C-/ for undo which is painful for me, I use C-u. TAB (completion, snippets) is another painful pinky key so I use C-a. C-h (help) is remapped to C-i.

                                                                                    I also avoid binders that involve releasing modifiers, like C-x o (switch window). I don’t have that kind of precision when using the foot pedals. Anything like that gets remapped to be completely w/o modifiers or completely with. So C-x C-o is ok. I went with C-b. I don’t use the foot pedals anymore but I’ve grown fond of the rule and C-b in particular so I haven’t changed any of original remaps, but I’m no longer as vigilant about modifying new minor modes.

                                                                                  2. 2

                                                                                    Have you tried spacemacs or been at all inspired by using the spacebar as a leader? the symmetrical nature of the spacebar as leader is really nice. also, vim bindings are much, much more ergonomic than the default emacs bindings.

                                                                                  1. 13

                                                                                    Soo…

                                                                                    We are currently working on a new license for Kryptonite. For now, the code is released under All Rights Reserved.

                                                                                    That means that right now, I can do nothing with the code. I likely shouldn’t even look at it. Secondly, the bold part (emphasis mine): no, just no. Anything with a custom license is going to end up as a disaster.

                                                                                    Looking at the FAQ:

                                                                                    curl https://krypt.co/kr | sh

                                                                                    And this is software I should trust? Yeah, no.

                                                                                    1. 4

                                                                                      And this is software I should trust? Yeah, no.

                                                                                      curl | sh over HTTPS being insecure is FUD. It is just as secure as .deb/tarball over HTTPS, which for some reason I never hear people complaining about.

                                                                                      1. 10

                                                                                        As a release engineer, the thing that bugs me the most about ‘curl | bash’ and that I never see mentioned, is that this format of installer almost never has a version number on it. I could run their script to install the thing on my system and make sure everything works. The next day, I can use the same command to install on another system and I don’t know that I got the same script and that the same things will get installed.

                                                                                        Package Managers enforce versioning and checksumming so you know you got the same, expected software.

                                                                                        ‘curl | bash’ tells me, “We’re too lazy to package a release. You handle versioning, checksumming, and verification yourself”. And people accept that because they’re too lazy to do those things, too.

                                                                                        1. 4

                                                                                          This is a good point, curl|sh also doesn’t really have a single answer for “how do i upgrade”.

                                                                                          For some it’s ‘run the same curl|sh again’, for others this may wipe your current installation and data.

                                                                                          Honestly the most ridiculous ones (in terms of rube goldbergness) are the 200 line shell scripts that add a custom apt repo, and then install a heap of dependencies, before installing the main package.

                                                                                          Because declaring dependencies in the package is apparently as hard as giving people instructions about creating a .list file (or providing a release .deb)

                                                                                          1. 5

                                                                                            As someone who has had to do the reverse-engineering for those types of scripts, I couldn’t agree more.

                                                                                            They’re all symptoms of the same problem, right up there with the use of npm and docker. Not sure what to call it… Misguided inaptitude?

                                                                                            1. 6

                                                                                              “Cargo culting cool-kid bullshit” comes to mind, but I may just be angry.

                                                                                        2. 9

                                                                                          There is a huge difference between a deb and curl | sh though, in that I can extract the contents of the deb without running any of the scripts (dpkg-deb -x). It has a well defined format, too. I can easily uninstall it, and so on and so forth.

                                                                                          Figuring out what a random shell script does is not quite the same.

                                                                                          Also read this - https won’t protect you from malicious code. Precise installation steps, or packages one can inspect more easily, are far superior to curl|sh. If someone does not take the effort to explain how to install their software, why should I trust them that they’ll do anything else right?

                                                                                          1. 2

                                                                                            While someone could do malicious/stupid things in the maintainer scripts of a .deb, I’ve literally never seen that happen.

                                                                                            With a .deb, you can see what files will be installed where, you can extract the contents, and (barring the aforementioned stupidity in maintainer scripts) you can REMOVE the software, cleanly.

                                                                                            With a .sh script, well fuck you, read the whole god damn thing, and then go fetch whatever shit it downloads and examine that too.

                                                                                            1. 1

                                                                                              With a .deb, you can see what files will be installed where, you can extract the contents, and (barring the aforementioned stupidity in maintainer scripts) you can REMOVE the software, cleanly.

                                                                                              deb files have an embedded shell script that performs installation tasks, called debian/rules that, “fuck you”, you need to read the whole thing.

                                                                                              I’ve literally never seen that happen.

                                                                                              There’s also all those downgrade attacks that used to be popular.

                                                                                              1. 9

                                                                                                deb files have an embedded shell script that performs installation tasks, called debian/rules that, “fuck you”, you need to read the whole thing.

                                                                                                No, they do not. debian/rules is used for building a debian package, not when installing. They can have various scripts that run during installation, but you can extract the contents without running any of those (dpkg-deb -x). So no, you do not have to read the whole thing, if you want to have a look at what it installs and where. You can even list the files, without extracting them.

                                                                                                The same can’t be said about a curl|sh thing.

                                                                                                As for the two links: Both do their thing in the postinst. One can extract the files only, nullifying their malicious behaviour. You can look at the files without running code. With curl|sh, you can’t. You will have to download and read the shell script to figure out where to get the files from.

                                                                                                1. 0

                                                                                                  No, they do not. debian/rules is used for building a debian package, not when installing.

                                                                                                  Fine. Everything I said still applies to postinst though.

                                                                                                  The same can’t be said about a curl|sh thing.

                                                                                                  Sure you can. curl > a.sh read it, then run it. Same as with postinst.

                                                                                                  1. 3

                                                                                                    Point is, I can look at the files without running the postinst. I have the files without any code to run, or look at. Can’t say the same about the shell script. I still have to go through it and download the files separately.

                                                                                                    Not nearly the same, not by a long shot.

                                                                                                    1. -2

                                                                                                      Point is, you can look at a shell script without running it.

                                                                                                      Or maybe you can’t, and you should learn how to improve that?

                                                                                                      1. 6

                                                                                                        Your choice to insult someone who disagrees with you reflects on you poorly (not to mention it is extremely unlikely to be true).

                                                                                                        My bash is excellent, so I decided to audit a few curl | install scripts awhile back.

                                                                                                        The first one took over an hour to review - it pulled in quite a few different other scripts from several domains, none of which did anything obviously untowards (although given how many ways there are to hide a subtle backdoor in a shell script I wouldn’t be that confident).

                                                                                                        Anything that shrinks the amount of bash code I have to read to audit a package is a win, and in practice deb postinstall scripts are much shorter.

                                                                                                        1. 0

                                                                                                          Your choice to insult someone who disagrees with you reflects on you poorly (not to mention it is extremely unlikely to be true).

                                                                                                          I think it’s interesting that you find it insulting if someone observes you cannot do something.

                                                                                                          Anything that shrinks the amount of bash code I have to read to audit a package is a win, and in practice deb postinstall scripts are much shorter.

                                                                                                          You can’t say “well, I looked at less code so it must be more trustworthy” – the code actually has to be shorter. You still have to look at the binaries in the package, or what’s the point?

                                                                                                          My mind is boggled that someone thinks there is something about an archive file containing another archive file, with executable scripts at both layers, more trustworthy than an executable script, when they don’t bother reading either.

                                                                                                          1. 1

                                                                                                            observation

                                                                                                            You observed nothing of the sort; you insinuated, on no basis other than a disagreement, a lack of competence with one of the primary languages used by package maintainers in someone who is one.

                                                                                                            1. -1

                                                                                                              Nonsense. He said he can’t look at a script, and doesn’t know how to save files with curl.

                                                                                                        2. 5

                                                                                                          You are, again, missing the point. With a package, I have access to all the files, and the scripts. I can look at the files without reading the scripts.

                                                                                                          With a curl|sh, I do not have the files, and I have to read the script to get access to the files.

                                                                                                          Being able to is one thing, having to is another, and willing to is a third.

                                                                                                          1. -2

                                                                                                            You are, again, missing the point. With a package, I have access to all the files, and the scripts. I can look at the files without reading the scripts.

                                                                                                            The type slower.

                                                                                                            I actually don’t get your point.

                                                                                                            I think people trust .deb files more than they trust .sh files, and my point is that there’s no reason to.

                                                                                                            If you think there is, then you’re going to have to explain why, and you’re going to have to do better than dpkg-deb -x because that’s a crock of dogshit that tries to make up some cocked up definition for “secure” that has nothing to do with reality: It doesn’t keep more people safe from harm even hypothetically.

                                                                                                            With a curl|sh, I do not have the files, and I have to read the script to get access to the files. Being able to is one thing, having to is another, and willing to is a third.

                                                                                                            Don’t insult my intelligence: I haven’t done that to you.

                                                                                                            You don’t have the files when you do curl -O && dpkg -i either, which is why you don’t do that. Why would you do that, even if the website says to?

                                                                                                            Because we’re talking about trust. I trust esl-erlang even though they tell me (basically) to do that, because at the end of the day, I’m not going to audit the binaries in the archive. I’m going to trust their TLS certificate, just like I’ve trusted the one at debian.org.

                                                                                                            You don’t want to trust it, yes Virginia, you absolutely need to read the whole fucking thing. Not just the postinst, but also the things in the tarball that you are ignoring.

                                                                                                            1. 4

                                                                                                              I think people trust .deb files more than they trust .sh files, and my point is that there’s no reason to.

                                                                                                              There actually is. Because with a deb file, it is a lot easier to ignore the script parts, and have a look at the files. Obviously, if you blindly install one, that is in no way more secure than curl|sh - but that’s not what I’m talking about. I’m talking about a way to inspect what it wants to install. With a package (deb, rpm or even a tarball), I can do that. I can download it, and have the files.

                                                                                                              With curl|sh, I first have to read the whole script, and then inspect the files.

                                                                                                              I hope we can agree that having the files at hand is a better starting point than first having to figure out where and how to get them from. With a shell script, I’d have to do additional reading, which I don’t need to do when facing a package or a tarball.

                                                                                                              If you think there is, then you’re going to have to explain why, and you’re going to have to do better than dpkg-deb -x because that’s a crock of dogshit that tries to make up some cocked up definition for “secure” that has nothing to do with reality: It doesn’t keep more people safe from harm even hypothetically.

                                                                                                              Well, for one, dpkg-deb -x is not going to run random commands on my system (because it just extracts a tarball, and does not run maintainer scripts). Nor will it put files in random places (because it puts them in a directory I point it at). I’ll have more control over what it does. That is safer than running a random shell script.

                                                                                                              And yes, I could read a shell script. But I have better things to waste my time on than read a random script, because people thought that making a tarball is too hard.

                                                                                                              I trust deb packages not because they are inherently more secure - they are not. I trust them more, because I have tools to help me inspect their contents. Or, to put it another way: I distrust curl|sh because it has so many disadvantages, and so little gains over even a simple tarball, that I fail to see its appeal. And even worse, the suggestion to pipe a random script into your shell, coming from the developers of a tool aimed at improving security is just… insane. THAT is my problem. If they’d suggest blindly installing a debian package, I’d call that bad practice too. Now, if they provide a repo, or a tarball, that’s a whole different story.

                                                                                                              You don’t have the files when you do curl -O && dpkg -i either, which is why you don’t do that. Why would you do that, even if the website says to?

                                                                                                              Of course I won’t run dpkg -i. I’ll run dpkg-deb -x foo.deb /tmp/blah, and voila, I have the files without any of the random scripts run. Can you do the same with a curl|sh thing? No. You’d have to read the whole thing, download the files, and then inspect them. With a package, or even a tarball, this is a whole lot easier. There can still be maintainer scripts, but those are typically small. Not to mention, that I won’t have to read any scripts just to be able to look at the binaries.

                                                                                                              More secure, safer? Probably not. Easier? Less error prone? Yes. And that is my point. A directly downloadable package is easier to verify.

                                                                                                              If they’d suggest curl -O && dpkg -i, I’d call that bad practice too, by the way. Installing random stuff is bad. Having an easy way to inspect random stuff is, therefore, better than also having to read and understand random shell scripts on top of everything else.

                                                                                                              Not just the postinst, but also the things in the tarball that you are ignoring.

                                                                                                              No, I’m not ignoring. I’m extracting them with dpkg-deb -x precisely because I want to look at them. And with a debian package, I can do that without having to read a random shell script to even have access to the files.

                                                                                                              1. 0

                                                                                                                Obviously, if you blindly install one, that is in no way more secure than curl|sh - but that’s not what I’m talking about

                                                                                                                However it’s what I’m talking about.

                                                                                                                So thank you for conceding to my point and agreeing with me.

                                                                                                                I hope we can agree that having the files at hand is a better starting point than first having to figure out where and how to get them from. With a shell script, I’d have to do additional reading, which I don’t need to do when facing a package or a tarball.

                                                                                                                No, I can’t agree with that.

                                                                                                                The biggest problem with Debian and Linux is the fact that the tiniest hole can often be exploited into something bigger.

                                                                                                                If you’re relying on your ability to read, then you actually need to be able to read, and you actually need to read it. The underhanded C contest is a fantastic example of why you shouldn’t rely on the former, and the fact that you think you can “skip reading the files” is part of what’s wrong with the former.

                                                                                                                If you are trying to establish trust, you need to find another way.

                                                                                                                This is a more interesting conversation to have, but the tools are very bad.

                                                                                                                Can you do the same with a curl|sh thing? No.

                                                                                                                Wrong. curl | tee a.sh

                                                                                                                And with a debian package, I can do that without having to read a random shell script to even have access to the files.

                                                                                                                How about this: I’ll demonstrate I know what you’re saying by restating it, then you try to explain what you think I’m saying.

                                                                                                                You’re saying that unpacking a binary archive puts assets in a more convenient format for review than (mentally) unpacking them from a shell script.

                                                                                                                You’re admitting that if you skip the review step in either case, then there is no security difference between the two formats.

                                                                                                                You believe that people are more likely to review a debian package than a shell script, and that this contributes to security.

                                                                                                                Installing random stuff is bad.

                                                                                                                Do you think that I’m arguing that installing random stuff is good? Or are you just trying to repeat enough obviously correct things that everyone will assume everything you’re saying is correct?

                                                                                                                What’s your point here?

                                                                                                  2. 4

                                                                                                    Not only is debian/rules not run during package installation (as mentioned by @algernon, it’s used during package build), debian/rules isn’t even a shell script. It’s a makefile.

                                                                                                    Those two links are examples of how to do it, not examples of someone doing it.

                                                                                                    A shell script via curl | sh (or curl | bash if you’re particularly terrible and don’t write portable shell scripts) also can’t realistically be signed, to prevent tampering/replacement.

                                                                                                    Apparently creating a 1 line .list file and importing a GPG key is too much for people, so downloading the script, a signature and a GPG key is basically never going to happen.

                                                                                                    1. -1

                                                                                                      A shell script via curl | sh (or curl | bash if you’re particularly terrible and don’t write portable shell scripts) also can’t realistically be signed, to prevent tampering/replacement.

                                                                                                      If you use https: it is signed to prevent tampering/replacement.

                                                                                                      Apparently creating a 1 line .list file and importing a GPG key is too much for people, so downloading the script, a signature and a GPG key is basically never going to happen.

                                                                                                      The CA can revoke a SSL key. They can’t revoke a GPG key.

                                                                                                      1. 2

                                                                                                        If you use https: it is signed to prevent tampering/replacement

                                                                                                        How does https tell you that the file you downloaded is the same file the person/company you trust put on the server? The web server just gives you whatever file is on disk, and TLS encrypts it in transport.

                                                                                                        I’m talking about gpg signing so you know the file in question is the same file the author you trust placed there, and it hasn’t been modified by someone else.

                                                                                                        The CA can revoke a SSL key

                                                                                                        Which is fucking irrelevant, as we’re talking about signing the file, not transport layer encryption.

                                                                                                        1. -1

                                                                                                          I’m talking about gpg signing so you know the file in question is the same file the author you trust placed there, and it hasn’t been modified by someone else.

                                                                                                          Yes, by neatly sidestepping the whole point of establishing that the author is deserving of that trust in the first place.

                                                                                                          A GPG key downloaded from the website is no different, and in many ways may be less secure, since you seem to think a GPG key downloaded from the website is different.

                                                                                                          The CA can revoke a SSL key

                                                                                                          Which is fucking irrelevant, as we’re talking about signing the file, not transport layer encryption.

                                                                                                          It’s something that SSL can do better than GPG. Given that GPG is in the proposed use less secure than SSL, I think that’s pretty “fucking” relevant.

                                                                                                          1. 3

                                                                                                            Who said the gpg key had to be downloaded from the same server? Gpg key servers are a thing.

                                                                                                            Comparing ssl (by which I assume you mean tls) and gpg is stupid. You can access a gpg signed apt repo over tls, they’re not mutually exclusive and they’re not even used for the same task here.

                                                                                                            1. 1

                                                                                                              Who said the gpg key had to be downloaded from the same server? Gpg key servers are a thing.

                                                                                                              There’s no difference between a key I’ve uploaded to one keyserver and a key I’ve uploaded to another keyserver.

                                                                                                              You’re delusional if you think otherwise.

                                                                                                              Comparing ssl (by which I assume you mean tls) and gpg is stupid.

                                                                                                              Comparing the download of a key from an unencrypted keyserver with the DH key exchange is stupid. You have no idea what you’re talking about.

                                                                                                              You can access a gpg signed apt repo over tls, they’re not mutually exclusive and they’re not even used for the same task here.

                                                                                                              Nobody gives a fuck if you can get debian packages using carrier pigeons.

                                                                                                              The point is that a GPG signed package doesn’t offer any security over HTTPS if nobody is bothering to check the GPG keys, because at least someone can check the keys used for HTTPS, and at least Google is doing that.

                                                                                                              1. 0

                                                                                                                You’ve missed the point, multiple times now.

                                                                                                                TLS/HTTPS can only ever at most protect you from MITM network attacks (and that assumes someone like WoSign hasn’t issued a cert for your domain to someone else, which.. you know, they did. multiple times.)

                                                                                                                .Deb packages can be served over TLS, if you wish, to gain extra network level encryption, but they’re also GPG signed by default, so you can verify who authored them, regardless of where they have been served. Whether you choose to verify the GPG key is up to you.

                                                                                                                You can’t realistically do that with a curl|sh script, just like you don’t get realistic safe upgrades, you don’t get realistic clean uninstall, etc etc.

                                                                                            1. 7

                                                                                              It resonates with me, however being in Germany I think I am partially shielded from this bullshit. Sadly it seems that almost every city now wants to be in center of the local Silicon Valley.

                                                                                              But I find it kinda ironic that it is posted on Medium from all of things. I understand that that’s a way to get to the audience though.

                                                                                              2012, Medium: At last, a way to post text on the internet!

                                                                                              1. 5

                                                                                                A bit of background: the author is a fairly controversial writer (as is every anti-capitalist) and his blog got hacked. He put his stuff on Medium so he wouldn’t have to worry about that again. He isn’t in tech

                                                                                                1. 2

                                                                                                  Medium is actually a pretty great lightweight publishing platform. People who diss it tend to not appreciate good design.

                                                                                                  1. 5

                                                                                                    Not sure what you mean by lightweight in this context. What would be an example of a heavyweight alternative?

                                                                                                    1. 3

                                                                                                      Engraving your blog posts on stone tablets and carrying them on your back from the top of a mountain.

                                                                                                      1. 1

                                                                                                        Something that requires setting up your own server.

                                                                                                      2. 4

                                                                                                        I used to use HTML files I loaded into a directory of a free, web host. There were free and paid editors to automatically produce or load them. Local copies and standard tech meant I could switch vendors pretty easy. Funny how “lightweight” changes over time. ;)

                                                                                                        1. 1

                                                                                                          Perhaps you underestimate the hurdle of having to first learn HTML and how to upload files to a server.

                                                                                                          1. 1

                                                                                                            I did include “free and paid editors to automatically produce or load them.” All kinds of lay people used them. MySpace had kids everywhere doing HTML and filling in templates. A blog could basically just use templates. The local high schools also teach lay people HTML with end result they all make a web site at the end. It’s super-easy if you’re just using a subset of it. Even easier if it’s a WYSIWYG editor like all the site editors that hosts use.

                                                                                                    1. 4

                                                                                                      I’ve waffled on higher education for PL because of my [incorrect?] perception that academia is fixated on type theory as a way to solve everything. I generally like it but I don’t know if I like it enough to do a PhD in it. Is this true?

                                                                                                      1. 7

                                                                                                        By fixated, do you mean “academia has an obsession with type theory, and tries to use it where it is inappropriate”? I don’t get that vibe, but then again, its my primary field so I may be biased :)

                                                                                                        If you want to study PL theory you should expect to encounter type theory and the lambda calculus. Otherwise, it’s up to you. Unless you work in a big group you’re going to have some freedom in selecting the problem you want to solve and how you want to solve it, although it’ll need to be related to whatever your adviser has a grant for. Look at the faculty of schools you’re interested in and see what they’re publishing on.

                                                                                                        1. 4

                                                                                                          Depends on the school and its goals. Many are about whats practical or just looks like it for grant money. You see a lot of Java, C#, etc there. Quite a few on Scheme with DSL’s. Bunch love stuff like Haskell. Sone, esp with embedded focus, stay on C with a subset on Ada or embedded Java. The ones about oroving properties mostly do HOL, Coq, and ACL2. There’s also agent-oriented, term-rewriting, flow-based, logic-based… list goes on.

                                                                                                          You might think it’s all about types because you’re on forums whose users mostly submit that stuff or similarly next to Universities that focus on it.

                                                                                                        1. -5

                                                                                                          What counts as using formal methods?

                                                                                                          An image of a soap powder box would be very appropriate for this tag.

                                                                                                          Or perhaps that is too up market for the snake oil salesmen of computing.

                                                                                                          1. 8

                                                                                                            The examples quoted mostly involve application of mathematics, logic, or good modelling to software to prove properties of it. The track record of that field is really good. Especially in hardware. I don’t know why you’d use the term snake oil salesman in light of that. Best reserved for those peddling unproven bullshit.

                                                                                                            1. 12

                                                                                                              Given that you are calling researchers “snake oil salesman”, “intellectually dishonest”, “lazy”, etc, I don’t get the feeling that you are engaging in good faith. I know some of the CompCert team and they are honest people trying to make a difference in the world. I can’t respond directly to your posts because I can’t tell which of that sea of questions is sincere and which is rhetorical grandstanding. If you have a single, sincere question you would like to ask about proof assistants like Coq or formally verified applications like CompCert, I would be happy to answer it.

                                                                                                            1. 14

                                                                                                              Quite frankly, this is just a usability bug in GitHub. Clicking a line number should implicitly do the y thing before appending #L to the uri.

                                                                                                              1. 5

                                                                                                                I agree that it’s a usability bug but I don’t think linking to a specific commit (the ‘y’ behavior) is desirable. It should link to the line you originally linked to at the current version (it may have moved, but we have that info in the git history), or if the line was deleted, should display “that line no longer exists, it last appeared in commit ab352cf”

                                                                                                                1. 9

                                                                                                                  But the linker is not necessarily trying to link to a single line. Often they are trying to link to a larger context which may have disappeared by the current version even if the line they clicked on still exists. Or it could be the opposite. The line they linked may have disappeared by the overall aim of the link remains the same.

                                                                                                                  1. 6

                                                                                                                    Pretty sure you can’t identify a line that accurately through git history in any automated fashion. Git’s basic unit is the whole file object, the concept of lines being moved around or compared is all provided by higher-level tooling. That tooling still has problems with complicated scenarios.

                                                                                                                    1. 3

                                                                                                                      I think what cmm is trying to say is that the link should refer to the current latest commit. That way future viewers will see the content of the file in the context it was originally linked.

                                                                                                                      It might also be cool to think about a semantic linking scheme, following a piece of code through history. I have no idea how hard it would be to implement something like that.

                                                                                                                      1. 1

                                                                                                                        Here’s my idea:

                                                                                                                        Super hard. :-)

                                                                                                                        1. 5

                                                                                                                          Apologies for carrying this argument into multiple threads, but I think you are overstating this problem’s difficulty. Nobody seems to believe me that this is doable, so I quickly hacked up a solution.

                                                                                                                          Clone https://github.com/Droogans/.emacs.d/ (the repo from the linked example) somewhere. I snooped the history and the original commit they linked to was 6ecd48 from 2014. Run fwd_lines.py 6ecd48 init.el 135 | less and scroll through. It will display both files with the corresponding lines marked with >>> <<<. You will see that we have correctly identified line 177 as the corresponding line in HEAD, 3 years later.

                                                                                                                          In general, any time GitHub’s diff functionality would identify lines as the same, so will fwd_lines.py. Like I argued in my original thread, if this kind of identification wasn’t accurate enough for general usage, code reviews would be a lot more difficult than they currently are. And this is just using an off-the-shelf diff algorithm and only using the linked commit and HEAD. You can increase the precision by taking into account the entire history between those two commits, and you can add some error-tolerance by allowing lines to match w/ some noise. Finally, if there’s the occasional error, I would argue that’s better than the alternatives: linking arbitrary lines (currently) or linking stale results (‘y’).

                                                                                                                          I haven’t exhaustively tested this so if anyone finds an interesting case where this fails, let me know.

                                                                                                                          1. 1

                                                                                                                            Nice demo! It was cool to see it run. I still want to make a case though for preferring “linking stale results” as you put it to either the current or your proposed option. I’ll provide two justifications for my position:

                                                                                                                            a) You’re right that you can easily do as well as Github’s existing diff results. But there’s no reason to think that may be good enough. If I move a function twice in its history it’s not really acceptable for early links to inside it to forever error out with a message that it was deleted. Showing a function to have been deleted in one set of lines and added back in another is fine for a diff presentation, but not a good fit for taking lines forward.

                                                                                                                            b) Even in situations where you can take lines forward correctly, as @tome pointed out there’s perfectly valid use cases where you want to show a link as it was at the time you wrote it. In fact, in my experience I rarely encounter situations where I’m referring to a set of lines of code without wanting to talk about what’s there. In that situation explanations get very confusing if the code changes from under you. Even if you don’t agree with me that that’s the primary use case I hope it’s reasonable that it is a use case. So perhaps we need two different sets of links, one for the specific version and one with “smart take-forward”. But then that gets confusing. If I had to choose I’d rather create just permalinks that don’t try to be smart, just link to the code I want to link to. Going from a specific version of a file to head is pretty easy, so checking what the code looks like right now should, I argue, remain a manual process. That would be robust to the errors I described above, and it also gives me control to pick the precise tag or branch that I want to compare with (trunk vs release 1 vs release 2, etc.)

                                                                                                                            1. 1

                                                                                                                              If I move a function twice in its history it’s not really acceptable for early links to inside it to forever error out with a message that it was deleted.

                                                                                                                              Solvable by taking into account the entire history rather than just the linked commit and HEAD, and in my first message I acknowledged that we can just display the commit linked to if the line was deleted. You lose nothing over ‘y’

                                                                                                                              Even in situations where you can take lines forward correctly, as @tome pointed out there’s perfectly valid use cases where you want to show a link as it was at the time you wrote it.

                                                                                                                              I don’t remember advocating for GitHub to remove the ‘y’ functionality, I just don’t think it’s the best default. It’s certainly not what I want most of the time when I link people to a particular line. Outside of a code review the historical perspective doesn’t matter that much to me

                                                                                                                              So perhaps we need two different sets of links, one for the specific version and one with “smart take-forward”.

                                                                                                                              This is already the case, except replace “smart take-forward” with “arbitrary nonsense”. All of GitHub’s links are relative to HEAD unless you explicitly ask

                                                                                                                              1. 2

                                                                                                                                We can agree that the current default is “arbitrary nonsense”. Beyond that we’ll have to agree to disagree about what the best default is.

                                                                                                                                Falling back to ‘y’ on deletes is reasonable, yes. But here’s another example to show that this is a hard problem with tons of rare use cases that a normal codebase is going to run afoul of at some point (causing all older links to break each time). Say we have an intervening commit that replaces:

                                                                                                                                f(a, b, c);
                                                                                                                                

                                                                                                                                with

                                                                                                                                f(a, b, c, false);
                                                                                                                                f(a, b, c, true);
                                                                                                                                

                                                                                                                                What would you want your tool to do? I would just like to see the former. I would very much want my tools to not mislead me about what somebody meant when they sent me a link to the former.

                                                                                                                                Hmm, perhaps the best of both worlds would be to show the lines highlighted in the specific commit with a call-out link up top to perform the smart carry forward. I don’t object to smarts, but unless they’re utterly reliable I don’t want to be forcibly opted in to them.

                                                                                                                                1. 1

                                                                                                                                  What would you want your tool to do? I would just like to see the former.

                                                                                                                                  This indeed gets fuzzier if you allow lines to match with noise, but without that you would get what you want, because that line no longer exists in the source, so we fall back to the commit you linked at.

                                                                                                                                  Hmm, perhaps the best of both worlds would be to show the lines highlighted in the specific commit with a call-out link up top to perform the smart carry forward. I don’t object to smarts, but unless they’re utterly reliable I don’t want to be forcibly opted in to them.

                                                                                                                                  Probably a good compromise, I didn’t expect so much push-back on this idea. I don’t understand what you mean by “forcibly opted in” though. Do you think you are currently being “forcibly opted in” in the current relative-to-HEAD linking scheme? When you link other sites than GitHub, are you miffed that your browser doesn’t automatically replace your link with archive.org? I think relative-to-HEAD is a useful convenience and my suggestion is in the vein of making it better, not ruining all of your specific-commit links.

                                                                                                                                  1. 1

                                                                                                                                    My mental model is that when I make simple actions like taking a step or clicking on a link, I expect the response to them to be simple. If I see a link to a file on github I expect it to go to HEAD. If I try to go to a link and it doesn’t exist I want to know that so that I can decide whether I want to go to archive.org. I would be miffed if I was automatically taken to archive.org. I wouldn’t be miffed if that happened because I chose to install an extension or turn on some setting to automatically be taken to archive.org.

                                                                                                                                    As a second example, the autocomplete on my phone has lately gotten smart enough to enter some uncanny valley where it’s often doing bullshit like replacing ‘have’ with ‘gave’. It has also started replacing words two back, so even if I check each word after I type it I still end up sending out crap I didn’t type. This kind of shit gives me hives. I’d rather go back to making typos with an utterly dumb keyboard than put up with crappy ‘smarts’ like this.

                                                                                                                                    You’re right that the current default on github is “arbitrary nonsense”, but I’d actually prefer it to your smart carry-forward – unless I could choose when I want to try it at a time of my own choosing.

                                                                                                                                    1. 1

                                                                                                                                      If I see a link to a file on github I expect it to go to HEAD.

                                                                                                                                      Do you then also disagree with auto-‘y’ (auto archive.org), the perspective of the person who started this thread?

                                                                                                                                      You’re right that the current default on github is “arbitrary nonsense”, but I’d actually prefer it to your smart carry-forward – unless I could choose when I want to try it at a time of my own choosing.

                                                                                                                                      But, arbitrary nonsense doesn’t work, just like your crappy smartphone autocomplete doesn’t work. That’s the whole point of the linked post, and the perspective of virtually everyone I’ve talked to about this issue. You are using simplicity and predictability as measuring sticks here, but the behavior of a GitHub relative-to-HEAD link a year from now is completely unpredictable. Isn’t it worth investigating alternatives that might work, even if they might end up sucking like phone autocomplete? I mean, I do acknowledge that you are saying “I could choose when I want to try it”, but I can’t help but feel like this is a way of ducking engaging with the idea.

                                                                                                                                      1. 1

                                                                                                                                        Yes, I definitely don’t mean to rain on your parade. My initial comment was just to inject a note of caution and a competing viewpoint. Absolutely, this is all fascinating and worth trying. I think I’m just frustrated with my phone :)

                                                                                                                                        I didn’t find OP’s auto-y unpredictable because as the reader I can see that a shared URL refers to a particular tree state. Whereas I interpreted your proposal to be taking a URL to a specific tree state but then showing me something outside of that tree. Is that accurate?

                                                                                                                                        Coda: after I first submitted this comment I noticed that my phone had autocompleted ‘auto-y’ to ‘auto-insertion’. Sigh..

                                                                                                                                        1. 2

                                                                                                                                          Whereas I interpreted your proposal to be taking a URL to a specific tree state but then showing me something outside of that tree. Is that accurate?

                                                                                                                                          Technically accurate, but maybe missing intent. The intent of such a URL is to point to HEAD. To make this completely concrete, instead of links like

                                                                                                                                          https://github.com/Droogans/.emacs.d/blob/mac/init.el#L135

                                                                                                                                          we would have links like

                                                                                                                                          https://github.com/Droogans/.emacs.d/blob/mac/init.el#6ecd48/L135 (if GitHub supported this type of URL, clicking this would take you here)

                                                                                                                                          We use 6ecd48 here similar to how on the web we use redirects to ensure that URLs always point to the latest content, or in the case where the line was deleted, error 404 + “maybe check here?”

                                                                                                                                          I didn’t find OP’s auto-y unpredictable because as the reader I can see that a shared URL refers to a particular tree state.

                                                                                                                                          I think I would find this behavior more predictable if navigating to https://github.com/Droogans/.emacs.d/ started a new “session” by redirecting you to https://github.com/Droogans/.emacs.d/tree/aecc15138a407c69b8736325c0d12d1062f1f21b (the current commit HEAD points to). Otherwise, there is an inconsistency between your current browsing state (your URL is always pointing to HEAD) and the links you are sending to others.

                                                                                                                                          Coda: after I first submitted this comment I noticed that my phone had autocompleted ‘auto-y’ to ‘auto-insertion’. Sigh..

                                                                                                                                          Maybe if I owned a smartphone I would better understand everyone’s objections :)

                                                                                                                                          1. 2

                                                                                                                                            Indeed, this comment helps understand our relative positions. I see now that you’re right – what we find ‘predictable’ is subjective. I find OP’s auto-y more predictable partly because it fits my needs more often than a link to HEAD. Your hash/line proposal seems pretty reasonable!

                                                                                                                      2. 1

                                                                                                                        Git’s storage model and the functions provided in their CLI toolkit/libgit2/whatever GitHub uses are irrelevant. With the history and a modified diff algorithm you can identify lines across commits. Make an index of that data. GitHub already has to maintain a separate index for search, nothing new.

                                                                                                                        1. 3

                                                                                                                          With the history and a modified diff algorithm you can identify lines across commits.

                                                                                                                          What @jtdowney is saying is: No, you can’t. You can almost do it. You can often do it. But the only reliable way to know what code that link was talking about, is to show the original file where the line belongs.

                                                                                                                          1. 1

                                                                                                                            All of what you’re saying applies to diff algorithms in general. It’s necessarily a fuzzy, ill-defined task, and yet we have effective algorithms for solving it. Our whole software development industry is powered by diff, if it’s good enough for pull requests, it’s good enough for forwarding links to a current commit. We’re not talking about mission critical software here.

                                                                                                                            edit: also, that is not what the parent comment was saying. Their criticism was rooted in Git’s storage model and tooling, as if I was implying GitHub should shell out to a non-existent “git find-corresponding-line-in-commit” command. If someone wants to argue that what I am suggesting is not feasible it should not be grounded in Git implementation details.

                                                                                                                            1. 1

                                                                                                                              Yes.

                                                                                                                              Good points. I agree that a good-enough algorithm could show us the line in an updated context or detect that it can’t and then fall back to showing us the place where that line number was originally anchored.

                                                                                                                              And yes, the low-level tooling is irrelevant. No matter how you store the data, unless you have a human confirm that yes, this line has the same identity as that line, it’s still just a succession of changesets (as svn stores it), which is just the dual of a succession of trees (as git stores it).

                                                                                                                              Maybe the way bzr and svn handle file copies can help things along, but it’s not an essential piece of information (as git shows us), and there’s no tool out there that explicitly tracks line identities. Bzr does it in the low-level layer, but it’s still algorithmic, not human-verified, so that metadata has no more value than what git can infer after the fact. Basically it’s done to amortize some of the work of the diff/merge and for space efficiency.

                                                                                                                              Thank you. Worse is better. Don’t let the perfect get in the way of the good.

                                                                                                                  1. 11

                                                                                                                    Apologies to the author, but I find the comparisons to Haskell in this piece to be facile, and the practical examples of ways to solve the “nil problem” in Clojure to be glossing over the fact that nil remains a frustrating problem when nil is essentially a member of every single type (EDIT: although I should acknowledge that yes, the author does acknowledge this later in the piece–I just don’t find the rest compelling regardless). I don’t buy “nil-punning” just because it’s idiomatic (see the linked piece by Eric Normand); as far as I’m concerned talking about it as idiomatic is just putting lipstick on the pig. And fnil is a hack, the existence of which illuminates how Clojure didn’t get this right.

                                                                                                                    That said, I’m not a lisp guy in my heart, so maybe I’m missing the subtle elegance of nil-punning somehow–but I’d love to see a Clojure without nil in it, and I think it’s telling that these articles always end up defensively comparing Clojure to Haskell.

                                                                                                                    I believe that Clojure has some nice ad-hoc facilities for building composable abstractions, it’s got a lot of good stuff going for it and makes me want to shoot myself far less than any programming language I’ve used professionally to date. But as a user of the language I’ve found its type system, such as it is, to be incoherent, and I find the pervasiveness of nils to simply be a shitty part of the language that you have to suck up and deal with. Let’s not pretend that its way of handling nil is in any way comparable to Haskell (or other ML-family languages) other than being deficient.

                                                                                                                    P.S. Per your first if-let example: when-let is a more concise way to express if-let where the else term is always going to evaluate to nil.

                                                                                                                    1. 11

                                                                                                                      Glad I’m not the only Clojurian who had this knee jerk reaction. nil is not Maybe t. Maybe t I can fmap over, and pattern match out of. With f -> Maybe t at least I know that it’s a function which returns Maybe t because the type tells me so explicitly and the compiler barks at me when I get it wrong. This gives certainty and structure.

                                                                                                                      nil is the lurking uncertainty in the dark. The violation of the “predicates return booleans” contract. The creeping terror of the JVM. The unavoidable result of making everything an Object reference. I never know where nil could come from, or whether nil is the result of a masked type or key error somewhere or just missing data or… a thousand other cases. Forgot that vectors can be called as functions and passed it a keyword because your macro was wrong? Have a nil….

                                                                                                                      Even nil punning doesn’t actually make that sense because it mashes a while bunch of datastructures into a single ambiguous bottom/empty element. Is it an empty map? set? list? finger tree? Who knows, it’s nil!

                                                                                                                      Edit: wait wait but spec will save us! Well no… not unless you’re honest about the possibility space of your code. Which spec gives you no tools to infer or analyze.

                                                                                                                      Have a nil.

                                                                                                                      </rant>

                                                                                                                      1. 4

                                                                                                                        That said, I’m not a lisp guy in my heart, so maybe I’m missing the subtle elegance of nil-punning somehow

                                                                                                                        Minor nitpick–please don’t lump all lisps in with nil-punners! Racket and Scheme have managed to avoid that particular unpleasantness, as has LFE.

                                                                                                                        1. 3

                                                                                                                          Not to mention Shen with a type system that subsumes Haskell’s…

                                                                                                                          1. 3

                                                                                                                            Even in Common Lisp, nil isn’t an inhabitant of every type. If you add (optional) type declarations saying that a function takes or returns an integer, array, function, etc., it isn’t valid to pass/return nil in those places. I think this is more of a Java-ism than a Lisp-ism in Clojure’s case.

                                                                                                                            1. 1

                                                                                                                              Isn’t nil indistinguishable from the empty list though, in Common Lisp?

                                                                                                                            2. 2

                                                                                                                              Yeah, just take it as further evidence that I’m not “a lisp guy” in that I conflated all lisps together here–sorry!

                                                                                                                              1. 1

                                                                                                                                In that case I’ll also forgive being wrong in the ps about if vs when.

                                                                                                                                1. 1

                                                                                                                                  Oh–would you explain further then? I always use when/when-let in the way I described (EDIT: and, I should add, many Clojure developers I’ve worked with as well assume this interpretation)–am I misunderstanding that semantically, fundamentally (from a historical lisp perspective or otherwise)?

                                                                                                                                  1. 4

                                                                                                                                    The ancient lisp traditions dictate that when you have two macros and one of them has an implicit progn (aka do) then that one should be used to indicate side-effects.

                                                                                                                                    I know not all Clojure devs agree, but when you’re arguing that you should be less concerned than CL users about side-effects it’s probably time to reconsider your stance.

                                                                                                                                    1. 1

                                                                                                                                      Thanks!

                                                                                                                                      EDIT: I want to add a question/comment too–I’m a bit confused by your mention of side-effects here, as in my suggestion to use when vs. if in these cases, I’m simply taking when at face value in the sense of it returning nil when the predicate term evaluates to false. I can see an argument that having an explicit else path with a nil return value may be helpful, but I guess my take is that…well, that’s what when (and the when-let variation) is for in terms of its semantics. So I guess I’m still missing the point a bit here.

                                                                                                                                      …but nonetheless appreciate the historical note as I’m a reluctant lisper largely ignorant of most of this kind of ephemera, if you will forgive me perhaps rudely qualifying it as such.

                                                                                                                                      1. 4

                                                                                                                                        The typical Lisp convention is that you use when or unless only when the point is to execute some side-effecting code, not to return an expression, i.e. when means “if true, do-something” and unless means “if false, do-something”. This is partly because these two macros have implicit progn, meaning that you can include a sequence of statements in their body to execute, not just one expression, which only really makes sense in side-effectful code.

                                                                                                                                        So while possible to use them with just one non-side-effecting expression that’s returned, with nil implicitly returned in the other case, it’s unidiomatic to use them for that purpose, since they serve a kind of signalling purpose that the code is doing something for the effect, not for the expression’s return value.

                                                                                                                                        1. 3

                                                                                                                                          Thanks mjn, that helps me understand technomancy’s point better.

                                                                                                                                          Fully digressing now but: more generally I guess I’m not sure how to integrate these tidbits into my Clojure usage. I feel like I’ve ended up developing practices that are disconnected from any Lisp context, the community is a mix of folks who have more or less Lisp experience and the language itself is a mix of ML and Lisp and other influences.

                                                                                                                                          In any case, thank you and technomancy for this digression, I definitely learned something.

                                                                                                                            3. 3

                                                                                                                              Oh I don’t think this solves the nil problem completely, but it’s as good as you can get in Clojure. If you’re coming from Java and have no or little experience in an ML-ish language, then this conceptual jump from null to Nothing is a bit difficult, so this article was primarily written for those people (beginner/intermediate Clojurists).

                                                                                                                              Also you have to admit that, beyond Turing completeness, language features are an aesthetic. We can discuss the pros and cons of those features, but there is no One Right Way. A plurality of languages is a good thing - if there were a programming language monoculture we would have nothing to talk about at conferences and on internet forums :)

                                                                                                                              1. 6

                                                                                                                                Also you have to admit that, beyond Turing completeness, language features are an aesthetic. We can discuss the pros and cons of those features, but there is no One Right Way.

                                                                                                                                It sounds to me like you are saying that all ideas about PL are of equal worth, which I disagree with. One can have a technical discussion about PL without it ultimately boiling down to “that’s just, like, your opinion man”

                                                                                                                                1. 5

                                                                                                                                  That’s not at all what I’m saying. It’s like in art, you can say that the Renaissance period is more important than the Impressionist period in art history, but that doesn’t mean Renaissance art is the “correct” way to do art. We can also have a technical discussion of artworks, but we weigh the strengths and weaknesses of the artworks against the elements and principles of art and how skillfully they are composed, we don’t say one artwork is correct and the other is incorrect. This is the basics of any aesthetic analysis.

                                                                                                                                  Likewise in programming language design, we can discuss the technical merits of features, e.g. yeah Haskell has an awesome type system and I really enjoy writing Haskell, but that doesn’t mean Haskell is categorically better than Clojure. When you have a ton of legacy Java to integrate with (as I do), Clojure makes a lot of sense to use.

                                                                                                                                  1. 3

                                                                                                                                    PL criticism is done on utilitarian grounds, not aesthetic grounds. You acknowledge as much in your second paragraph when you give your reason for using Clojure. I guess you can look at PL as art if you want to but I don’t like that mode of analysis being conflated with what researchers in the field are doing.

                                                                                                                                    1. 2

                                                                                                                                      PL criticism is done on utilitarian grounds, not aesthetic grounds.

                                                                                                                                      Why not both?

                                                                                                                                      1. 1

                                                                                                                                        When you’re trying to figure out which language is better, it’s not a question of aesthetics.

                                                                                                                                        Though to be fair, “better” is really difficult to nail down, especially when people end up arguing from aesthetics.

                                                                                                                            1. 3

                                                                                                                              I disagree with the premise of “Undefined Behavior != Unsafe Programming”. Regehr argues:

                                                                                                                              Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust.

                                                                                                                              I agree with the problem presented: some operations have pre-conditions on safety, and there needs to be a mechanism to separate checking those pre-conditions (whether dynamically or statically) and executing the operation itself. I disagree that C-style UB is the solution here.

                                                                                                                              What would I propose? Extending LLVM’s simple type system to dependent types, and having compilers actually produce proof terms (which can be erased for no performance penalty) to justify use of unsafe operations. If the check needs to be done dynamically, then pipe the proof produced from the check to where the operation is being done. If the check is done statically, then at some point the compiler proved that this was safe–so modify the compiler to actually produce that proof! Don’t just throw it away!

                                                                                                                              The FP community went through a similar phase in the 90’s and mid 00’s. After an extremely complicated type inference step, frontends would throw away all evidence constructed that the program was type safe, and would operate on an untyped intermediate form (I think OCaml still works like this, but I’m not sure). It turns out this is a bad idea, you lose nothing by keeping the proof terms around, and you gain a lot of confidence in your compiler if it can preserve type-safety through N stages of optimization. A similar argument applies to LLVM: by keeping proof terms around, you can detect (more) ICEs, which would be quite beneficial for LLVM.

                                                                                                                              1. 2

                                                                                                                                and having compilers actually produce proof terms (which can be erased for no performance penalty) to justify use of unsafe operations.

                                                                                                                                Not quite sure how that would work, because often users of the language will write deliberately-unsafe code for doing low level things or whatever, and then the burden of providing a proof is on them, which needs language changes.

                                                                                                                                Besides, by adding dependent types here there’s a good chance it becomes even harder to transform the code. LLVM IR is an IR that is meant to be shuffled around and such. The limitations of the IR typesystem limit its ability to be shuffled around. This is both a good and bad thing, because this may prevent invalid optimizations, but it may also prevent valid ones. I’m not sure how easy such a dependently typed system is to transform; in such a case they may end up with a second-level IR which is easier to transform, which reintroduces the same problem.

                                                                                                                                But yeah, it’s an interesting solution, which could work (it’s just not immediately obvious to me that it will).

                                                                                                                                1. 1

                                                                                                                                  Not quite sure how that would work, because often users of the language will write deliberately-unsafe code for doing low level things or whatever, and then the burden of providing a proof is on them, which needs language changes.

                                                                                                                                  Generally dependently typed systems have an escape hatch, Coq has the “Admit” statement, Idris has the believe_me term. Unsafe languages can use these to bypass proof obligations, and safe languages can use them to incrementally move over to a safer intermediate representation.

                                                                                                                                  Besides, by adding dependent types here there’s a good chance it becomes even harder to transform the code. LLVM IR is an IR that is meant to be shuffled around and such. The limitations of the IR typesystem limit its ability to be shuffled around. This is both a good and bad thing, because this may prevent invalid optimizations, but it may also prevent valid ones. I’m not sure how easy such a dependently typed system is to transform; in such a case they may end up with a second-level IR which is easier to transform, which reintroduces the same problem.

                                                                                                                                  It does indeed make transformations more difficult, because your manipulations essentially have to encode their own proof of type safety. You’ll need some sort of logic/proof subsystem for manipulating open terms and yeah the limitations of that logic will limit valid optimizations. Although you can again fall back to “admit” and fill in the gaps of your logic/transformation later.

                                                                                                                                  1. 1

                                                                                                                                    Generally dependently typed systems have an escape hatch, Coq has the “Admit” statement, Idris has the believe_me term. Unsafe languages can use these to bypass proof obligations, and safe languages can use them to incrementally move over to a safer intermediate representation.

                                                                                                                                    This loops back into the original problem. Now you have a statement marked believe_me – what transformations are allowed on that? You would still need some kind of undefined behavior for this, though perhaps at a higher level.

                                                                                                                                    Then again, “some kind of UB at a higher level” is something we have already, so it’s still a strict improvement. Hmm.

                                                                                                                              1. 7

                                                                                                                                Interesting post, thanks.I think the Jazz Standard concept is useful to programming but that may just be because I play the trumpet. I don’t agree that Fizzbuzz is a standard, though.

                                                                                                                                The value of standards comes from establishing enough rules & structure that others can productively work with you towards a common goal. So I would say the closest analogy to programming is that a standard is a type signature. A type is an approximation of a program and likewise the chord changes of the song are its type signature. Types allow constraining inputs of functions and provide guarantees such as memory safety, and chord changes constrain the set of notes playable and provide guarantees such as song won’t be overly dissonant.

                                                                                                                                If you see it like I do, calling Fizzbuzz a standard is a category error. I would classify Fizzbuzz as a meme. When my friends talk about getting Fizzbuzz at an interview we laugh about it. Fizzbuzz is in Fizzbuzz Standard Edition because its funnier like that. A team of programmers isn’t gonna get together and say “alright lets riff on Fizzbuzz” unless its followed by “haha nah, but seriously”. However, they might start by establishing what interfaces they want their parts to satisfy so they can collaborate without stepping on each others toes.

                                                                                                                                1. 2

                                                                                                                                  Interesting distinction! I would like to respond; I am going to think on that for a while. Thanks for reading!

                                                                                                                                1. 5

                                                                                                                                  I would rather just rewrite the thousands of lines of code. You already have the architecture laid out so this is a well-defined and easy (albeit boring) task. Writing a Python interpreter, even a small one, will require some research. And when I read

                                                                                                                                  OPy will reflect this evolution. It could even end up a specialized language for writing languages — a meta-language — rather than a general-purpose language like Python.

                                                                                                                                  I wonder if you’re keeping your eyes on the prize, no offense but this sounds like the kind of dicking around I like to do when I’m bored of whatever I’m supposed to be working on. You’re trying to make a shell, not a meta-language, right?

                                                                                                                                  1. 4

                                                                                                                                    Especially since multiple meta-languages already exist, such as SML, Ocaml, and I’d include Haskell here. And for Python-like language, the PyPy people made a simple meta-language.

                                                                                                                                    The upside to these is that they come with optimizing compilers, debuggers, community, existing libraries, and semantics.

                                                                                                                                    If the goal is small, surely something like Lua makes more sense than writing ones own Python implementation.

                                                                                                                                    1. 3

                                                                                                                                      On top of that, they (esp Ocaml) come with libraries specifically for writing compilers faster.

                                                                                                                                      1. 3

                                                                                                                                        After using ASDL a lot, I definitely wondered if I should have written it in OCaml. In 2013, I did another parsing project in Python, and I actually resolved to write future projects like that in OCaml.

                                                                                                                                        But there are a couple problems with OCaml. I would say it’s a meta-language for semantics and not syntax. ANTLR/yacc are meta-languages for syntax. In particular:

                                                                                                                                        • OCaml’s Yacc is NOT good enough for bash. It misses by a mile. If you look at the first half of my blog, you will see I spent a lot of time developing a somewhat novel parsing algorithm for bash.
                                                                                                                                        • Moreover, the VERY first thing you hit when writing a language – the lexer, is heavily stateful. OCaml is worse than C for writing this kind of code. Parsers and lexers are slightly paradoxical, because from the outside they’re purely functional, but from the inside they’re heavily stateful. I don’t like writing imperative code in OCaml; there’s no real benefit.

                                                                                                                                        The second problem is that OCaml is not a good meta-language for itself. I talked about that in “Type Checking vs. Metaprogramming” [1].

                                                                                                                                        So yes I’m daydreaming about a meta-language that’s a cross between ML/Lisp/Yacc/ANTLR – some details here [2]. But it is toward a somewhat practical end. I actually have to implement multiple languages (OSH, Oil, awk/make). If it can implement those languages, I’ll call it done.

                                                                                                                                        Now, I’m not necessarily saying that OCaml would be worse than Python. I certainly worked around a lot of Python’s quirks, no doubt. But they both have flaws for this application, and I chose the one I was more familiar with and which had a bigger toolset.

                                                                                                                                        [1] http://www.oilshell.org/blog/2016/12/05.html

                                                                                                                                        [2] https://www.reddit.com/r/ProgrammingLanguages/comments/62shf7/domain_specific_languages_for_building_compilers/

                                                                                                                                        1. 3

                                                                                                                                          I am fairly familiar with some lexing/parsing tools for OCaml and I wonder if some of the limitations you mention have not been lifted by some advanced users encountering similar needs.

                                                                                                                                          I don’t know exactly what you mean by “novel parsing algorithm for bash” (would you have more precise pointers), but for example your blog mentions the idea of having several lexical states, and it happens that this question has been studied by people writing parsers for multi-languages (say HTML+CSS, etc.) using standard lex+yacc-inspired parsing tools. See Dario Texeira’s on-the-fly lexer switching with Menhir for example.

                                                                                                                                          Menhir recently gained an incremental parsing mode where, instead of the parser taking a (lexbuf -> token) lexer as argument, there is an explicit representation of the parser state as a persistent value (a state may be accepting, an error, expect further tokens, etc.), and a function that takes a state and a token and returns the next parsing state. This is advanced and not as convenient as the usual approach, but it gives you a lot of flexibility (you can inspect the state in the middle, and for example decide to switch to a different lexer if you want) that would allow to do lexer-switching in a nicer way than described above.

                                                                                                                                          Finally, while Menhir would be my goto-choice of parser generator technology for OCaml – and I haven’t understood yet whether bash parsing is fundamentally non-LR(1) – you may want to experiment with Dypgen, a GLR parser that also explicitly supports scannerless parsing.

                                                                                                                                          1. 2

                                                                                                                                            Off the top of my head, the difficulties with parsing shell are:

                                                                                                                                            • It has to interact with readline. I call this the $PS2 problem. When you hit enter after if true, it should print PS2 rather than executing the command. Most shells do something ad hoc for this and get it wrong in some cases, but my parser handles it without any special cases.
                                                                                                                                            • I reuse the parser for tab completion. This imposes some constraints on error handling (parsing incomplete commands). At least some shells do this with another half / ad hoc shell parser and get it wrong.

                                                                                                                                            Both of these make bottom-up algorithms unsuitable IMO. You want control over how much input you are reading. And you want to know which production you are looking at earlier rather than later.

                                                                                                                                            Some other issues:

                                                                                                                                            • Shell is best thought of as not one language, but four mutually recursive languages implemented by four parsers. Something like "dq"$(echo)'sq'${var:-default}unquoted is a single word, and the word language is fully recursive. But the word is used as token for something like for i in x y z. It is unusual structure and many parser generators don’t accomodate it (e.g. ANTLR).

                                                                                                                                            I should have really said that I don’t want to use a parser generator at all. It might be possible, but I don’t think it would save any work over a hand-written parser due to the work above. The blog post you linked seems to indicate that the approaches used are a little hacky.

                                                                                                                                            I have some daydreams about writing a custom parser generator to express it.

                                                                                                                                            But as mentioned, OCaml is close and I wondered if I should have used it. One of the thing that tips it is that not only don’t want a runtime dependency on a Python interpreter, I don’t want any build time dependencies besides a C compiler either. In the world of shell, configure && make && make install is still useful and I think an expected workflow.

                                                                                                                                            EDIT: I also have the usual complaints about custom error messages and parser generators. I don’t have any specific experience with Menhir, but there are definitely places in shell where the location of the parse error is not the one you want to point out to the user.

                                                                                                                                          2. 3

                                                                                                                                            Ocaml is only one of the languages I mentioned, there are other options. In case you are not aware, though, there are other parser libraries such as menhir and stream combinator parsers. I’d also wager it’s easier to make a specialized parser DSL in Ocaml than building a Python implementation. The types alone mean you can get a well typed AST out of it.

                                                                                                                                            I’m also very surprised that you claim Python is better than something like Ocaml for parsing, that has not been my experience at all. I still can’t really imagine that a home-brewed mini-Python is better in terms of all the other aspects that come with a language implementation such as optimizations and debugging.

                                                                                                                                            But it’s your project, so have fun!

                                                                                                                                            1. 2

                                                                                                                                              Menhir is still a LALR(1) generator as far as I know, and that algorithm doesn’t work with shell at all. I always see people recommending parser combinators, but I can literally think of zero production quality languages that use them. And I’ve looked at perhaps 30 “real” parsers.

                                                                                                                                              As far as I can tell, OCaml, SML, and Haskell all have the same thing going for them – algebraic data types and pattern matching. I don’t know of any non-toy or non-academic programs written in SML. OCaml seems to be what you use if you need to write something production quality. Between OCaml and Haskell, OCaml actually has a Unix tradition and has better tools as far as I can tell. Supposedly Haskell is not efficient with strings.

                                                                                                                                              I claimed that C is better than OCaml for writing hand-written lexers. I explicitly didn’t say that Python is better – I said the opposite. OCaml would probably be the #2 choice. The first post describes how I ended up with Python:

                                                                                                                                              http://www.oilshell.org/blog/2016/10/10.html

                                                                                                                                              I started with 3000 lines of C++ and realized I would never finish at that rate. The prototype became production… but I think that will end up being a good thing.

                                                                                                                                            2. 2

                                                                                                                                              “cross between ML/LISP/YACC/ANTLR”

                                                                                                                                              Closest I know is sklogic’s tool at:

                                                                                                                                              https://github.com/combinatorylogic/mbase

                                                                                                                                              He mixes and matches about every technique in the book in one system for building arbitrary applications in this space. The commercial use is a product for static analysis. He’s always DSL'ing problems. His main development is done in a mix of LISP, Standard ML, and Prolog. He threw SML in for stuff he wants done safely with easy analysis. Prolog is obviously for anything done easy with logic programming. He has LISP for everything else with the full information of the compiled app available to assist him. So, that’s the sort of thing you can do if you start with the right foundation in a language that can embed the rest on a use-case by use-case basis.

                                                                                                                                              Now, one like that outside .NET platform building on something like Racket Scheme might be quite interesting. Even better, something like PreScheme available at least selectively with added benefits of dynamic safety & concurrency of Rust. There’s a lot of options out there that build fundamentals on languages with strong ecosystems and prior work before one attempts to do fully-custom stuff.

                                                                                                                                              1. 2

                                                                                                                                                That’s exactly what I linked to in my post :)

                                                                                                                                                Yeah I don’t like the .NET platform and have some other reservations as you can see in the conversation there. I think doing something fully general purpose may be infeasible, but if you narrow the goals to expressing/ reimplementing the DSLs in Unix, of which there are many, it can probably be done.

                                                                                                                                                1. 1

                                                                                                                                                  Oh I missed that. I thought it was a Reddit elaboration of Oil or something. Ok, you looked at the toolkit, I agree on .NET, and there were disputes about what it could handle. Most of these I’d have said try and see. I see many people argue with him about whether his method could handle a job he’s already used it for. I think he said on HN he uses it to build static analyzers for C and Verilog that his company sells… written in a smooth combo of something like 4-8 DSL’s. I usually say, “Seriously? You think he doesn’t know how to parse weird stuff?” Your case is an exception where it’s not clear cut because Bash might be the pain you describe from my own research.

                                                                                                                                                  I remember looking up a formal semantics. Plus, an off-hand comment by David Wheeler in our verifiable builds discussion got me thinking about using it for a bootsrapping of untrusted compiler since bash is always there & already trusted. So, needed to see if people wrote compilers in it or even formalized it (aka precisely describe it). I found a partial formalization that talked about how difficult it was since it was like several languages at once with static and dynamic aspects. That’s exactly what you said on Reddit. I thought, “How we going to cleanly parse or verify it if they can’t friggin describe its semantics?” The other thing I found were compilers and such that the people semi-regretted writing. Reluctantly shared with caveats haha. Still, even if not sklogic’s method, such a thing made me think you might get best results using two or more fundamental techniques that are each fast instead of one-size-fits-all method. At least keep it in back of your mind as he does it regularly and high-assurance did too. They did sequential stuff in ASM’s and concurrency in SPIN. I could also definitely see Bash done in a rewriting logic with primitives or heuristics for corner cases since they can handle about anything if it’s parsing and transforms. Maude & K framework come to mind.

                                                                                                                                                  So, I guess I can’t offer any specific help on this one past remember using more than one tool might be better on occasion for parsing. Not always but sometimes. In any case, I’m glad you at least got to look into such hybrid tools. Bash is also hard enough to cleanly model that it might be worth some PhD’s throwing effort at to produce a tool like GOLD parsing engine, Semantic Design’s, or sklogic’s that can handle it cleanly w/ straight-forward grammar + embedded functions + synthesis. Just can’t let ourselves be stopped by the challenge of something as crude and old as Bash. ;)

                                                                                                                                                  1. 1

                                                                                                                                                    What would you want to accomplish with a formal semantics of shell?

                                                                                                                                                    I have looked at papers for formalizing Python [1] and problems with formalizing ANSI C. The Python one gives examples of using it to express securities contracts and this pox networkiing project [2]. However AFAICT they didn’t say what you could actually prove about those types of programs.

                                                                                                                                                    I’m all for formalization but I would want it to be in service of something practical. Lamport’s TLA+ seems like a good example. People used it to find real bugs.

                                                                                                                                                    I suppose you could do some sort of taint analysis on programs to prove that data isn’t used as code, as in shellshock. However I think it is easier not to write such a silly bug in the first place :) That just feels too obvious.

                                                                                                                                                    Also, one of the reasons I want to keep it in Python is precisely because it’s easier to reason about the semantics of the interpreter than it would be in C or C++. For one, you get the memory safety guarantee for free. What other properties would you want to prove about shell?

                                                                                                                                                    Shell is definitely foundational, in that you generally need it to configure and build C programs. However I’m not sure the work on C foundations has borne that much fruit yet. I feel like dynamic analysis has been more practical and widely used (valgrind, ASAN).

                                                                                                                                                    [1] https://cs.brown.edu/~sk/Publications/Papers/Published/pmmwplck-python-full-monty/

                                                                                                                                                    [2] https://github.com/noxrepo/pox

                                                                                                                                                    1. 1

                                                                                                                                                      “What would you want to accomplish with a formal semantics of shell?’

                                                                                                                                                      Old rule is how can one get a simple, elegant solution to something one can’t even formally express? Among other things like verification. The formal semantics problems were just evidence it might be hard to parse or reason about.

                                                                                                                                                      “I’m all for formalization but I would want it to be in service of something practical. “

                                                                                                                                                      I agree. I’m thinking on parser generators with efficient synthesis in this case. Gotta formally describe it somehow, though.

                                                                                                                                          3. 3

                                                                                                                                            If I were just trying to make one shell, that would be true. But I’m writing an old shell OSH, a new shell Oil. The new one is bigger because it will have the functionality of Awk and Make.

                                                                                                                                            So yeah there is some yak shaving, but so far my yak shaving has paid off, like writing ASDL and the test framework.

                                                                                                                                            Of course, OSH is open source. I will be happy if someone doubts my plan, and instead wants to rewrite it in C or C++ for me. :) It’s true it isn’t that much code – around 12K lines of Python. It will probably expand into 20K lines of C++ or 30K lines of C if you’re up for it.

                                                                                                                                            Also, the meta-language thing is speculating about evolution AFTER getting OSH working on a normal Python interpreter.

                                                                                                                                            1. 4

                                                                                                                                              The new one is bigger because it will have the functionality of Awk and Make.

                                                                                                                                              Huh, this sounds a lot like mash: http://www.vitanuova.com/inferno/man/1/mash.html , http://www.vitanuova.com/inferno/man/1/mash-make.html

                                                                                                                                              1. 3

                                                                                                                                                Woah are there THREE mash shells? That’s a record. I listed two here! https://github.com/oilshell/oil/wiki/ExternalResources

                                                                                                                                                If there are any lessons I should take from that mash shell, please let me know.

                                                                                                                                                It feels obvious that make and shell should be the same program. The point of both is to start processes, and 90% of the lines of a Makefile are literally shell, or they’re simple assignments that can be expressed in shell.

                                                                                                                                                In case you missed it I wrote about that here: http://www.oilshell.org/blog/2016/11/13.html

                                                                                                                                                This makes it obvious: http://www.oilshell.org/blog/2016/11/14.html

                                                                                                                                          1. 10

                                                                                                                                            “Probe their mental model” conjures bad memories of StackOverflow for me. The Rust community may not have this problem but I quit SO years ago because people will respond to clear technical questions with “do you actually need to do this?” or “yes, but what is the real problem you are trying to solve?” and it’s just not worth the trouble. I’d like to encourage “You’re Doing It Wrong” types to answer the question being asked first and foremost, even if that answer is “that’s not possible”. Sometimes I don’t have a problem I’m trying to solve, and am trying to explore the limits of my tools. Sometimes a “best practices” solution doesn’t apply and the amount of context necessary to establish this is overwhelming and I’d rather not waste everyone’s time typing it up. In every case, I’d like to know the answer to my original question, I’ll decide for myself whether or not I was doing it wrong.

                                                                                                                                            1. 10

                                                                                                                                              The “do you need to do this” is because they get a lot of XY problem folks. I don’t really address this in the post, but I think potential XY should be met with something like “Are you sure you need to do that? Is there an underlying problem here? Else, in short, the solution is Z (or there is no solution because the tool can’t do that), and I can expand on that if you’d like.”.

                                                                                                                                              Anyway, yeah, I intended to mean that the probing comes after the actual solving – I kind of hint at that in my post, but don’t explicitly mention it I guess. Not sure how to work that in now.

                                                                                                                                              1. 12

                                                                                                                                                I usually go “I don’t fully understand you problem, could you give me a little more context of why you are doing this?”, which doesn’t run the danger of sounding blaming as much.

                                                                                                                                              2. 3

                                                                                                                                                I notice this many times doing a search.

                                                                                                                                                I google a specific question and see Stack Overflow answer. The chosen answer often does not answer the question in title. At least many times the next answer has more points and is an answer to the question in title.

                                                                                                                                                Also many theoretical questions are met with more of the same. I’m thinking about asking them maybe on operating systems subreddit.

                                                                                                                                              1. 9

                                                                                                                                                It’s been a while since I’ve used Gnome 3 but I remember preferring Unity to it at the time. Not really a big deal for me since most of my “window management” is done inside Emacs and Firefox.

                                                                                                                                                Ubuntu’s entry into phones reminds me a lot of Mozilla’s. Who was asking for this, and why did it have to come at the expense of their desktop offering?

                                                                                                                                                1. 2

                                                                                                                                                  For me Unity seemed to worked better out of the box than Gnome and I’ve been using it daily for years now. If it’s solid and just works, I really just forget about it.

                                                                                                                                                  I wonder if they will change the Gnome shell to look like Unity ? I mean the menu bar always at the top, the window buttons on the left…

                                                                                                                                                  1. 1

                                                                                                                                                    I used Gnome as my daily driver for a few years, and I really liked it. I’d also imagine that Ubuntu will modify and/or extend Gnome-shell to make it more Unity-like.