1. 2

    An IR is primarily a paradigm for compiler engineers. You can port optimizations to different IRs, but it shapes the thinking of the developer.

    I have used FIRM a lot and always viewed it as a graph of operations and data dependencies. When I look at a C program, in the back of my mind its FIRM representation appears.

    1. 9

      I think “lisper” on the HN version of this article had a great summary:

      “What’s really going on is that, in Lisp, code is a particular kind of data, specifically, it’s a tree rather than a string. Therefore, some (but not all) of the program’s structure is represented directly in the data structure in which it is represented, and that makes certain kinds of manipulations on code easier, and it makes other kinds of manipulations harder or even impossible. But (and this is the key point) the kinds of manipulations that are easier are the kind you actually want to do in general, and the kind that are harder or impossible are less useful. The reason for this is that the tree organizes the program into pieces that are (mostly) semantically meaningful, whereas representing the program as a string doesn’t. It’s the exact same phenomenon that makes it easier to manipulate HTML correctly using a DOM rather than with regular expressions.”

      1. 3

        I don’t think tree vs string is the difference. For all languages, it is a string of bytes on disk and a tree once parsed in memory. Lisp just has a closer correspondence between tree and string, which makes it cognitively easier. I don’t know where somebody would draw the line that they are considered “homo”, equal.

        1. 6

          I think at the point that the tree-literal syntax is the same as the language-proper syntax is a pretty good point to consider it equal. You can’t express arbitrary javascript programs in JSON, or any other C-family languages code in data-literal syntax. The lisp family however uses the same syntax for representing data as it does for representing its program.

          1. 5

            Lisp just has a closer correspondence between tree and string, which makes it cognitively easier

            Maybe not just cognitively but also in terms of programming. Many languages have complicated parsing and contexts where transformations aren’t as simple as an operation on a tree with regularity in its syntax and semantics.

            1. 2

              Right. Von Neumann machine code is homoiconic, but I don’t think it exhibits many of the purported advantages of Lisp?

              1. 2

                The one’s Ive seen are definitely harder than LISP’s to work. Now, might be different if we’re talking a Scheme CPU, esp if microcoded or PALcoded. With SHard, it can even be built in Scheme. :)

        1. 1

          Last step: use tools to benchmark. There are various speed test websites. They can give you additional tips. Maybe you can compress an image better for example or you forgot gzip compression.

          1. 4

            Should have put a Spectre-Exploit on the site and count how many vulnerable visitors he got for extra lulz.

            1. 2

              Open the fake one to get hit by the real one. That’s a nice idea.

            1. 2

              In what other languages would it be possible?

              I guess everything with properties (functions disguised as fields) so D, C#, etc.

              Afaik not with C, C++, or Java.

              1. 26
                #define a (++i)
                int i = 0;
                if (a == 1 && a == 2 && a == 3)
                1. 1

                  Isn’t that undefined behavior? Or is && a sequence point?

                  1. 3

                    && and || are sequence points. The right expression may never happen depending on the result of the left, so it would make things interesting if they weren’t.

                2. 10

                  This is very easy to do in C++.

                  1. 5

                    You can also do it with Haskell.

                    1. 3

                      Doable with Java (override the equals method), and as an extension, with Clojure too:

                      (deftype Anything []
                        (equals [a b] true))
                      (let [a (Anything.)]
                        (when (and (= a 1) (= a 2) (= a 3))
                          (println "Hello world!")))

                      Try it!

                      Or, inspired by @zge above:

                      (let [== (fn [& _] true)
                            a 1]
                        (and (== a 1) (== a 2) (== a 3)))
                      1. 3

                        Sort of. In Java, == doesn’t call the equals method, it just does a comparison for identity. So

                         a.equals(1) && a.equals(2) && a.equals(3); 

                        can be true, but never

                         a == 1 && a == 2 && a == 3;
                      2. 3

                        perl can do it very simply

                        my $i = 0;
                        sub a {
                        	return ++$i;
                        if (a == 1 && a == 2 && a == 3) {
                        1. 2

                          Here is a C# version.

                          using System;
                          namespace ContrivedExample
                              public sealed class Miscreant
                                  public static implicit operator Miscreant(int i) => new Miscreant();
                                  public static bool operator ==(Miscreant left, Miscreant right) => true;
                                  public static bool operator !=(Miscreant left, Miscreant right) => false;
                              internal static class Program
                                  private static void Main(string[] args)
                                      var a = new Miscreant();
                                      bool broken = a == 1 && a == 2 && a == 3;
                          1. 2

                            One of the ‘tricks’ where all a’s are different Unicode characters is possible with Python and Ruby. Probably in Golang too.

                            1. 7

                              In python, you can simply create class with __eq__ method and do whatever you want.

                              1. 4

                                Likewise in ruby, trivial to implement

                                a = Class.new do
                                  def ==(*)
                                a == 1 # => true
                                a == 2 # => true
                                a == 3 # => true
                            2. 2

                              In Scheme you could either take the lazy route and do (note the invariance of the order or ammount of the operations):

                              (let ((= (lambda (a b) #t))
                                     (a 1))
                                (if (or (= 1 a) (= 2 a) (= 3 a))
                                    "take that Aristotle!"))

                              Or be more creative, and say

                              (let ((= (lambda (x _) (or (map (lambda (n) (= x n)) '(1 2 3)))))
                                      (a 1))
                                  (if (or (= 1 a) (= 2 a) (= 3 a))
                                      "take that Aristotle!"))

                              if you would want = to only mean “is equal to one, two or three”, instead of everything is “everything is equal”, of course only within this let block. The same could also be done with eq?, obviously.

                              1. 1

                                Here is a Swift version that uses side effects in the definition of the == operator.

                                import Foundation
                                internal final class Miscreant {
                                    private var value = 0
                                    public static func ==(lhs: Miscreant, rhs: Int) -> Bool {
                                        lhs.value += 1
                                        return lhs.value == rhs
                                let a = Miscreant()
                                print(a == 1 && a == 2 && a == 3)
                              1. 13

                                As someone whose users are almost exclusively in emerging markets and remote and underserved locations. This doesn’t even touch the surface. I’ve been in some places with a proper broadband connection in an enterprise environment and been completely unable to load lots of common web sites.

                                Not only that, we’re constantly developing for whatever the latest version of android and iOS, windows. A lot of the world is just getting away from Android 4.4 to be honest.

                                Just the same with laptops. There are a lot of 1024x768 laptops roaming around out there still, users who skip big updates because they just plain take too long to download or are impossible to download on a flaky or expensive connection. When you see those charts showing the breakdown of the market of how many people are using a specific version of an OS. A certain chunk of those people who are still running Old, insecure, feature-lacking versions that you give a chuckle or tsk tsk to, literally can’t upgrade.

                                People are climbing to the tops of hills near their villages to try and get enough signal to make a phone call still, we have situations where someone has to travel around weekly from town to town to sync data from users phones because the cell coverage and data have been too flaky.

                                Another thing to remember is that you might be costing these people real money. You will not find cheap 2gb monthly data plans in some of these places. A lot of people are using the cheapest top up cards. So when your web site eats through their data plan by serving umpteen js files and making a billion calls to external services, serving bloated Ajax payloads, tracking calls, etc… just to load your home page, you’re actually using up a finite resource for a user as they may have to travel to another town or wait ages to be able to afford another top up.

                                This scales out though to all aspects of a digitally connected existence. Overly designed emails loading external image assets. That png image that you never compressed properly because hey, 1mb images aren’t that bad on your fibre connection. Compression ratios for video streams, ads that stream audio on page load, ads themselves, persistent connections constantly passing data back and forth, etc…

                                Pages that fall apart when one of the assets times out downloading though the page is sitting there on screen.

                                I’m not saying that we should strip all of the good stuff we’ve managed to create, or that we should go back to the geo cities ages. Or even that we should always build for the least least least common denominator all the time, but if you’re a content producer or web app or desktop/mobile app that wants to be usable by everyone, everywhere you need to exercise a bit of common sense. If you’ve got a successful app that’s failing to pick up in some key target locales that you wanted to serve, or you have no traffic from for instance countries in the African continent then it might be worth investigating what your product operates like in those regions.

                                The other half of this is that our innate ability to understand the internet and how it works isn’t the norm for everyone. In the west and developed countries, we have an almost innate immediate understanding of how a piece of software or a web site works from years of exposure to software and technology. We know how a button should look, what a select field does, how urls work.

                                For some populations, even email is still a foreign concept. I had a user who just assumed that his first and last name @gmail.com would belong to him and he could access it because that’s how email works right?

                                Or had users who fill out a form and show legitimate fear about pushing the submit or save button on screen because it feels very final and they don’t know what will happen when they do it. It’s a scary door and you don’t know what’s on the other side. It’s all still very much magic.

                                We have so many people who don’t have email addresses themselves that we started just making up email addresses on our domain (not real accounts). A lot of users put it in as their login having no idea why that funny @ symbol is in their username.

                                Sorry, rant over. Just this is something that over the years has been a constant driver in the back of my head and informs a lot of my design and development patterns. And it’s more complicated than just needing to minify and gzip your stuff.

                                1. 3

                                  Problem is it’s really really hard to develop software to solve problems you didn’t even know existed. I’d love to make software that works for everyone but I don’t know how everyone uses their devices. I don’t know how a colorblind person sees or how a person in africa understands how to use email. And I certainly don’t have the money to visit all these people and ask them. The best I can do is build it to cover all the use cases I understand and people can send me an email or open a bug report if I missed something thats really important to them.

                                    1. 3

                                      I’m not saying that you should build your software to meet every conceivable combination of personal IT experience, connectivity requirements, etc… far from it as that’s pretty much not feasible even for a lot of large teams. In my case, my target users are these groups specifically and I’ve seen the frustration first-hand where people have been asked for instance to install an app on their phone and because all the person has is an android 4.4 phone, the app doesn’t exist for them. Or we’ve had a user call up because their data plan is depleted and the only thing they did was browse some news sites and blogs so we have to top up their sim cards constantly.

                                      I’m just saying, if you’ve got an app, application, web app, mobile app or whatever, and you look and see that you have a high rate of abandonment, or no installs whatsoever in a region and you’d like to have uptake in that region, you need to think about how your product would operate under those circumstances. Even large-scale enterprises are guilty of this from what I’ve seen, so it’s not just a young developer not knowing any better situation. I’ve had big corps pitch their well-developed software solutions for some of our situations and their applications fall flat on their face in the meeting because they just couldn’t deal with the network health in our locations.

                                      If you’re happy to never address the segment of the market then don’t worry about it, but if you want a bigger piece of the pie, you don’t have to travel to Africa to identify potential problems, turn on throttling in chrome dev tools and throttle down to a 3g or lesser connection, Use a connection throttler to test how your mobile app or desktop acts when you run it. Rename your largest JS dependency to something else and see what happens to your page when you try to load it (basic simulate a timeout on a resource). Look at chromes network tab and tally up how much data actually comes into your app when you load the page. Simple stuff, basic common sense stuff.

                                      There was another article on here a few days ago about someone who provides a service that allows people to use WhatsApp on lower-end phones through a browser and people went nuts for it. If you can work around it, you can achieve some great growth.

                                      And I don’t just mean the African continent either, there are plenty of people in rural UK, America, Canada, South America, Europe and the Pacific that are surviving on 3g modems, tethering their cell phones, etc… I can see a major town from my house, but I can’t get any better than a dial-up connection speed on my broadband and some days I have to resort to using a 3g modem to get any kind of internet at all, I have to walk out of the house and up the road a bit to get a good cell phone signal. I can literally pull out my driveway and people have fiber to their property. There are swathes of developed countries populations which don’t have access like you would in a metropolitan environment still.

                                      If addressing these people isn’t a concern, then don’t worry about it, well, maybe have a little common sense with using compression, clear out deprecated and unnecessary code, style assets, etc… (which does go a long way) and carry on with your life. But if you want to grow globally outside of your current market, you’ll need to adjust your thinking a little bit, set some constraints on how big you’ll allow your application to be, don’t import an entire date library just to format dates, give yourself a delay time before adopting newer things like flex-box, etc…

                                      1. 7

                                        I remember a story from Youtube: They saw bad loading times in their stats and went on to optimize that by making Youtube more lightweight. After the optimizations the stats got even worse! However, the usage went up. A lot more people, for whom Youtube timed out, were now able to use it.

                                        Unfortunately, Google cannot find me the original story. I believe it was some Google blog.

                                        1. 1

                                          Even a little bit like that goes a long way, long wait times are basically standard on loading a page for some people, so when a page does actually load it’s great.

                                          A weird instance we had once was a user that was using a 3g dongle for their data, and they were requesting a long-running process in one of the earlier versions of our system, the process would run, and they would sit there waiting for it to complete. For this user (and a few similar cases until we changed how the long-running process was called) their view of it was that it was still running hours later even though the job had finished ages before. Took forever to figure out that over his 3g dongle, the connection was getting swapped on the telecom side, when the app responded with the result, the user wasn’t there anymore even though they never lost their actual connection to the internet. The problem was there was nothing in between the user and our servers to report to the user that the request had failed because of a weird time out implemented on the telecoms side. Swapping to a hardline or office wifi and the process completed and worked fine.

                                  1. 13

                                    SQLite is much more practical than people give it credit for. Your application probably doesn’t need PostgreSQL. (Although PostgreSQL does have a lot of features that make it a nicer choice.)

                                    1. 2

                                      I’m a fan as well. My colleagues switch to PostgreSQL now, because SQLite has limitations with ALTER TABLE. The Play framework uses database evolutions written in SQL so that is easier with PostgreSQL.

                                    1. 7

                                      First of all, I love love love how vibrant the Lobsters formal methods community is getting. I’m much more likely to find cool FM stuff here than any other aggregator, and it’s awesome.

                                      Second, maybe I’ve been spending too much time staring at specifications, but I’m not seeing how level 1 is different from level 2. Is level 1 “this is broken for probable inputs”, while level 2 is “this is broken for some inputs”? Different in degrees of probability.

                                      1. 4

                                        Level 1 is statements about specific executions; level 2 is statements about the implementation. This is for all statements, not just correctness.

                                        1. 1

                                          First of all, I love love love how vibrant the Lobsters formal methods community is getting.

                                          Me too.

                                          I believe crustaceans care more for high-quality code than the average programmer. Maybe pride of the craftsmen? This is a distinction to Hacker News, where monetary considerations get more attention. Formal methods is certainly one of the big topics to improve the quality of code.

                                          1. 1

                                            I also love that we get more formal methods discussions taking place here!

                                            I am however not sure how much “depth” is accepted; should I post any paper I find interesting here, with short summaries why and personal reflections?

                                            1. 2

                                              I usually just post the papers. One thing I do, though, is try to make sure they show some practical use along with what they could or couldn’t handle. Especially in abstract so readers can assess with a glance. I’ll sometimes add summaries, reflections, brainstorming, etc if I feel it’s useful. I will warn the acceptance are hit or miss with many of the PDF’s getting 1-3 votes. Then, out of nowhere, they really appreciate one. Also, I only submit anything important on Monday-Friday since many Lobsters seem to be off-site on weekends.

                                              So, that’s been my MO. Hope that helps.

                                          1. 9

                                            Counterpoints: Correctness is Easily Defined (maybe), Levels Mix Up, and People Issues Over Tech Issues

                                            (or “Brace Yourself Peer Review is Coming!”)

                                            My typical model for describing correctness is Abstract, State Machine since ASM’s are more like Turing machines for structures. Really easy to learn for ordinary programmers compared to most formal models. Moore and Mealy state machines are basic for this on finite side. If modeling as such, then correctness is simply that your system produces the intended behavior in terms of external output and/or internal states/transitions upon given input(s). Yours are subsets of that at best. Definition 1 is subset observing one or more state machines give incorrect output for input. Definition 2 is subset identifying specific inputs and transitions that led to incorrect output. Definition 3 kind of jumps sideways to focus on arguments made about specifications rather than the specifications themselves. Both might be incorrect. Correctness still reduces to same thing that I could model each of your definitions with. Even modifying it to say “correct under reasonable operating assumptions” is same where “CPU, RAM, or HD aren’t faulty” becomes another input that must be true when CurrentState or Output are correct.

                                            In objective case, your argument that correctness is hard to define or has many definitions seems wrong. If applied to subjective definitions, it’s true where people come up with definitions narrow enough to miss important aspects of correctness. You do a good job illustrating several perspectives such as runtime, code, and logical analysis that can help or hurt depending on what aspect of correctness one wants to achieve. These levels might be too simplistic in real world to tech correctness, though. Some examples follow.

                                            In Level 2: Code, we do consider some behaviors that can not happen directly in code when writing reliable or secure programs. We might write code to get the algorithm correct first, then do an assessment of known issues outside of code that affect code, modify the code to include those solutions, and then we’re done. It still happens at code level even if originally the knowledge came from a different level. A good example is software caches that aren’t really about the semantics of the code but we do modify the code we write to reduce misses. It’s a requirement/design detail that, when applicable to code, causes a transformation of that code. Eventually, the coder might automatically write code that way as part of coding style (esp with caches). Other examples include running out of memory at random points, bit flips, or side channels in CPU’s. These have code styles to mitigate them people can use without even understanding the logic behind it which can be enforce by code analysis tools.

                                            In Level 3: Design/Logic, it assumes logic specs and implementation are a different thing. The most successful by the numbers use of formal methods in imperative programs are contracts and assertions. Most practical that subsets formal methods being Design-by-Contract. These tightly integrate logic dictating correctness with the source code where they’re both created and considered at Level 2. In some implementations, the assertions become runtime checks that are activated in Level 1. So, the assertion-oriented methods operate at Levels 2-3 simultanteously when writing specs/code with 1-3 happening together when testing specs/code.

                                            “That can only be seen when doing formal verification, where a program’s properties and assumptions are written just as concretely as its source code. “

                                            Or Design-by-Contract with verification condition generator. The more specs and VC’s they see, the more complex the behavior might be. These things could possibly be converted to English statements about the software, too. There’s ongoing work in natural-language provers. You’re right that the common case is the specs are implicit in code or in developers’ heads “scattered” about in inconsistent way. That’s a strong argument for explicit specs.

                                            “Level 3 is all about how modular are the interactions between the components of your software”

                                            They’re going to tell you OOP or FP plus dynamic languages solve that. It’s why Smalltalk and LISP systems are so malleable. They also supported live debugging and updates. Many long-running systems with acceptable level of failures achieved with English specs plus memory-safe, dynamic, maintainable code. Not a formal spec in sight. Lots of half-assed versions of aforementioned concept in mainstream languages with bolted-on abstractions, middleware, clusters, and so on. You might want to think on this more to determine how to describe why you’re solution is superior to that for practical reasons. If it even is given that static, verified code vs safe, easy-to-modify, dynamic code being better in general case for “acceptable correctness” is ongoing debate among developers in CompSci, industry, etc.

                                            “At Level 2, this was totally fine, since freed memory in DOS was valid until the next malloc, and so the program worked. At Level 3, this was a defect, “

                                            Another reason I don’t like the levels for these subjects is we already have the SDLC models and other stuff like that which has a way to specify requirements, design, and code with traceability. There’s piles of methodologies and tools for doing this. What this section says is that, ignoring best practice, a developer only considers the code rather than the hardware or OS considerations that will be introduced in “requirements” or “design” documents saying the code has to fit within that model. Then, you illustrate the problem that ignoring everything but code creates. I know there’s plenty of developers out there doing it but it doesn’t mean we need new levels or other models for the basics. Just illustrate with good examples like yours why we have those basic concepts there for software development lifecycle. Note that I’m not pushing order of Waterfall so much as the categories of development.

                                            “The HTTP “referer” is forever misspelled, and that SimCity special-casing code is still there 30 years later.”

                                            You’re treating it all as one thing. The SimCity software might have been cheap to the people who built it. The company still makes games. So, I’m guessing SimCity made them a lot of money, too. That software was totally successful as a product at its goal “for that group at that point in time.” It’s true the software might become a problem for another group down the line needing it to do a different thing due to technical concerns in the code. However, it was at acceptable quality and successful before with it not being in another context later. What you’re seeing here is the effect of economic/human factors dominating technical ones when deciding if specific methods are good methods (esp good enough).

                                            We should always consider such things in our exhortations because those listening in FOSS or industry sure as hell will. Many’s model of operation means they don’t need to care about long-term quality. FOSS folks or CompSci students (esp w/ non-reproducible research) might not need to care about even current quality. Others don’t need high runtime quality so much as “it works often enough with it easy to fix.” Others add fast rate of change to that. Others, esp bigger companies, will talk developer costs for specific lines of code on top of that. The goals and metrics vary across groups. If you want to convince them, you need to target your message to those goals and metrics. If you don’t want to relent, you’ll have to target groups that are using your goals and metrics such as low-defect rate, higher flexibility for change, increased predictability (aka lower liability), and so on. That’s a smaller segment.

                                            Like you, I spent a lot of time talking about objective metrics focused on the tech when the groups’ intended goals, failure tolerances, and subjective metrics are where the ability to influence is at. I wasted a lot of time. Now, I focus on teaching them how defects can be found faster with little extra work (esp at interfaces), how software can be changed faster, how it can be diagnosed faster, how it can be fixed faster, and how rollback can work well and/or fast. And with what cost, level of support, etc. This is their language.

                                            “I now have two years experience teaching engineers a better understanding of how to avoid complexity, improve encapsulation, and make code future-proof.”

                                            Glad you’ve been doing that. Keep at it. All I’m doing here is showing you how to improve the message.

                                            Part Two

                                            “ The three levels deal with different views of a program: executions, code, and specifications. Each corresponds to its own kind of reasoning.1 Let’s bring in the math!”

                                            Now this is true and where you’r use of levels shines with good examples like overflow. However, the analysis seems to miss that these are common problems at the code level where simple checks can take care of them. Even your malloc example might be handled in a library implementation where some things the spec tracks were tracked by compile-time analyses or run-time library. Such non-formal methods ares typical of memory-safe languages, embedded systems catching integer overflows in code, watchdog timers to dodge termination proofs, and so on. The overhead they have is acceptable to many or maybe most given dominance of such methods in industry and FOSS. Others who want less overhead or more determinism might like the techniques your bring to the table for Level 3. I already refuted separating them too much with examples such as Design-by-Contract that can do all of it at once with lots of automation (eg property-based testing or fuzzing plus runtime checks). I agree prior work shows many tools work best focusing on just one level, though.

                                            I think you prefer clean, isolated, and static ways of analyzing or looking at system correctness when real world is just messier in terms of solutions available for one or multiple levels. It’s a lot messier. Embrace the mess! Like a consultant, factor it into your analyses so your recommendations show the various options at each level for different classes of problem with the trade-offs they take: time/money saved now or later via less debugging or fixes earlier in SDLC, extra time/money for specific verifications (these two biggest where development pace is biggest), compile-time vs runtime analysis, needs runtime checks or not at what cost, and support in their tooling of choice. It’s more work for people like us to do this. However, as some see value and more uptake happens, network effects might kick in where they do some of the evangelizing and tool work for us. That’s on top of potential niche market for specific solutions that work well like we see in DO-178B/C for static analyzers and so on.

                                            1. 11

                                              You should make a blog or something and keep all the longer form writing there. Then you could reference it easily on forums.

                                              1. 11

                                                I’m actually closer to having a blog than before. It’s probably going to be up this year after a decade of procrastination on that.

                                                1. 2

                                                  That would be a blog, where I would subscribe before you even publish something. Please do. :)

                                                  1. 1

                                                    What has been your primary hindrance, if you don’t mind my asking? There are (and have been) many free blog hosting platforms, that make it quite trivial to start writing. Also, if you desired to host your own, services like github pages and netlify offer very compelling free tiers.

                                                    1. 1

                                                      There are (and have been) many free blog hosting platforms

                                                      Is there a good one?

                                                      1. 1

                                                        I dont mind you asking but rather not say. You could just say my background, current circumstances, and procrastination habit all added up to hold off since I already had places to share ideas. On that latter point, I also found it easier to help others by going where crowds were already at than trying to do lots of self-promotion. I was drafting the heavy hitters instead of building a turbo-charged, 18 wheeler of my own. Whether good or bad idea who knows.

                                                        I am switching gears on quite a few things this year. Gonna try a piece or just a few at a time to avoid overwhelming myself.

                                                  2. 3

                                                    Hi, OP here.

                                                    Thanks for the long response. The typography of this comments section is really not meant for text of this length. As such, I’m having a lot of trouble reading this, for multiple reasons. I can’t tell how you disagree with me. In fact, I can’t even figure out if you disagree with me.

                                                    1. 6

                                                      Okay, I’ve read it through again. Here’s my attempt at a summary of what you wrote:

                                                      • Automata are a good way of writing specifications. Levels 1 and 2 have analogues in the automata-based style of specification.
                                                      • Something about me saying that correctness has many definitions (which is only kind of true)
                                                      • Changes to code are informed by the design
                                                      • Partial specifications may be embedded in the code via assertions
                                                      • Something about OOP. Who is the “they” that’s telling me it’s going to solve modularity? Is this responding to something I said? (Sidenote: Objects are very misunderstood and underrated among the intelligentsia. I highly recommend reading William Cook.) Also, something about “my solution.” I don’t know what “my solution” is supposed to be; I’m just providing a nomenclature.
                                                      • Something about how developers hopefully don’t think about nothing but the code when programming.
                                                      • For some kinds of software, most of the value to the producer occurs in a short time horizon.
                                                      • Different kinds of software have different value/time curves.
                                                      • Bug catching tools exist.
                                                      • The real world is messy.

                                                      None of this seems at all opposed to anything I wrote. Am I missing a key point?

                                                      There is one minor point of difference though: I’m not sure that Level 3 has an analogue in some variations of automata-based specification. This is because an automata-based specification is a very whole-system view of the world. There’s not exactly a notion of “This implementation automaton is a correct refinement of that specification automaton, but for the wrong reasons” unless you have a hierarchical construction of both. Hence, if you try to think of your system as implementing some kind of transition system, you may not realize if you have poorly-defined component boundaries.

                                                      I expect this phenomenon is related to why LTL-based program synthesis performs so poorly.

                                                      1. 4

                                                        Sidenote: Objects are very misunderstood and underrated among the intelligentsia. I highly recommend reading William Cook.

                                                        Sidenote to your sidenote: while we’ve mostly accepted that there’s several divergent strands of Functional Programming (ML, Haskell, Lisp, APL), we haven’t really done the same with OOP. That’s why people commonly conflate OOP with either Java ObjectFactoryFactory or say “that’s really not OOP, look at Smalltalk.” While Java is a Frankenstein of different strands, Smalltalk isn’t the only one: I’d argue Simula/Eiffel had a very different take on OOP, and to my understanding so did CLU. But a lot of CLU’s ideas have gone mainstream and Eiffel’s ideas have gone niche, so we’re left with C++ and Smalltalk as the main ‘types’ of OOP.

                                                        1. 5

                                                          There are old programming concepts called objects, but they don’t necessarily correspond to meaningful constructs. Cook and Cardelli did discover something that does capture what’s unique about several strands of OOP, and also showed why CLU’s ideas are actually something else. I consider this a strong reason to consider their work to be the defining work on what is an “object.”

                                                          This is largely summarized in this essay, one of my all-time favorites: http://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf

                                                          1. 1

                                                            By the numbers, I’d consider it whatever C++, Java, and C# did since they had the most OOP programmers in industry. Then, there’s the scripting languages adding their take. So, my hypothesis about mainstream definition is that what most people think OOP is will be in one of those camps or a blend of them.

                                                        2. 3

                                                          I’ll take the main comment paragraph by paragraph showing what I did for context:

                                                          1. I disagreed with your three models for correctness in favor of one that says “Do states, transitions, and outputs happen as intended on specific inputs?” All yours fit into that. There’s also formal models for it deployed in high-assurance CompSci and industry.

                                                          2. I agreed your levels represent different vantage points people might use. I also agreed some tools work best focused only on them.

                                                          3. I disagreed you need these to explain why code alone isn’t sufficient. The standard software, development lifecycle (even Waterfall) shows places where problems show up. Lots of articles online to pull examples from with your DOS thing being one. Best to just use what’s widely-used and well-understood for that.

                                                          4. I disagreed that stuff in the three levels stayed separate during development. I gave specific examples of where stuff at one was solved at another without mentally leaving that other level. I also showed methods that do multiple ones simultaneously with strong, code focus.

                                                          5. I disagreed that strong correctness of software (a) mattered for many groups or (b) mattered in long term. You have to examine the priorities of people behind products or projects to know if it matters or how much. Then, tie your recommendations to their priorities instead of yours. Ours if it’s software quality improving. :)

                                                          6. I disagreed that strong, logical methods were needed for long-term when specific examples from Smalltalk/LISP to industry’s knock-offs regularly produce and maintain software that meet organization’s goals. Since they’re status quo, you must show both those kind of solutions and yours side-by-side arguing why your costlier or stranger (to them) methods are better.

                                                          7. I agreed some would have your view in niche segments of FOSS and industry that are willing to sacrifice some upfront-cost and time-to-market to increase quality for higher levels of (insert good qualities here). You will do best marketing to them again tying the message to their interests and metrics. I gave DO-178B (now DO-178C) market as example where they invest in better tooling or buy better-made software due to regulations. Ada/SPARK, Astree Analyzer, formal methods, and so on have more uptake in safety-critical embedded than anywhere else except maybe smartcards.

                                                          So there’s a summary that will hopefully be more clear by itself or if you read the original post with it in mind.

                                                          1. 5

                                                            Thanks. That I can read.

                                                            First, please tell me where I say that strong correctness is important or that we should be using heavyweight formal methods in today’s industry. I don’t believe either of those statements except in a small number of high-assurance cases, but you’re not the first to think that I do. Also, what is the “your solution” that you speak of? Points 5-7 seem to all be arguing against something I don’t believe. (As an aside, where are you getting your claims about tool adoption? My understanding from talking to their competitors is that Astree has no market share.)

                                                            Point 4: Obviously, you need to think about code (level 3), write code (level 2), and run code (level 1) concurrently with each other. Thinking about these three levels of abstraction absolutely do happen concurrently. What should set off alarm bells is writing code that special-cases a certain input (level 1 influencing level 2), and shaping a design around a poorly-thought-out implementation (level 2 influencing level 3). Obviously, your value/time curve will determine when exceptions are worth making.

                                                            Point 3: Ummm…..I don’t know what to say. Obviously, there are a lot of people with no training in formal methods who are nonetheless very good software designers. I do believe that the concepts of PL and formal methods, in various incarnations, whether logic-based or automata-based, are the correct way to think about many software design concepts that were previously only vaguely-defined. These all provide a notion there is a layer of reasoning which shapes the code, but is invisible when looking only at the code. This is what I call Level 3, or “the hidden layer of logic,” and closely related to what Don Batory calls “Dark Knowledge.” (Someone on HNH just linked me to a Peter Naur article which might discuss the same concept.) Probably the clearest example of this concept is ghost code, which many static analysis tools rely on. I also believe that understanding the hidden layer of logic is a shortcut to the higher levels of software design skill, as I’m proving in my coaching practice.

                                                            1. 3

                                                              I could’ve misread the intent of your post. I thought you were promoting that people consider, maybe adopt, specific ideas. If you’re just brainstorming, then you could read me as liking or shooting down some of the brainstormed ideas. It all depends on what your goal is. I’ll answer some of this.

                                                              First, one thing about your post that looks like an evangelistic push of specific solutions like applying your concepts of Level 1-3 or software engineering practices in general. Here’s some examples of why I thought that:

                                                              “No, our goal is to be able to continue to deliver working software far into the future.”

                                                              “People sometimes tell me how software is easy and you can just… Hogwash… This is why it’s important to get programs right at Level 3, even if it passes all your testing and meets external requirements.”

                                                              “This is why it’s important to make your APIs conform as strictly to the spec as possible”

                                                              “the most important parts of our craft deal not with code, but with the logic underneath. When you learn to see the reasoning behind your system as plainly as the code, then you have achieved software enlightenment.”

                                                              “always think of the components of your… So much becomes clearer when you do, for logic is the language of software design.”

                                                              Your replies to me were talking like it’s idle categorization or brainstorming but those quotes sound like pushing specific practices to some audience of programmers to solve their problems. I assume industrial and FOSS programmers as default audience since they account for most working, practical software out there. If you’re (a) talking to real-world programmers and (b) pushing specific advice, then I responded to that: in terms of what their requirements are; what methods they use (eg design review, OOP, patches, runtime checks); told you to compare what you were pushing to good versions of what they did; and, if your methods still sound better for them, then modify your pushes to argue stuff more meaningful to them, esp project managers or FOSS volunteers. That what I meant by your “solutions” or “recommendations” with my counterpoints on suggested modifications.

                                                              I should’ve added that this quote that was 100% in agreement with my recommendations:

                                                              “I always recommend trying to think in pure concepts, and then translate that into the programming language, in the same way that database designers write ER diagrams before translating them into tables. So whether you’re discussing coupling or security, always think of the components of your software in terms of the interface, its assumptions and guarantees”

                                                              Back to other points.

                                                              “Obviously, you need to think about code (level 3), write code (level 2), and run code (level 1) concurrently with each other. “

                                                              I’m glad you intended for them to know that. It would be obvious if you’re audience already knows benefits of level 1-3 activities but then you wouldn’t have to explain the basics. Problem was your audience (as I perceived them!) was composed of people you were convincing to use more than coding and a debugger. Given Design-by-Contract reactions, I know it’s not going to be obvious for many of them that logic and code can work together in a way that’s also checked at runtime. Maybe it should be in a similar, future write-up if you do them that they can interleave or that some methods do several at once.

                                                              “These all provide a notion there is a layer of reasoning which shapes the code, but is invisible when looking only at the code.”

                                                              It’s a neat way of describing the other factors. The industrial developers do have methods for that, though, that you might leverage to help them understand. Big attention goes to modeling languages with UML’s probably being most widespread. In safety-critical, Esterel SCADE is good example. Anyway, the industrial side has modeling techniques and diagrams for various parts of the lifecycle that show relationships, constraints, and so on. Adoption varies considerably just like with strong design in general. Many will have seen that stuff, though. Starting with something like it might increase how fast they understand what you’re saying. You can then show how formal logic has benefits of simplicity, consistency, and tooling. And I tend to especially highlight the automated analyses that can happen since resource-constrained groups love solutions that involve a button push. :)

                                                              “I also believe that understanding the hidden layer of logic is a shortcut to the higher levels of software design skill, as I’m proving in my coaching practice.”

                                                              Definitely! In that case, you also have people who are trying to improve rather than randomly reading your blog. It’s a different audience willing to invest time into it. It’s definitely beneficial to teach them about the implicit stuff so they see just what kind of complexity they’re dealing with. I fully support that. An example I give to people learning formal verification is this report doing a landing system in Event-B. Specifically, I tell them to note how the small number of requirement and design details balloons into all the stuff in section C onward. I tell them, “Although some is due to the logic, much of that is just you explicitly seeing the huge pile of details you have to keep track of throughout your system to make it correct in all situations. It’s always there: you just don’t see it without the formal specs and explicit over implicit modeling.” That always gets a reaction of some kind. Heck, for me, it made me want to reconsider both not using formal methods (what am I ignoring?) and using formal methods (wow, that’s a lot of work!). :)

                                                              1. 3

                                                                Oh boy, this is getting unwieldy. Replying in separate comments.

                                                                1. 3

                                                                  I view this post as saying “Here’s a good way to think about software.” This does make it easier to come up with a number of design claims, such as “conforming strictly to an API reduces the complexity of future clients.” Obviously, everything has tradeoffs, and needs to be tailored to your situation. For the better engineers I’ve taught, the reaction to a good fraction of this material is “It clarified something I already had a vague intuition for,” and that is also the aim of this post.

                                                                  This lens does lead to a number of recommendations. In the follow-up post, currently #3 on Hacker News, I give a counterintuitive example of reducing coupling. This is, of course, only useful to those who desire to reduce coupling in their code.

                                                                  1. 3

                                                                    I’m compressing most of the second half of your post into “For people already familiar with certain design-capture methodologies, you can help explain your ideas by comparing them to what they’re familiar with.” Although I once came close to taking an internship with the SEI, I can’t say I’ve had the (mis)fortune of working anywhere that used one of these methodologies (startups gotta iterate, yo), and I’ve read 0 books on UML or any of them.

                                                                    So, if I’m going to go this route, do you have anything in mind that, say, more than 10% of Hacker News readers would likely be familiar with? Otherwise, referencing it won’t fill the desired purpose.

                                                                    Thanks for sharing the Event-B report, BTW.

                                                        1. 5

                                                          I never understood why people emphasize keywords.

                                                          Screenshot of my vi color scheme: I highlight comments and literals. Comments must be obviously distinguished from code, especially if there is code in a comment. Likewise, string literals with code in them must be obviously different. Other literals like magic numbers are an anti pattern and should be spotted easily.

                                                          I like the idea of semantic colors. We could also highlight local variables different than global variables or fields, for example.

                                                          1. 6

                                                            I like the distinction of “tool” and “place”. It feels like a useful mental concept.

                                                            I can see a relation to the economics of software development. Companies want their products to be places, so they capture a slice of your attention and deepen awareness of their brand. Nvidia is an example: It is just a single part of the computer hardware, yet it comes with a GUI tool and often demands attention.

                                                            Free software can afford to become a tool. Imagine if you boot a Linux desktop and various involved projects show you a series of splash screens first: This desktop experience brought to you by systemd, dbus, dnsmasq, CUPS, NetworkManager, PulseAudio, Gnome, Mozilla Firefox, Gnome Keyring Daemon, gvfsd, and bash.

                                                            1. 4

                                                              It’s an interesting dichotomy, and drastically more interesting than the same old “minimalism” tirade I was expecting from reading the title of the post.

                                                            1. 2

                                                              I like to hear anecdotes about sexism to learn what people actually consider sexism. In my opinion it is used overly broad. Not all the single stories in the article are, but the article does not claim they are, so that is fine.

                                                              1. 3

                                                                Good read, very stirring, rather toxic.

                                                                EDIT: More than just toxic.

                                                                1. 3

                                                                  What is toxic about it?

                                                                  1. 12

                                                                    (To make clear: I think it was well written. I myself am a huge fan of writing in the second person. That said, I think this sort of thing is ultimately counterproductive as we find our way.)

                                                                    I think that articles like this, spreading agreeable fiction based on small kernels of truth in the name of cautionary tales, serve to further polarize workers and worsen relations and in general further retard the evolution and equalization of peoples in tech.

                                                                    I don’t particularly care for the portrayal of women (or men, for that matter) in this, I don’t particularly care for the normalization of backchanneling and backbiting and subterfuge, I don’t particularly care for the caricatures of different pathologies of our industry. I particularly do not care for the implication of common issues in engineering (student loans, unfriendly coworkers, the path from contributor to manager, office politics) as somehow being uniquely gendered.

                                                                    Further, I think that this article kinda reinforces the popular simple assumptions about our industry: software startups have infinite funding, that software orgs start at 50 people and go from there, that titles matter at all, that having sufficient diversity can be approximated by a process like farming pokemons, that successful businesses have to have a million daily users or they aren’t worth anything, that startups are somehow competing against each other instead of external apathy and internal incompetence, that women of color in tech must speak weird English and are necessarily steeped in progressive politics–those and a score of odd little throwaway lines here and there that serve to paint the One True Picture of the Valley and attempt to deconstruct/critique it and in so doing erase the lived experience of thousands of workers in healthier (or sicker!) companies in the Valley, in places like Europe or Southeast Asia or Latin America or Africa or even the American Midwest, in smaller companies that are just starting out or in software units of banks or insurance companies or oil & gas or other non-startup businesses.

                                                                    My concern is that things like this further cement a simplified and inaccurate view of ourselves in our own mythology. I say that this is toxic because the things we’re doing and going to do to exorcise this specter are probably going to be also terrible and costly to workers.

                                                                    1. 8

                                                                      If you’ve read the piece and haven’t seen any toxicity, then we must be living in two parallel universes that have touched each other right here in the comments section of this post. To me the post seems like the output from an ML exercise, where they’ve input samples of contempt, snark and narcissism to a text generation system.

                                                                      1. 2

                                                                        What’s not? The author follows up a glowing paragraph about how great X is as a boss with “so why don’t I have your job” It’s like she’s not even self-aware.

                                                                    1. 3

                                                                      By this logic, nothing ever would have had to have been invented. At least if you carry it through to the end, it the way stated, not the way it was intended.

                                                                      1. 12

                                                                        This particular line of refutation and critique is probably the most common refrain I hear when this sort of article or sentiment is brought up. It’s also wrong–note the “maybe” in the post title.

                                                                        Let’s not flatter ourselves: yet another “HTML DOM but with better syntax”, “jQuery but with cleaner syntax”, “HTML DOM but with databinding”, “Angular but with smarter data-binding this time”, “Angular but with version-breaking and typescript”, “HTML DOM but with better diffing”, “React but artisinal”, “React but artisinal but also angular”, is hardly invention in the sense you probably mean it.

                                                                        1. 10

                                                                          Our use of common tools has forced us into fixing the things that bother us about them, instead of developing truly new ways of solving our problems. The common solutions don’t make us think, and destroy our ability to think outside the box.

                                                                          What would software be like if the free software movement never happened? Instead of “buying” loose fitting uniforms, I bet we’d all be excellent fabric makers, and tailors of original clothes that fit just right.

                                                                          1. 3

                                                                            And worse, now that we have too many tools to ever fix any of them, there is actually an entire generation of “developers” who simply have no capacity to write quality, durable code.

                                                                            What would software be like if the free software movement never happened? Instead of “buying” loose fitting uniforms, I bet we’d all be excellent fabric makers, and tailors of original clothes that fit just right.

                                                                            Some of us anyway.

                                                                            But unlike good clothing, most people cannot “see” code, so very few people appraise it’s quality – A lot of people actually think they’re paying for code, that somehow more code is more valuable.


                                                                            I actually welcome legislation that puts programmers and business on the hook legally (with proper teeth, like the GDPR promises to have) for their work, because I would like to always do good work, but I know I can’t do that while being competitive.

                                                                            1. 3

                                                                              And worse, now that we have too many tools to ever fix any of them, there is actually an entire generation of “developers” who simply have no capacity to write quality, durable code.

                                                                              This isn’t any different from how it used to be. For as long as we’ve had computers we’ve had people worried about developers writing bad, brittle code. The usual solution? High quality, well tested components we know are good, so that developers have fewer places to screw up.

                                                                              Not having to roll our own crypto is, on the whole, a good thing.

                                                                              1. 1

                                                                                And worse, now that we have too many tools to ever fix any of them, there is actually an entire generation of “developers” who simply have no capacity to write quality, durable code.

                                                                                You sound old and grumpy, it’s gonna be alright. I’ve seen old people and young generation alike write shitty (and good) code. At least by reusing existing components people might have an easier time to build systems or complex program relying on widely used and tested pattern.

                                                                                I actually welcome legislation that puts programmers and business on the hook legally (with proper teeth, like the GDPR promises to have) for their work

                                                                                How would such legislation going to encourage individuals from taking risk and rewrite their own components instead of reusing existing more tested and widely used ones?

                                                                                because I would like to always do good work, but I know I can’t do that while being competitive.

                                                                                If you need legislation to be able to market your good work, “maybe it’s you”.

                                                                                1. 1

                                                                                  That probably results in more money for insurance companies but not better software.

                                                                                  1. 4

                                                                                    I’m confident if we are planning more, writing better specs, coding more carefully, focusing on reducing code size, and doing more user-testing, then software will be better.

                                                                                    And there may always be a gap: As we learn where it is, we can probably refine those fines…

                                                                                2. 3

                                                                                  What if I don’t want to be a tailor, though? I want to be a welder, but I can’t, because I spend all my time tailoring!

                                                                                  Component programming has, historically, been the hoped-for solution to the software crisis. Parnas made that a central advantage of his work on modules, high-correctness software is predicated on using verified components, etc etc. It might not have lived to it’s standards, but it’s a lot better than where we used to be.

                                                                                  Consider the problems you want to think about, and then consider how hard it would be to solve then if you had to write your own compiler.

                                                                                  1. 2

                                                                                    It might not have lived to it’s standards, but it’s a lot better than where we used to be.

                                                                                    Hmm. Can you elaborate on why it’s better? I feel that in a lot of ways it’s worse!

                                                                                    Consider the problems you want to think about, and then consider how hard it would be to solve then if you had to write your own compiler.

                                                                                    We’ve trained ourselves to make a base set of assumptions about what a computer is, and has to be. A C compiler is just a commodity tool, these days. But, obviously, people have invented their own languages, and their own compilers.

                                                                                    But, consider a very basic computer, and forth. Forth is simple enough that you can write very big functioning systems, in a small amount of code. Consider the VPRI Steps project that’s been attempting to build an entire computing system in a fraction of the code modern systems take. What would things look like, then?

                                                                                    1. 1

                                                                                      Hmm. Can you elaborate on why it’s better? I feel that in a lot of ways it’s worse!

                                                                                      The most popular Python time library, Arrow, is 2000+ lines of core code and another 2000+ lines of localization code. If you tried to roll your own timezone library you absolutely will make mistakes that will bite you down the line, but Arrow is battle-tested and, to everybody’s knowledge, correct.

                                                                                      Consider the VPRI Steps project that’s been attempting to build an entire computing system in a fraction of the code modern systems take. What would things look like, then?

                                                                                      That report lists 17 personnel and was funded by a 5 million dollar grant. I don’t have that kind of resources.

                                                                                      1. 2

                                                                                        When was the last time you wrote code that required accurate timezones (UTC is almost always OK for what I do)? And, to be honest, 4,000 lines doesn’t seem like enough to be exhaustive here…

                                                                                        But, I don’t disagree that there are exceptional things that we should all share.

                                                                                        Just that, in the current state of things, relying on an external library responsibly, requires a deep understanding of it to use it properly. You can’t rely on documentation—it’s incomplete. You can’t rely on its tests—they don’t exhaustively prove it works. You can’t trust the names of functions—they lie, or at least have ambiguity. And, more often than not, you care about only a small percentage of the functionality, anyway.

                                                                                        That report lists 17 personnel and was funded by a 5 million dollar grant. I don’t have that kind of resources.

                                                                                        The point wasn’t “we should all go define 2,000 line systems that do everything.” It was, apparantly poorly, attempting to point out that there may have been another way to “compute,” that would have made rolling everything yourself more appropriate. I think it’d be pretty hard to go back to a place where that’s true—the market has spoken, and it’s OK with bloated, completely broken software that forces them to upgrade their computers every 3 years just to share photos in a web browser and send plain text email to their familes.

                                                                                        1. 1

                                                                                          When was the last time you wrote code that required accurate timezones (UTC is almost always OK for what I do)? And, to be honest, 4,000 lines doesn’t seem like enough to be exhaustive here…

                                                                                          Maybe not timezones, but definitely https, authentication libraries, web scrapers, crypto, unit testing frameworks, standard library stuff…

                                                                                          I think it’d be pretty hard to go back to a place where that’s true—the market has spoken, and it’s OK with bloated, completely broken software that forces them to upgrade their computers every 3 years just to share photos in a web browser and send plain text email to their familes.

                                                                                          Right, but I’m asking historically if this was caused by the rise of component-based programming, as opposed to just being correlated with it, or even if it happened despite it! It’s really hard to prove a counterfactual.

                                                                                3. 0

                                                                                  So… do you not believe in evolution, then?

                                                                                  1. 1

                                                                                    Thb, when I read “maybe it’s you”, I understand this as a stylistic device, and don’t read it literally. And I guess it depends on the situation, I totally agree with you than 99% of the “new” stuff invented for the web have no need to be created (which one could generalized to the whole economy if one would want to). I just want to say that there are situations where being open to new ideas wouldn’t be bad, because sometimes bad ideas are kept just because of a network effect.

                                                                                    And if we’re already talking about what exactly was written (I should have clarified this, so it’s my fault), i was talking about the title. I know the text says something different, that’s why I said “not the way it was intended”.

                                                                                    1. 2

                                                                                      Author here. Thank you for your feedback! You’re right: the title may be construed as an accusative. For the record: it is not. I’ll take better care with such things going forward!

                                                                                1. 2

                                                                                  not really the point of the article, but the one thing i’ve never found a good tool for is deploying my application. its something that i absolutely don’t want to build, but keep on reinventing for every project i work on.

                                                                                  1. 2

                                                                                    I’d like to solve this problem, but I have very strong opinions about how it should be done.

                                                                                    1. 2

                                                                                      Write down the problem/requirements in a blog post and submit it here. I love to read about unsolved problems. Maybe someone even knows a solution.

                                                                                      1. 1

                                                                                        Having written these things for a few startups (Instacart, Airbnb) I can say, it’s tough to make a clean API for it and that makes generality, reusability and consistency (all requirements for a “tool” instead of a “solution”) very difficult.

                                                                                      1. 4

                                                                                        it’s much easier to add features slowly based on need than it is to determine all of the useful behaviors before starting development, and it’s much easier to test a small number of base features before building on top of them than to test a large complex language all at once.

                                                                                        I don’t agree that adding features one by one improves anything about the language design. One huge challenge is that each feature interacts with most of the other ones and you need to consider each combination with the full set anyways.

                                                                                        Let us assume you start with a small number of base features A, B, and C. You determine all useful behaviors, test the features and their interactions, you tune them to a sweet spot of language design. Now you add feature D. Assume it interacts with B and C. To find the new sweet spot, you have to tune B and C. This means all the time spent into tuning for the sweet spot A-B-C was wasted once you introduced D. Ok, maybe not all time, because you probably learned something on the way. Still, the overall process does not look efficient to me.

                                                                                        The thoughts are the reason why I think it is a bad idea that the Go designers avoid adding Generics. I believe it is unavoidable to add them at some point and then a lot of tuning is invalidated.

                                                                                        On the other hand, it seems unavoidable. I do not know any language, which did not grow further after 1.0. Simple languages (Lisp, SQL, SML) get extended when they get in contact with the real world.

                                                                                        1. 1

                                                                                          In my experience having a slightly larger language, and trimming down features after you have gained some insights into what has worked and what hasn’t is one of the best approaches to adopt.

                                                                                          This of course requires the ability to carefully manage deprecation and migration, and not having made crazy promises about compatibility like some languages of the 90ies did. Looking at many more recent languages, this has substantially changed though, with the stronger focus on adoption over existing users.

                                                                                        1. 5

                                                                                          It seems like all of these issues could be resolved if the processor’s microcode (or the OS’s kernel) had explicit control over cache invalidation. Any microarchitecture researchers here that can comment on the existence of or any research on explicit cache control?

                                                                                          I’ve always wondered why the cache can’t be controlled explicitly, even in user mode code. It seems like a lot of performance is probably left on the table by having it automated. It’s like how using garbage collection (GC) can simplify code, but at the cost of throwing away information about memory usage and then having the GC try to guess that information in real time.

                                                                                          1. 7

                                                                                            My understanding (from David May’s lectures and asking lots of questions) is that memory caches are far too much on the hot path (of everything, all the time) to be controlled by microcode.

                                                                                            I remember he mentioned some processor (research? not mainstream, I think) being made with a mechanism wherein you could set a constant in a special register that would be added to the low bits of every physical address before it hit the cache system, so that you could have some user level control of which addresses alias each other. But I got the impression from that conversation that nobody had ever really seriously considered putting anything more than one adder’s worth of gate delays for user control of a cache system because it’s so important to performance and nobody could think of amazingly useful ways that running code could customise cache behaviour that can’t already be achieved well enough anyway using CPU features like prefetches or by cleverly changing the layout of your data structures.

                                                                                            1. 1

                                                                                              I could image separate load/store instructions for uncached memory access, for example. ARM already has exclusive load/store in addition to normal ones.

                                                                                            2. 1

                                                                                              The architecture of snooping caches is ridiculously baroque.

                                                                                              1. 1

                                                                                                Why is that? Afaik the only alternative is directory-based which has higher latency but scales better.

                                                                                                1. 1

                                                                                                  You could design for non-coherent caches with software controlled cache line sync. Most data is not shared but the overhead for shared data is imposed on all transactions.

                                                                                                  1. 1

                                                                                                    Software control is probably even slower. One problem is that the compiler has to insert the management instructions without dynamic information like a cache, which means a lot of unnecessary cache flushing.

                                                                                                    If you go for software controlled, I would rather bet on fully software managed scratch pad memory. There seems to be no consensus how to use that well though.

                                                                                                    1. 1

                                                                                                      Very few memory locations are shared - probably fewer should be shared. Snooping caches are designed to compensate for software with no structure to shared variables.

                                                                                              2. 1

                                                                                                Looking at the Spectre proof of concept code, it looks like there already actually is a way for user mode code to explicitly invalidate a cache line, and it’s used in the attack.

                                                                                                Perhaps a microcode patch could use this feature of the cache to invalidate any cache lines loaded by speculative execution?

                                                                                              1. 1

                                                                                                Could you please clarify if you are considering just plain Make or GNU Make extensions in your document? If GNU Make is considered, dependency on a directory is certainly possible with the | dependency.

                                                                                                Also, could you please add Mk (from Plan9) to the list? It is supposed to be a better Make.

                                                                                                1. 1

                                                                                                  Mk is Make reduced to its essentials. That removes a lot of cruft but also some useful features. People probably debate which things are cruft or useful, though.

                                                                                                1. 2

                                                                                                  Someone needs to trash cargo, because it seems like a lot of people really like it. However, it’s a build system, so it must be bad.

                                                                                                  1. 1

                                                                                                    I don’t think they solved the Debian integration yet? Building without network access, using shared libraries, etc.

                                                                                                  1. 30

                                                                                                    All of them:

                                                                                                    The fact that they exist at all. The build spec should be part of the language, so you get a real programming language and anyone with a compiler can build any library.

                                                                                                    All of them:

                                                                                                    The fact that they waste so much effort on incremental builds when the compilers should really be so fast that you don’t need them. You should never have to make clean because it miscompiled, and the easiest way to achieve that is to build everything every time. But our compilers are way too slow for that.

                                                                                                    Virtually all of them:

                                                                                                    The build systems that do incremental builds almost universally get them wrong.

                                                                                                    If I start on branch A, check out branch B, then switch back to branch A, none of my files have changed, so none of them should be rebuilt. Most build systems look at file modified times and rebuild half the codebase at this point.

                                                                                                    Codebases easily fit in RAM and we have hash functions that can saturate memory bandwidth, just hash everything and use that figure out what needs rebuilding. Hash all the headers and source files, all the command line arguments, compiler binaries, everything. It takes less than 1 second.

                                                                                                    Virtually all of them:

                                                                                                    Making me write a build spec in something that isn’t a normal good programming language. The build logic for my game looks like this:

                                                                                                    if we're on Windows, build the server and all the libraries it needs
                                                                                                    if we're on OpenBSD, don't build anything else
                                                                                                    build the game and all the libraries it needs
                                                                                                    if this is a release build, exit
                                                                                                    build experimental binaries and the asset compiler
                                                                                                    if this PC has the release signing key, build the sign tool

                                                                                                    with debug/asan/optdebug/release builds all going in separate folders. Most build systems need insane contortions to express something like that, if they can do it at all,

                                                                                                    My build system is a Lua script that outputs a Makefile (and could easily output a ninja/vcxproj/etc). The control flow looks exactly like what I just described.

                                                                                                    1. 15

                                                                                                      The fact that they exist at all. The build spec should be part of the language, so you get a real programming language and anyone with a compiler can build any library.

                                                                                                      I disagree. Making the build system part of the language takes away too much flexibility. Consider the build systems in XCode, plain Makefiles, CMake, MSVC++, etc. Which one is the correct one to standardize on? None of them because they’re all targeting different use cases.

                                                                                                      Keeping the build system separate also decouples it from the language, and allows projects using multiple languages to be built with a single build system. It also allows the build system to be swapped out for a better one.

                                                                                                      Codebases easily fit in RAM …

                                                                                                      Yours might, but many don’t and even if most do now, there’s a very good chance they didn’t when the projects started years and years ago.

                                                                                                      Making me write a build spec in something that isn’t a normal good programming language.

                                                                                                      It depends on what you mean by “normal good programming language”. Scons uses Python, and there’s nothing stopping you from using it. I personally don’t mind the syntax of Makefiles, but it really boils down to personal preference.

                                                                                                      1. 2

                                                                                                        Minor comment is that the codebase doesn’t need to fit into ram for you to hash it. You only need to store the current state of the hash function and can handle files X bytes at a time.

                                                                                                      2. 14

                                                                                                        When I looked at this thread, I promised myself “don’t talk about Nix” but here I am, talking about Nix.

                                                                                                        Nix puts no effort in to incremental builds. In fact, it doesn’t support them at all! Nix uses the hashing mechanism you described and a not terrible language to describe build steps.

                                                                                                        1. 11

                                                                                                          The build spec should be part of the language, so you get a real programming language and anyone with a compiler can build any library.

                                                                                                          I’m not sure if I would agree with this. Wouldn’t it just make compilers more complex, bigger and error prone (“anti-unix”, if one may)? I mean, in some cases I do appriciate it, like with go’s model of go build, go get, go fmt, … but I wouldn’t mind if I had to use a build system either. My main issue is the apparent nonstandard-ness between for example go’s build system and rust’s via cargo (it might be similar, I haven’t really ever used rust). I would want to be able to expect similar, if not the same structure, for the same commands, but this isn’t necessarily given if every compiler reimplements the same stuff all over again.

                                                                                                          Who knows, maybe you’re right and the actual goal should be create a common compiler system, that interfaces to particular language definitions (isn’t LLVM something like this?), so that one can type compile prog.go, compile prog.c and compile prog.rs and know to expect the same structure. Would certainly make it easier to create new languages…

                                                                                                          1. 2

                                                                                                            I can’t say what the parent meant, but my thought is that a blessed way to lay things out and build should ship with the primary tooling for the language, but should be implemented and designed with extensibility/reusability in mind, so that you can build new tools on top of it.

                                                                                                            The idea that compilation shouldn’t be a special snowflake process for each language is also good. It’s a big problem space, and there may well not be one solution that works for every language (compare javascript to just about anything else out there), but the amount of duplication is staggering.

                                                                                                            1. 1

                                                                                                              Considering how big compilers/stdlibs are already, adding a build system on top would not make that much of a difference.

                                                                                                              The big win is that you can download any piece of software and build it, or download a library and just add it to your codebase. Compare with C/C++ where adding a library is often more difficult than writing the code yourself, because you have to figure out their (often insane) build system and integrate it with your own, or figure it out then ditch it and replace it with yours

                                                                                                            2. 8

                                                                                                              +1 to all of these, but especially the point about the annoyance of having to learn and use another, usually ad-hoc programming language, to define the build system. That’s the thing I dislike the most about things like CMake: anything even mildly complex ends up becoming a disaster of having to deal with the messy, poorly-documented CMake language.

                                                                                                              1. 3

                                                                                                                Incremental build support goes hand in hand with things like caching type information, extremely useful for IDE support.

                                                                                                                I still think we can get way better at speeding up compilation times (even if there’s always the edge cases), but incremental builds are a decent target to making compilation a bit more durable in my opinion.

                                                                                                                Function hashing is also just part of the story, since you have things like inlining in C and languages like Python allow for order-dependent behavior that goes beyond code equality. Though I really think we can do way better on this point.

                                                                                                                A bit ironically, a sort of unified incremental build protocol would let compilers avoid incremental builds and allow for build systems to handle it instead.

                                                                                                                1. 2

                                                                                                                  I have been compiling Chromium a lot lately. That’s 77000 mostly C++ (and a few C) files. I can’t imagine going through all those files and hashing them would be fast. Recompiling everything any time anything changes would probably also be way too slow, even if Clang was fast and didn’t compile three files per second average.

                                                                                                                  1. 4

                                                                                                                    Hashing file contents should be disk-io-bound; a couple of seconds, at most.

                                                                                                                    1. 3

                                                                                                                      You could always do a hybrid approach: do the hash check only for files that have a more-recent modified timestamp.

                                                                                                                    2. 1

                                                                                                                      Do you use xmake or something else? It definitely has a lot of these if cascades.

                                                                                                                      1. 1

                                                                                                                        It’s a plain Lua script that does host detection and converts lines like bin( "asdf", { "obj1", "obj2", ... }, { "lib1", "lib2", ... } ) into make rules.

                                                                                                                      2. 1

                                                                                                                        Codebases easily fit in RAM and we have hash functions that can saturate memory bandwidth, just hash everything and use that figure out what needs rebuilding. Hash all the headers and source files, all the command line arguments, compiler binaries, everything. It takes less than 1 second.

                                                                                                                        Unless your build system is a daemon, it’d have to traverse the entire tree and hash every relevant file on every build. Coming back to a non-trivial codebase after the kernel stopped caching files in your codebase will waste a lot of file reads, which are typically slow on an HDD. Assuming everything is on an SSD is questionable.