Threads for tokenrove

    1. 4

      I hate the typesetting in this book, and I was initially disappointed by its scope, but it’s a great book, most importantly because it’s something I can hand to someone whose eyes would glaze over with any of the usual literature on formal methods, which is a huge win in convincing a team to adopt them.

    2. 3

      Thanks for writing about this; the articles linked are very interesting. I didn’t realize this had a name, but it’s a technique I use frequently with property-based testing. Also, I wouldn’t say oracles aren’t used in PBT: one of my most common strategies for PBT is to compare results against an oracle (e.g. standard library implementation with the same interface as the SUT, or a dumb, naive, easy-to-verify implementation). I think part of the problem is that 90% of the accessible information about PBT is using the same handful of cliché property examples.

    3. 4

      This book also had a profound effect on me early on; even as I continued to recommend it to people, I often wondered if that influence was good or bad. Either way, it’s incredibly compelling. For anyone looking for more, Showstopper by G. Pascal Zachary, the book about Windows NT’s development, is not Great like Soul of a New Machine, but has many similarities and gave me some of the same feeling (plus perhaps, more understanding of how NT turned out as it did).

    4. 2

      I love coding and have since I was very young, but I think loving it can be an impediment to doing it professionally. I often feel like I was suckered by “do what you love” talk, since being a professional programmer has little to do with loving programming, and can often kill one’s love of it. I’m not convinced it’s a good occupation but it seems to be the one I’m stuck with. I often envy people who got to choose a career in adulthood.

      The aspect of people being worse/better I think is more just that I have written a lot of code since I love doing it, and doing a lot of something will tend to make you better, as well as caring about how you do it. (This aspect can also make you worse in some professional settings, too; caring too much about craftsmanship or artistry can lead a person to ignore what’s important for the business.) One of the important revelations in my career was meeting people who didn’t grow up programming, weren’t passionate about it, and were still extremely competent at their job.

    5. 4

      I have been working on a blog post about this topic. The compiler in Warren’s “Logic programming and compiler writing” is about 200 lines IIRC. I think it could be interesting to do more in this vein.

      1. 4

        EDIT: Oops, I think you misunderstood the challenge like a lot of other people in this thread. It’s not about a compiler that’s 100 lines. It’s about reading a 100 line function that shows the structure of an 8,000 line (or 80K line, or 800K line) compiler.

        If you click through to the post I think that’s clear, although I can understand why the title was confusing.

        You can barely write a tokenizer for Python, JavaScript, or C in 100 lines. The “production” tokenizers are 500-2000 lines by themselves.

        100-line compilers are interesting from a pedagogical standpoint, but they don’t pass the test of “Somebody who doesn’t care about compilers built something useful with this.”


        OK awesome, I’m looking forward to it! I wanted to do a blog post about it too, but I likely won’t have time before my trip this summer [1]. To do a good job I would want to look at more existing compilers. I want to hear someone else’s take and see more examples.

        Part of the theme of my blog is “theory vs practice”, and in theory everyone thinks of their compiler as a series of transformations, and that’s how you draw it on the blackboard.

        But in practice, you rarely see the source code reflect this!

        The structure that this compiler had was almost unbelievable until I refactored it. Multiple inheritance was used instead of if statements!!! There were objects that you had to call methods on in a certain order, with hidden state, rather than the state being explicit on the stack.

        [1] http://www.oilshell.org/blog/2018/03/25.html

        1. 2

          ? I am confused. Are you talking about a 100 line top level function of a compiler ? So you are looking for a sufficiently compact yet expressive piece of compiler poetry.

        2. 2

          100-line compilers are interesting from a pedagogical standpoint, but they don’t pass the test of “Somebody who doesn’t care about compilers built something useful with this.”

          Fennel’s compiler is exactly 100 lines, and I’d argue it’s genuinely useful: https://github.com/bakpakin/Fennel/blob/master/fennel.lua#L600

          It doesn’t do as much as most compilers, but that’s kind of the point. It does exactly one thing (bring a new more flexible syntax to an existing runtime without any new semantics) but it does it very well.

          1. 1

            A lot of people are interpreting this question as 100 lines TOTAL, which is what I was referring to. Fennel apears to be more than that.

            But this structure is not what I’m looking for. I’m looking for the stages of the compiler all in one function – see my example. This compile1() function looks like a single stage that takes the AST as input.

            1. 1

              This compile1() function looks like a single stage that takes the AST as input.

              In a lisp, the reader is exposed as its own independent first-class feature and isn’t considered part of the compiler. Many lisps have another representation between what the reader returns and the fully-annotated AST used by the compiler, but Fennel doesn’t.

    6. 8

      If you’re interested in that era, some other fun papers (that aren’t as dense / theory heavy as Girard’s original one introducing linear logic) are:

      • “The Linear Abstract Machine” by Yves Lafont (1988)
      • “Lilac, a functional language based on linear logic” by Ian Mackie (1991)
      • “Computational Interpretation of Linear Logic” by Samson Abramsky (date is a little complicated. The journal pub everyone cites is from 1993, but he was presenting the material in 1990, as publications as early as that or 1991 were referencing unpublished drafts of his, and he did conference versions in 1991/1992)

      The most famous, of course, is Wadler’s “Linear Types Can Change The World” (1990) but IMO it’s a pretty overrated (all the rest I’ve mentioned, while less known, are more interesting…)

    7. 2

      I find this applies everywhere, not just Erlang: always prefer aggressive polling to sleeping in tests. It’s especially common to see this in code that waits for ports to be open/closed, or affects the filesystem. To the extent that it should be a common lint to detect and warn about any sleep in tests, since it’s so often the first remedy people try. If your test wastes a bunch of CPU polling in a loop, who cares? It’s much better than slowing down the test suite by an arbitrary amount.

    8. 1

      Just the other day I was mentioning Inozemtseva and Holmes, “Coverage is Not Strongly Correlated with Test Suite Effectiveness”, whose methodology is perhaps not perfect but I think still makes a great point.

    9. 6

      See gdb’s infcall.c for the gory details. (eval.c has the C repl stuff that leads to this being invoked)

    10. 1

      I honestly haven’t done C/C++ programming in a while, but I also found autotools / the traditional configure/make to be a lot of code generation code smell.

      Isn’t this why we have newer tooks like cmake? I’ve noticed a lot of projects I use today use cmake or some other non-terrible build tool.

      I can see legacy projects keeping their makefiles and autotools, but if you’re green fielding, you should look at all the tools and try not to use ones that are decrepit and old or too new and shiny (unless you’re willing to invest a lot into helping the new and shiny took be better and contribute to it as part of your work). You want relatively new and lot of use and support.

      I’ve found personally, don’t use Maven .. don’t even use SBT, use Gradel for JVM projects. Use cmake instead of autotools. Most modern languages have their build system built in (like cargo for rust or mix for Elixir), but if you’re doing node or electron, you probably want to use yarn instead of npm, and pip is pretty much the standard on Python instead of setuptools. Also don’t use virtualenvs for Python/Ruby; just use Docker containers and let them build out the requirements.txt or the gem bundle.

      I should really do a blog post on build systems …

      1. 4

        I don’t think cmake is a good example here; it generates a lot of Makefile and associated stuff, and the language itself is kind of awful. It mostly benefits from not having to deal with the assumptions autotools was built on, like supporting SysVr3 on m88k or Pyramid.

    11. 4

      Does anyone know why Ada is seeing a bit of a resurgence (at least, among the HN/Lobsters crowd)? I’m quite surprised by it, so I’m wondering if there are any interesting lessons that can be taken from it in terms of what causes languages to become popular.

      Also, what terms I should search for to find out more about Ada’s type system? It seems quite interesting - I’d love to learn more about what tradeoffs it’s making.

      1. 12

        Personally, after shunning Ada when I was younger because it felt cumbersome and ugly, I have seen enough failures where I’ve thought, “gee, those decisions Ada made more sense than I thought, for large projects”. I think some people are experiencing that; at the same time there’s this new wave of systems languages (often with stated goals like safety or being explicit about behavior) which is an opportunity for reflection on systems languages of the past; and SPARK is impressive and is part of the wave of new interest in more aggressive static verification.

        An earlier article posted on lobste.rs had some nice discussion of some interesting parts of Ada’s type system: http://www.electronicdesign.com/embedded-revolution/assessing-ada-language-audio-applications

        Also, the Ada concurrency model (tasks) is really interesting and feels, in retrospect, ahead of its time.

        1. 2

          I’m with you that it’s the new wave of systems languages that helped. The thing they were helping were people like me and pjmpl on HN that were dropping its name [among other old ones] on every one of those threads among others. There have been two, main memes demanding such a response: (a) the idea that Rust is the first, safe, systems language; (b) the idea that low-level, esp OS, software can’t be written in languages with a GC. There’s quite a few counters to the GC part but Burroughs MCP in ALGOL and Ada are main counters to first. To avoid dismissals, I was making sure to drop references such as Barnes Safe and Secure Ada book in every comment hoping people following along would try stuff or at least post other references.

          Many people contributing tidbits about the obscure systems languages on threads in a similar topic that had momentum. The Ada threads might be ripple effects of that.

          1. 3

            Now that I think about it, your posts are probably why I automatically associate Ada with Safe Computing these days.

      2. 7

        I think its part of the general trend of interest in formal methods and correctness (guarantees or evaluation of). We’ve also seen a lot on TLA+ recently, for example.

        1. 10

          I think lobsters, at least, is really swingy. A couple people interested in something can really overrepresent it here. For example, I either found or wrote the majority of the TLA+ articles posted here.

          And things propagate. I researched Eiffel and Design by Contract because of @nickpsecurity’s comments, which lead to me finding and writing a bunch of stuff on contract programming, which might end up interesting other people…

          One focused person can do a lot.

    12. 3

      The interesting thing was most of my searches kept giving me pre-2002 results. Two others that might have been more recent had broken links. I found one more for next week but it’s an algorithm rather than methodology. Makes me wonder if there’s actually little to no formal verification going on for Erlang programs. Given its use case and popularity, that would be pretty shocking. I’m going to do a more thorough search in next few days.

      1. 3

        Disclaimer: I don’t know anything about this at all. My (very rudimentary) understanding of the situation was that more in-depth application of formal methods to Erlang was waiting on the development of session types, which have been yielding fruit for a few years, but perhaps Erlang is a slightly less popular target for this analysis now. I haven’t read the thesis you’ve linked to see how they treat abstract interpretation of message passing; session types might be irrelevant for this.

        I would love for someone actually familiar with the development of these things to comment and set the record straight. (In fact, perhaps I’m just commenting in this case so I can draw out someone better informed having a “someone is wrong on the Internet” reaction.)

    13. 5

      the last two chapters - on automatically proving programs correct, as well as deriving correct programs from definitions - were puzzling. Felt too academic and somewhat out of place in a book teaching a programming language.

      This is interesting, because I had the same experience with ML for the Working Programmer; the last chapter (a theorem prover) seems to come out of nowhere and seems to have little to do with the “working programmer” (although I have more appreciation for it now than when I first encountered this book and thought it a bizarre choice).

      1. 1

        For me it is not overly surprising that a book about a language which was built to make a theorem prover, authored by an author who is doing theorem proving would include a chapter about… theorem proving. In a similar manner how SICP uses Scheme to implement Scheme.

        1. 2

          For sure. But the idea that “theorem proving” was for the “working programmer” seemed bizarre to me in the late 90s, and I think only added to ML’s reputation as an academic language not for serious work. Thankfully the latter idea is well refuted now, and the former is starting to have some currency.

    14. 3

      A common corollary to entity services is the idea of “stateless business process services.” I think this meme dates back to last century with the original introduction of Java EE, with entity beans and session beans. It came back with SOA and again with microservices.

      This is a key point, and one I don’t see made enough around the microservices hype. If people saw the historical connection here, maybe they’d stop thinking these approaches are so hip and novel. Systems have crumbled under these designs before; study history or repeat it. (And, of course, sometimes these designs had successes, too; what were the circumstances?)

      1. 5

        Systems have crumbled under these designs before; study history or repeat it.

        The amount of work expended to avoid thinking critically about your particular context in the face of whatever the current fad is enormous: “A week in the lab can save 15 minutes in the library.”

        1. 2

          My only regret is that I have but one upvote to give for my country.

      2. 2

        This is what happens when what’s hip and new comes from people who dismiss the projects and teams that have experience out of hand because they’re seen as dinosaurs / enterprise wankery / other euphemisms for “old therefore slow and irrelevant”.

    15. 1

      Samek’s book on this (statecharts) is also good. I’m a little sad that there isn’t much written about this.

      1. 6

        This is a pretty standard thing to do once you’ve protected the master branch on a project, which is also a pretty standard thing to do.

        1. 2

          You’re arguing against personal preference here.

        2. 1

          I take it you also don’t believe in continuous integration, et cetera?

          1. 3

            Ease up, man. This form of “my method is better – period” does nothing but alienate the argument and vilify the author (you). Imagine how this makes people feel.

            I doubt you’re here to stroke your ego, so if you want to get your point across effectively, consider using a different approach than this.

            ON TOPIC: There is no right answer here, as whatever works best for teams/groups/individuals will vary wildly.

            1. 6

              If you’re right, it’s enough to be right. Being rude to other people doesn’t make you more right.

              Please take a look down your recent threads. The downvotes you’re getting are not incorrect. You’ve sometimes had good technical points to make, but your attitude and communication style has overwhelmed your ability to share your knowledge and be part of this community. If you care about participating effectively, you need to change the way you communicate. And if you don’t care, why bother posting?

              1. 7

                Stop insulting people and assuming their opinions come from ignorance, incompetence, and malice. If you can’t treat others with common courtesy, let alone kindness, you should leave. If you can’t be at all polite or be quiet, I’ll ban you. This is normal for Lobsters and the vast majority of online and offline communities.

                1. 6

                  pushcx has my full confidence. I’ve been involved in running various online communities for something like fifteen years now, and he’s doing fine. If anything, he’s doing a better job of parsing out people’s concerns here than I am, lately.

                  I urge you to take the advice to examine your own recent words. I don’t think I have the ability to help, beyond giving that advice; this is really something you have to do yourself.

  • 3

    Meh. The advantages some programming languages bring to the table are sometimes very significant. They don’t just make the problem “slightly easier”. It’s likely that Go is popular because it makes concurrent networking programs significantly easier to write, compared to most mainstream languages; similarly, using OCaml (or something similar) to write a compiler or symbolic program is a huge improvement over doing it in C.

    1. 3

      It’s not about the language per se and more about how many primitives the language integrates, and how well chosen those primitives are.

      C has no automatic memory management nor concurrency primitives, and Go has both.

      Among languages that use async/await-style concurrency, their concurrency expressiveness is largely similar.

      All the “P languages” form a family based on the set of primitives they’re built on, in which they’re very close to each other, and so programs written in them tend to be structured broadly similarly, despite significant differences in some of the design choices of the languages. Sometimes those differences have practical impact too, but rarely from a zoomed-out perspective on the code structure.

      So the answer to “which language should I learn?” is fairly irrelevant if it’s to be taken as “which P language should I learn?” but is rather more meaningful if it implies “should I use Go, Haskell or Prolog?”. (Although even then it’s just one topic among the many you need an understanding of, as the article says.)

    2. 1

      On the other hand, none of these languages have improved the ways people use their databases, write their queries, set their indices, deploy their servers, configure their networks, ….

      Programming languages bring a lot to the table, but they are not the core of dealing with computers anymore. It’s a huge chunk, but not as central as people make them to be.

      1. 2

        Though not Go or OCaml, all the tasks you describe benefit from declarative languages, like SQL and Prolog. (Or Greenspunned versions thereof)

        1. 2

          Sure, if you span the net wide enough, you could also call Elasticsearch query syntax (which is bascially the AST of a simple Lucene search program) a programming language. This isn’t practical though and not what people mean by “I’ll learn another programming language”.

          SQL is a perfect example of that: it is rather worthless to know without at least having a hunch on how your specific database executes it. Plus, each of the product comes with extensions.

          1. 2

            it is rather worthless to know without at least having a hunch on how your specific database executes it

            I feel this is deeply true of any programming language — it is mostly useless divorced from an implementation. I feel that knowing how to program in C is inseparable from knowing compiler extensions and intrinsics. And with the exception of (seemingly increasingly rare) languages defined by standards, one may not have any choice.

            One difference between logic languages and imperative languages, here, is that most programmers have already deeply internalized a mental model of how imperative languages are executed (which still often fails to match the actual implementation… note the way one still finds people making performance assumptions that held perfectly well on the ZX Spectrum and not in the modern era).

            Maybe we actually agree on something here: I think something the OP is successfully pointing out is that most people’s definition of “I’ll learn another programming language” is so shallow that it yields little compared to the effort they could put into learning other things. But, for example, I think learning something like Prolog (well enough to write production software: i.e., understanding at least one implementation well enough to reason accurately about performance and so on) is an exercise that yields knowledge transferable to plenty of other areas of programming; I suspect one can make this argument for any language and implementation that differs significantly from what one already knows.

        2. 2

          Like SQL and Prolog = Datalog. Seems like a good example where a new language can help with database queries.

          https://en.wikipedia.org/wiki/Datalog

  • -1

    People use cat in the weirdest ways…

    1. 8

      I’m aware of useless uses of cat, but in this case I wanted to use it to ensure that wc -c wasn’t relying on the filesystem’s count of the number of bytes in the file - sending it through a pipe ensures that.

      1. 3
        wc -c < foo
        

        Also, POSIX specifies that wc shall read the file.

        1. 5

          If you check out the GNU coreutils wc source, if it’s only counting bytes, it will try to lseek in the fd to find the length. wc -c < foo is not the same as cat foo | wc -c in this case, because the seek will succeed in the first case and not in the second.

        2. 8

          I still prefer cat |. I actually prefer cat | in almost every case, because the syntactic flow matches the semantic flow precisely. With an infile, or even having the first command open the file, there’s this weird hiccup where the first syntactic element of the pipeline isn’t the initial source of the data, but the first transformation thereof.

          The main argument against it seems to be “but you’re wasting a process”, which, uh, with all due respect, I can’t see ever being a problem on a system you’d ever run a full multiprocessing Unix system on. If your system were constrained enough that that was an issue, a multiprocessing Unix would be too much overhead in and of itself, extra cats notwithstanding.

        3. 2

          < foo

          This does not guarantee that bytes are actually being read(); redirecting a file to stdin like that lets the process call fstat() on it if it wants. A naughty implementation of wc -c could call fstat(), check st_mode to verify that stdin is a regular file rather than a pipe or something, and then return the filesystem’s reported size from the st_size field without actually reading any bytes from stdin. Having some other process like cat or dd or something read the bytes onto a pipe does prevent wc -c from being able to see the original file & hence prevents it from being able to cheat and return st_size.

          Also, POSIX specifies that wc shall read the file.

          I guess this does. :)

        4. 0

          … and this is, indeed, how I would have done it.

      2. 1

        Interesting. Thank you for the great response.

  • 6

    What’s wrong with just script(1) to record:

    script script.log

    which makes a script of everything typed and displayed. Which you can then play back via:

    while read line; do echo ${line}; sleep 1; done < script.log

    Tweak the parameter to sleep(1) for replay speed :)

    1. 4

      Indeed, script is great, and ubiquitous. script usually (depending on the version) also has options to record and playback everything, including timing. (scriptreplay under Linux, script -p under FreeBSD.)

      1. 1

        Didn’t know about -r option to script(1), at least from BSD, to record. And -p option, to play back.

        Very cool!

        Thanks :)