1.  

    Fantastic post, thanks for sharing. I’ve been meaning to write a Python bytecode evaluator or a compiler that emits Python bytecode for a while. Looks like a good intro.

    1.  

      I’ve given a couple talks at Python conferences going over the basics of Python bytecode and how it’s executed, and these two are the general resources I always recommend for deeper dives/examples:

      1.  

        or a compiler that emits Python bytecode

        Is that just a learning exercise, or is there some practical use for it that you have in mind?

        1.  

          Just for fun/learning.

      1. 2

        More details in their proposal here.

        1. 3

          Ah but do they have their own implementation of JavaScript?

          https://github.com/nginx/njs

          1. 4

            It’s interesting how the declarative style of OCaml fits in well for the purpose of describing the behavior of hardware, since hardware description languages like Verilog are also designed to be declarative.

            1. 3

              There are a number of functional frontends for Verilog today already: Bluespec (Haskell), Hardcaml (OCaml), Chisel (Scala), etc.

              1. 3

                Funny you should say that… https://github.com/janestreet/hardcaml

              1. 1

                I shared a first release of this on lobsters a few weeks ago. The big new thing since then is support for loading multiple files and joining them in SQL. Another user request was to add support for OpenOffice Sheets which I have done. And it now supports .tsv files too.

                1. 6

                  As an even more extreme example of this, look at the “urls” that are allowed in httpie compared to curl. I think these shorthands make perfect sense in interactive developer tools and are a terrible idea in libraries. (I’m disappointed curl can’t be more lax about urls but also I get Daniel’s position.)

                  https://httpie.io/docs/cli/url-shortcuts-for-localhost

                  1. 15

                    Sharing this from elsewhere: If you’re interested to see a comparison of parallel programming in a number of functional languages check this repo out. It includes multicore OCaml, parallel MLton (but not Poly/ML, which has been around and parallel longer), Haskell, Futhark, F#, Scala, and Rust. Credit to Sam Westrick for turning me on to this.

                    1. 2

                      I like that there is a comparison repo. I didn’t look at everything, but the Rust code seems fairly straightforward, with no obscure tricks of zaniness.

                    1. 2

                      Slightly off topic, I noticed recently that HeidiSQL (a semi-prominent SQL GUI) is written in Delphi (but ironically their code base says it’s too old to be compiled by FreePascal).

                      Are there any other big open source apps written in Delphi or Pascal? My impression is that Delphi/Pascal are most common in proprietary apps so I was just curious if there were other common apps like HeidiSQL you may know of that are open source.

                      1. 2

                        There is PyScripter which is a Python IDE.

                        1. 1

                          Neat! Thanks! Interesting that it also does not use FreePascal, uses Delphi Community to build.

                      1. 17

                        It’s kind of fun trolling around on Google’s certificate transparency search engine to see what software random companies run internally. Put in any company and you’ll probably find their jira and whatnot. Weird form of customer/prospect research.

                        https://transparencyreport.google.com/https/certificates?hl=en

                        1. 6

                          I saw crt.sh first, and the interface is more concise.

                        1. 1

                          Moving my original post-description-comment to an actual comment:

                          The combo of -ldflags=”-w -s” and upx is pretty swanky. I just got one of my binaries down from 36Mb to 6Mb using both. And you don’t lose useful stacktraces.

                          1. 1

                            upx technically does make executables smaller, but it’s not as good as techniques that focus on removing junk/overhead from the executable in the first place. It’s going to have to uncompress before running.

                            Usually packaged programs are compressed externally (tarball, etc.) and then upx doesn’t save transfer size, but adds startup overhead and taunts anti-virus software.

                            1. 1

                              Yeah good point. After zip compression there’s only a 2Mb difference in the resulting zip whether I use upx on the Go binary or not.

                            1. 3

                              36 MiB to 6 MiB is good, but I feel 6 MiB is still too big. I have the impression that Go toolchain is bad at linking, also known as tree shaking recently.

                              1. 3

                                This binary includes database drivers for Oracle, SQL Server, Postgres, MySQL, and SQLite (among many other things) so 6Mb is pretty amazing. The Node.js version of this Go code was like 800Mb (proprietary db drivers don’t have great dependency management practices).

                                But yeah it’s just about your perspective.

                                1. 2

                                  Oh wow, does Go have an independent database driver for Oracle? If so, that’s amazing. (That’s my guess because proprietary database drivers not having great dependency management would be exactly same for both Node.js and Go.)

                                  1. 1

                                    Oh wow, does Go have an independent database driver for Oracle?

                                    Yeah it does, but it’s not the most mature library. Some of their development practices are a bit surprising. Like commenting out tests and making new releases. Tests are not run automatically as far as I can tell.

                                    Still, impressive work on its own.

                                    https://github.com/godror/godror

                                    (That’s my guess because proprietary database drivers not having great dependency management would be exactly same for both Node.js and Go.)

                                    Snowflake brings in aws sdk on Node and on Go but the Node one is still an order of magnitude bigger IIRC.

                                    1. 2

                                      Huh, that’s not independent. It seems to use ODPI and ODPI is written by Oracle.

                                      1. 1
                              1. 4

                                Started reading Empire of Liberty about the first few decades of the US, super interesting. I am most interested in Reconstruction/the Gilded Age but figured I’d start back further for more context.

                                Launched the new website for my (extremely small) company: https://multiprocess.io.

                                And thinking about my next personal blog post on bootloader basics.

                                1. 3

                                  I guess there’s not a Forth tag, shame. :)

                                  1. 2

                                    Nice work switching to a lexer!

                                    1. 2

                                      Thanks! It was a good idea, I was just being lazy so I could focus on other parts. But properly lexing has helped me a bunch chasing down edge cases.

                                    1. 2

                                      Working on a series of Observable notebooks on parsing with regular expressions. I feel it’s time I put some effort in sharing what I’ve been tinkering with.

                                      1. 1

                                        Parsing with regular expressions :O

                                        1. 1

                                          I know, but it’s a bit of an exercise in finding a compact set of primitives to get from 0 (regexes) to 100 (AST), and I’m also curious about the speed.

                                      1. 1

                                        Working on some various basic bootloader programs to discover that space a bit better. A text editor, maybe a clone of that snake-in-a-tweet bootloader program, etc.

                                        1. 2

                                          Just a superficial observation, but this seems a good example of a program that only deals with natural numbers (indexes, sizes, line numbers and so on). Unless there is a specific reason (which there could be), I think the pattern of representing all these natural numbers as i32 is often an antipattern. If anything, it generates a lot of casting:

                                          fp = data.len() as i32;
                                          

                                          If it was me, I would say that there is usually only one correct integer type natural number type, namely usize in Rust and size_t in C. Of course, there are cases where you subtract two numbers and the result could be negative – offsets in this case. This is an actual integer in the mathematical sense – a legit use of what programmers call signed integers. But even here (depending on what’s correctest and simplest), it may make even more sense to use a natural number representation if you know something is always positive or always negative, or even treat the two cases separately. This just depends on the invariants of the program. A sign of this is when the invalid range of values is artificially split in two by the signed representation, so that you have to check for both the lower and upper limit. I don’t see that here in its explicit form, but the many casts back to usize before using indexes, together with Rust’s implicit range checks, should be equivalent.

                                          1. 3

                                            Yeah I was a bit torn because this implementation only allows integers anyway and I’m unlikely to follow up on this post.

                                            BUT in a real implementation it’s going to be some byte sequence that stores all the datatypes and that won’t be usize either. So all the things that will always be usize I kept as usize but all the things that get stored as data I kept as i32 just kinda hinting that data is its own thing.

                                          1. 8

                                            This looks really well done! But I’m always compelled, in response to tutorials like these, to advocate for using parser/lexer generator tools instead of hand-writing them. In my experience writing your own parser is sort of a boiling-the-frog experience, where it seems pretty simple at first and then gets steadily hairier as you add more features and deal with bugs.

                                            Of course if the goal is to learn how parsers work, it’s great to write one from scratch. I wrote a nontrivial one in Pascal back in the day (and that’s part of why I don’t want to do it again!)

                                            Of the available types of parser generators, I find PEG ones the nicest to use. They tend to unify lexing and parsing, and the grammars are cleaner than the old yacc-type LALR grammars.

                                            1. 27

                                              This looks really well done! But I’m always compelled, in response to tutorials like these, to advocate for using parser/lexer generator tools instead of hand-writing them. In my experience writing your own parser is sort of a boiling-the-frog experience, where it seems pretty simple at first and then gets steadily hairier as you add more features and deal with bugs.

                                              That’s the opposite of my experience. Writing a parser in a parser generator is fine for prototyping and when you don’t care about particularly good error reporting or want something that you can reuse for things like LSP support but then it will hurt you. In contrast, a hand-written recursive descent parser is more effort to write at the start but is then easy to maintain and extend. I don’t think any of the production compilers that I’ve worked on has used a parser generator.

                                              1. 8

                                                Also once you know the “trick” to recursive descent (that you are using the function call stack and regular control flow statements to model the grammar) it is pretty straightforward to write a parser in that style, and it’s all code you control and understand, versus another tool to learn.

                                                1. 3

                                                  What’s the trick for left recursive grammars, like expressions with infix operators?

                                                  1. 4

                                                    Shunting yard?

                                                    1. 3

                                                      You can’t just blindly do left-recursion for obvious reasons, but infix is pretty easy to deal with with Pratt parsing (shunting yard, precedence climbing - all the same).

                                                      1. 2

                                                        The trick is the while loop. If you have something like A = A '.' ident | ident you code this as

                                                        loop {
                                                          ident()
                                                          if !eat('.') { break }
                                                        }
                                                        
                                                    2. 1

                                                      Having used and written both parser generators and hand-written parsers, I agree: parser generators are nice for prototypes of very simple formats, but end up becoming a pain for larger formats.

                                                    3. 7

                                                      Thanks! My thought process is normally: handwritten is simple enough to write, easy to explain, and common in real world software, so why learn a new tool when the fun part is what’s after the parser? I just try to focus on the rest.

                                                      1. 4

                                                        Lua’s grammar is also pretty carefully designed to not put you into weird ambiguous situations, with only one exception I can think of. (The ambiguity of a colon-based method call in certain contexts.)

                                                      2. 6

                                                        In general that is true, but with Lua there are compelling reasons (which I won’t get into here) to handwrite a single step parser for real implementations.

                                                        That’s what the reference implementation does, despite its author being well-known for his research in PEGs (and authoring LPEG).

                                                        1. 2

                                                          do you have suggestions on PEG parsers that generate JS as well Javascript/Kotlin ?
                                                          I was searching for something that I can use on a frontend webapp as well as on a backend (which is in Java).

                                                          1. 2

                                                            No, sorry; the one I’ve used only generates C.

                                                        1. 5

                                                          Very nice! I would love to see this become a full Lua 5.4 impl, even if a slow one.

                                                          1. 8

                                                            For the most part, projects on my personal Github and blog are purely educational/for teaching rather than something I intend to support or grow on their own. Just minimal examples to help folks learn.

                                                            But it’s open source if anyone else wants to build on it!