1. 9

    I can’t agree with this more. While doing olin/wasmcloud stuff I have resorted to this slightly insane method of debugging:

    • debug printf the pointer of a value you want to look for
    • immediately crash the program so none of the heap gets “contaminated”
    • pray the value didn’t get overwritten somehow (in my subjective and wholly personal experience Rust does this sometimes, it’s kind of annoying)
    • dump the webassembly linear memory/heap to a binary file
    • search for that pointer using xxd(1) and less(1)
    • grab my little-endian cheat sheet
    • hopefully derive meaning from it

    WebAssembly debuggers really need to happen, but until they do we are basically stuck in the printf debugging, black magick and tarot cards stone age, but at least it’s slightly easier to make a core dump. After a few hours though you learn to read little endian integers without needing to break out graph paper.

    1. 2

      I’ve used instrumentation tools like Wasabi for dynamic testing during client assessments (yes, we have clients working in WASM already). Also, we’ve extended Manticore, our symbolic executor to support WASM, and we have some symbolic debugging-like scripts for it internally.

      In general tho, I whole-heartedly agree; we’re seeing more and more of this sort of thing, and the tooling definitely needs to catch up.

      1. 2

        To be fair, the changes needed on the DWARF/LLVM side are super recent; I believe I saw commits related to this in LLVM as recently as December. The debuggers will catch up, but it takes time for some of this stuff to percolate outwards, I haven’t done any testing myself in the last few months, but I suspect the browsers have experimental support on the way shortly, if not already available. It will take a bit longer for this stuff to make it into say, lldb, but not that much longer.

        The current situation does suck for those of us needing to build stuff with WebAssembly right now, but I keep myself sane by knowing that its just growing pains - we’ll get the necessary tooling sooner rather than later.

        1. 1

          Call me a pessimist, but existing debugging infrastructure might not be good enough for this. We might wanna start with something like AVR debuggers or other things made for Harvard architectures.

          1. 2

            Presumably there are different needs. The first need is for front end / in browser use cases. For that, foo.wasm.debug could effectively include all the debug symbols, and source map, or whatever and you use the in browser debugger. That seems fine.

            The server side is more problematic, I think, but that’s where your Harvard arch comes in. Provide a manhole cover that can be opened for the purposes of attaching a debugger to, in your execution environment, and poke and prod with step debuggers, perhaps? Still need the debug symbols and such… of course.

            1. 1

              Jtag into a wasm env, really this just needs to a handful of exported fn from the env itself. One could run wasm in wasm and implement it yourself in user space.

        2. 1

          What if your wasm env could visualize memory in real-time and set breakpoints in any fn call or memory write?

          1. 1

            Debuggers have two sides, symbol side and target side. Breaking/resuming and reading/writing memory is done in target side, and you can implement them in your WebAssembly environments. But that only lets you to do break 123 or print 456, not break function or print variable. The article is mostly about symbol side.

            1. 1

              That would solve a lot of problems, but it would also be a lot of work :)

              I’m gonna see what I can do though.

              1. 1

                Here is a veritable zoo of binary visualizers

                https://reverseengineering.stackexchange.com/questions/6003/visualizing-elf-binaries

                I think the past of least friction is run wasm3 within the browser so that you can have 100% over execution, it will be slow ~25x, but provide first class access to all facets of execution of the code under debug.

          1. 9

            I wonder how many projects requiring these trendier build systems like meson or ninja are using them as intended or to the capacity they allegedly facilitate. Meanwhile, make is unsexy but it’s everywhere.

            I sort of get cmake but even that boils down to a Makefile. Do front ends like that really reduce enough boilerplate to justify making them a build requirement? They haven’t for my own projects but I’m generally not building in the large.

            1. 6

              I sort of get cmake but even that boils down to a Makefile. Do front ends like that really reduce enough boilerplate to justify making them a build requirement? They haven’t for my own projects but I’m generally not building in the large.

              My experience with cmake is that it’s a worse wrapper than autotools/configure is, which is really saying something. I tried to get a i386 program to build on x86_64 and I had immense trouble just communicating to gcc what flags it should have, cmake actually lacked any architecture specific options that would have enabled me to do that.

              1. 9

                I’m not sure what you hit specifically, but cmake does provide those kind of options, namely CMAKE_SYSTEM_NAME, CMAKE_CROSSCOMPILING, and CMAKE_SYSTEM_PROCESSOR. I found this document really helpful in my own endeavours.

                My personal experience with CMake is that it has a bear of a learning curve, but once you’ve seem some example setups and played around with different options a bit, it starts to click. Once you are more comfortable with it, it is actually quite nice. It has its rough edges, but overall I’ve found it to work pretty smoothly in practice.

                1. 1

                  Ah, thanks! I’ll keep that in mind for next time I end up doing that

              2. 4

                CMake isn’t really an abstraction over Makefiles; in fact, there’s plenty you can probably do in Makefiles that would be cumbersome or perhaps impossible to do purely in CMake. It’s a cross platform build system that just uses Makefiles as one of it’s targets for doing the actual building step.

                Where CMake tends to get it’s use (and what it seems to be intended for) is:

                • Providing a build system for large, complex C++ (primarily) projects where lots of external dependencies exist and the project is likely to be distributed by a distribution or generally not controlled by the project itself
                • Cross platform projects that are largely maintained for platforms where Makefiles are not well supported in a way that is compatible with GNU or BSD make, or where supporting ‘traditional’ IDEs is considered a priority (i.e. Win32/MSVC).
                1. 2

                  Unfortunately, ninja is too simple for many tasks (e.g. no pattern rules) and building a wrapper is a more complex solution than Make.

                  Meson is too complex for simple tasks like LaTeX, data analysis, or plot generation. Make is great for these use cases but a few improvement are still possible:

                  • Hide output unless a command fails.
                  • Parallel by default.
                  • Do not use mtime alone, but in addition size, inode number, file mode, owner uid/gid. Apenwarr explained more.
                  • Automatic “clean” and “help” commands.
                  • Changes in Makefiles implicit trigger rebuilds where necessary.
                  • Continuous mode where it watches for file system changes.
                  • Proper solution for multi-file output.
                  1. 2

                    CMake can generate Makefiles, but I would hardly say its value simply boils down to what Make can provide. All of the value provided by CMake is what goes in to generating those files. It also benefits from being able to generate other build system files, i.e. using Ninja, or Visual Studio makefiles. A lot of complexity comes in when you need to adapt to locating dependencies in different places, detecting/auto-configuring compiler flags, generating code or other files based on configuration/target, conditionally linking things in, etc., and doing those things in plain Make is a huge pain in the ass, or not even possible in practice.

                    I wouldn’t say CMake is perfect, not by a long shot, but it works pretty well in my experience.

                  1. 5

                    It’s interesting, but it tells us nothing other than “we were able to establish a presence on the aircraft’s systems.” It doesn’t say which systems, nor does it say if it was simply information disclosure or they were able to take control of vital aircraft systems.

                    There’s a huge difference between being able to, say, see what channels are being watched on the in-flight satellite TVs, and being able to throttle down the engines.

                    (Not that we shouldn’t take this sort of thing extremely seriously regardless of what they were able to do. A hole in the dike and all that…)

                    1. 1

                      I would be blown away if the flight control avionics weren’t isolated from all other systems on the aircraft, let alone connected to a network which could be compromised remotely (i.e. via the in-flight wi-fi, or some other means accessible to a passenger).

                      1. 3

                        They’re supposed to be done like in this answer:

                        https://aviation.stackexchange.com/a/14823

                        It was also one of main reasons for development of partitioning kernels like those below that airplane and defense companies were buying.

                        http://www.aviationtoday.com/2006/03/01/mils-operating-systems-safety-and-security/

                        Now risk has gone way up in some ways with the consolidation of more stuff onto less boxes while attackers are doing hardware attacks that break the model. If those go to airplanes, it might get bad. The older model had its own risks like piles of wiring that sometimes shorted out with the plane going splat.

                        1. 1

                          I would be kind of amazed if they weren’t. For starters supposedly its already happened by tapping into the in-flight entertainment center..

                          Also production on these stopped in 2004. Even if there was no vulnerability in the meridade of transceivers and receivers, I would be kind of surprised if the after market retrofitting was basically akin to plugging something to the ODB2 connector.

                      1. 7

                        Seems like a pretty language. The whole anything divided by zero is zero thing means it’s probably not great for anything math heavy as it will pave over legitimate errors.

                        1. 7

                          That seems like such a bizarre design decision - I wonder what motivated it. It’s definitely something that would’ve caught me off guard.

                          1. 3

                            Division by zero should at least be inf if not going to be an error :P

                            1. 2

                              You could easily define a division function that checks if you’re dividing by zero and emits an “exception”. (I put it in quotes because exceptions must be handled in at some point in the stack.)

                              So I guess if you want to do heavy math you’d do that?

                              1. 2

                                The official tutorial has a “gotcha” page for division by zero that explains their thinking a bit. It originally used a partial function that would throw an exception if you divided by zero. They later changed it to a total function that returns zero in the case of division by zero.

                                From a mathematical standpoint, it is very much insane. From a practical standpoint, it is very much not.

                                From a practical perspective, having division as a partial function is awful. You end up with code littered with trys attempting to deal with the possibility of division by zero. Even if you had asserted that your denominator was not zero, you’d still need to protect against divide by zero because, at this time, the compiler can’t detect that value dependent typing.

                                Here’s a recent discussion on Reddit of this decision.

                                To me it seems like the language makes you do something with all possible errors but when it came to actually programming that way, the authors found it exhausting. There are a lot of people that find Go’s error handling or Swift’s optional handling similarly tiresome.

                                1. 2

                                  Oh geez, this really grinds my gears. I’m trying not to be nasty about this, but the only explanation of this is in a sentence that’s not finished:

                                  Even if you had asserted that your denominator was not zero, you’d still need to protect against divide by zero because, at this time, the compiler can’t detect that value dependent typing.

                                  It’s literally “blah blah dependent typing”!

                                  I’m sure that they can correct this sentence to at least be grammatically correct. However, the more serious error is a philosophical one.

                                  I call this “the map is not the territory”, which is an old idea in philosophy [1]. The type system is a MAP to the territory. The territory is your PROGRAM – i.e. its runtime behavior.

                                  The map is supposed to HELP you with your program. In this case, they’ve got it precisely backwards: the map is making your program wrong.

                                  You should not make your program wrong to satisfy the type system. That’s like demolishing a house because someone misprinted a map of your town. No – somewhere lives there and there are real world consequences to it.

                                  Also, somebody else mentioned stack overflow and heap allocation failures on Reddit, which are two other things that can happen everywhere (at least in C programs).

                                  [1] https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation

                                  1. 1

                                    Couldn’t there be an option for the compiler inserting the checks automatically with it crashing and dumping the state into a log on a divide by zero? Anything that keeps it from running and running with potentially-bogus data.

                                  2. 1

                                    I think it’s an interesting idea. I wonder whether a separate coalescing division operator would be good? Or, have pony have its semantics for division by zero but supply a separate operator for traditional semantics.

                                  1. 5

                                    I really enjoy effects systems, though the verbosity is rough. Purescript ends up with some gnarly annotations if you go all in.

                                    One thing I want to really see is moving more towards tagging and working in partial information. Particularly: being able to “overlay” types onto things that are already built.

                                    For example, you might have an internal HTMLString tag/row, and some third party lib provides concatHtml, but doesn’t know about your stuff. But if you could offer type annotations over this without having to rewrite all your code (super useful when working on larger projects to avoid churn just to try out new typechecks), then you could tag things simply.

                                    We do a lot of stuff with wrapping/unwrapping nowadays, but then we hit the “monad stack” problem pretty quickly. Does HTMLString . EvenLengthString mean the same thing as EvenLengthString . HTMLString?

                                    We have a lot of tools out there for types and building packages/modules, and I think we can expand static analysis beyond just current types.

                                    1. 2

                                      I’m really intrigued by them as well, but mostly from the point of view of optimizations (for example, memory management) or interesting types of static analysis. I haven’t read anything that makes me think user-defined effects are very useful in practice, but I’d love to read some good examples. Custom effect-handlers can offer some interesting options, such as the concurrent scheduler often showed in papers, but when I compare that model to how much more intuitive working with processes in Erlang is, I can’t help but feel like it is a bit of a “square peg in a round hole” situation. To me the promise of effect systems (and effect inference), is a compiler which is able to statically determine a lot more interesting properties about a program, and thus able to optimize it better, or warn about things like conflicting effects. I’m really interested in this myself, because I know I could extend my type system with effects if I really wanted to go that direction, but beyond the memory management aspect, I’m not sure what I would be trying to accomplish with it.

                                      I think my issue with tagging/gradual typing is that you lose a lot of the guarantees/benefits you get with a system designed around static types (whether inferred or otherwise). You basically have a system where everything is effectively dynamically typed anyway, because some of the code may be working with typed values, but then passing them off to code which has no type information, and has to rely on tags. One of the benefits of statically typed languages (in my opinion) is that the type system gives you enough information to do some really nice optimizations (such as unboxing primitives), and also the guarantee that if your type system is sound, and your program compiles, you can be certain it will work.

                                      You mention using a value of type HTMLString with some third party lib providing a concatHtml function. How are you proposing to make it work with your types? Are you assuming it has no types and you can provide your own type signature for it that works with your HTMLString type, ala Haskell typeclasses/instances? Or are you suggesting that your HTMLString type could be made compatible with whatever type concatHtml is annotated with? I think a number of gradual typing systems do something like the former (i.e. decorate untyped functions with their missing annotations), but in practice I’ve heard mostly complaints about this approach - most developers don’t want to spend time doing that for an entire third-party library, let alone multiple, and while some industrious community members will certainly do so, it doesn’t prevent the feeling of “the type system doesn’t really protect me”. The latter seems unlikely to work, other than in the case of subtyping, at least without heavy restrictions (i.e. allowing types which are structurally the same, or types which are basically aliases for the same thing).

                                      I think a lot of the gradual typing approach is focused on initial productivity, right at the beginning of a project, where things are constantly shifting, sometimes in big ways - and I think that having a more dynamic system at that point is valuable, but it doesn’t take long for things to settle down to a point where a static type system is staying out of the way but doing a good job helping you make changes/preventing mistakes. Rather than gradual typing, it would be nice if you could defer typechecking until the point where it’s required and be able to tell the typechecker “I know this looks wrong, but let it fly for now”. I think this combination would allow quite a bit of the flexibility desired near the beginning of a project. Unfortunately I think it introduces a ton of complexity into a compiler, which I think is fine, but can also make it difficult to implement and balance with all of the other concerns (error reporting, soundness, codegen).

                                    1. 3

                                      Is there a video to go with the slides? A few of the bullet-points leave something to be desired in terms of context and clarity.

                                      1. 1

                                        I couldn’t find one, and I looked for awhile. I agree it’s lacking a larger point/context that must have been established in the actual talk, but didn’t make it into the slides. Hopefully somebody knows where it can be found.

                                      1. 14

                                        I find slide #12 odd.

                                        There is interest in proper engineering in some circles of academia, namely the one within industrial companies. Interestingly, the research groups within these companies are the anti-thesis to what the author writes about them: they are not bound by quarterly returns.

                                        Finally, there is a big hobbyist and enthusiast community that is interested in participating, they just can’t provide infrastructure.

                                        The slide makes it seem like these are three silos and a new programming language has to come out of one of these.

                                        A combination of these is very powerful though: Rust specifically is rooted in all three. It is a programming language built by the research arm of a company, initially as a research project, but geared towards solving problems that company has at scale. This interplay is very powerful, as each and every language feature of Rust can be traced to problems that Mozilla has.

                                        But, and that’s where it’s rather unique: it was built with community participation from 0.1.0 on (there’s history before that). Indeed, the language has gone through fundamental changes through hobbyist involvement and feedback. Even today, 75% of all contributions are from non-Mozillians.

                                        This all is made easier, though, without the huge amount of infrastructure Mozilla provides: CI, build tooling, hosting for the odd website you might have, hosting for the package manager.

                                        So, yes, individually, the next big language doesn’t have many chances to come out of one of these silos. But smashing the silos has a lot of potential. And Mozilla is a actually good at doing this.

                                        P.S.: For someone interested in some history, this is a talk by a community contributor that build many features that were subsequently removed for 1.0 (they are not bitter about it, they think Rust is a better laguage for it). https://www.youtube.com/watch?v=olbTX95hdbg&list=PL85XCvVPmGQic7kjHUfjHb6vz1H5UTEVG

                                        1. 6

                                          Yeah I found that strange too - I’ve worked at a few companies which had R&D departments, oftentimes focused on software, and it doesn’t seem far fetched that a programming language could grow out of one of those departments (and as you said, they already have).

                                          A lot of these slides seemed to take an “everything sucks” point of view - and I can’t tell if that was in the interest of stirring up conversation, or whether it’s actually their attitude about these things, but I can’t help but feel there are better ways to get people thinking about the issues involved in growing practical programming language improvements.

                                        1. 5

                                          On type checker ergonomics, I recommend Improving Type Error Messages in OCaml (2015), especially check “Related work” section.

                                          1. 1

                                            Thanks for the link!

                                          1. 7

                                            I was actually looking for such a tag recently, and ended up using the compilers tag, but found that not everything I was interested in was under that tag (some was under programming, some under compsci. A dedicated tag for programming language theory would have been nice.