Threads for qznc

  1. 10

    I’m not an expert in the field but to me it seems obvious that LLM needs some sort of db lookup for esteblished facts in order to produce reliable factual output. Why is this not obvious to experts?

    For example, if you ask ChatGPT how to get from Dallas to Paris it will tell you a lot about how to get to France. It wouldn’t care to clarify which Paris you actually want to get to. Maybe it’s the one 100 miles away. All just because statistically Paris, France comes up more often in the training data set.

    Why would an LLM show any different behaviour in science? Pi stands for all sorts of things (nucleotide diversity in chemistry, pion particle in physics, population proportion in statistics, prime counting function in maths, to name a few) but most often it’s circle circumference to diameter ratio constant. Would we expect an LLM to reliably use pion in the phisics context? Would we expect an LLM to always properly pick either pion or the ratio constant given that both have place in the related math?

    Statistically plausible text generation is maybe good to come up with technobabble for your next sci-fi but I don’t see why experts in the AI field though it might produce good science.

    I wonder what I miss that made them confident enough to release this Galactica model.

    1. 6

      You’re saying this like humans don’t have the exact same issue. Ask me about pi and I’ll tell you the math answer. (Because it’s statistically likely) Start a conversation about physics and maybe I’ll expect you mean pion instead. Yet our science does just fine (usually).

      You can construct prompts with enough context to tell GPTs what kind of lookup is available and how to use it too. They just can’t guess what you’re thinking about.

      1. 10

        Concrete knowledge is the same issue for humans. One obvious difference though is humans are pretty good at knowing when they don’t know something or have low confidence in their knowledge. Especially in scientific/research environment. That’s why every paper has a tonne of citations and why most papers have a whole section restating previous findings that the paper build on. LLMs though are way to happy to make stuff up to be useful for doing novel research.

      2. 4

        Two things are obvious to experts (Experts in what? Ontology?):

        For example, Bing’s chat product can query Wikipedia, but even if the LLM dutifully summarized Wikipedia, there are biases in both the LLM’s training data and in Wikipedia’s own text.

        1. 3

          Well, yes, but LLMs often hallucinate things completely divorced from reality as opposed to merely biased datasets of Wikipedia or whatever. The discourse would’ve been very different if LLMs were biased but factually correct on the level of Wikipedia. As of right now we’re very far from that and yet some people still think Galactica is a good idea.

          1. 2

            There is ongoing work to solve that, for instance: https://arxiv.org/abs/2305.03695

            It’s not as easy as “just throw a DB at it”. I expect this problem will eventually be solved. Companies like Google or Meta were once careful not to release early, untested models but the competition from OpenAI changed that. Things are just going so fast currently that we will see issues like this for a while.

        2. 4

          Part of the issue is that this is fundamentally impossible. LLMs as an approach cannot do anything even remotely like this. There are many other approaches that either already do do something like this or could (with sufficient research) plausibly be made to do so, but just fundamentally the LLM approach cannot ever do anything like this. The closest thing to this that I have seen plausibly demonstrated for LLMs is usually euphemistically called ‘fallback’ - basically, you either have:

          1. some form of (non-LLM) recognizer that scans the input for things that can be actually solved by some real AI (or just regular Comp Sci) technique, it replaces the thing with some kind of placeholder token, the external system solves the thing, and then it gets substituted in where the LLM has emitted a ‘placeholder answer’ token.

          or

          1. you have some (non-LLM) system to detect ‘the LLM has generating something with problems’, and the prompt + answer (or some subset of it) gets sent to a human in a call center somewhere, who writes the actual response and does some data entry followup.

          And neither of these is actually connecting the LLM to a db or giving it the ability to look up facts - they are both ways to hide the fact that the LLM can’t actually do any of the things that are advertised by using non-LLM systems as a patch over the top in an ad-hoc way.

          The correct way to proceed is to mine the LLM development for the one particular interesting bit of math that they developed that could be of a lot of use to statistics if it actually turns out to be mathematically valid (basically a way to make high dimensional paths with ~ arbitrary waypoints that are differentiable and/or a way to do sampling+something like correlation very nonlocally, depending on how you look at it) and then throw the rest away as dangerous but also useless garbage, and pour the billions of dollars that have been mustered into any of the many actual areas of AI that are real and useful.

          1. 2

            What are you referring to when you say “this”? What is fundamentally impossible for LLMs?

            1. 3

              some sort of db lookup for esteblished facts

              Edit: apologies for lack of clarity in my initial reply to you

              1. 1

                I wonder why it’s fundamentally impossible? At least on the surface it appears LLMs are capable of some form of reasoning so why can’t they know thye’re making an inference and need to look stuff up?

                1. 3

                  LLMs do not reason, and they do not ‘know’. You are misunderstanding what the technology is and how it works (unfortunately aided in your misunderstanding by the way the major promoters of the technology consistently lie about what it is and how it works). They are a technology that produces output with various different kinds of ‘vibe matching’ to their input, and where ‘fallback’ (a fundamentally non-LLM technology) is used in an ad-hoc way to patch over a lot of the most obvious nonsense that this approach produces.

                  Edit: That is, LLMs are fundamentally a different approach to the general problem of ‘knowledge stuff’ or ‘machine intelligence’ where, instead of doing all the difficult stuff around knowledge and belief and semantics and interiority and complicated tensorial or braided linear logic and solvers and query planners and knowledge+state representation and transfer to connect all of these different things plus a whole bunch of difficult metacognition stuff and etc etc that would mean that connecting to a knowledge db is something that could actually work, you just… don’t do any of that difficult stuff and then lie and say you did.

                  1. 1

                    I don’t disagree with your assessment but I wonder if there is a way of tweaking LLMs to do this without altering their fundamental architecture. If you add a load of ‘don’t know’ tokens to the training data then I would expect their predictions to be full of these in any output where the other things in the input set did not provide a stronger signal. It wouldn’t be completely reliable but you could probably count the proportion of ‘don’t know’ tokens that end up interleaved with your output to get a signal of how accurate the response is and then train another model to generate searches based on the other tokens and then have the tandem system loop feeding more lookup results from the second model’s output into the first until it is below a threshold ratio of ‘don’t know’ to other tokens.

                    1. 1

                      Do you have a rocomendation for a shortish explainer on LLMs that can clear up this apparent reasoning ability (and other wonderful capabilities) for me? Or materials on what it actually does and why it might look like it has some cpabilities that it actually doesn’t have?

                      1. 3

                        Part way through a long answer and there was a power cut. Bleh. So, a short answer it will have to be.

                        Basically, there are no good short explainers out there that I have been able to find. The short explainers are basically all lies, and the good stuff is all zoomed in critique that assumes a lot of background and only covers the specific bit under examination.

                        I am actually part way through writing my own paper about why programmers seem to get taken in by these things more than they ‘should’, but it is a real slog and I’m not sure I’ll be able to get it into a form where it is actually useful - I’ve had to start by working on a phenomenology of programming (which is itself a large and thankless undertaking) that satisfies me in order to even have the tools to talk about ‘why it looks like it has capabilities’.

                        The two papers I’d suggest looking at to get a bit of a starting point on the deceptiveness of the ‘LLM capabilities’ propaganda are “Language modelling’s generative model: is it rational?” by Karen Spärck Jones and “Are Emergent Abilities of Large Language Models a Mirage?” by Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo.

                        1. 1

                          Thank you.

                          Potentially misguided advice: maybe aiming for something less formal than a paper would make is less of a slog.

                          1. 3

                            Alas, the issue is that I need to establish a bunch of pretty foundational stuff about what programming is to a programmer, what is the nature of the current technology stack, and a bunch of stuff around how we think about and talk about the current technology stack (the “‘secretly two layer’ nature of contention about whether some technology ‘works’” thing) before I can use that to show how it opens programmers up to being mindeaten.

                            But also, the paper “AI and the Everything in the Whole Wide World Benchmark” by Inioluwa Deborah Raji, Emily M. Bender, Amandalynne Paullada, Emily Denton, and Alex Hanna, has some good stuff about ‘construct validity’ that lays out some of the formal case behind why I say that LLM boosters are just lying when they make claims about how LLMs can do all kinds of things.

            2. 3

              Openai spent 6 months testing GPT-4 before releasing it… There might be a hint of what should be and shouldn’t be done…

              1. 13

                They trained it on the internet and were surprised it „mindlessly spat out biased and incorrect nonsense.“ There might be a hint of what should be and shouldn’t be done… 😉

              2. 3

                I think it is obvious to most people trying to build applications out of LLMs, but it seems some people, like researchers have a harder time with this. Most practitioners are using models to produce embeddings which are used in conjunction with vector databases to find similar information. It works fairly well.

                1. 3

                  db lookup for esteblished facts

                  https://www.wikidata.org/ maybe?

                  Also see http://lod-a-lot.lod.labs.vu.nl/

                  1. 4

                    There are a plenty of options and it doesn’t take much effort to find them. The issue is that LLMs don’t use any of them.

                  2. 2

                    LLM needs some sort of db lookup for esteblished facts in order to produce reliable factual output.

                    Wolfram Alpha and the wikidata have been earlier attempts at making such DB’s. Both done the hard way. Maybe the next killer application will be an LLM instruction-trained to use them?

                    1. 2

                      I wish. It seem to me currently LLMs don’t use either because they have a very small input buffer (i.e. wikipedia doesn’t fit into it) or don’t do multi-step inference (can’t look up missing data and put it into context for another try).

                      Things like AutoGPT might be a viable approach even with smaller context if they didn’t try to pursue the task from the get go and instead did a few rounds of “what do I need to know before I can give an answer?” before replaying to the initial prompt.

                      But there was that paper that promised very large/virtually unlimited inputs so maybe that one’s going to work. I’m sceptical though because it probably would take a lot of time for a GPT-4-sized LLM to chew through the whole Wikipedia on every prompt.

                      1. 3

                        (can’t look up missing data and put it into context for another try).

                        ChatGPT can do this - it will perform searches online to find information to help it answer prompts. It requires beta though.

                  1. 6

                    If I understand correctly, this is about avoiding common pitfalls with dynamic types in Python, and making code more robust, by adding types everywhere. Where it differs from Rust is that even with all these types, Python interpreter will still happily run whatever it is given (while Rust compiler will complain loudly). So one still needs to rely heavily on mypy and pyright.

                    1. 10

                      It’s not only about type annotations. Even in a fully dynamically typed unannotated language, code might be more or less prone to runtime type errors. A good example here is null-safety.

                      Python is not null (None) safe — it occasionally happens that you try to call a method on something, and something turns out to be None.

                      Python is more null-safe than Java. Java APIs tend to happily return null, while Python in general prefers to throw. For example, looking up a non-existing element in a dictionary returns null in Java, and throws KeyError in Python. Python behavior prevents silent propagation of nulls, and makes the bugs jump out.

                      Erlang is null-safe. It is dynamically typed, but, eg, map lookup functions return ('ok, value) pair on success, and 'error atom on failure. This forces the call-site to unpack an optional even for happy cases, signaling the possibility of nulls.

                      1. 9

                        Python and Java are safe in a type safety sense with respect to None/null. The behavior is well defined (raise/throw an exception). Your complaint is about API design not language design.

                        For contrast, in C/C++ it is not defined what happens if you dereference a NULL pointer and that makes it unsafe. The NULL constant might be some thing else than 0x0. Dereferencing it could return a value or kill the process with a segmentation fault or something else.

                        1. 6

                          Yes, this is mostly a question of API design, rather than a question of language design (though, to make Erlang API convenient and robust to use, you need to have symbols and pattern-matching in the language).

                          Whether null-unsafety leads to UB is an orthogonal question.

                          1. 1

                            The NULL constant might be some thing else than 0x0.

                            https://en.cppreference.com/w/c/types/NULL

                            The macro NULL is an implementation-defined null pointer constant, which may be

                            • an integer constant expression with the value ​0​
                            • an integer constant expression with the value 0 cast to the type void*
                            • predefined constant nullptr (since C23)

                            https://en.cppreference.com/w/c/types/nullptr_t

                            nullptr_t has only one valid value, i.e., nullptr. The object representation of nullptr is same as that of (void*)0.

                            1. 3

                              The integer literal 0 when used as a pointer is a valid way to specify the null pointer constant, but it doesn’t mean the representation of null pointers is all-zero bits.

                              https://en.cppreference.com/w/cpp/language/zero_initialization

                              A zero-initialized pointer is the null pointer value of its type, even if the value of the null pointer is not integral zero.

                              https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf#%5B%7B%22num%22%3A649%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C-27%2C816%2Cnull%5D

                              7.20.3.1 The calloc function

                              Synopsis

                              1 #include <stdlib.h>

                              void *calloc(size_t nmemb, size_t size);

                              Description

                              2 The calloc function allocates space for an array of nmemb objects, each of whose size

                              is size. The space is initialized to all bits zero.261)

                              Returns

                              3 The calloc function returns either a null pointer or a pointer to the allocated space.

                              1. Note that this need not be the same as the representation of floating-point zero or a null pointer constant.
                              1. 1

                                Everything that you say is true with respect to the de jure standard. The de facto standard is somewhat different and a platform where null is not 0 will break a lot of assumptions. In particular, C’s default zero initialisation of globals makes it deeply uncomfortable for null to not be a zero bit pattern.

                                For CHERI C, we considered making the canonical representation a tagged zero value (I.e. a pointer that carries no rights). This has some nice properties, specifically the ability to differentiate null from zero in intptr_t and a cleaner separation of the integer and pointer spaces. We found that this was a far too invasive change to make in a C/C++ implementation that wanted to be able to handle other values.

                                Similarly, AMD GPUs have the problem that their stack starts at address 0, so 0 is a valid non-null pointer. I proposed that they lower integer to pointer and pointer to integer casts as adding and subtracting one. This would let their null pointer representation be -1. This broke a lot of assumptions even in a fairly constrained accelerator system.

                                In particular, C has functions like memset that write bytes. C programmers expect to be able to fill arrays with nulls by using memset. They expect to be able to read addresses by aliasing integers and pointers. They expect to be able to stick objects in BSS and find that they are full of nulls. They expect the return from calloc to be full of nulls, even if the standard does not require it.

                                My favourite bit of the standard’s description of null is that a constant expression 0 cast to a pointer may compare not equal to a dynamic value of zero converted to null. If an and b are integers and a is constant, a can compare equal to b, but a cast to a pointer would compare not equal to b cast to the same pointer type. That’s such a bizarre condition that I doubt anyone writing C has ever considered whether their code is correct if it can occur.

                                In general, I am happy what WG14 and WG21 try hard to avoid leaking implementation details into the language, but there are two places where I think that the ship has sailed. Bytes (char units) are octets and null is represented by zero. So much C/C++ code assumes these two things that any implementation that changes them may, technically, by C or C++, but it won’t run more than a tiny fraction of code written in either language.

                                1. 1

                                  Oh, I agree that de facto C code almost universally assumes the null pointer representation to be zero when initializing data with memset or calloc, for better or worse (my C code does too). And in practice assuming nulls are zero is rather convenient. Though machines with non-zero null pointers do (or did) exist (https://c-faq.com/null/machexamp.html)

                                  In particular, C’s default zero initialisation of globals makes it deeply uncomfortable for null to not be a zero bit pattern.

                                  Pointers are initialized to the null pattern in the default zero initialization, so I’m not sure what you mean. Are you saying that it is inconvenient for null representation to be non-zero because then the compiler couldn’t put the zero-initialized global in .bss? I agree that’s a good point in favor of having a zero null in practice, but it wouldn’t be so dire, an implementation could put the global data in .bss to be zero-initialized by the loader and then manually initialize the pointer fields, for example with a special pseudo-relocation. Though I don’t know if there’s any implementation that does anything like that, and that doesn’t solve the problem for calloc/memset.

                                  1. 1

                                    Pointers are initialized to the null pattern in the default zero initialization, so I’m not sure what you mean. Are you saying that it is inconvenient for null representation to be non-zero because then the compiler couldn’t put the zero-initialized global in .bss?

                                    Yes, and the mid-level optimisers in most compilers really, really like that assumption. If they need to fill in another bit pattern then causes problems. More importantly, it’s surprisingly common to put huge data structures in BSS. Every platform except Windows was very happy for snmalloc to put a 256 GiB structure full of pointers in BSS and then have the OS lazily allocate pages as needed. If null were non-zero, then suddenly that would require a 256 GiB binary. We are quite an extreme example of this but I’ve seen variants of it elsewhere.

                                    implementation could put the global data in .bss to be zero-initialized by the loader and then manually initialize the pointer fields, for example with a special pseudo-relocation

                                    That’s basically what I did for early versions of CHERI/Clang and it has some really bad pathological cases in some common codebases.

                                    1. 1

                                      This reminded me of MM_BAD_POINTER. I wonder whether its marginal utility is practically zero, or even negative due to the .bss considerations, now that NTVDM is rather long in the tooth. That’s the main reason I’m aware of for having the null page be dereferencable (the VM86 IVT was situated there). Even then, who knows, with all the speculative execution mitigations, I’ve lost track of whether a 32-bit Windows driver running code on behalf of an NTVDM process would actually be null-dereference-exploitable.

                      1. 1

                        I totally agree, but how then should we track requirements? How have people seen that work successfully? I have my ideas in theory, but all of my professional life has been in a world where user stories are the status quo.

                        1. 4

                          “track requirements” does not make a lot of sense to me, but I will take a stab at, “How to set requirements without relying on user stories”:

                          1. Pick an achievable thing
                          2. Summarize the thing by a) starting your summary with a verb and b) not including any hints to implementation (“how” to do the thing)
                          3. Write statements describing what the environment will look like with the achievable thing, achieved: again without hinting at implementation or the “how” of achieving - using words like “should, should not, may, may not, must, must not”

                          Your statements in step 3 are requirements. Your story does not exist (or can be whatever marketing/your manager/your spin doctor wants it to be), and how you go about satisfying those requirements is fair game to anyone that wants to do the work.

                          1. 1

                            Sphinx Needs is the fancy new thing at my work. However, I’m in a regulated field so maybe something more lightweight is good enough for you already.

                            More important is to write them well (e.g. clear, testable, independent). I don’t know any tools for that though.

                          1. 9

                            I don’t see my provider mentioned yet: mailbox.org

                            My priorities are:

                            • Privacy focused
                            • Small enough to have a change to actually talk to a human in case I need support and that I am important as a customer, but big enough that I can trust them to be stable.
                            • Not free, because I don’t trust that, but also not too expensive because… well.
                            • Multiple custom domains should not increase the price by a lot

                            My previous provider wast Fastmail. The reason I switched is somewhat irrational. mailbox.org is German and that is closer too my home. Also I trust German laws and enforcement around privacy a bit more (without having solid evidence). And the most irrational one: for a long time I had the tradition to watch as many videos from the CCC congress after Christmas and New Year, and that left me with a soft spot for German hackers.

                            Comparing the two, I’d say that the user experience of Fastmail is definitely better, but I have no complaints at all about mailbox.org. It is rock solid. Also you get more features with your subscription, like a hosted NextCloud, video conferencing, etc.

                            I have one Belarusian domain (.by tld) for a fun domain name, but I don’t actually use it. Setting up DNS for SPF, DKIM was not harder than any of my other domains and test emails pass through without any hickups.

                            1. 2

                              Me too. I’m on their 1€/month plan because I only need the basic features.

                              I guess the biggest reason to switch away from gmail was the fear that Google blocks my account for unknown reasons. So many accounts are connected to my email that I want to be a customer, not the product.

                            1. 20

                              “Is it possible for a peer-reviewed paper to have errors?”

                              My sweet, sweet summer child. You are vastly overestimating the competence of this species.

                              1. 12

                                The standard pre-publication peer review process is only intended to reject work that is not notable or is fairly obviously bullshit. It’s not intended to catch all errors.

                                Post-publication review (in the form of other scholars writing review papers, discussing the work, trying to replicate it, challenging it, etc) is where the academic process slowly sifts truth from fiction.

                                1. 1

                                  Expanding: reviewers certainly can and do catch minor errors — but it’s not their primary job, and they generally don’t have the time to be very thorough about it.

                                2. 6

                                  I am definitely a summer child. Academically I wasn’t fortunate enough. I have worked through a few papers, but this was the first time I encountered a publication error.

                                  When I shared this story with a few, they were surprised that I didn’t know paper could have errors. I simply didn’t know because no one had told me till now!

                                  1. 4

                                    We’ve all been sweet summer children at different times in different walks of life.

                                    Before I’ve had any insight into the details through friends and colleagues, I, too, had illusions of academic publishing being this extremely rigorous process, triple- and quadruple-checked and reproduced before publishing. Discovering the fallibility of authors, peer reviewers and publishers was both mildly heartbreaking and extremely relieving. :)

                                    It really isn’t something they teach you in school — or at least they haven’t in any of the ones I’ve attended.

                                    1. 4

                                      Pre-publication peer review is a process where papers are filtered via a set of biases. Papers are as likely to be rejected for not being the right style for a venue as for technical content. At this stage, o one tries to reproduce results and often will not check maths (my favourite example here is the Marching Cubes algorithm, which was both patented and published at the top graphics conference, and is fairly obviously wrong if you spend half an hour working through it. The fix is simple but the reviewers didn’t notice it).

                                      After publication, 90% of papers are then ignored. The remaining 10% will have people read them and use them as inspiration for something. An even smaller fraction will have people try to reproduce the result ps to try to build on them. Often, they discover that there was a bug. When we tried to use the lowFAT Pointers work, for example, we discovered that the compression scheme was described in prose, maths, and a circuit in the original paper and these did not quite agree (I think two of them were correct). For a tiny subset of papers, a lot of things will be built on the result and you can have confidence in them.

                                      The key is to think of academic papers as pretentious blogs. They might have some good ideas but until multiple other people have reproduced them that’s the most that you can guarantee.

                                      It was sobering for me to attend ISCA back in 2015, when there was a panel on the role of simulation in computer architecture papers. I expected the panel to say ‘don’t rely on simulators, build a prototype’. The panel actually went the other way and said some abstract model was probably fine. This was supported by the industry representative (I think he was from NVIDIA), who said that they look at papers for ideas, they skip the results section entirely because they don’t trust them: the error margins are often 50% for a 20% speed up, so unless they’ve internally reproduced the results on their own simulation infrastructure they assume the results are probably nonsense.

                                      1. 2

                                        the error margins are often 50% for a 20% speed up, so unless they’ve internally reproduced the results on their own simulation infrastructure they assume the results are probably nonsense.

                                        To be fair to simulators, I suspect industry has to ignore the results sections even on papers which do have implementations. So experimenting on a simulator is reasonable because it’s cheaper.

                                        Say I have an idea for making arithmetic faster. I implement an ALU with my idea (the experiment) and an ALU without it (the control) and compare performance. If my control has a mistake in it which tanks its performance, the experiment will look great by comparison.

                                        I have seen people ranting about precisely this problem with literature on data structures and algorithms here on lobsters. That’s largely solved by making sure you benchmark against an industrially relevant open source competitor, but those largely aren’t available in the hardware space?

                                    2. 2

                                      Indeed. This was just a typo, too. Wait until the author gets deep into the literature and starts finding stuff that is just outright wrong. By the end of grad school, I assumed that any biology paper in Nature, Science, or Cell was probably flawed to the point of being unusable until proven otherwise.

                                      1. 5

                                        At the end of high school, you believe you know everything.

                                        At the end of college, you believe you know nothing.

                                        At the end of a PhD, you believe nobody knows anything.

                                        (No clue, where I got it from)

                                        1. 2

                                          After five years in industry, you start assuming active malice…

                                          (edit: Software developer, for context. I’m not talking about papers so much as the garbage that makes up the modern ecosystem.)

                                          1. 3

                                            We each choose whether to be that kind of person.

                                            1. 2

                                              Or, just maybe, the system of peer review that was instituted when there were less researchers has been overwhelmed by the massive increase in researchers who also have a direct economic interest in publishing.

                                              Anyway, peer review is just a basic step to tell you a paper isn’t entirely worthless. It states conclusions and presents evidence that’s not totally unbelievable. The real science starts when (like in the linked post) someone tries to reproduce the findings.

                                          2. 2

                                            Or if not wrong then incomplete, vague, or low-key deceptive. Frankly, the format rather sucks and we should do better. The failures are often a lot more interesting and useful, but the successes are what get reported. So anyone who wants to actually reproduce the work has to re-tread all the failures from scratch, again.

                                            1. 6

                                              The researchers that I respect often have fascinating blogs. I am aware, for example, that @ltratt publishes some papers to keep the REF and his funding bodies happy. Occasionally I might even read one. I’ll read everything that he writes on his blog though. If the REF actually measured impact, they’d weigh the blog far more heavily than a lot of publications.

                                        1. 4

                                          It is interesting that people always consider Logo for teaching programming. In Mindstorms, Papert clearly writes the intention was to teach geometry. He also describes a different „Newton-turtle“ to teach physics.

                                          1. 1

                                            Is it still considered today? I believe Python has taken over that role. Maybe it depends on the age of the student.

                                            I do believe that producing pictures is inherently more fun than “sort this list of words”, even if the latter is what everyone who programs ends up doing eventually.

                                            1. 4

                                              As an anecdote, I have been taught a version of Logo in 5-6th grade. This was 10 years ago, but I believe it is still taught in schools here.

                                          1. 1

                                            Slightly off topic, but does anyone have a suggestion for a build system that doesn’t suck? For our CHERI microcontroller project we have a CMake build system, but we need to be able to build compartments and libraries, and then combine them into firmware images that also define a set of threads. Most of these concepts are new to CMake and it’s extension mechanism is pretty much nonexistent. I want to isolate users from all of the details of the build, so they just specify the set of source files for a library (and maybe some extra flags if they need them).

                                            I tried rewriting it in xmake. This was a bit better. We could pass thread descriptions through as Lua objects with a decent set of properties and add rules for building compartments and libraries. There were a few annoyances though:

                                            • 90% of what I wanted to do involved undocumented interfaces (the docs are really lacking for xmake)
                                            • xmake really likes building in the source directory, which is unacceptable in a secure build environment (source will be mounted read only, the only writeable FS is a separate build FS) and you have to fight it every step if you don’t want to do this.
                                            • I hit a lot of bugs. A clean rebuild always tries to link the firmware image before it has linked some of the compartments that it depends on and I have no idea why. Specifying the build directory usually works but then xmake sometimes forgets and starts scattering things in the source tree again.
                                            • Overall, it feels like a 0.2 release and I’m not sure I’d want to handle problems users will have with it.

                                            That sounds really negative, but I liked a lot about xmake and I’d probably be very happy with the project in a couple of years, it just isn’t there yet. For example, the build process in xmake is a map and fold sequence for every target (apply some transform to every input independently, then apply a transform to all of the sources). There is no doc with high level concepts explaining this, you need to figure it out.

                                            1. 2

                                              The concept of build systems is sucky. Have you tried Nix?

                                              1. 2

                                                Given that we want to build firmware images on Windows, Linux, FreeBSD and Mac hosts, it’s not clear how Nix would help.

                                              2. 1

                                                Have you looked at SCons. It’s written in Python, and so are the build scripts, so it’s quite extensible. It has also been around a while (well over 10 years).

                                                1. 1

                                                  I know you’ve tried build2 and ran into some rough edges but if you want to give it another try, I would be happy to sketch somethings out for you (our documentation story, especially when it comes to the lower-level parts, is similarly lacking). Based on your description, I would first start with the higher-level ad hoc recipes/rules (these are like improved make recipes/pattern rules) and see how far I can get with that before considering re-implementing things as a build system module. Here is some introductory material for that if you want to take a look:

                                                  https://build2.org/release/0.13.0.xhtml#adhoc-recipe

                                                  https://build2.org/release/0.14.0.xhtml#adhoc-rules

                                                  https://build2.org/release/0.15.0.xhtml#dyndep

                                                  1. 1

                                                    I was put off Build2 for this for two reasons:

                                                    First, we have a lot of custom things. We need to construct a linker script based on the set of things in the final image, we need to modify the build of our loader and scheduler based on the number of threads added, and we need a structured mechanism for passing thread metadata from a consumer to the build rule for our firmware. xmake’s use of Lua is really nice for all of these:

                                                    • We can use a rich string library to construct the things that go in the linker script.
                                                    • We can pass Lua objects from the consumer to our rule (the xmake UI for doing this could be improved).
                                                    • We can inspect one target from a dependent one and modify it.

                                                    I believe Build2 would require us to write C++ code for all of these. Lua is a much nicer choice.

                                                    Second, and this might be my misunderstanding, Build2 feels a lot more bottom-up in its approach. Part of this means that it’s hard for me to see how you do context-dependent builds. For example, when we compile a C or C++ source file, it needs to know the name of the library or compartment that it will end up in and so we don’t want a generic .c / .cc rule, we want to build compile rules for each target. This is exactly the structure that xmake exposes as a compositional abstraction: there are compile rules, but they expose extension points per target. We need to change how a file is compiled depending on the kind of target that it’s ending up with and I couldn’t find anything in Build2 that looked vaguely this shape.

                                                    My ideal system would give users the ability to write something like:

                                                    # Somehow we need to tell it to pick up our build infrastructure.
                                                    include("cherimcu")
                                                    
                                                    # Okay, now we want a library built from a couple of source files.
                                                    library(helpers, [ "foo.c", "bar.cc" ])
                                                    # And a couple of isolated compartments.  Real projects will have a load more of these.
                                                    compartment(example, [ "example.cc" ])
                                                    compartment(example2, [ "example2.cc" ])
                                                    
                                                    # And now we want to assemble it into a firmware image, with a load of other metadata.
                                                    firmware(myDeviceFirmware,
                                                             [ helpers, example, example2 ],
                                                             "threads" = [
                                                               {
                                                                 stackSize : 0x400,
                                                                 priority : 1,
                                                                 entryPoint : "entry",
                                                                 compartment : "example"
                                                               },
                                                               {
                                                                 stackSize : 0x200,
                                                                 priority : 32,
                                                                 entryPoint : "entry_point",
                                                                 compartment : "example2"
                                                               }
                                                             ])
                                                    

                                                    With xmake, I’m pretty close to this. I’m not sure that I could get Build2 to this state without a lot of custom C++ (and, even though C++ is normally my go-to language, I’d refer a scripting language like Lua for this. I’d prefer a strongly typed scripting language where files and paths were first-class types even more).

                                                    1. 1

                                                      I believe Build2 would require us to write C++ code for all of these. Lua is a much nicer choice.

                                                      Not necessarily. While build2’s language is probably not as powerful as Lua (yet), we do provide quite a bit of string/path/regex functions. It also has types (bool, [u]int64, string, [dir_]path) and lists of those. Things like arithmetic is a bit ugly at the moment, but doable. There is also a lot of things inside that are not yet exposed to the buildfile language but could be should there be a need. For example, internally we have sets, key-value maps, etc (but all those things are available to build system modules written in C++, so you can define a variable of, say, type std::map<std::string, std::string> and users will be able to access/modify it from the buildfile).

                                                      Second, and this might be my misunderstanding, Build2 feels a lot more bottom-up in its approach. Part of this means that it’s hard for me to see how you do context-dependent builds. For example, when we compile a C or C++ source file, it needs to know the name of the library or compartment that it will end up in and so we don’t want a generic .c / .cc rule, we want to build compile rules for each target. This is exactly the structure that xmake exposes as a compositional abstraction: there are compile rules, but they expose extension points per target. We need to change how a file is compiled depending on the kind of target that it’s ending up with and I couldn’t find anything in Build2 that looked vaguely this shape.

                                                      This would be pretty easy to do manually (just mark each source file with a target-specific variable that contains the compartment) but doing this sort of back-propagation automatically would require implementing your rules in C++ (where this is definitely doable).

                                                      Another idea that we are considering is a macro facility. This would likely allow you to approximate your desired language of “magic incantations” pretty closely. But I am still on the fence about this.

                                                      I’m not sure that I could get Build2 to this state without a lot of custom C++

                                                      I think it would be interesting to try to prototype something without C++ (that is, using ad hoc pattern rules) and see how close we can get. I would be happy to sketch something out, just need to get a better understanding of the build steps/outputs.

                                                      But I think in the long run, what you are trying to do (i.e., polished build system support for an unusual development environment) is a great fit for writing your rules in C++ and packaging them as the build system module. There will be few things that you won’t be able to handle optimally.

                                                      1. 1

                                                        Thanks for the details. Some of your understanding about build2 is correct but there is also quite a bit that is off (likely due to the lack of documentation). I am going to respond and clarify those points but before (or while) I do that, could you sketch the command lines that would be used to build the example you have shown if I were to do it manually from the shell? I want to see if I can sketch a prototype based on that.

                                                        1. 1

                                                          Compiling a C/C++ file is more or less the same as a normal build (in xmake and cmake, we use their existing infrastructure), but we need to add a flag that tells it which compartment it’s in. In the xmake, we default this to the name of the target that it’s being compiled for. In CMake we have an add_compartment function that takes the name of the compartment as an argument and will then create a real target and set the properties on it.

                                                          We then we do an initial linking step for compartment step. This uses a custom linker script and produces a .compartment / .library file, linked with an extra linker flag telling it that it’s linking a compartment (and so doesn’t do a complete link).

                                                          The final firmware link step combines a load of compartments / libraries and a couple of .o files that we’ve built separately. There are a few things that make this fun:

                                                          • When we define a firmware target, we specify the threads. These are used to create some -D arguments to the compiler for a couple of other targets. In CMake, we add those targets in the add_firmware function, in xmake we modify the targets in the after_load step of the rule that the firmware uses.
                                                          • The final firmware link uses a linker script that is generated from the set of other targets, so needs to do some substitutions on a text file with a load of string processing.

                                                          Aside from that, it’s a fairly normal linker invocation. The main thing is that we need to communicate the threads to this step.

                                                          1. 1

                                                            Thanks, I think I am starting to get the picture. A few clarifying questions:

                                                            When we define a firmware target, we specify the threads. These are used to create some -D arguments to the compiler for a couple of other targets.

                                                            I assume those couple of targets are from the corresponding compartment?

                                                            Generally, it looks like there is a 1:1 relationship between threads and compartments. If that’s correct, would it be more natural to specify the thread information on the compartment, especially seeing that you need it when compiling compartment’s source code. Something along these lines:

                                                            compartment(example,
                                                                        [ "example.cc" ],
                                                                        "thread" = {
                                                                            stackSize : 0x400,
                                                                            priority : 1,
                                                                            entryPoint : "entry",
                                                                        })
                                                            
                                                            compartment(example2,
                                                                        [ "example2.cc" ],
                                                                        "thread" = {
                                                                            stackSize : 0x200,
                                                                            priority : 32,
                                                                            entryPoint : "entry_point",
                                                                        })
                                                            
                                                            firmware(myDeviceFirmware, [ helpers, example, example2 ])
                                                            

                                                            Or am I missing some details here?

                                                            1. 1

                                                              No, threads are orthogonal to compartments. A compartment defines code and globals, a thread is a scheduled entity that owns a thread and can invoke compartments. Each thread starts executing in one compartment but can invoke others. Two threads can start in the same compartment.

                                                              A firmware image is built out of a set of compartments and libraries (libraries do not own state and so provide code that can be simultaneously in multiple security contexts), a few core components, and a set of threads.

                                                              We specialise the loader and scheduler with the definitions of the threads. We don’t allow dynamic thread creation (we’re targeting systems with 64-512 KiB of RAM) and so we want to pre-allocate all of the data structures for the thread state.

                                                              1. 1

                                                                Ok, this makes sense but then I am confused by your earlier statement:

                                                                When we define a firmware target, we specify the threads. These are used to create some -D arguments to the compiler for a couple of other targets.

                                                                If there are only compartments and libraries and both are independent of threads, which targets does this statement refer to?

                                                                1. 1

                                                                  The scheduler and loader. In the CMake and xmake versions each of these are separate targets. The scheduler is built like a compartment (it’s basically untrusted). The loader is just built as a pair of .o files (assembly stub that calls into C++ to do most of the work).

                                                                  To clarify: the compartments provided by the user are independent of the threads, the scheduler is not (threads cannot start in the scheduler, so there’s no circular dependency here). Users don’t provide the targets for the scheduler and loader, they are created by the build system for the firmware. They don’t need to be separate targets if there’s a way of doing it better in build2.

                                                                2. 1

                                                                  Ok, here is my initial take: https://github.com/build2/cherimcu

                                                                  It only uses Buildscript rules (no C++) but doesn’t yet cover threads (still waiting on some clarifications in the sibling reply). Let me know if this looks potentially interesting in which case I will develop it a bit further.

                                                                  1. 1

                                                                    Thanks. I think it’s a bit harder to use than the xmake currently. This is my writeup of using xmake for the same project. The snipped listed there is the user code to create our simple tutorial example that has two compartments that communicate, allocate and free some memory, and talk to a UART.

                                                                    Your example has this:

                                                                    compart{example}: objc{example}: cxx{example.cc}
                                                                    compart{example}: objc{details}: cxx{details.cc}
                                                                    objc{example details}: compartment = example
                                                                    

                                                                    I think the last line is telling setting the compartment name (it took me a while to realise that objc meant ‘object file for a compartment’ and not ‘Objective-C’: see my previous comment about the terse syntax of build2). I guess the lines above are a chain of rules, you compile a C++ file, you get an object file(? or is this doing something special) and link them into a compartment. If I forget the last line, then there’s a dynamic check in the rule.

                                                                    In contrast, the xmake version is correct by construction:

                                                                    compartment("example")
                                                                        add_files("example.cc", "details.cc")
                                                                    

                                                                    You can’t do anything with the files unless you add them to a compartment (or a library).

                                                                    The xmake interface for specifying the threads is less nice (and I think build2 might be nicer here). These two arrays of objects are used to set three pre-defined macros:

                                                                    • The number of threads
                                                                    • A C++ array literal with the sizes of the stacks and trusted stacks.
                                                                    • A C++ array literal with the names of the compartment entry points (mangled, but with an encoding which is basically {prefix}{identifier length}{identifier length}{suffix}.

                                                                    I actually like the xmake version of this but I have a lot more confidence in build2 than xmake as a project that will be maintained long-term for users to depend on.

                                                                    1. 1

                                                                      I think the last line is telling setting the compartment name. I guess the lines above are a chain of rules, you compile a C++ file, you get an object file(? or is this doing something special) and link them into a compartment. If I forget the last line, then there’s a dynamic check in the rule.

                                                                      Yes, that’s all correct. At the core this is (improved) make with targets, prerequisites, patter rules, etc.

                                                                      it took me a while to realise that objc meant ‘object file for a compartment’ and not ‘Objective-C’: see my previous comment about the terse syntax of build2

                                                                      Yeah, good point. Those are actually all custom names (in build/cherimcu.build) and we can rename them to something like compartment_object_file if you prefer ;-) (but also see the next point).

                                                                      In contrast, the xmake version is correct by construction

                                                                      I don’t know if you’ve noticed at the end of that example I’ve shown what it could look like if we re-implemented the compartment/firmware linking rules in C++ which would allow us to synthesize intermediate dependencies (so we don’t need to manually spell out obj*{} stuff) and back-propagate some variables (so we don’t need to manually set the compartment name):

                                                                      compart{example}: cxx{example.cc details.cc}
                                                                      

                                                                      The xmake interface for specifying the threads is less nice (and I think build2 might be nicer here)

                                                                      My current idea is to represent threads as non-file-based targets (similar to PHONY targets in make) which will allow us to “connect” them (via the dependency relationships) to multiple things, namely the firmware and the scheduler/loader:

                                                                      firmware{mydevice}: compart{example example2} library{helpers} thread{example example2 uart}
                                                                      
                                                                      thread{example}:
                                                                      {
                                                                        compartment = "example",
                                                                        priority = 1,
                                                                        entry_point = "entry_point",
                                                                        stack_size = 0x400,
                                                                        trusted_stack_frames = 2
                                                                      }
                                                                      
                                                                      thread{example2}:
                                                                      {
                                                                        compartment = "example2",
                                                                        priority = 2,
                                                                        entry_point = "entry_point",
                                                                        stack_size = 0x400,
                                                                        trusted_stack_frames = 2
                                                                      }
                                                                      
                                                                      thread{uart}:
                                                                      {
                                                                        compartment = "uart",
                                                                        priority = 31,
                                                                        entry_point = "entry_point",
                                                                        stack_size = 0x400,
                                                                        trusted_stack_frames = 2
                                                                      }
                                                                      
                                                                      1. 1

                                                                        I don’t know if you’ve noticed at the end of that example I’ve shown what it could look like if we re-implemented the compartment/firmware linking rules in C++ which would allow us to synthesize intermediate dependencies (so we don’t need to manually spell out obj*{} stuff) and back-propagate some variables (so we don’t need to manually set the compartment name):

                                                                        compart{example}: cxx{example.cc details.cc}

                                                                        That looks like the kind of thing that I’d like to end up with. Is the ability to dynamically create targets the only thing missing to be able to do this in the script? xmake also has this limitation, which I work around by having a description-scope function called firmware that actually expands to the definition of three targets (the scheduler, the loader, and the firmware that depends on the first two). This is a bit of a hack but it does give the UI that I want.

                                                                        My current idea is to represent threads as non-file-based targets (similar to PHONY targets in make) which will allow us to “connect” them (via the dependency relationships) to multiple things, namely the firmware and the scheduler/loader:

                                                                        Yes, that’s the sort of shape that I’d like. I don’t really like the repetition. Why do I need to tell the firmware build rule which targets are compartments, which are threads, and which are libraries? Don’t the targets know this already? Or is this because the rules define separate namespaces (which, I guess, is how I can have a compartment and a thread both called example)?

                                                                        1. 1

                                                                          Is the ability to dynamically create targets the only thing missing to be able to do this in the script?

                                                                          Yes, that plus the ability to back-propagate values. Generally, with a C++ rule you can access the underlying build model directly and there is very little limitation about what kind of “synthesis” you can do (as long as you keep things race-free). With script rules we just expose the most generally applicable functionality.

                                                                          If you think this looks promising, I can add the C++ rule prototype (we can start directly in cherimcu.build and see if we want to factor this to a build system module later).

                                                                          Why do I need to tell the firmware build rule which targets are compartments, which are threads, and which are libraries? Don’t the targets know this already? Or is this because the rules define separate namespaces […]

                                                                          Yes, in build2 the target identity is the directory, type, and name, where type is the abstraction of file extension. So path /tmp/foo.1 in build2 becomes target /tmp/man1{foo} (here man1 is the target type). Besides helping with the “what is the extension of an executable” type of problems, this also allows us to have non-file-based targets without having to resort to hacks like the .PHONY marker in make.

                                                                          1. 1

                                                                            If you think this looks promising, I can add the C++ rule prototype (we can start directly in cherimcu.build and see if we want to factor this to a build system module later).

                                                                            That would be very interesting.

                                                                            The things that I’m currently looking at adding to the xmake is separating out SoC descriptions. These need to be able to specify compile flags that are propagated to all of the targets in a dependency chain and some things describing the memory map to go into the final linker script. I could make these JSON, but the lack of hex encoding for integers is a bit annoying there since they’re full of memory addresses. I could make them Lua literals for xmake - is there some nice alternative for build2?

                                                                            1. 1

                                                                              That would be very interesting.

                                                                              Ok, I will try to find some time in the next couple of days.

                                                                              I could make these JSON, but the lack of hex encoding for integers is a bit annoying there since they’re full of memory addresses. I could make them Lua literals for xmake - is there some nice alternative for build2?

                                                                              We have int64 and uint64 types and while currently there is no way to specify their values in hex, that would be trivial to add.

                                                                              But I am wondering why you need them to be recognized as integers at all if they are just being passed along. Unless you are doing some arithmetic on them?

                                                                              1. 1

                                                                                But I am wondering why you need them to be recognized as integers at all if they are just being passed along. Unless you are doing some arithmetic on them?

                                                                                I’d like, at the very least, to do some sanity checking for them, I’d also like to be able to express MMIO regions as either (base, top) or (base, length) and translate between them.

                                                                                Note that they don’t actually need to be 64-bit values for us, we have a 32-bit address space (which I really want to reduce to 28 bits).

                                                                                1. 1

                                                                                  Ok, I’ve done another pass over the prototype and it now includes thread information and loader/scheduler generation: https://github.com/build2/cherimcu/

                                                                                  Note that I haven’t gone with the C++ implementation yet. Instead, I’ve decided to see how far I can take it with script rules by adding some missing features to build2 (like the hex notation for integers). So if you want to try it, then you will need to use the staged version of build2 until 0.16.0 is out in a couple of months: https://build2.org/community.xhtml#stage

                                                                                  You’ve mentioned that you would prefer for loader/scheduler not to be separate targets. I’ve done it this way (they are generated on the fly during the firmware linking) but I think ideally you would want to make them separate synthesized targets so that you avoid unnecessarily recompiling them if the relevant information hasn’t changed. But that, again, would only be possible with the C++ implementation.

                                                                                  Another thing I would like to point out is that this implementation does fairly accurate change tracking (which is one of the main design goals of build2). For example, if you change the thread stack size, then the relevant (and only the relevant) parts will be updated.

                                                                                  Let me know what you think. I would still like to add the C++ version, time permitting.

                                                                                  1. 1

                                                                                    Thanks, that looks very nice. The xmake version is now at a state that I’m happy with. The maintainer just added support for dynamically creating targets for us, which lets us clone a default loader target and add defines from the board description file. Since my last message, we’ve added board descriptions, which are JSON files containing the memory map for the target (used to generate the linker script) and macro definitions that must be set from the board. Xmake has built-in support for parsing JSON, so the user can just specify either a path to their own board definition or the name of one that we ship and we can pull all of these things out. Dynamic target creation means that we can have a single build file that builds the same firmware for two boards, without conflicts. I think that isn’t necessary in your approach because we don’t share the loader target between different firmware images in your version.

                                                                                    For thread descriptions, we need to product a pair of predefined macros containing initialiser lists that each contain a subset of the information (with some permutations: we transform the compartment name and thread entry point into the mangled name of our export table symbol). I think these would probably be easier in C++.

                                                                                    I hope that we will open source in the next few weeks. I have a bit more confidence in Build2 as a long-term solution and I bet I could extend the xmake build system to produce the build2 files for most cases (everything that you need is available for introspection in xmake), so there’s a nice migration path.

                                                                                    Recompiling the loader doesn’t bother me too much. It is one of our slowest files to build (it uses C++ templates to do a lot of compile time checks for correctness) but even then it’s only about a second and a complete clean build is well under 2 seconds for a fairly complex example (one where we start to worry about running out of SRAM for code). I’m much more worried about reproducible builds than speed, and I think Build2 has a strong story there.

                                                                                    1. 1

                                                                                      Xmake has built-in support for parsing JSON […]

                                                                                      I can see what you are doing here ;-).

                                                                                      Seriously, though, while we don’t have the buildfile-level json type yet, we do have the built-in JSON parser/serializer available to C++-based implementations. So a rule written in C++ could load a JSON file.

                                                                                      Another option would be to ditch JSON and just represent this data as a buildfile target, similar to thread, provided it’s not too deeply structured. This way you get the type-checked hex notation for integers. Also, if you are using xmake as the meta build system, I am sure you could generate these from JSON.

                                                                                      For thread descriptions, we need to product a pair of predefined macros containing initialiser lists that each contain a subset of the information (with some permutations: we transform the compartment name and thread entry point into the mangled name of our export table symbol). I think these would probably be easier in C++.

                                                                                      I think you should be able to achieve this even with the script rules. Currently they just generate translation units with this information spliced in. Producing instead a list of macro definitions doesn’t feel like a major leap in complexity. Any reason you don’t just generate a header or source file instead of wrangling with escaping macro values on the command line?

                                                                                      1. 1

                                                                                        Seriously, though, while we don’t have the buildfile-level json type yet, we do have the built-in JSON parser/serializer available to C++-based implementations. So a rule written in C++ could load a JSON file.

                                                                                        A big part of the reason that we wanted JSON is that xmake and CMake can both parse it, so these files can be completely independent of the build system. If xmake turns out to be a mistake, we can teach CMake to consume these files and do the right thing. I assumed that, even if build2 didn’t have native support, then linking something like nlohmann/json into a C++ plugin would be pretty trivial (and not cost us much since we’d probably want one anyway).

                                                                                        I think you should be able to achieve this even with the script rules. Currently they just generate translation units with this information spliced in. Producing instead a list of macro definitions doesn’t feel like a major leap in complexity. Any reason you don’t just generate a header or source file instead of wrangling with escaping macro values on the command line?

                                                                                        We’d need a different header file per firmware image and so we’d have to pass a command-line parameter telling it where to find the file. We could do that, if it were significantly easier, but you don’t really need much escaping: -D"..." or "-D..." both work fine with clang (and gcc, though we only have LLVM support at the moment), and xmake handles the escaping for me anyway so all I need to do is create a string and pass it to xmake. I’m a bit surprised if this is hard with build2: correctly escaping command-line arguments seems like a pretty essential feature for a build system (and mostly doesn’t matter if the build system isn’t invoking tools via a shell, since execve takes an array of arguments, not an escaped string).

                                                                                        To give you a concrete example, this is what one of our examples generates:

                                                                                        -DCONFIG_THREADS={{1,1,1024,2},{2,2,1024,2},{3,31,1024,2},} "-DCONFIG_THREADS_ENTRYPOINTS={la_abs(__export_example__Z11entry_pointv),la_abs(__export_example2__Z11entry_pointv),la_abs(__export_uart__Z11entry_pointv),}" -DCONFIG_THREADS_NUM=3
                                                                                        

                                                                                        The first one is used by the loader setting up the threads and contains the thread numbers (monotonic integers, we don’t actually need these, since they are the index), thread priorities (used by the scheduler to set up initial thread state), stack size (used for the loader to allocate the stack), and trusted stack depth (used by the loader to allocate the trusted stack). The second of these is the macro used to emit a relocation for an absolute address (not a capability) for a global. This is generated from the compartment names (example, example2, and uart) and the entry point names (all of them are called entry_point, because naming things is hard). These are used by the loader to set up the initial program counter and global capabilities for the threads. The last one is the number of threads. The loader and the scheduler both need different subsets of this information (we should probably split it a bit better at some point, but it’s not urgent).

                                                                                        We use the first of these in a constexpr array, which lets us do fun things like add up the total space required for all stacks, trusted stacks, and register-save areas (using sizeof on some structures in C++) in a constexpr function that is then used to define a global that has this much space, in a special section so that our linker script puts it in the right place. Unfortunately, la_abs is not constexpr currently and so we split the entry points into a separate array.

                                                                                        CONFIG_THREADS_NUM is mostly used in assembly, we just have a static_assert in C++ that it matches the sizes of the other two arrays.

                                                                                        I’m not sure that putting these in a header file would save us any complexity and it would add some (we had a header file when we wrote these by hand, removing it simplified the code).

                                                                                        1. 1

                                                                                          I assumed that, even if build2 didn’t have native support, then linking something like nlohmann/json into a C++ plugin would be pretty trivial (and not cost us much since we’d probably want one anyway).

                                                                                          Theoretically, yes, though for now we recommend that build system modules don’t have any external dependencies for robustness reasons (imagine if two different build system modules require different versions of nlohmann-json and some poor user ended up using both in the same build).

                                                                                          I’m a bit surprised if this is hard with build2: correctly escaping command-line arguments seems like a pretty essential feature for a build system […].

                                                                                          Yes, escaping/quoting of arguments when calling exec or equivalent is of course handled automatically. I think I was remembering all the cases where we wanted to pass a string literal as a macro:

                                                                                          cxx.poptions += -DBUILD2_INSTALL_LIB=\"$regex.replace($install.resolve($install.lib), '\\', '\\\\')\"
                                                                                          

                                                                                          But your macros look fairly benign. For comparison, this is what my prototype generates:

                                                                                          #include <cstdint>
                                                                                          
                                                                                          struct thread
                                                                                          {
                                                                                            const char*    compartment;
                                                                                            std::uint64_t  priority;
                                                                                            const char*  (*entry_point) ();
                                                                                            std::uint64_t  stack_size;
                                                                                            std::uint64_t  trusted_stack_frames;
                                                                                          };
                                                                                          
                                                                                          extern "C" const char* entry_point ();
                                                                                          extern "C" const char* entry_point2 ();
                                                                                          
                                                                                          thread threads[2] = {
                                                                                            {"example", 1, &entry_point, 0x00000400, 2},
                                                                                            {"example2", 2, &entry_point2, 0x00004000, 3},
                                                                                          };
                                                                                          
                                                              2. 1

                                                                Can you expand why you ended up with this design? It seems to be a circular dependency: A thread depends on the code execute in it and through your “-D arguments” the code depends on the thread.

                                                                I can imagine something like “the code needs to know if it gets called every 10ms or 50ms to measure time”. In our projects (proprietary automotive), we avoid such code and rather pay for the overhead of computing durations at runtime.

                                                                1. 1

                                                                  There’s no circular dependency. Threads depend on the compartment that they start in, the scheduler depends on knowing the number of threads, the loader depends on knowing the number of threads, their entry points, and the sizes of their stacks. We could remove the loader’s dependency here and make it dynamic, but we get a bit better code density from compile-time specialisation (not very important:the loader erases itself after it runs and returns the memory to the heap allocator, so there’s little need to make the loader small), but we want to specialise the scheduler’s data structures with the number of threads and the number of priority levels used.

                                                      1. 4

                                                        My tip: Instead of cp or scp, use rsync. It is more efficient for large files. It will not modify the target if it already exists. It will update the target if it exists but has different contents. It even uses the same parameter ordering as cp (although it is wrong imho).

                                                        1. 4

                                                          On a higher level, there is two main questions in discussions:

                                                          1. What exactly do we have to decide? This is the difference between “which framework is best?” and “we are about to create a new service, should we use the same framework as we usually do?” This puts the discussion into a context. Without context you can argue past each other forever because everybody assumes a different context with lots of hidden assumptions.

                                                          2. What are the relevant aspects for the decision? You can compare web frameworks according to their parts and how mature/builtin/supported they are. These are usually not the decisive aspects though. Programming language matters though because your team usually is only really competent in one or two of them, so every other language would come with a big risk and learning effort. Sometimes you want to pick the long-term best option. Sometimes the next deadline is more important.

                                                          Without answering these two questions, precision in your discussions will be in vain. If you know the answer, then this article has good tactical advice.

                                                          1. 2

                                                            This doesn’t actually solve the hard problem, which is estimating the financial cost of a less-than-optimal implementation.

                                                            1. 3

                                                              I think the framework is there: you could calculate a probability distribution of financial cost based on the number of distinct issues and their microdefect amounts. Optimally you’d also use a probability distribution for the cost of a single defect instead of just using an average.

                                                              For an organization well-versed in risk management this might just work. But without understanding the concept of probabilistic risk I don’t believe the tradeoffs in implementation (and design) can be managed.

                                                              The article seems to focus on just the expected value of microdefects. This might be enough for some decisions, but it’s not a good way to conceptualize “technical debt”.

                                                              1. 3

                                                                One interesting implication is that if we can estimate the costs of different violations, we can estimate the cost-saving of tools that prevent them.

                                                                For example, if “if without an else” is $0.01, then a linter that prevents that or a language where conditionals are expressions rather than statements automatically saves you a dollar per 100 conditionals.

                                                                1. 2

                                                                  you could calculate a probability distribution of financial cost based on the number of distinct issues and their microdefect amounts

                                                                  My point is, we can’t do that because we don’t know what the average cost of a defect is, and we have no way of finding out.

                                                                  1. 2

                                                                    I think we do (certainly I have some internal numbers for some of these things) the thing that we don’t know is the cost distribution of defects. For example, the cost of a security vulnerability that allows arbitrary code execution is significantly higher than the cost of a bug that causes occasional and non-reproduceable crashes on 0.01% of installs. A bug that causes non-recoverable data corruption to 0.01% of users is somewhere in the middle. We also don’t have a good way of mapping the probability of any kind of bug to something in the source code at any useful granularity (we can say, for example, that the probability of a critical vulnerability in a C codebase is higher than in a modern C++ one, but that doesn’t help us target the things to fix in the C codebase and rewriting it entirely is prohibitively expensive in the common case).

                                                                    1. 1

                                                                      What sorts of things do you have numbers for, if you can share? I have heard of people estimating costs, but only for performance issues when you can map it to machine usage costs pretty easily, so I’d be interested in other examples.

                                                                    2. 1

                                                                      It’s true we can’t know the distribution or the average exactly. But if you measured the cost of each found defect after it’s fixed, you could make a reasonable statistical model after N=1000 or so. And note that we do know lower and upper bounds for the financial cost of a defect: the cost must typically be between zero and the cost of bankruptcy.

                                                                      1. 4

                                                                        if you measured the cost of each found defect after it’s fixed, you could make a reasonable statistical model after N=1000 or so

                                                                        You are also assuming the hard part. How are you measuring the cost of a defect?

                                                                        1. 1

                                                                          It depends a lot on the business you are in. For Open Source it is hopeless because you don’t know how many users you even have. My work is in automotive, where we can count the cost for customer defects quite well. Probably better than our engineering costs in general.

                                                                          1. 1

                                                                            we can count the cost for customer defects quite well

                                                                            Are these software defects or hardware defects? As a followup, if they are software defects, are they the sort of defects that would be described as “tech debt” or as outright bugs?

                                                                            1. 1

                                                                              Yes, the classification is still tricky. Assume we have a defect. We trace it down to a simple one line change in the software and fix it. Customer happy again. They get a price reduction for the hassle. That amount plus the effort invested for debugging and fixing is the cost of the defect.

                                                                              Now we need to consider what technical debt could have encouraged writing that bug: Maybe a variable involved violated the naming convention so the bug was missed during code review? Maybe the cyclomatic complexity of the function is too high? Maybe the Doxygen comment was incomplete? Maybe the line was not covered by a unit test? For all such possible causes, you can now adapt the microdefect cost slightly upwards.

                                                                              1. 1

                                                                                That’s an interesting idea. And then microdefects would work well, because you average out differences in like how much it costs a customer to be happy that don’t have much to do with the bug itself.

                                                                                Do you have a similar process for bugs that don’t affect customers, or correct but inefficient code implementations?

                                                                                1. 1

                                                                                  You are thinking of those “phew, glad we found that before anyone noticed” incidents, I assume. The cost is only the effort here.

                                                                                  We have something similar. Sometimes we find a defect which has already shipped but apparently the customer (OEM) nor the users seem to have noticed. Then there is a risk assessment, where tradeoffs are considered:

                                                                                  • How many users do we expect to notice it? Mostly depends on how many users there are and how often the symptoms occur.
                                                                                  • How severe is the impact? If is a safety risk, the fixing is mandatory.
                                                                                  • How much will it cost to fix it? Again, the more users there are the higher the cost.
                                                                                  • How visible is the fix? If you bring a modern car to the yearly inspection, chances are that quite a few bugfixes are installed to various controllers without you noticing it.

                                                                                  You can estimate anything but of course the accuracy and precision can get out of hand.

                                                                1. 24

                                                                  I’m sympathetic to the goal of making reasoning about software defects more insightful to management, but I feel that ‘technical debt’ as a concept is very problematic. Software defects don’t behave in any way like debt.

                                                                  Debt has a predictable cost. Software defects can have zero costs for decades, until a single small error or design oversight creates millions in liabilities.

                                                                  Debt can be balanced against assets. ‘Good’ software (if it exists!) doesn’t cancel out ‘Bad’ software; in fact, it often amplifies the effects of bad software. Faulty retry logic on top of a great TCP/IP stack can turn into a very damaging DoS attack.

                                                                  Additive metrics like microdefects or bugs per line of code might be useful for internal QA processes, but especially when talking to people with a financial background, I’d avoid them, and words like ‘debt’, like the plague. They need to understand software used by their organization as a collection of potential liabilities.

                                                                  1. 11

                                                                    Debt has a predictable cost. Software defects can have zero costs for decades, until a single small error or design oversight creates millions in liabilities.

                                                                    I think this you’ve nailed the key flaw with the “technical debt” metaphor here. It strongly supports this “microdefect” concept, explicitly by analogy to microCOVID, which the piece doesn’t mention is named for micromort. The analogy works really well to your point: these issues are very low cost and then sudden, potentially catastrophic failure. Maybe “microcrash” or “microoutage” would be a clearer term; I’ve seen “defect” used for pretty harmless issues like UI typos.

                                                                    The piece is a bit confusing by relying on the phrase ‘technical debt’ while trying to supplant it, it’d be stronger if it only used it once or twice to argue its limitations.

                                                                    We’ve seen papers on large-scale analyses of bugfixes on GitHub. Feels like that route of large-scale analysis could provide some empirical justification for assessing values of different microdefects.

                                                                    1. 1

                                                                      I’m very surprised by the microcovid.org website not mentioning their inspiration from the micromort.

                                                                      1. 1

                                                                        It’s quite possible they invented the term “microCOVID” independently. “micro-” is a well-known prefix in science.

                                                                      2. 1

                                                                        One thing I think focusing on defects fails to capture is the way “tech debt” can slow down development,even if it’s not actually resulting in more defects. If a developer wastes a few days flailing because the didn’t understand something crucial about a system e.g. because it was undocumented, then that’s a cost even if it doesn’t result in them shipping bugs.

                                                                        Tangentially relatedly, the defect model also implicitly assumes a particular behavior of the system is either a bug or not a bug. Often things are either subjective or at least a question of degree; performance problems often fall into this category, as do UX issues. But I think things which cause maintenance problems (lack of docs, code that is structured in a way that is hard to reason about, etc) often work similarly, even if they don’t directly manifest in the runtime behavior of the system.

                                                                        1. 1

                                                                          Microcovids and micromorts at least work out in the aggregate; the catastrophic failure happens to the individual, i.e. there’s no joy in knowing the chance of death is one in a million if you happen to be that fatality.

                                                                          Knowing the number of code defects might give us a handle on the likelihood of one having an impact, but not on the size of its impact.

                                                                        2. 3

                                                                          Actually, upon re-reading, it seems the author defines technical debt purely in terms of code beautification. In that case the additive logic probably holds up well enough. But since beautiful code isn’t a customer-visible ‘defect’, I don’t understand how monetary value could be attached to it.

                                                                          1. 3

                                                                            I usually see “tech debt” used to describe following the “no design” line on https://www.sandimetz.com/s/012-designStaminaGraph.gif past the crossing point. The idea is that the longer you keep on this part of the curve, the harder it becomes to create or implement any design, and the ability to maintain the code slows.

                                                                            1. 1

                                                                              I think this is the key:

                                                                              For example, your code might violate naming conventions. This makes the code slightly harder to read and understand which increases the risk to introduce bugs or miss them during a code review.

                                                                              Tech debt so often leads to defects, they become interchangeable.

                                                                              1. 1

                                                                                To me, this sounds like a case of the streetlight effect. Violated naming conventions are a lot easier to find than actual defects, so we pretend fixing one helps with the other.

                                                                            2. 3

                                                                              I think it’s even simpler than that: All software is a liability. The more you have of it and the more critical it is to your business, the bigger the liability. As you say, it might be many years before a catastrophic error occurs that causes actual monetary damage, but a sensible management should have amortized that cost over all the preceding years.

                                                                              1. 1

                                                                                I think it was Dijkstra who said something like “If you want to count lines of code, at least put them on the right side of the balance sheet.”

                                                                              2. 2

                                                                                Debt has a predictable cost

                                                                                Only within certain bounds. Interest rates fluctuate and the interest rate that you can actually get on any given loan depends on the amount of debt that you’re already carrying. That feels like quite a good analogy for technical debt:

                                                                                • It has a certain cost now.
                                                                                • That cost may unexpectedly jump to a significantly higher cost as a result of factors outside your control.
                                                                                • The more of it you have, the more expensive the next bit is.
                                                                                1. 1

                                                                                  especially when talking to people with a financial background, I’d avoid them, and words like ‘debt’, like the plague

                                                                                  Interesting because Ward Cunningham invented the term when he worked as a consultant for people with a financial background to explain why code needs to be cleaned up. He explicitly chose a term they knew.

                                                                                  1. 1

                                                                                    And he didn’t choose very wisely. Or maybe it worked at the time if it got people to listen to him.

                                                                                1. 2

                                                                                  I’m among those people who repeatedly claim that atomic commits are the one advantage of monorepos. This article tells me it isn’t a strong reason because this ability is rarely or never used. Maybe, I’m not a big fan of monorepos anyways.

                                                                                  I agree with the author that incremental changes should be preferred for risk mitigation. However, what about changes which are not backwards-compatible? If you only change the API provider, then all users are broken. You cannot do this incrementally.

                                                                                  Of course, changes should be backwards compatible. Do Google, Facebook, and Microsoft achieve this? Always backwards compatible?

                                                                                  1. 6

                                                                                    You’d rewrite that single non-backwards compatible change as a series of backwards compatible ones, followed by a final non-backwards compatible change once nobody is depending on the original behavior any more. I’d expect it to be possible to structure pretty much any change in that manner. Do you have a specific counter-example in mind?

                                                                                    1. 5

                                                                                      We used to have an internal rendering tool in a separate repo from the app (rendering tests were slow).

                                                                                      The rendering tool ships with the app! There’s no version drift or anything.

                                                                                      When it was a separate repo you’d have one PR with the changes to the renderer, another to the app, you had to cross-reference both (lot easier to check changes when you also see usage changes by consumers), then merge on one side, then update the version on the other side, and only then do you end up with a nice end-to-end change

                                                                                      It’s important to know how to make basically any change backwards compatible, but the costs of doing that compared to the easy change is extremely high and error prone IMO. Especially when you have access to all the potential consumers

                                                                                      1. 4

                                                                                        That approach definitely works, but it doesn’t come for free. On top of the cost of having to roll out all the intermediate changes in sequence and keep track of when it’s safe to move on, one cost that I see people overlook pretty often is that the temporary backward compatibility code you write to make the gradual transition happen can have bugs that aren’t present in either the starting or ending versions of the code. Worse, people are often disinclined to spend tons of effort writing thorough automated tests for code that’s designed to be thrown away almost immediately.

                                                                                        1. 3

                                                                                          You don’t have to, at least if you use submodules. You can commit a breaking change to a library, push it, run CI on it (have it build on all supported platforms and run its test suite, and so on). Then you push a commit to each of the projects that consumes the library that atomically updates the submodule and updates all callers. This also reduces the CI load because you can test the library changes and then the library-consumer changes independently, rather than requiring CI to completely pass all tests at once.

                                                                                          1. 3

                                                                                            I’m working in an embedded field where microcontrollers imply tight resource constraints. That often limits how many abstractions you can introduce for backwards-compatibility.

                                                                                            A simple change could be a type which has “miles” and then “kilometers”. If you extend the type (backwards compatible) it becomes larger. Multiplied by many uses all across the system that can easily blow up to a few kilobytes and cross some limits.

                                                                                            Another example: A type change meant that an adapter had to be introduced between two components where one used the old and the other the new type. Copying a kilobyte of data can already cross a runtime limit.

                                                                                            I do admit that microcontrollers are kinda special here and in other domains the cost of abstractions for backwards-compatibility is usually negligible.

                                                                                        1. 1

                                                                                          in C++ the keyword const does not completely refer to immutability. For instance, you can use the keyword const in a function prototype to indicate you won’t modify it, but you can pass a mutable object as this parameter.

                                                                                          I don’t know C++ enough, but doesn’t const makes object itself immutable, not only variable holding it? Unlike most languages, i.e. javascript, where const only makes variable constant, not its value. I.e. you can’t call non-const methods on this object, you can’t modify its fields. At least if it’s not pointer to object, seems that for pointers it’s complicated. I thought this works almost the same way as in Rust, where you can’t modify non-mut references.

                                                                                          1. 7

                                                                                            I don’t know C++ enough, but doesn’t const makes object itself immutable, not only variable holding it?

                                                                                            It’s C++ so the answer to any question is ‘it’s more complicated than that’. The short answer is that const reference in C++ cannot be used to modify the object, except when it can.

                                                                                            The fact that the this parameter in C++ is implicit makes this a bit difficult to understand. Consider this in C++:

                                                                                            struct Foo
                                                                                            {
                                                                                               void doAThing();
                                                                                            };
                                                                                            

                                                                                            This is really a way of writing something like:

                                                                                            void doAThing(Foo *this);
                                                                                            

                                                                                            Note that this is not const-qualified and so you cannot implicitly cast from a const Foo* to a Foo*. Because this is implicit, C++ doesn’t let you put qualifiers on it, so you need to write them on the method instead:

                                                                                            struct Foo
                                                                                            {
                                                                                               void doAThing() const;
                                                                                            };
                                                                                            

                                                                                            This is equivalent to:

                                                                                            void doAThing(const Foo *this);
                                                                                            

                                                                                            Now this works with the same overload resolution rules as the rest of C++: You can call this method with a const Foo* or a Foo*, because const on a parameter just means that the method promises not to mutate the object via this reference. There are three important corner cases here. First, consider a method like this:

                                                                                            struct Foo
                                                                                            {
                                                                                               void doAThing(Foo *other) const;
                                                                                            };
                                                                                            

                                                                                            You can call this like this:

                                                                                            Foo f;
                                                                                            const Foo *g = &f;
                                                                                            g->doAThing(&f);
                                                                                            

                                                                                            Now the method has two references to f. It can mutate the object through one but not the other. The second problem comes from the fact that const is advisory and you can cast it away. This means that it’s possible to write things in C++ like this:

                                                                                            struct Foo
                                                                                            {
                                                                                               void doAThing();
                                                                                               void doAThing() const
                                                                                               {
                                                                                                 const_cast<Foo*>(this)->doAThing();
                                                                                               }
                                                                                            };
                                                                                            

                                                                                            The const method forwards to the non-const one, which can mutate the class (well, not this one because it has no state, but the same is valid in a real thing). The second variant of this is the keyword mutable. This is intended to allow C++ programmers to write logically immutable objects that have internal mutability. Here’s a trivial example:

                                                                                            struct Foo
                                                                                            {
                                                                                               mutable int x = 0;
                                                                                               void doAThing() const
                                                                                               {
                                                                                                 x++;
                                                                                               }
                                                                                            };
                                                                                            

                                                                                            Now you can call doAThing with a const pointer but it will mutate the object. This is intended for things like internal caches. For example, clang’s AST needs to convert from C++ types to LLVM types. This is expensive to compute, so it’s done lazily. You pass around const references to the thing that does the transformation. Internally, it has a mutable field that caches prior conversions.

                                                                                            Finally, const does not do viewpoint adaptation, so just because you have a const pointer to an object does not make const transitive. This is therefore completely valid:

                                                                                            struct Bar
                                                                                            {
                                                                                                int x;
                                                                                            };
                                                                                            struct Foo
                                                                                            {
                                                                                              Bar *b;
                                                                                              void doAThing() const
                                                                                              {
                                                                                                b->x++;
                                                                                              }
                                                                                            };
                                                                                            

                                                                                            You can call this const method and it doesn’t modify any fields of the object, but it does modify an object that a field points to, which means it is logically modifying the state of the object.

                                                                                            All of this adds up to the fact that compilers can do basically nothing in terms of optimisation with const. The case referenced from the talk was of a global. Globals are more interesting because const for a global really does mean immutability, it will end up in the read-only data section of the binary and every copy of the program / library running will share the same physical memory pages, mapped read-only[1]. This is not necessarily deep immutability: a const global can contain pointers to non-const globals and those can be mutated.

                                                                                            In the specific example, that global was passed by reference and so determining that nothing mutated it required some inter-procedural alias analysis, which apparently was slightly deeper than the compiler could manage. If Jason had passed the sprite arrays as template parameters, rather than as pointers, he probably wouldn’t have needed const to get to the same output. For example, consider this toy example:

                                                                                            namespace 
                                                                                            {
                                                                                              int fib[] = {1, 1, 2, 3, 5};
                                                                                            }
                                                                                            
                                                                                            int f(int x)
                                                                                            {
                                                                                                return fib[x];
                                                                                            }
                                                                                            

                                                                                            The anonymous namespace means that nothing outside of this compilation unit can write to fib. The compiler can inspect every reference to it and trivially determine that nothing writes to it. It will then make fib immutable. Compiled with clang, I get this:

                                                                                                    .type   _ZN12_GLOBAL__N_13fibE,@object  # @(anonymous namespace)::fib
                                                                                                    .section        .rodata,"a",@progbits
                                                                                                    .p2align        4
                                                                                            _ZN12_GLOBAL__N_13fibE:     # (anonymous namespace)::fib
                                                                                                    .long   1                               # 0x1
                                                                                                    .long   1                               # 0x1
                                                                                                    .long   2                               # 0x2
                                                                                                    .long   3                               # 0x3
                                                                                                    .long   5                               # 0x5
                                                                                                    .size   _ZN12_GLOBAL__N_13fibE, 20
                                                                                            

                                                                                            Note the .section .rodata bit: this says that the global is in the read-only data section, so it is immutable. That doesn’t make much difference, but the fact that the compiler could do this transform means that all other optimisations can depend on fib not being modified.

                                                                                            Explicitly marking the global as const means that the compiler doesn’t need to do that analysis, it can always assume that the global is immutable because it’s UB to mutate a const object (and a compiler is free to assume UB doesn’t happen. You could pass a pointer to the global to another compilation unit that cast away the const and tried to mutate it, and on a typical OS that would then cause a trap. Remember this example the next time someone says compilers shouldn’t use UB for optimisations: if C/C++ compilers didn’t depend on UB for optimisation then they couldn’t do constant propagation from global constants without whole-program alias analysis.

                                                                                            For anything else, the guarantees that const provides are so weak that they’re useless. Generally, the compiler can either see all accesses to an object (in which case it can infer whether it’s mutated and get more accurate information than const) or it can’t see all accesses to an object (and so must assume that one of them may cast away const and mutate the object).

                                                                                            [1] On systems with MMUs and sometimes it needs to contain so may actually be mutable unless you’ve linked with relro support.

                                                                                            1. 1

                                                                                              No, you might have D in mind where const is transitive.

                                                                                            1. 8

                                                                                              Wrong title. This post is fairly interesting and well written, but it doesn’t really explain why we need build systems. Instead, it tells us what build systems do. And while I do see the author trying to push us towards widely used build systems such as CMake, he offers little justification. He mentions that most developers seem to think CMake make them suffer, but then utterly fails to address the problem. Are we supposed to just deal with it?

                                                                                              For simple build system like GNU Make the developer must specify and maintain these dependencies manually.

                                                                                              Not quite true, there are tricks that allows GNU Make to keep track of dependencies automatically, thanks to the -M option from GCC and Clang. Kind of a pain in the butt, but it can be done.

                                                                                              A wildcard approach to filenames (e.g. src/*.cpp) superficially seems more straightforward as it doesn’t require the developer to list each file allowing new files to be easily added. The downside is that the build system does not have a definitive list of the source code files for a given artefact, making it harder to track dependencies and understand precisely what components are required. Wildcards also allow spurious files to be included in the build – maybe an older module that has been superseded but not removed from the source folder.

                                                                                              First, tracking dependencies should be the build system’s job. It can and has been done. Second, if you have spurious files in your source tree, you should remove them. Third, if you forget to remove an obsolete module, I bet my hat you also forgot to remove it from the list of source files.

                                                                                              Best practice says to list all source modules individually despite the, hopefully minor, extra workload involved when first configuring the project or adding additional modules as the project evolves.

                                                                                              In my opinion, best practice is wrong. I’ll accept that current tools are limited, but we shouldn’t have to redundantly type out dependencies that are right there in the source tree.


                                                                                              That’s it for the hate. Let’s talk solutions. I personally recommend taking a look at SHAKE, as well as the paper that explains the theory behind it (and other build systems as well). I’ve read the paper, and it has given me faith in the possibility of better, simpler build systems.

                                                                                              1. 3

                                                                                                We need to distinguish between build execution (ninja) and build configuration (autotools). The paper is about the execution. Most of complexity is in the configuration. (The paper is great though 👍)

                                                                                                1. 2

                                                                                                  I have looked at SHAKE and its paper before, but I am curious: what would you like to see in a build system?

                                                                                                  I ask because I am building one. 1

                                                                                                  1. 4

                                                                                                    I’m a peculiar user. What I want (and build) is simple, opinionated software. This is the Way.

                                                                                                    I don’t need, nor want, my build system to cater to God knows how many environments, like CMake does. I don’t care that my dependencies are using CMake or the autotools. I don’t seek compatibility with those monstrosities. If it means I have to rewrite some big build script from scratch, so be it. Though in all honesty, I’m okay with just calling the original build script and using the artefacts directly.

                                                                                                    I don’t need, nor want, my build system to treat stuff like unit testing and continuous integration specially. I want it to be flexible enough that I can generate a text file with the test results, or install & launch the application on the production server.

                                                                                                    I want my build system to be equally useful for C, C++, Haskell, Rust, LaTeX, and pretty much anything. Just a thing that uses commands to generate missing dependencies. And even then most commands can be as simple as calling some program. They don’t have to support Bash syntax or whatever. I want multiple targets and dynamic dependencies. And most of all, I want a strong mathematical foundation behind the build system. I don’t want to have to rebuild the world “just in case”.


                                                                                                    Or, I want a magical build system where I just tell it where’s the entry point of my program, and it just fetches and builds the transitive extension of the dependencies. Which seems possible on some closed ecosystems like Rust or Go. And I want that build system to give me an easy way to run unit tests as part of the build, as well as installing my program, or at least giving me installation scripts. (This is somewhat contrary to the generic build system above.)

                                                                                                    That said, if the generic build system can stay simple and is easy enough to use, I probably won’t need the “walled garden” version.

                                                                                                    1. 2

                                                                                                      Goodness; you know exactly what you want.

                                                                                                      Your comment revealed some blind spots in my current design. I am going to have to go back to the drawing board and try again.

                                                                                                      I think a big challenge would be to generate missing dependencies for C and C++, since files can be laid out haphazardly with no rhyme or reason. However, for most other languages, which have true module systems, that may be more possible.

                                                                                                      Thank you.

                                                                                                  2. 2

                                                                                                    The real reason why globbing source files is unsound, at least in the context of CMake:

                                                                                                    Note: We do not recommend using GLOB to collect a list of source files from your source tree: If no CMakeLists.txt file changes when a source is added or removed, then the generated build system cannot know when to ask CMake to regenerate.

                                                                                                    I heard the same reason is why Meson doesn’t support it.

                                                                                                    1. 2

                                                                                                      Oh, so it’s a limitation of the tool, not something we actually desire… Here’s what I think: such glob patterns would typically be useful at link time, where you want to have the executable (or library) to aggregate all object files. Now the list of object files depend on the list of source files, which itself depends on the result of the glob pattern.

                                                                                                      So to generate the program, the system would fetch the list of object files. That list depends on the list of source files, and should be generated whenever the list of source file changes. As for the list of source files, well, it changes whenever we actually add or remove a source file. As for how we should detect it, well… this would mean generating the list anew every time, and see if it changed.

                                                                                                      Okay, so there is one fundamental limitation here: if we have many many files in the project, using glob patterns can make the build system slower. It might be a good idea in this case to fix the list of source files. Now, I still want a script that lists all available source files so I don’t have to manually add it every time I add a new file. But I understand the rationale better now.

                                                                                                      1. 1

                                                                                                        Oh, so it’s a limitation of the tool, not something we actually desire… Here’s what I think: such glob patterns would typically be useful at link time, where you want to have the executable (or library) to aggregate all object files. Now the list of object files depend on the list of source files, which itself depends on the result of the glob pattern.

                                                                                                        So to generate the program, the system would fetch the list of object files. That list depends on the list of source files, and should be generated whenever the list of source file changes. As for the list of source files, well, it changes whenever we actually add or remove a source file. As for how we should detect it, well… this would mean generating the list anew every time, and see if it changed.

                                                                                                        Okay, so there is one fundamental limitation here: if we have many many files in the project, using glob patterns can make the build system slower. It might be a good idea in this case to fix the list of source files. Now, I still want a script that lists all available source files so I don’t have to manually add it every time I add a new file. But I understand the rationale better now.

                                                                                                      2. 1

                                                                                                        First, tracking dependencies should be the build system’s job. It can and has been done.

                                                                                                        see: tup

                                                                                                        Second, if you have spurious files in your source tree, you should remove them.

                                                                                                        Conditionally compiling code on the file level is one of the best ways to do it, especially if you have some kind of plugin system (or class system). It’s cleaner that ifdefing out big chunks of code IMO.

                                                                                                        Traditionally, the reason has been because if you want make to rebuild your code correctly when you remove a file you have to do something like

                                                                                                        OBJS := $(wildcard *.c)
                                                                                                        
                                                                                                        .%.var: FORCE
                                                                                                        	@echo $($*) | cmp - $@ || echo $($*) > $@
                                                                                                        
                                                                                                        my_executable: $(OBJS) .OBJS.var
                                                                                                        	$(CC) $(LDLIBS) -o $@ $(OBJS) $(LDFLAGS)
                                                                                                        

                                                                                                        which is a bit annoying, and definitely error-prone.

                                                                                                        Third, if you forget to remove an obsolete module, I bet my hat you also forgot to remove it from the list of source files.

                                                                                                        One additional reason is that it can be nice when working on something which hasn’t been checked in yet. Imagine that you are working on adding the new Foo feature, which lives in foo.c. If you then need to switch branches, git stash and git checkout will leave foo.c lying around. By specifying the sources you want explicitly, you don’t have to worry about accidentally including it.

                                                                                                        1. 1

                                                                                                          Conditionally compiling code on the file level is one of the best ways to do it, especially if you have some kind of plugin system (or class system). It’s cleaner that ifdefing out big chunks of code IMO.

                                                                                                          Okay, that’s a bloody good argument. Add to that the performance implication of listing every source file every time you build, and you have a fairly solid reason to maintain a static list of source files.

                                                                                                          Damn… I guess I stand corrected.

                                                                                                      1. 2

                                                                                                        The first trick here is that function parameters evaluation order is unspecified, meaning that new Widget might be called, then priority(), then the value returned by new Widget is passed to std::shared_ptr(…)

                                                                                                        I know the order of evaluation of function parameters is undefined, but I’ve never heard of the compiler being allowed to skip around between multiple function calls evaluating a parameter here, a parameter there… I don’t actually own EffC++; can someone verify this is true?

                                                                                                        In other words, my understanding is that the compiler will first fully evaluate one parameter of processWidget, then the other. The order may be unspecified, but after the “new” operator we know the next call will be to the shared_ptr constructor. Thus there’s no chance of a leak … as I understand it.

                                                                                                        1. 6

                                                                                                          I was surprised at first, but after double-checking, the boost docs and Herb Sutter back it up. Wild.

                                                                                                          1. 3

                                                                                                            Wild indeed. I still don’t want to admit this is true, 🙈 so I’m desperately seizing on the disclaimer at the top of the Sutter article:

                                                                                                            This is the original GotW problem and solution substantially as posted to Usenet. See the book More Exceptional C++ (Addison-Wesley, 2002) for the most current solution to this GotW issue. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard.

                                                                                                            The article isn’t dated, but it must be from before 2002. I wonder if this part of the C++ spec has changed since then, considering how unintuitive this behavior is. 🤞🏻😬

                                                                                                            1. 2

                                                                                                              So C++ is non-strict?

                                                                                                            2. 3

                                                                                                              Imagine an arithmetic expression like (a+b)*(c+d). The two additions are independent and can be computed in parallel. If the CPU has multiple ALUs for the parallel computation and there are enough registers to hold the data, the compiler should interleave the evaluation to enable instruction-level parallelism.

                                                                                                              Such an optimization can result in this „skipping around“.

                                                                                                            1. 2

                                                                                                              My very first FP language was Scala, which has first class objects. Granted this is one of the rare ones, but OO FP does exist.

                                                                                                              1. 3

                                                                                                                The article is all about how FP and OO co-exist without issue and there is no “vs”. I wouod go one step further and say there is no “OO language” or “FP language” – every language with a feature that can simulate a closure can describe both paradigms at will.

                                                                                                                1. 2

                                                                                                                  Scala had the explicit goal to combine FP and OO nicely.

                                                                                                                1. 18

                                                                                                                  As the leading architect of a project, I asked some developers what they thought of my „leading“ there. One suggestion was that I could have been more confident.

                                                                                                                  I believe it is a human thing to long for confident leaders. Developers are no exception. The „strong opinions, weakly held“ meme is a symptom. It isn’t generally good or bad.

                                                                                                                  With the developers we concluded that I was roughly as confident as the circumstances permitted.

                                                                                                                  1. 5

                                                                                                                    Oh yeah, definitely. I’ll add to that that people also want leaders with prestige (or high status if you will).

                                                                                                                    There is one negative interpretation and a positive one that I oscillate between:

                                                                                                                    1. People are bad with uncertainty, so it’s not received well if leadership says “we will do X, and it has a 75% chance of success”. Or worse: “we want X, but it’s not a strongly held opinion, feel free to disagree”

                                                                                                                    2. Part of leadership’s job is to create clarity and it’s necessary to just say “we’re sure about this decision, let’s go”. That doesn’t necessarily imply skewing the facts. But it helps tremendously to not have decision-makers that seem confused and fluffy and all over the place. Having insecure managers is terrible and not helpful at all.

                                                                                                                  1. 3

                                                                                                                    Something I really want is a UI (probably browser-based) that will let you type languages like pikchr on the left and render it on the right automatically.

                                                                                                                    I would use it for graphviz also, etc.

                                                                                                                    Does this already exist?

                                                                                                                    I wrote something sort of like this (motivated by R plotting) several years ago, but there are a bunch of things about it that aren’t great: https://github.com/andychu/webpipe

                                                                                                                    It uses inotify-tools connected to a “hanging GET” to auto-refresh on save.

                                                                                                                    1. 2
                                                                                                                      1. 1

                                                                                                                        Yes that’s the right idea! I think the button can replaced with a keyboard shortcut easily.

                                                                                                                        I would like something that generalizes to any tool, maybe with some kind of CGI interface. It looks like this is in Java, and I suspect https://pikchr.org/home/pikchrshow is in Tcl since it’s from Dr. Hipp. I probably would hack on the Tcl version first, although Python or PHP would also work.

                                                                                                                      2. 1

                                                                                                                        Pikchr has an inbuilt side-by-side sandbox functionality which makes the editing experience a lot easier.

                                                                                                                        1. 1

                                                                                                                          Ah OK I see this, it’s pretty close to what I want. I would like to enter a shell command and use it for any such tool! It can probably be hacked up for that purpose

                                                                                                                          https://pikchr.org/home/pikchrshow

                                                                                                                      1. 17

                                                                                                                        Am I missing something from this story?

                                                                                                                        She told me she knew I was busy with work and the app helped make certain she was on my mind and continued to keep communication high between us.

                                                                                                                        I thought the whole point of this is that you’re busy/distracted/mentally engaged/what have you with work, and thus she isnt on your mind?

                                                                                                                        I’m also still not even sure what it’s supposed to do. What does “automate your text messages” even mean?

                                                                                                                        Does it just send random non committal messages to people? Or canned responses to messages?

                                                                                                                        For anything outside of the “I’m driving and will see this when I stop” type auto responses I don’t see how “automation” is actually useful?

                                                                                                                        1. 5

                                                                                                                          This starts conversations with people by sending the starting text, it doesn’t have the full conversation.

                                                                                                                          Some people, especially those that grew up with ubiquitous phones in school, see texting frequently as how you show you care about someone. This ensures that if you haven’t sent a text or called in a while, you start something automated to jump start the conversation.

                                                                                                                          1. 2

                                                                                                                            Perhaps it’s not clear enough from the story, but, for my use case, it allows me to provide my significant other with a quick “bid for affection” without breaking my focus; a notification pops up and I simply swipe it away and the automation happens. In other cases where I want more of a connection or dialog, I will configure the app to remind me at more convenient times and ask more open-ended questions.

                                                                                                                            1. 2

                                                                                                                              So it is more of a texting reminder with builtin suggestions?

                                                                                                                              The link to the app does not work for me and the article does not really describe the app itself.

                                                                                                                              1. 2

                                                                                                                                In a nutshell, yes. Depending on how you configure it, its behavior changes and It also handles other communication methods. The story purposefully focuses on my experience with indie app development and not the details of the app itself. The app is currently in beta, so it’s limited to 47 countries/regions to better support this, but feel free to let me know what country you are in or send me a message.

                                                                                                                          1. 1

                                                                                                                            LoR monorepo

                                                                                                                            Originally „monorepo“ meant one repo for the whole company. Here and also at my company people use the term for „one project repo“ now. Is that common?

                                                                                                                            It seems the term is getting useless like „big data“.

                                                                                                                            1. 3

                                                                                                                              I think the term generally means ‘one repo containing multiple things that could be built independently’. For example, there’s an LLVM monorepo that contains LLVM, clang, lld, libc++, and so on even though libc++ is a completely separable component and folks that work on it don’t need anything else from the LLVM repo.