Threads for Grive

  1. 1

    This article would be better if it expressed those issues in terms of affordance.

    In C, the standard functions are not good, most of the time. That’s why most projects do not use them, and use wrappers around them. The question is not whether the standard is good enough, but whether it is possible –at all– to get a simple and safe function written in C to do this job.

    For this particular problem, yes it is possible.

    The next question then, is whether the language shepherds you toward creating this safe and simple interface. I think for C the answer is ‘no’, and that’s the issue with this language. Someone with enough experience might think, most of the times, to create this wrapper. But there is nothing in the language pushing toward it, or even better, completely forbidding the use of the unsafe and impractical version. This ‘most of time’ translates into many projects not using the safer and more practical wrapper.

    Some people use non-standard preprocessor statements for this job (they will mark the unsafe / impractical standard function as deprecated and force the use of the wrapper in the current project).

    Any C project starting today should first try to switch to a new language. If not possible, it should start with a very comprehensive boilerplate skeleton that consist in many, many wrappers replacing most of the standard lib.

    1. 14

      Most people, even strident advocates of rebase

      Again with Fossil marketing, they paint people thinking differently as emotional and noisy, unlike their perfectly rational argumentation.

      Bisecting is made more difficult by having to deal with intermediate commits. Those commits can break, then fix build and / or tests, before being integrated. As the main tree will jump to the end of the list, those breaks won’t be seen until bisect is attempted.

      When doing a series of commit, each individual one needs to support build + testing. As it will always happen that someone makes a mistake, they need to be fixed individually and locally would be better.

      Not having a local draft history to improve and work on is just an anti-feature that makes Fossil a bad VCS. Reading another comment here about having to use two Fossil repositories, one for local work and one shared, is a good sign that this is something missing. It’s also pretty hilarious considering the complacent tone of the article. I mostly feel for the developers forced to deal with this team opinions on software development.

      1. 6

        Why use C as a target at all? The code output is unreadable and useless. It’s also subject to interpretation by the various compilers.

        LLVM IR seems a better idea.

        1. 5

          Worse, it invites edits that could invalidate the proofs.

          1. 1

            It’s also subject to interpretation by the various compilers.

            This seems like a misconception. If your C output follows the C standard, it is not open for interpretation, if compilers don’t follow the standards, then it is a bug.

          1. 15

            “It tests CS fundamentals.” “It tests reasoning through a new problem.” These answers are contradictory

            They aren’t really – they’re alternative answers to the same question, and they both describe a useful outcome. Put together, they give you an idea if the interviewee knows the CS fundamentals and/or can figure them out as they go along.

            Interview at one of the clients I work with involves implementing a binary tree (just the basics, here’s the API, implement adding a new element to it – not even lookups). People we interview generally know what a binary tree is, and usually they have no trouble implementing it – it’s trivial enough that you shouldn’t have a problem with it even if you last saw it at university 10 years ago. We recently had a candidate who was a self-taught programmer with a mathematics degree. We started off showing her a linear search function to code review. She described it well, suggested improvements. The list in the example was sorted, and we asked if the search could be improved somewhat given the patterns in example data. “Well, you could split the list into two halves…” and she described binary search. “Cool, that does algorithm have a name you’re familiar with?” “No, I don’t think so”. Okay then.

            So then there’s the binary tree. The looks at the drawing of it, describes the properties, the implements in in one go, correctly, calling the Node a Knot (translated to Polish they’re the same word). Again, she had no idea if this data structure had any specific name. Naturally she got the job.

            Linked lists may be useless in all applications nowadays (given that CPU caches generally make it an inferior choice in almost every scenario) (bad generalization, see comments below), but they do have a value on interviews imho: they tell you if the candidate knows what the data structures and algorithms lie beyond the APIs they use in their code every day, and if they don’t – even better! It’s easy enough figure it out and to write on the spot – and even if what you write isn’t fully correct, seeing the process and discussing their bugs is still valuable.

            That’s actually a bit offtopic from this article – which is well worth reading in full indeed, and makes sense if you think about it. I think the Spolsky quote actually nails it, even without the historical context – people would still prefer (or think that they’d prefer) to hire Real Programmers [tm] who can deal with pointers rather than “script jocks” who just copy code around.

            1. 20

              Linked lists may be useless in all applications nowadays (given that CPU caches generally make it an inferior choice in almost every scenario)

              This is not the first time I heard this (I recall a specific rust tutorial expressing this in particular), but this is such a weird take to me.

              I work on high-performance datapath in software-defined networking. Those are highly optimized network stack using the usual low-latency/high-io architectures. In all of them, linked-list were fundamental as a data structure.

              One example: a thoroughly optimized hash-table using cuckoo hash, with open addressing and concurrent lookup. It implements buckets as a linked-list of arrays of elements, each cell taking exactly one cache-line. The link-list is so fundamental as a construct there that arguably it is eclipsed by the other elements.

              Another example: a lockless list of elements to collect after the next RCU sync round. Either the RCU allocates arrays of callbacks + arguments to call once synchronized, or the garbage collector consist in a list of nodes embedded in other objects and the reclaiming thread will jump from object to object to free them. The array-based one has issues as each thread will need to reallocate a growing array of callbacks, requiring bigger and bigger spans of contiguous memory when attempting to free elements.

              Another example: when doing TSO on TCP segments, those are linked together and passed to the NIC to offload the merging. To avoid copying around segments of data during forwarding, they must be linked together.

              There are countless other examples, where the high-performance requirement makes the linked-list a good solution actually. It avoid allocated large swath of memory in contiguous chunks, and allows being much more nimble in how memory is managed.

              That being said of course, at the source of those elements (when new packets are received), everything must be done to attempt to pack the maximum of useful data for a fastpath in contiguous memory (using hugepages). But as you go higher in the stack it becomes more and more untenable. Then simple structures will allow building ancillary modules for specific cases.

              I don’t see how linked-list will ever become completely useless. Maybe I’m just too far gone in my specific branch of software?

              1. 11

                Software is full of rules of thumb, some of them are true. I’m pretty sure someone wrote an article years ago on mechanical sympathy where they pointed out that a CPU can chew through contiguous memory a bunch faster than it can chase unknown pointers and since then “everyone knows” that linked lists are slow.

                The reality is always more complicated and your experience optimizing real world systems definitely trumps the more general recommendations.

                1. 1

                  I’m pretty sure someone wrote an article years ago

                  Bjarne Stroustroup did!

                2. 6

                  Linked lists make sense in reality because solutions to real problems are bigger than benchmarks. Complicated custom datastructures with linked lists threaded through them are everywhere, but nobody looks at them when they’re looking to make a blanket statement about how fast linked lists are. (Such statements seem to rest on shaky foundations anyway; linked lists aren’t a thing that is fast or slow, but a technique that is more or less applicable).

                  I use linked lists mainly for bookkeeping where objects are only iterated occasionally, but created and deleted often, where they appeal mostly because they avoid headaches; their constant factors may not be great, but they scale to infinity in an uncomplicated way (and if I’m creating or deleting an object I’m allocating something in any event). I don’t often see linked list operations in flame graphs, and when I do it’s normally accidentally-quadratic behaviour, so I replace them with things that aren’t list-like at all.

                  Final idle thought: not scientific at all, but I note that linked list nodes can live inside the element itself without problems, while array elements are often indirected through an extra pointer to deal with move issues. While it doesn’t absolutely have to, code using such arrays often looks like it suffers from the same data dependency issues linked lists tend to.

                  1. 2

                    People like to make performance claims without ever generating any numbers to backup their assertions.

                    1. 2

                      True – same is true for “linked list are faster at insertions/deletions” though. I brought it up because I’ve heard it mentioned time and time again in my university days, and now time and time again at interviews etc, always at a scale where it doesn’t matter or is straight up incorrect. I’m sure they’ll always (or: for the foreseeable future) have their place, even if that place is not the common case. All the more reason to understand them even if you won’t often use them.

                      1. 1

                        There seems to be a long tradition of this kind of garbage. Decades ago I used to hear things like “malloc is slow” from my senior colleagues with nobody being able to explain to me (the young punk) what lead to the assertion.

                    2. 7

                      people would still prefer (or think that they’d prefer) to hire Real Programmers [tm] who can deal with pointers rather than “script jocks” who just copy code around.

                      Real Programmers™ is already a fallacy though that tends to amplify the issues on teams by hiring a group of people with very similar strengths, flaws, and viewpoints.

                      Put together, they give you an idea if the interviewee knows the CS fundamentals and/or can figure them out as they go along.

                      This may be true for some questions, like your binary search example, but I don’t think it’s true of the majority of linked list and pointer type questions that are asked in an interview setting. Linked list cycle finding (i.e. tortoise and the hare) comes to mind—a lot of people in the early 2010’s were defending this as something that either tested if you knew fundamentals or if you could piece them together, but it’s been pointed out half to death by now that the algorithm itself wasn’t developed until years after cycle finding was a known problem with people trying to solve it—almost everyone who passed a tortoise and the hare question either knew it in advance or was given hints by the interviewer that led them there without the interviewer believing they’d given it away (which is a pretty fraught and unnormalized thing).

                      In general, I think this is a high ideal that is really hard to build for and people convince themselves they have solved for it at far too high a rate. When I first started interviewing for software jobs (~2010), I learned quickly that if you knew the answer or had seen the question before, the right thing to do was to pretend you hadn’t and perform discovering the answer. This is a problem with nearly all knowledge-based interview questions; there will always be a degree to which you’re testing if the candidate is connected enough to know what the questions will look like ahead of time and what kinds of things will be tested.

                    1. 7

                      The main points are: stability, portability and obsolescence and how they are a struggle.

                      But then the author moves to the latest MacOS? Where is the stability? Apple is famous for breaking compatibility and biting the bullet whenever they can push a new proprietary API to ensnare devs (Metal?). Where is the portability (Apple only cares about the hardware they sell of course). And where is the (lack of) planned obsolescence? This is the whole long-term strategy of Apple: tech as fashion and short hardware update cycles.

                      So this is why the author leaves linux desktop? He could run a recentish notebook, with ARM or x86 cores and linux would be perfectly fine. None of those issues would be valid then.

                      This is a weird take.

                      1. 12

                        But then the author moves to the latest MacOS? Where is the stability?

                        On the user side of things :). A few months ago I got one of them fancy M1 MBPs, too, after not having used a Mac since back when OS X was on Tiger. Everything that I used back in 2007 still worked without major gripes or bugs. With a few exceptions (e.g. mutt) the only Linux programs that I used back in 2005 and still worked fine were the ones that were effectively abandoned at some point during this period.

                        Finder, for example, is still more or less of a dumpster fire with various quirks but they’re the same quirks. Nautilus, uh, I mean, Files, and Konque… uh, Dolphin, have a new set of quirks every six months. At some point you eventually want to get off the designer hobby project train.

                        In this sense, a lot of Linux software isn’t really being developed, as in, it doesn’t acquire new capabilities. It doesn’t solve new problems, it just solves the old problems again (supposedly in a more “usable” way, yeah right). It’s cool, I don’t want to shit on someone’s hobby project, but let’s not hold that against the people who don’t want to partake.

                        (Edit: to be clear, Big Sur’s design is hot garbage and macOS is all kinds of annoying and I generally hate it, but I wouldn’t go back to dealing with Gnome and GTK and Wayland and D-Bus and all that stuff for the life of me, I’ve wasted enough time fiddling with all that.)

                        1. 10

                          At some point you eventually want to get off the designer hobby project train.

                          THIS, so much.

                          1. 1

                            Well, just step off then?

                            Unlike Apple, you have some options with open source. Don’t like the latest Gnome craze? Get MATE, which is basically Gnome 2. There are lots of people who keep old window managers and desktop environments alive and working. The Ubuntu download page lists a couple, but many more can be installed with a few commands.

                            I think I have been running the same setup for six or seven years now, no problem at all.

                            1. 3

                              If you try to grab a Gnome 2 box, you’ll find that Mate is pretty different even if the default screen looks about the same. Not because of Mate but because of GTK3 general craziness. Sure, the panels look about the same, but as soon as you open an application you hit the same huge widgets, the same dysfunctional open file dialog and so on. It’s “basically the same” in screenshots but once you start clicking around it feels pretty different.

                              If all you want is a bunch of xterms and a browser, you got a lot of options, but a bunch of xterms and a browser is what I used back in 2001, too, and they were already obsolete back then. The world of computing has long moved on. A bunch of xterms and a browser is what many, if not most experienced Linux users still use simply because it’s either that or the perpetual usability circlejerk of the Linux desktop. I enjoy the smug feeling of green text on black background as much as anyone but at some point I kindda wanted to stop living in the computing world of the early 00s.

                              I’ve used the same WindowMaker-based setup for more than 10 years, until 2014 or so, I think. After that I could technically keep using it, but it was mostly an exercise in avoiding things. I don’t find that either fun or productive. I kept at it for 6+ years (basically until last year) but I hated it.

                              (Edit: imho, the options are really still the same that they were 15 years ago: Gnome apps, KDE apps, or console apps and an assortment of xthis and xthat from the early/mid-90s – which lately mostly boils down to “apps built for phones” and “apps built for the computers of the Hackers age”. Whether you run them under Gnome, KDE, or whatever everyone’s favourite TWM replacement is this year doesn’t make much of a difference. Lots of options, but not much of a choice.)

                      1. 4

                        And nowadays, all arguments that say that indexes should be 0-based are actually arguments that offsets are 0-based, indexes are offsets, therefore indexes should be 0-based. That’s a circular argument.

                        This is not a circular argument.

                        The rest seems mostly straw-mans, without argument for 1-based indexing.

                        1. 6

                          This sounds a bit like it’s going to take the course of USB3.0, which turned from an improved USB into an abomination that tried to do everything. While that certainly not only has downsides, I’m not sure if it’s a good development.

                          1. 2

                            Yeah reading “5G ready” on an antenna will hardly say which part of the standard will be supported. Will you be able to connect to short-range frequencies (30 - 300GHz) as well? Unlicensed spectrum (5-6 GHz)? It is certain that when buying a phone for example this kind of info will be usually missing, just like USB-C does not give exactly what is supported by the port.

                            On the other end, industry features (sidelinking, V2X etc) will change a lot of things, not only for corporations but also citizens and society. Good or bad remains to be seen.

                          1. 1

                            Nice! I love the controls, very usable. I will probably use it actually.

                            I had written something a little similar, shimpr, POSIX shell presentation tool. For some things about managing input and term geometry it seems to reduce to similar solution, but you have avoided extended tput for example. I was not aware of its limitations actually.

                            And good to know that SIGWINCH is supposed to become POSIX.

                            1. 4

                              TL;DR Programming looks very much like talking from Neuro Imaging

                              1. 4

                                Thank you, I find this article to be more interesting.

                                To see what happens in the brain during this process, the team used a functional magnetic resonance tomograph. The image data clearly showed that the test subjects’ left brain areas were activated, which are mainly associated with speech comprehension. “To our surprise, we could not observe any activity in the direction of mathematical or logical thinking,” said the researcher summarising the results. “Our research suggests that speech understanding plays a central role in programming. The renowned Dutch computer scientist Edsger W. Dijkstra already expressed this assumption in the 1980s,” Apel adds.

                                It’s the second time I hear about this idea. Anecdotal evidence around me seems to support it. I know very competent programmers that had mental blocks about mathematics. All competent programmers I know however are talented at writing and expressing themselves.

                                In my opinion literacy and linguistic talent is a better predictor about the quality of the programming output of a person. I think because understanding algorithms and data structures is an important part, but even more important is communicating with your peer, and writing good quality code is all about communicating intent to other people in your team.

                                1. 1

                                  Indeed, I agree about program comprehension; However, I wonder if the same can be said about writing a new program. That is, if you are asked to implement a new algorithm from scratch, would it still light up the speech processing parts of the brain. Going the reverse; if one is trying to understand a mathematical paper, would the logical parts of the brain still light up?

                                  1. 1

                                    In my opinion literacy and linguistic talent is a better predictor about the quality of the programming output of a person. I think because understanding algorithms and data structures is an important part, but even more important is communicating with your peer, and writing good quality code is all about communicating intent to other people in your team.

                                    Ultimately, programming is a form of communication, and art. Knuth argues that here. You need to be able to understand your own thoughts well enough to put them down in a medium, which is fundamentally a form of art. It’s a dialogue between yourself, it’s a dialogue with the platonic idea-space, and it’s a dialogue with the computer, it’s systems, and other people who read it. There’s room for process, but that does not deny that fundamentally it has an artistic component that we as a field, constantly ignore, that we stifle. The more we regard programming as a mechanical endeavour, or as a scientific endeavour, the more we deny and devalue our own independence, skill, proficiency, and the work itself. Attempts to ‘codify’ programming will fail, without that acknowledgement.

                                1. 1

                                  The initial and recurrent issue of VLIW is having a smart enough compiler.

                                  I have worked on another project using VLIW. On paper it is very interesting, you can get into theoretical very high FLOPS with rather low power, and the arch itself seems pretty simple. You can take older open arch and adapt them (the project I worked was also adapted from SPARC, like this one).

                                  But you need to have a lot of firepower on the software side, and have a real strategy to get a good compiler. I have yet to see it succeed.

                                  All those VLIW projects could maybe mutualize their efforts, have a common optimization pass specialized for VLIW in OSS compilers.

                                    1. 16

                                      Unfortunately, the comparison is written in such a clearly biased way that it probably makes fossil sound worse than it is (I mean you wouldn’t need to resort to weasel words and name-calling if fossil was valid alternative whose benefits spoke for themselves .. right?). Why would anyone write like that if their aim is to actually promote fossil?

                                      1. 5

                                        The table at the top is distractingly polemic, but the actual body of the essay is reasonable and considers both social and technical factors.

                                        My guess is that author is expecting the audience to nod along with the table before clicking through to the prose; it seems unlikely to be effective for anyone who doesn’t already believe the claims made in the table.

                                        1. 4

                                          This is what’s turned me off from even considering using it.

                                        2. 12

                                          “Sprawling, incoherent, and inefficient”

                                          Not sure using a biased comparison from the tool author is useful. Even then, the least they could do is use factual language.

                                          This is always something that gripes me reading the recurring fossil evangelism: git criticism is interesting and having a different view should give perspective, but the fossil author always use this kind of language that makes it useless. Git adapts to many kind of teams and workflow. The only thing I take from his comparison is that he never learnt to use it and does not want to.

                                          Now this is also a very valid criticism of git: it is not just a turn-key solution, it needs polish and another system needs to put forth a specific work organization with it. That’s a choice for the project team to make. Fossil wants to impose its own method, which of course gives a more architected, polished, finish, but makes it impossible to use in many teams and projects.

                                          1. 2

                                            Maybe they don’t care about widely promoting fossil and just created that page so people stop asking about a comparison?

                                          2. 5

                                            One of the main reasons for me for not using Fossil is point 2.7 on that list: “What you should have done vs. What you actually did”. Fossil doesn’t really support history rewrites, so no “rebase” which I use nearly daily.

                                            1. 2

                                              This is also a problem with Git. Like you, I use rebase daily to rewrite history, when that was never really my objective; I just want to present a palatable change log before my changes are merged. Whatever happens before that shouldn’t require something as dangerous as a rebase (and force push).

                                              1. 4

                                                I don’t think it makes any sense to describe rebases as ‘dangerous’, nor to say that you want to present a palatable change log without rewriting history unless you’re saying you want the VCS to help you write nicer history in the first place?

                                                1. 2

                                                  Rebase is not dangerous. You have the reflog to get back to any past state if needed, you can rewrite as much as you need without losing anything.

                                                  Now, I see only two ways of presenting a palatable change log: either you are able to write it perfectly the first time, or you are able to correct it. I don’t see how any VCS would allow you to do the first one. If you use a machine to try to present it properly (like it seems fossil strives to do), you will undoubtedly hit limitations, forcing the dev to compose with those limitations to write something readable and meaningful to the rest of the team. I very much prefer direct control into what I want to communicate.

                                                  1. 2

                                                    I think whether rebase is dangerous depends on the interface you are using Git with. The best UI for Git is, in my opinion, Magit. And when doing a commit you can choose from a variety of options, one of them being “Instant Fixup”.

                                                    I often use this when I discover that I missed to check-in a new file with a commit or something like that. It basically adds a commit, does an interactive rebase, reorders the commits so that the fixup-commit is next to the one being fixed and executes the rebase pipeline.

                                                    There are other similar options for committing and Magit makes this straight-forward. So much, indeed, that I have to look up how to do it manually when using the Git CLI.

                                                    1. 4

                                                      I prefer to work offline. Prior to Git I used SVK as frontend for SVN since it allowed offline use. However, once Git was released I quickly jumped ship because of its benefits, i.e. real offline copy of all data, better functionality (for me).

                                                      In your linked document it states “Never use rebase on public branches” and goes on to list how to use rebase locally. So, yes, using rebase on public branches and force-pushing them is obviously only a last resort when things went wrong (e.g. inadvertently added secrets).

                                                      Since I work offline, often piling up many commits before pushing them to a repo on the web, I use rebase in cases when unpushed commits need further changes. In my other comment I mentioned as example forgotten files. It doesn’t really make sense to add another commit “Oops, forgotten to add file…” when I just as easily can fixup the wrong commit.

                                                      So the main reason for using rebase for me is correcting unpushed commits which I can often do because I prefer to work offline, pushing the latest commits only when necessary.

                                                      1. 2

                                                        In addition to what @gettalong said, keep in mind the original use-case of git is to make submitting patches on mailing lists easier. When creating a patch series, it’s very common to receive feedback and need to make changes. The only way to do that is to rebase.

                                                  1. 4

                                                    We need to stop writing drivers in such ad-hoc and low-level ways. This is the second large effort to reverse-engineer Mali, after Lima, and the primary artifacts produced by these projects are piles of C. Long-term maintainability can only come from higher-level descriptions of hardware capabilities.

                                                    I’m a little frustrated at the tone of the article. I was required to write this sort of drivel when I was applying for grants, but by and large, the story of how I wrote a GPU driver is simple: I was angry because there wasn’t a production-quality driver, so I integrated many chunks of code and documentation from existing hard-working-but-exhausted hackers to make an iterative improvement. The ingredients here must have been quite similar. The deliberate sidelining of the Lima effort, in particular, seems rather rude; Panfrost doesn’t mark the first time that people have been upset with this ARM vendor refusing to release datasheets, and I think that most folks in the GPU-driver-authorship world are well aware of how the continual downplaying of Lima is part of the downward spiral that led to their project imploding.

                                                    I don’t think I ever hid that r300, r500, and indeed radeonhd, from the same author as Lima, were all big influences on my work, and that honest acknowledgement of past work is the only way to avoid losing contributors in the future.

                                                    1. 16

                                                      I’m a little frustrated at the tone of the article. I was required to write this sort of drivel

                                                      Is it not possible that the tone is a true, unforced reflection of the author’s enthusiasm? That’s how I read it. Maybe that’s just naive of me.

                                                      1. 9

                                                        Long-term maintainability can only come from higher-level descriptions of hardware capabilities.

                                                        Is there any source you can provide indicating that this would actually work? From my understanding, creating meaningful abstractions over hardware is an extodinarily tough problem to solve. For example device trees work as a general purpose description of hardware, but still need a lot of kernel-space driver fluff to get anything talking correctly. What kind of higher level description do you think would work in this space?

                                                        FYI: I have zero experience in graphics driver land so I don’t actually know anything about this domain. ¯\_(ツ)_/¯

                                                        1. 1

                                                          I read it not as “assembling a driver from common code components and interfaces in c”, but as “write a high-level description of the hardware and api, from which implementations in C or Rust or whatever can be generated”.

                                                          But maybe we’re both reading it wrong :)

                                                        2. 4

                                                          Isn’t the primary artefact produced a set of instruction able to use a GPU? I would think it comes first, before the piles of C.

                                                          Long-term maintainability can only come from higher-level descriptions of hardware capabilities.

                                                          This seems like an extraordinary claim. “Can only come from” is a very strong statement.

                                                          1. 4

                                                            GPU drivers are not required to have a single shape. Indeed, they usually have whatever shape is big and complex enough to fit their demanded API. The high-level understanding of GPUs is what allowed the entire Linux GPU driver tree to be refactored around a unified memory manager, and what allowed the VGA arbiter to be implemented. High-level descriptions, in particular datasheets, are already extremely valuable pieces of information which are essential for understanding what a driver is doing. At the same time, the modern GPU driver contains shader compilers, and those compilers are oriented around declarative APIs which deal with hardware features using high-level descriptions of capabilities.

                                                            Let me show you some C. This module does PCI ID analysis and looks up capabilities in a table, but it’s done imperatively. This module does very basic Gallium-to-r300 translation for state constants, but rather than a table or a relation, it is disgustingly open-coded. (I am allowed to insult my own code from a decade ago.) I won’t lie, Panfrost has tables, but this is, to me, only the slightest of signs of progress.

                                                            1. 3

                                                              Ah, I see what you mean.

                                                              Higher-level description means genericity, this can lead to bloated code trying to deal with the future, impairing the present. Trying to keep the proper balance of high-enough description and low-enough efficient description is a challenge.

                                                              Helping the maintenance effort by lending hands to refactor with a fresh mindset is my naive view of how to fight this, but I know this is falling prey to the rewrite fallacy.

                                                        1. 3

                                                          I have used Snowden’s argument (referred to in this comment: ), however this is sometimes ineffective when the person is not politically interested in “solidarity” or resistance against the oppression.

                                                          For the more individualistic people, I’ve switched to using “saying that you do not care about privacy because you have nothing to hide, is like saying you don’t care about advertising because you have nothing to buy”. One of the core issue with privacy, that affects everyone is that privacy-invading services are built to coerce and manipulate opinion on a large scale. To some, this is acceptable when dealing about mundane stuff (what you will buy for your week groceries), but I’ve yet to encounter someone who thought it was okay when going into the political / social realm. Recent examples were pretty clear that the end-game here is not just advertising, but also vote manipulation and political destabilization of democracies.

                                                          When I’m saying that is like “saying that you do not care about advertising because you have nothing to buy”, it’s not that it is the same as ads (for better or worse), because many are not at all opposed to seeing ads. It’s more about the few people saying that they are not themselves impacted by advertising. It affects everyone, even those that are aware of its effect. This is overt manipulation. The issue with privacy is that it gives amunition to people that are trying to do exactly the same as advertising, but with other subjects, and even if you believe that it won’t affect you, everyone can fall prey to disinformation.

                                                          1. 4

                                                            Haha, I did exactly this, minus the coloring for a presentation to a client. I only had an SSH access to my machine and was demoing CLI tools, so was too lazy to change context (and was able to have the slides stay into a corner of the screen with the tool taking the other side).

                                                            Without much formatting, the script itself is very short, it was fun. Not sure the client was impressed however! But the presentation went well (at least the product was exceeding expectations).

                                                            I’m not sure there are many reasons to do it in bash instead of POSIX shell however? Only thing lacking might be local variables in functions, but many shell still implement them.

                                                            1. 3

                                                              Local variables (or more specifically the ‘local’ keyword) can be mimicked in POSIX shell by wrapping the code in ( ) with the expense of a sub-shell.


                                                                  printf '%s\n' "$var"
                                                              # '$var' will be unset.
                                                              printf '%s\n' "$var"

                                                              I also sometimes do this with functions themselves.

                                                              func() (
                                                                  # code here.

                                                              (Notice how ‘()’ is used in place of ‘{}’ for the function body).

                                                              1. 2

                                                                Interesting! I’m planning to use this in a talk I’m giving this weekend. Let’s see how that goes. :D

                                                                I’m not sure there are many reasons to do it in bash instead of POSIX shell however?

                                                                I use bash interactively and am fairly comfortable with its features; mapfile etc. are pretty nifty. Shouldn’t be too hard to port it to POSIX sh though—I just might.

                                                              1. 4

                                                                As someone who is working in the telecom industry, I can assert, mobile network operators have become less composed of technical experts creating and doing their own things and more about managing and juggling multiple vendor solutions. It has become rare to encounter mobile operators that know what they are doing without relying on consultants hired for specific projects and delegating everything to the cheapest third party that fulfills the RFP written by that consultant. Even more absurd, sometimes the consultants have no idea what they are talking about and MNOs are buying things they have no clue about and will never use just for the hype of ticking a box on a sheet.
                                                                Let’s add that the field is hard to get into, the documentation is fierce to dive in. That leads to many security vulnerabilities, which are in a lot of cases not really vulnerability but features and configurations that haven’t been set properly because no one had any idea what they were doing.

                                                                1. 1

                                                                  How much did the complexity of the standards contribute to this disaster? Why is the world still hooked on this mobile crap instead of having good public Wi-Fi coverage everywhere?

                                                                  1. 8

                                                                    The Wi-Fi standard is not well-suited for medium-distance communication. The frequencies only work on short-distance. The mobile crap is like this because it is complex to build a good network with some distance between the nodes. Using Wi-Fi instead would not make it magically better, you’d have complex addendum to the Wi-Fi to make it minimally viable.

                                                                    Funnily, the frequencies for 5G will also favor short-distances, meaning only high-density cities will have proper coverage.

                                                                    1. 1

                                                                      The standards are amendments over amendments over amendments, all stacking one over the other and referencing one another. You literally have to jump between 10-20 documents all the time. So I think that yes, the standards being convoluted has pushed mobile operator away from implementing them and letting third party handle the complexity. As for Wi-Fi, you’d need a brand new infrastructure to cover everything, which costs money, time, and legal approval in many countries. Mobile has the premise of being seamless. Let’s also not forget the players here, sim card manufacturers are manufacturing credit and debit cards, passports, and sometime even cash money for countries. They are very big. Same for core network equipment manufacturers. There’s a whole ecosystem of actors that benefit from this.

                                                                      1. 1

                                                                        In many many countries that’s just not possible, while upgrading the existing infrastructure is easier to do and is almost invisible to the end-user.

                                                                    1. 3

                                                                      In the “Handling Collisions” section, he writes

                                                                      It is possible that hash_b will return 0, reducing the second term to 0. This will cause the hash table to try to insert the item into the same bucket over and over. We can mitigate this by adding 1 to the result of the second hash, making sure it’s never 0.

                                                                      How does adding 1 to a hash ensure that it’s not 0?

                                                                      1. 1

                                                                        It’s possible for hash_b(x) = 0, so that (hash_a(x) + i) * hash_b(x) is always 0, no matter how many times you increase i to get another bucket.

                                                                        1. 2

                                                                          It’s possible hash_b(x) = -1

                                                                          1. 2

                                                                            Looking at the ht_hash implementation, it doesn’t look like it will ever return negative values.

                                                                            1. 2

                                                                              So the hash function can return [0, INT_MAX], meaning that (INT_MAX + 1) is possible. If hash_a() can also return 0, we thus have 0 + (undefined behavior probably being -1) % num_buckets.

                                                                              -1 % num_buckets still returns -1. Not a good idea for an index.

                                                                              The guide is interesting, pretty simple. But those issues are fundamental in C and should be tackled right now. It does not add too much complexity to limit the number of buckets, only use unsigned values (avoid undefined behavior), and check for edge-cases.

                                                                              The hash function itself can easily overflow a long integer, meaning that the precaution of up-casting to long, modulo then restrict to int is useless and incorrect. Still undefined behavior.

                                                                              There are other well-defined stream hash-functions, easy to write in a few lines that can be used instead.

                                                                              1. 5

                                                                                The hash function returns [0, m] where m is the number of buckets (at least in ht_get_hash). That means, unless the hash table has INT_MAX buckets, hash_b + 1 won’t overflow.

                                                                                The overflow potential that is there is that hash += (long)pow(a, len_s - (i+1)) * s[i]; could overflow the long hash; it probably won’t because long is probably 64 bits, but on some systems (maybe especially the kind of system where you would want a home grown hash table implementation), long could be 32 or 16 bits. I agree that using unsigned there would be better.

                                                                                Also, while it doesn’t really matter to this discussion, 32-bit INT_MAX + 1 overflows from 01111111111111111111111111111111 to 10000000000000000000000000000000, which represents -2147483648 in two’s complement, not -1. -1 would be all bits set to 1. (Of course, this is UB and the compiler could just make the program terminate or launch nethack or whatever instead)

                                                                      1. 2

                                                                        I used to sort of like strl*

                                                                        But I now prefer to let my code die noisily and as soon as possible when I do such stupid…

                                                                        I then know I’m doing stupid.

                                                                        I then fix my code.

                                                                        My code is then no longer stupid.

                                                                        1. 3

                                                                          What are you trying to say? Do you prefer to use strncpy instead of strlcpy because strncpy is somehow noisier? What?

                                                                          1. 2

                                                                            I’m saying both are a Bad Idea.

                                                                            If you hit the case in strl* where you need to truncate the result so you can null terminate AND/OR the case in strn* where you don’t null terminate…..

                                                                            You have a bug.

                                                                            So wrap ’em both (or replace them) and check for that case and die noisily as soon as possible and then fix your code.

                                                                            Or use a memory safe language such as D.

                                                                            1. 1

                                                                              String truncation is not necessarily a bug. You might have outside requirements. The correct thing to do is to check it properly, which needs to be done in all languages.

                                                                              strlcpy is cumbersome for truncation check. At least it leaves the memory in a better shape. strscpy is better to quickly check that you properly copied your string

                                                                              1. 2

                                                                                If we were all perfect… we wouldn’t even be talking about strn*

                                                                                We’re talking about it because the strn* api’s are a very very common source of defects. They are at the 2.5 to 3 out 10 on the Rusty Scale of API Goodness

                                                                                So you’re right. “String truncation is not necessarily a bug”, it just is around about 99 out of a 100 uses.

                                                                                My proposal bumps it up to 5/10 on the scale.

                                                                        1. 2

                                                                          This is not the job of strlcpy to initialize memory.

                                                                          memset or = {0};, don’t rely on quirks from badly designed functions. strncpy() will always write len bytes, but that’s going beyond the scope of a copy function.

                                                                          1. 1
                                                                            #define strscpy(dst, src, len)  \
                                                                                do {                        \
                                                                                    memset(dst, 0, len);    \
                                                                                    strlcpy(dst, src, len); \
                                                                                while (0);

                                                                            How’s this? I bet there’s still a one-off bug somewhere.

                                                                            1. 3

                                                                              not that it matters, but memset(3) will return dst, so you could (maybe not should) also do

                                                                              #define strscpy(dst, src, len) \
                                                                              	strlcpy(memset(dst, 0, len), src, len)
                                                                              1. 2

                                                                                Still has the problem of evaluating len twice.

                                                                                For clarity’s sake, a better approach here would be to implement strscpy as a (potentially inline) function rather than a macro. The types of all the arguments are known and there’s no preprocessor trickery going on.

                                                                              2. 2

                                                                                Probably just a typo, but drop the semicolon after while (0). Having it defeats the purpose of wrapping your code in a do {} while loop in the first place.

                                                                                1. 1

                                                                                  You’re right that it’s a typo, but it doesn’t break anything, as far as I see. It would just equality valid to add or to omit a semicolon in the real code.

                                                                                  1. 10

                                                                                    The whole point of using do { ... } while (0) is to handle the case where adding a semicolon in the real code is not valid. Consider the calling code

                                                                                    if (a)

                                                                                    If you define your macro as #define macro() do { ... } while (0) then this works fine. But if you define it as do { ... } while (0); then this expands to

                                                                                    if (a)
                                                                                        do { ... } while (0);;  /* note two semicolons here */

                                                                                    That extra semicolon counts as an extra empty statement between the body of the if and the else. You can’t have two statements in the body of an if (without wrapping things with curly braces) so the compiler will refuse to compile this. Probably complaining that the else has no preceding if. This is the same reason why plain curly braces don’t work properly in a macro.

                                                                                2. 2

                                                                                  How do you detect truncation?

                                                                                  strlcpy will also attempt to evaluate strlen(src), meaning that if src is malformed, you will read memory that should not be read, and you will waste time evaluating it in every case.

                                                                                  ssize_t strscpy(char *dst, const char *src, size_t len)
                                                                                  	size_t nleft = len;
                                                                                  	size_t res = 0;
                                                                                  	/* Copy as many bytes as will fit. */
                                                                                  	while (nleft != 0) {
                                                                                  		dst[res] = src[res];
                                                                                  		if (src[res] == '\0')
                                                                                  			return res;
                                                                                  	/* Not enough room in dst, set NUL and return error. */
                                                                                  	if (res != 0)
                                                                                  		dst[res - 1] = '\0';
                                                                                  	return -E2BIG;
                                                                                  1. 1
                                                                                    char *dir, pname[PATH_MAX];
                                                                                    if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
                                                                                        goto toolong;