1. 17

    I personally prefer more tags to less tags, because I use tags to guide what to submit to Lobsters and what I want to read—often I’ll check out an otherwise-uninteresting title because the tags intrigue me. Also, more tags makes it easier to label submissions: does data science and modeling fit under math, compsci, what? Having a datascience tag makes the appropriate choice obvious.

    1. 3

      I agree. Even insanely specific tags like apple-m1, react, or language-model don’t hurt. Tags give users an option to filter out the topics they find uninteresting, and at the same time group related submissions together for the interested, so the more tags the better.

      Also, what about data-science instead of datascience? :)

      1. 1

        There is a sort of circular issue here though with tags being used to ‘guide’ what to submit, for example lobsters has traditionally been less keen on drive-by press releases about new versions of whatever framework (which I think is a good thing from an SNR pov), but the existence of a ‘release’ tag implies that that sort of thing is on topic here. I think in reality it’s shades of grey rather than being in the set of on-topics or not in the set, as defined by the existence of a tag. Likewise ‘culture’ and ‘practices’ seem to be catch-alls for traditionally off-topic, high noise stuff. In short I don’t think the tail should start to wag the dog, wrt the tag system and why you might come to this site rather than any of the others.

        I would tag datascience with math, fwiw. Datascience just being statistics done on a macbook, etc etc

        Having written the above, and seen my sibling comment about apple-m1 tags and the like, if you really need to go down the taxonomy rabbit hole can they at least be hierarchical, eg apple-m1 is a subset of ‘hardware’ and language-model is a subset of ‘plt’, for those that don’t want to fully explore the fractal tag-space, or think they can’t submit their article about nim verification because they can only see a haskell-verification and rust-verification tag.

      1. 1

        It’s a great read for me, but I don’t quite understand the following part:

        Since each closure has its own type, there’s no compulsory need for heap allocation when using closures: as demonstrated above, the captures can just be placed directly into the struct value.

        How does each closure having a distinct type eliminate the need for heap allocation? While i32 and Vec have different types, they both normally live on the stack (unless you explicitly use something like Box).

        1. 3

          He’s saying that since each has its own distinct type, there’s no inherent need to abstract them behind a trait object. They can be stored directly on the stack instead of being behind a trait object pointer (Box<dyn Trait> or &dyn Trait)

          1. 1

            Suppose |x| x + 1 and |x| x + 2 have the same type, then I can do something like vec![|x| x + 1, |x| x + 2], which seems to have the same effect of using trait objects?

            1. 2

              Those two closures do not have the same type. Here’s a Rust Playground example showing this. Each closure is compiled into its own struct which implements the correct traits (from Fn, FnMut, and FnOnce) and which (if they don’t capture from the environment) may be coerced to a function pointer. Their types, even if the closures are identical, are not the same.

              1. 1

                Exactly, and that’s why I said suppose they have the same type :P

                You pointed out that each closure having a unique type lifts the restriction that closures must be put behind a trait object, and I was saying there is no such restriction even if all closures with the same signature share one common type, because in that case we can do stuff like vec![|x| x+1, |x| x+2] which normally calls for trait objects.

                1. 2

                  From a type system pov you’d have two different implementations of Fn/FnMut for the same type, since you want different function bodies each. That would be kind of weird. If you then put two instances into the same vec I’m not entirely sure how Rust would find the correct trait impl without adding extra markers on the struct. Which smells like dynamic dispatch already.

                  1. 1

                    I suppose the compiler can look at the closures and figure out the least restrictive trait for each closure (can I make this Fn, No? then what about FnMut?Also no? Well I guess it’s a FnOnce then) and find the greatest common divisor of all closures in the Vec.

                    1. 2

                      That’s not the issue. Even if there was only one trait for all, you would have overlapping implementations of the trait. The traits have a call method that will need to be implemented differently, one for x+1 and one for x+2.

                      1. 1

                        Oh, I see. In the case where some closures share a common type, instances of that type may include an additional field holding the pointer to its function body, which is essentially dynamic dispatch. That doesn’t call for heap allocation though, because the function pointers typically refer to .TEXT instead of the heap.

                        EDIT: I realized that if you put the instruction for a closure on the stack, then it would not have 'static lifetime and thus cannot be returned from a function. See more discussion here.

        1. 3

          Lobsters isn’t write-only. Submit other stuff, too, or comment on other people’s submissions.

          EDIT: wait a sec, this article is from March. You’ve published ten things since then, and your Lobsters submissions are just in reverse chronological order. Are you just posting your entire archives to Lobsters? Is this just a inbound channel?

          EDIT2: From your homepage:

          I give you the strategy, technology, and skills to become a trusted brand […] Marketing gurus are a dime a dozen. My approach is different – think of me as a sparring partner who will sharpen your skills and plan your attack. We’ll work together to create a cohesive strategy, technology and tools that inspire long-term connections and fuel sustainable growth. It’s about trust. It’s about reputation. It’s about telling your story in a way that gets results.

          1. 2

            Lobsters isn’t write-only. Submit other stuff, too, or comment on other people’s submissions.

            I don’t understand: why can’t a user submit their own content exclusively, as long as the submissions are of high quality? From my perspective, well-received self-marketing materials are still a contribution to the community, even if the author themself may benefit from the submission.

            1. 3

              Good question. From Mitigating Content Marketing:

              We now consistently average over 20k visitors per weekday. Programming is an enormous, growing, lucrative, powerful industry and thus a very expensive demographic to advertise to. A link on our homepage sends traffic that would otherwise cost $15-30k on Twitter, AdWords, or LinkedIn.

              When this is sending attention to celebrate someone advancing the state of our understanding or sharing what they’ve created, it’s the internet at its best as gift economy. Unfortunately, some people see the site as a handful of rubes naively standing around a money fountain, so why not try to take a taste?

              People who are only submitting their own content exclusively are overwhelmingly (not entirely, but overwhelmingly) bad-faith actors who are just using us for free advertising. People who are commenting or submitting other stuff are at least making an effort to participate in the rest of the community.

              I’m being a bit harsher with this comment, because I was more polite with my comment on his previous story:

              Welcome to lobsters! Generally we encourage people to post and comment on articles they haven’t written, too, to participate more in the community.

              So either he’s not reading comments, which means he’s just spamming, or he read my comment and doesn’t care, in which case he’s bad-faith.

              1. 4

                Hey, I am verry sorry that my actions were seen as spammy behaviour, that was not my intention at all. I’ve been reading the philosophy section of lobsters for years and just wanted to contribute a few ideas myself.

                I’ll try to submit some other interesting links as well in the future and try to be more active with the comments.

                Thanks

                1. 3

                  People who are only submitting their own content exclusively are overwhelmingly (not entirely, but overwhelmingly) bad-faith actors who are just using us for free advertising.

                  Sure, but let’s not confuse heuristics with what those heuristics are trying to detect. This particular case feels suspect to me, since the site features a few CTAs for paid services. But I think it’s important to point out that it’s not inherently a bad faith action to only post articles to your own blog. The issue is when the content is trying to make a sale. There’s clearly grey area between “good faith posting of excellent content that only I have written” and “posting on your forum in order to sell you something”. And since we can’t read minds, I think that:

                  Submit other stuff, too, or comment on other people’s submissions.

                  …demanding engagement from someone like this seems unnecessarily hostile, setting a negative tone for newcomers. Why not just flag as spam, provide the feedback, and then let the mods take action if needed? For example, “you seem to only be posting links to your blog which is also trying to sell me services. this isn’t a place for content marketing, and people get banned for it pretty regularly. check out this post for background”

                  1. 5

                    For example, “you seem to only be posting links to your blog which is also trying to sell me services. this isn’t a place for content marketing, and people get banned for it pretty regularly. check out this post for background”

                    Man, that’s much better copy. Strikes the right balance of polite without being too “nice”. Gonna use that instead from now on

                2. 2

                  It’s not a hard and fast rule.

                  @hwayne is doing OP a favor by giving them a heads-up.

              1. 5

                tl;dr; They are sum types just like in Swift or F#.

                1. 5

                  I hope we are going to see a post about how Rust immutability is amazing soon. After that we can have articles about every single ML feature that existed for decades that Rust successfully implemented.

                  1. 8

                    You mean, it’s not a good thing to implement great features of a language that basically nobody is using into a new language that makes things better and more appealing?

                    1. 2

                      No, I mean presenting it as a novel idea when in fact it is around for 30 years is kind of funny.

                      that basically nobody is using into a new language

                      Factually incorrect. Please have a look at the languages in the ML family.

                    2. 4

                      It’s no secret that Rust was heavily inspired by ML, C, C++, etc. Rust itself has very few new language features, the borrow checker being one of them.

                      But Rust appeals to the low-level systems crowd, and brings with itself a ton of nice-to-haves from ML and the like. So people who were previously stuck with C-like enums suddenly have nice options and want to talk about it.

                      1. 4

                        What’s so bad about highlighting the strengths of a programming language?

                      2. 2

                        PEP 634 is also bringing sum types to Python.

                      1. 2

                        UNIX uses the term file descriptor a lot, even when referring to things which are clearly not files, like network sockets

                        Nit: unix uses the term ‘file’ a lot, including to refer to things that are not persistent, hierarchically-accessed records, like network sockets.

                        1. 1

                          From my understanding, a file in Unix’s sense is essentially a stream of bytes.

                          1. 1

                            The term is not entirely well-defined (which is part of the problem). I think that, at the very least, a file encompasses a stream of bytes, but there are files which are not simply streams of bytes. Anything that can be ioctled, for instance.

                        1. 12

                          What ever happened with the dispute over attribution with regards to this and that third party package manager?

                          1. 9

                            Like a modern day David vs. Goliath, David lost.

                            1. 6

                              On the github:

                              We would like to thank Keivan Beigi (@kayone) for his work on AppGet which helped us on the initial project direction for Windows Package Manager.

                              IIRC Beigi at one point said that was enough for him.

                              1. 7

                                The fundamental rule for URLs is that they should not change. So if the https://lobste.rs/t/formalmethods will remain (as redirection), it is OK.

                                1. 4

                                  I agree! However, https://lobste.rs/t/cryptocurrencies returns 404 after cryptocurrency was renamed to merkle-trees :(

                                  1. 4

                                    @pushcx Is it possible to create a redirect for renamed tags?

                                    1. 1

                                      Could always hard-patch in a route in the top or /t/ routers.

                                1. 8

                                  Would this also apply to other tags, in general, like obectivecobjective-c?

                                  1. 15

                                    Yeah, I would also second that.

                                    An alternative is to rename merkle-trees to merkletrees so that it’s consistent with other hyphen-less tag names, but I personally prefer the presence of delimiters.

                                    1. 2

                                      Can always take the Chaotic Good approach and retrofit the rules so names in tags are followed (and preceded?) by hyphens.

                                      1. 3

                                        I propose that names in tags be followed by hyphens only if they end in an “odd” letter (counting starting from 1), so merkle-trees-, formalmethods-, objective-c-, rust, etc.

                                        1. 1

                                          Just strip the hyphens. I want o-b-j-e-c-t-i-vec to be valid, damnit!

                                          1. 8

                                            ObjectiVec is the name of my next OO vector library.

                                    1. 7

                                      Can’t we just inject noises into that register as mitigation? From my understanding, s3_5_c15_c10_1 is accessible to all applications, so it should be easy to overwrite.

                                      1. 9

                                        Towards the end there is a paragraph on the webpage with words to that effect saying it wouldn’t mitigate it, something about doing so would peg the CPU at 100% and still not make the register useless.

                                        Then again the whole point of the webpage is to poke fun at infosec and ultimately goes on to say this isn’t a big deal and people shouldn’t be worried.

                                        1. 3

                                          I think this would technically work, but need one process per cluster running to do it so there would be significant power use and CPU capacity costs.

                                          1. 5

                                            Also, using forward error correction, you can still reliably transfer data at a lower rate, even when some noise is injected. How much lower the rate will be depends on how much noise is injected. I think with that, even if you would be willing to sacrifice a significant amount of power and CPU, this would not work as a practical mitigation.

                                        1. 6

                                          Very interesting but I remain unconvinced. It seems as though once you start adding the required features for productionization the models converge.

                                          The obvious way to improve throughout in a pull based model is to the have producer prefetch a buffer. This mimics the buffer that would exist on the DAG consumers in the push-based model. (In fact it may be better as it is naturally shared amount consumers.) In many real-world systems you will need to add back pressure to the producer in a push-based model, leading to it being basically equivalent to the pull-based model with buffering. (again, except the buffers are in the consumers).

                                          The author does raise an interesting point though, PostgreSQL and CockroachDB materialize the entire table when using with classes. In the past I have heard is that this is treated as some sort of optimization hint in PostgreSQL so I wonder if there is something fundamental about the model that is holding it back, or if it is a “feature” that it works this way.

                                          1. 3

                                            PostgreSQL used to materialize WITH clauses unconditionally, and it was often used as a mechanism for hand-optimizing queries. But as of PostgreSQL 12, by default it only materializes them if they are used more than once, are recursive, or have side effects. Otherwise it folds them into the main query for purposes of generating a query plan. If you still want to use them for manual optimization, you can explicitly say whether to materialize them.

                                            1. 2

                                              +1 Insightful. This is approximately my take too. I’ve written query engines and I opted for something that’s kind of a mix: Next() and Check() being the respective primitive operations to pull and push. The buffer you mention is a materialize operator that’s a logical noop to the plan but something the optimizer can add where it sees fit.

                                              That said, there may be something to leaning more push than I have been, specifically for DAGs, which require hacks in a pull model. Furthermore, in a distributed setting, push works a better because there’s much less back-and-forth. Tradeoff there is throwing a ton of data over the network. Still needs a hybrid.

                                              If you find it interesting, let me know what specifically, maybe I’ll put it on my blogging stack.

                                              1. 1

                                                I guess I’m late to the discussion, but what about coroutines, which appear to be the solution to all producer/consumer problems?

                                              1. 28

                                                Perhaps related: I find that a certain amount of redundancy is very useful for error checking, both by the speaker/author and listener/reader. This might take several forms:

                                                • I repeat myself often when giving talks or writing, so you never have to jump too far back to an earlier point in time / space. If I say something that is very similar but slightly different than before, I’m either doing it to draw attention to an intentional change, or I’ve made a mistake! Either way, people are paying attention closely to the difference.

                                                • When writing math, I never rely on symbols or words alone, but a mixture of both. The rule I follow is that you should be able to understand the sentence even if most of the symbols are deleted. This is invaluable for catching mistakes, and also for establishing notation – using words alone can be imprecise, and using symbols alone might be disorienting for a student or for someone familiar with slightly different conventions. The document should be skimmable, and every theorem should be more or less self-contained.

                                                • When writing code, I prefer to comment each logical block with some explanation of “what, why, how”, even if an educated reader might be able to infer from context. The point is that if their inference about the code does not match my comments, then they can confidently conclude the code has a bug, and it’s not just some mysterious hack they should be afraid of changing. Yes, this means maintaining comments along with the code, but I find that to be good practice anyway.

                                                A good way to summarize might be: When communicating, it’s important to keep in mind the predictions / assumptions your audience will make about your speech / writing. They’ll spend a lot of cognitive effort working through the things you say that don’t match their predictions. So, make sure you only surprise them with the new information you’re trying to communicate, rather than potential distractions like unfamiliar notation or syntax. (non-standard notation is ok and sometimes preferable, but the meaning should be evident from context!)

                                                1. 9

                                                  The rule I follow is that you should be able to understand the sentence even if most of the symbols are deleted.

                                                  As someone bad at math, I appreciate you.

                                                  1. 12

                                                    Heavy use of symbols is an underappreciated roadblock for a lot of people who might otherwise be able to grasp math concepts, I think.

                                                    As someone who didn’t study math beyond basic calculus, I’ve had the experience of being confronted with a page full of unfamiliar glyphs and having no idea what to make of it, but then understanding the idea just fine once someone translated it to English for me.

                                                  2. 5

                                                    In one concrete example, I try to never use “former” and “latter” to refer to previous things because it requires a jump. You can usually summarize the options in a small phrase.

                                                    1. 3

                                                      I wonder how many, like me, have to go through mental gymnastics to separate “former” and “latter”. I know latter comes after former because of the resemblance to “later”.

                                                      1. 2

                                                        I use the same mnemonic!

                                                        Though I wouldn’t go so far as calling it gymnasgics (for me personally), I do think it inherently causes a pause to backtrack and re-parse.

                                                        Another point for the Finnish language because of having a tighter coupling to all the other words referring to something earlier or later. I can’t remember how this is properly expressed in German or French, but Swedish also gets this quite correct.

                                                        English just makes it unnecessarily difficult.

                                                        1. 2

                                                          I thought the English construction was somehow derived from French (or Norman, rather) but apparently it’s from Old English if this blog post is to believed: https://www.grammarly.com/blog/former-vs-latter/

                                                    2. 2

                                                      This is an underrated statements. I find that the “avoid needless words” sentiment does more harm than good, even in fiction, and can be disastrous in technical writing.

                                                      Human speech naturally has some redundancy in it, and the higher the probability of errors and the higher the cost of those errors, the more redundancy is required. Clarity and redundancy aren’t mutually exclusive, sometimes clarity is redundancy.

                                                      1. 1

                                                        The point is that if their inference about the code does not match my comments, then they can confidently conclude the code has a bug, and it’s not just some mysterious hack they should be afraid of changing.

                                                        Nitpicking: what if it’s the comments that are incorrect or out of sync with the code? With both code and comment, now you have two sources of truth…

                                                        1. 9

                                                          Exactly! If the code doesn’t do what the comments say it does, whoever is reading will file an issue and the matter will be sorted out. This redundancy is really helpful for hunting down bugs and avoids two problematic scenarios:

                                                          1. If someone has changed the code but not the comment, they probably didn’t do a thorough enough job reading the surrounding context to actually understand the impact their changes will have on the rest of the code. So they’ve either introduced a bug, or they forgot to verbally explain the reason for their changes, which itself warrants a bug report.

                                                          2. I feel that code takes on just as much (or more) technical debt when the documentation is missing as when the documentation is incorrect. Incorrect documentation (as long as it is colocated with the code it describes) can act as a big red flag that something is wrong. When documentation is missing, anyone who attempts to modify the code will be afraid to make any changes, afraid that the suspicious lines really do serve some mysterious purpose that they just don’t understand yet. If someone is afraid to make changes, they might instead add code to wrap the mysterious bits

                                                          Some caveats:

                                                          • It does take some discipline to work this way, and I haven’t tried enforcing this style in large codebases / large teams. I have seen it work well on smaller projects with one or two like-minded collaborators. It’s especially useful on solo projects where I might work on the project in bursts of a few days once every few months. This way I document my in-progress thought process and can be more confident that local changes to the code will work as intended, even if I haven’t touched the code in three months.

                                                          • For documentation that is not written inline with the code itself, and intended to be consumed by users of the software, not developers, I feel differently about missing vs incorrect documentation. In this scenario, it’s much harder to keep the documentation in sync, and incorrect documentation is more dangerous.

                                                          1. 4

                                                            I think that’s the point, just like parity bits on memory. At least you know something isn’t right.

                                                        1. 76

                                                          Imagine you lived in a country that has a strange tradition where children, once a year, go to stranger’s homes and are given candy. Then, someone, in order to study the safety of this tradition, decide to give out candies laced with a mild and provably non-lethal toxin to children. This someone has a fool-proof plan to inform the children’s parents before anyone gets hurt. Not all parents tests candies for toxins, but enough do – since things like this can happen and parents of this country takes safety reasonably seriously. One parent detected this toxin in the children’s candies. All the parents are informed and said candies were thrown out. No harm, no foul?

                                                          Imagine you lived in a country where no neighbors can be trusted. Imagine you worked in a low trust environment. Imagine stopping the OSS model because none of the contributors can be trusted.

                                                          That’s not the kind of world we want to operate in.

                                                          1. 32

                                                            I think this sums up why I felt a bit sick about that whole story. It undermines the community and is essentially antisocial behaviour disguised as research. Surely they could have found a way to prove their point in a more considerate way.

                                                            1. 8

                                                              Surely they could have found a way to prove their point in a more considerate way.

                                                              Could you propose some alternative approaches? As the saying goes, POC || GTFO, so I suppose the best way to prove something’s vulnerability is a harmless attack against it.

                                                              The kernel community appears to assume good faith in every patch they receive from random people across the Internet, and this time they get mad when the researchers from UMN prove this wishful assumption to be false. On the other hand, cURL goes to great lengths to prevent the injection of backdoors. The kernel is clearly more fundamental than any userland utilities, so either the cURL developers are unnecessarily cautious against supply chain attacks, or the kernel hackers are overly credulous.

                                                              1. 16

                                                                Another possible approach is to ask the lead maintainers if you can perform such an experiment. Linux has a large hierarchy and I think the top level maintainers pull huge patch sets as a bundle.

                                                                If they had permission to use an unrelated e-mail address then it could be pretty much as good. Honestly I would think a umn.edu address would give more credence to a patch, since it seems like its from someone a reputable institution.

                                                                Of course they might not agree, in which case you don’t have consent to do the research.

                                                                1. 18

                                                                  This. You ask for permission. Talk to the kernel maintainers, explain your research and your methods, and ask if they want to participate. You can do things like promise a maximum number of bogus patches and a timeframe where they may occur, so people know they won’t get deluged with crap for the rest of time. You could even make a list of email addresses the patches will come from ahead of time and hand it to someone trustworthy involved in the kernel project who won’t be reviewing those patches directly, so once the experiment is over they can easily revert all the bad patches even if the researcher is hit by a bus in the mean time. It’s not that hard to conduct this sort of research ethically, these researchers just didn’t do it.

                                                                  1. 6

                                                                    That’s a fair point, but I want to point out that the non-lead reviewers are still unknowingly participate in the research, so that’s still not super ethical to them. Doing so merely shifts the moral pressure to the lead maintainers, who need to decide whether or not to “deceive” the rest of the community.

                                                                    But yeah, only lead reviewers can revert commits and have enough influence in the tech world, so getting their permission is probably good enough.

                                                                    1. 6

                                                                      A top comment in a cousin thread on HN suggests, that with proper procedure, AFAIU actually all reviewers could be informed. The trick seems to be to then wait some long enough time (e.g. weeks or more) and send the patches from diverse emails (collaborating with some submitters outside your university). There should be also some agreed upon way of retracting the patches. The comment claims that this is how it’s done in the industry, for pen testing or some other “wargames”.

                                                                  2. 5

                                                                    In the subsystems that I’ve contributed to, I imagine that it would be possible to ask a maintainer for code review on a patchset, with phrasing like, “I am not suggesting that this be merged, but I will probably ask you to consider merging it in the future.” After the code review is given, then the deception can be revealed, along with a reiterated request to not merge the patches.

                                                                    This is still rude, though. I don’t know whether it’s possible to single-blind this sort of study against the software maintainers without being rudely deceptive.

                                                                    1. 2

                                                                      I think you could ask them if you can anonymously submit some patches sometime over the next few months and detail how some of them will contain errors that you will reveal before merging.

                                                                      They might say no, but if they say yes it’s a reasonably blind test, because the maintainer still won’t know which patches are part of the experiment and which are not.

                                                                      Another way to do it would be to present the study as something misleading but also do it in private and with compensation so that participants are not harmed. Say you just want to record day-in-the-life stuff or whatever and present them with some patches.

                                                                      Finally, you could look at patches historically and re-review them. Some existing patches will have been malicious or buggy and you can see if a more detailed review catches things that were missed.

                                                                2. 17

                                                                  This research was clearly unethical, but it did make it plain that the OSS development model is vulnerable to bad-faith commits. I no longer feel what was probably a false sense of security, running Linux. It now seems likely that Linux has some devastating back doors, inserted by people with more on their minds than their publication records.

                                                                  1. 15

                                                                    This is something every engineer, and every human needs to be aware at some point. Of course, given enough effort, you can fool another human into doing something wrong. You can send anthrax spores via mail, you can fool drivers to drive off a cliff by carefully planted road signs, you can fool a maintainer into accepting a patch with a backdoor. The reason it doesn’t happen all the time is that most people are not in fact dangerous sociopath having no problem causing real harm just to prove their point (whatever that is).

                                                                    The only societal mechanism we have for rare incidents such as this one is that they usually get eventually uncovered either by overzealous reviewers or even by having caused some amount of harm. That we’re even reading about patches being reverted is the sign that this imperfect mechanism has in fact worked in this case.

                                                                  2. 2

                                                                    This country’s tradition is insanely dangerous. The very fact that some parents already tested candy is the evidence that there was some attempts to poison children in the past — and we don’t know how many of these attempts actually succeeded.

                                                                    So, if we assumed that the public outcry from this event lead to all parents testing all the candy, or changing tradition altogether, then doing something like this would result in more overall good than evil.

                                                                    1. 10

                                                                      Meanwhile in real life, poisoned Hallowe’en candy is merely an urban legend: According to Snopes, “Police have never documented actual cases of people randomly distributing poisoned goodies to children on Halloween.”

                                                                      The very fact that some parents already tested candy is the evidence that there was some attempts to poison children in the past

                                                                      Not really. Again in the real world, hospitals run candy testing services in response to people’s fears, not actual risks. From the same Snopes article: “Of several contacted, only Maryland Hospital Center reported discovering what seemed to be a real threat — a needle detected by X-ray in a candy bar in 1988. … In the ten years the National Confectioners Association has run its Halloween Hot Line, the group has yet to verify an instance of tampering”.

                                                                  1. 4

                                                                    I’m not familiar with the development of the Linux kernel, but shouldn’t all commits be reviewed by a human contributor before entering the source tree? I mean, if the culprits from UMN didn’t publish that paper, would these invalid commits get unnoticed for good? In that case, any malicious user could sign up for an email account and inject garbage or even backdoor into the kernel, which sounds like a big problem in the review process.

                                                                    1. 17

                                                                      I’m a former kernel hacker. Some malicious commits were found by human review. Humans are not perfect at finding bugs.

                                                                      As I understand it, the vast majority of kernel memory bugs are found by automated testing techniques. This isn’t going to change as long as the kernel is written mostly without automatic memory safety.

                                                                      1. 6

                                                                        Thanks for the input, but I was not talking about detecting bugs in kernel code written with good faith. What surprises me is that the kernel maintainers seem to assume every patch to be helpful, and merge them without going through much human review. The result is dozens of low-effort garbage patches easily sneaked into the kernel (until the paper’s acceptance into some conference caught attention). Software engineers typically don’t trust user input, and a component as fundamental as the kernel deserves even more caution, so the kernel community’s review process sounds a little sloppy to me :/

                                                                        1. 5

                                                                          the kernel maintainers seem to assume every patch to be helpful, and merge them without going through much human review.

                                                                          You seem here to be assuming up-front the conclusion you want to draw.

                                                                          1. 1

                                                                            If the patches were carefully reviewed by some human on submission, why didn’t the reviewer reject them? Well, maybe there are some ad-hoc human reviews, just not effective enough. These bogus commits were unnoticed until the publication of the paper, so it’s not like the kernel community is able to reject useless/harmful contributions by itself.

                                                                            1. 3

                                                                              If the patches were carefully reviewed by some human on submission, why didn’t the reviewer reject them?

                                                                              Because there are any number of factors which explain why a bug might not be caught, especially in a language which has an infamous reputation for making it easy to write, and hard to catch, memory-safety bugs. Assuming one and only one factor as the only possible explanation is not a good practice.

                                                                              1. 1

                                                                                As others have said, review is not easy. This is especially true when done under time pressure, as is essentially always the case in FOSS development. Pre-merge code review is a first line of defense against bad commits; no one expects it to catch everything.

                                                                      1. 6

                                                                        I use UNIX sockets whenever I can. Using TCP/IP for connections to local programs seems so excessive and is involving a lot more machinery and overhead. I’ve even seen people use SSL by mistake for local connections when using templating config managers.

                                                                        Most common server programs support UNIX sockets: SQL databases, memcache, pythons wsgi servers, nginx, Apache, etc. Maybe the most notable exception is rabbitmq? I just think people don’t know they exist, or find them mysterious.

                                                                        1. 2

                                                                          You can, of course, use socat as a tcp proxy to a Unix domain socket. You’ll lose the performance benefits of UDS but can interact with TCP-only services without needing to pull in a network stack into your application.

                                                                          1. 2

                                                                            Using TCP/IP for connections to local programs seems so excessive and is involving a lot more machinery and overhead.

                                                                            Both TCP/IP and UNIX sockets are abstracted away by the kernel in my mental model. Where can I read more about their overheads?

                                                                            1. 2

                                                                              The main thing that comes to mind is this (postgres) comparison:

                                                                              https://momjian.us/main/blogs/pgblog/2012.html#June_6_2012

                                                                              I have done a number of private benchmarks over the years which find about the same but it is sometimes very obvious because on debian at least the default postgres connection is a unix socket and when you start using TCP/IP instead (eg when using docker for integration tests) some applications can slow down a bit - particularly noticeable on test suites that output timing numbers.

                                                                              1. 1

                                                                                Thanks for the pointer! That’s a good read, but I want to understand the overheads from a theoretical perspective, like, which steps are handled under the hood by the kernel when I use a UNIX/TCP socket?

                                                                                1. 2

                                                                                  My own (perhaps naive) mental model is that the TCP/IP socket is approximately all the work described in my undergrad TCP/IP textbook:

                                                                                  • copy into a TCP frame
                                                                                  • copy into an IP packet (though I know these two steps are clubbed together in practice)
                                                                                  • figure out where to send the IP packet - easy as it’s localhost
                                                                                  • pulling apart the IP packet
                                                                                  • pulling apart the copied TCP frame (again, clubbed together normally)

                                                                                  as opposed to a unix socket which is basically two files, one in each “direction”. And on unix a file is “just” a seekable stream of bytes.

                                                                                  I suppose if I wanted to know exactly which steps are in userland vs which are in the kernel I would review the kernel syscalls that my fave language’s implementation uses.

                                                                                  My ideas for intro reading (ie the books I liked):

                                                                                  • For networks, a) Tenenbaum’s Computer Networks or b) TCP/IP Illustrated vol 1
                                                                                  • For files, the relevant bits of a) Tenenbaum’s Operating Systems or b) the Operating Systems dinosaur book

                                                                                  Two books I want to read are Robert Love’s Linux Kernel Development and his Linux System Programming. I think they would clear some mist our of my head in this area.

                                                                          1. 12

                                                                            I got this problem when interviewing for my first programming job. I brillantly solved it by arguing that a winning board in tictactoe is equivalent to having 3 numbers from 1-9 that add up to 15, and then summing all of each player’s 3-subsets.

                                                                            I did not get the job.

                                                                            1. 2

                                                                              Could you elaborate on how the positions on the board map to the numbers? The following layout doesn’t seem correct, since 1+2+3=6 but that’s still a win for X.

                                                                              123      XXX
                                                                              456  =>  O
                                                                              789      OO
                                                                              
                                                                              1. 6

                                                                                You map it to a magic square instead:

                                                                                276
                                                                                951
                                                                                438
                                                                                
                                                                                1. 1

                                                                                  Nice! Did you come up with it on the spot?

                                                                                  1. 1

                                                                                    This is the key to that solution. As someone responsible for giving technical interviews (not at google lol), you would have passed as far as I’m concerned.

                                                                                  2. 1

                                                                                    You can change rows by adding/subtracting 9, and change columns by adding subtracting 3. This is equivalent to shifting the row/col towards/away from the center. The property that the numbers add to 15 only holds around the center.

                                                                                    But still, this doesn’t solve the question how to avoid flagging e.g. 3, 4, 8 as a winning position.

                                                                                1. 3

                                                                                  If you are interested in PicoLisp, you may also like uLisp.

                                                                                  1. 2

                                                                                    There is a popular opinion among developers that SQLite is not suitable for the web, because it doesn’t support concurrent access. This is a myth. In the write-ahead log mode (available since long ago), there can be as many concurrent readers as you want. There can be only one concurrent writer, but often one is enough.

                                                                                    I wonder why SQLite3 developers choose to allow only one simultaneous write transaction. Does supporting multiple concurrent writers complexity the codebase or have other undesirable implications?

                                                                                    1. 6

                                                                                      For many applications what you want to do is actually take the WAL-recommendation seriously and enable it.

                                                                                      PRAGMA journal_mode=WAL;
                                                                                      

                                                                                      So much of “SQLite is slow” could be avoided by doing that. If you think you’d benefit from concurrent writes that’s most likely what you are looking for.

                                                                                      1. 1

                                                                                        The problem with the inability to have concurrent writers is also a usability problem, since SQLite does not handle that for you, you need to architect the application so that there is never concurrent writes. An alternative would be to have a lock.

                                                                                      2. 5

                                                                                        Yes. If you have two writers in a transaction and they conflict, one of them has to be able to undo partial bits of its transaction. That complicates the design of data structures and will make performance worse if you never have more than one writer. For most consumers of an embedded database, there is at most one thing writing (sometimes zero - I’ve seen sqlite used to provide an indexed data source that’s easy to update independently of the application build but never actually modified by the database).

                                                                                        1. 1

                                                                                          almost read-only database :)

                                                                                          Seems like a good fit for things like RDF HDT

                                                                                        2. 3

                                                                                          As soon as you let multiple processes change database state, you need to worry about changes which affect the same row in different ways. You end up in eventual consistency land, or you do as sqlite3 does and ensure strong consistency with a single writer.

                                                                                          1. 2

                                                                                            My understanding is, concurrent write requires independent rollback journal files for each transaction to implement correctly. However, SQLite’s rollback journal doesn’t support concurrency at all (single reader / writer). In SQLite’s WAL mode, concurrent readers are possible, but because they only have one write-ahead log file, it cannot have multiple write transactions (you cannot interleave write transactions in the WAL file). I wrote a little about this speculation in https://dflat.io/notes/mwmr/

                                                                                            1. 1

                                                                                              Hey I am sort of okvs expert. If you need about programming let me know. Mind the fact that mdbx works primary from memory, for bigger than memory dataset, it might perform less well (check that claim). Also apparantly the API is not so good (check that claim too). Ping via lobsters messages :)

                                                                                              What about rocksdb?

                                                                                          2. 1

                                                                                            I wonder why SQLite3 developers choose to allow only one simultaneous write transaction.

                                                                                            It is much easier to implement isolation: there is never any write conflicts, hence snapshot isolation is trivial, among other things MVCC is not even necessary.

                                                                                            Something that is unclear is how fast a write transaction is seen by other transactions, in theory according to ACID rules it should be immediate… In any case, it seems there is a performance optimization opportunity to declare transaction read-only. There is also an opportunity to allow to read data from the data file lagging a little behind the WAL. It is also a simplification, when you accept you might read old data according the WAL, you do not need to keep around the WAL data in memory (or read the WAL at each transaction). LMDB has a read-only flag.

                                                                                            Last time I checked, in SQLite there is no builtin support to enforce a single writer, so the user needs to handle that. Since SQLite support multiple processes, it leads to more complication to synchronize multiple separate process, each of which would have several writers. Unlike a single process database, where is it easier and would be more performant. Mind the fact that PostgreSQL rely on multiple processes, but they are all forked from the same, my guess is that shared memory or whatever is easier to do in that setup.

                                                                                            ref: about isolation in SQLite https://www.sqlite.org/isolation.html

                                                                                          1. 1

                                                                                            If languages cannot be “functional”, can they be “safe”?

                                                                                            On the one hand, it can be argued that it’s not what programming languages do, it’s what they shepherd you to. Even in a safe language like Rust, you may still shot your foot by doing funny things like 1 & 2.

                                                                                            On the other hand, it’s virtually impossible to write type/memory-incorrect programs in Rust if you don’t deliberately ask for trouble, so the corner cases above don’t really defeat the language’s safety. Ultimately, I can define a dialect of Rust by stripping all its unsafe parts, and that should be an absolutely safe language since you literally cannot write type/memory bugs with it.

                                                                                            1. 1

                                                                                              But we just created a pull request where the “base branch” is a commit hash, not a branch. And anyone can create a new commit hash in the base repository, since GitHub shares commits between forks.

                                                                                              I can’t wrap my mind around it. What does “the base repository” refer to? I don’t think you can commit to the repo being forked without upfront write access to it.

                                                                                              My understanding is that when you update the base branch of the pull request from step (2) to the commit hash from step (3), you essentially point the base branch to a commit in your own fork, which may contain malicious Actions workflow code, but I fail to understand how does this have anything to do with the base-branch-may-be-a-commit-hash bug. Can’t I just point the base branch to a new branch in my fork?

                                                                                              1. 6

                                                                                                I don’t think you can commit to the repo being forked without upfront write access to it.

                                                                                                See the end: “because GitHub shares commits between forks”. For reasons of efficiency, all repos that are related by forking are actually a single repo in the GitHub backend, and they just have their branches and tags namespaced. (And all the ref names are rewritten by their custom Git server on the way in and out, I guess.) If you fork some repo, make a commit, and then go to https://github.com/original-owner/reponame/commit/yourcommithashgoeshere you will see your commit, even though it “doesn’t exist” in the original repo. The UI does say “This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository”, but the logic for PR base refs (prior to the bug fix) didn’t care about checking that: it would just look up the commit, find it, and happily consider it to be part of the upstream repo.

                                                                                                Can’t I just point the base branch to a new branch in my fork?

                                                                                                No, it won’t accept a ref from another repo (barring the weird behavior discussed above). The field just contains a ref name, not owner/repo/ref. So you have to fool it into checking out something that it thinks is in the upstream repo but actually is controlled by a third party.

                                                                                                1. 1

                                                                                                  No, it won’t accept a ref from another repo (barring the weird behavior discussed above).

                                                                                                  I suppose you meant to say “a ref from another namespace in the same underlying repo”? There is no “another repo” on GitHub’s backend after all.

                                                                                                  Anyway, I think the fundamental issue is that commit hashes are not namespaced. I hope they have fixed that as well.

                                                                                                  1. 3

                                                                                                    Clearly you know what I mean :-P

                                                                                                    And no, the commit hashes aren’t namespaced and likely won’t be. I guess they just have to recognize (and probably are recognizing at the moment) that any time they’re parsing a refspec that has to respect the namespacing, they need to filter down to the right kinds of refs.

                                                                                              1. 3

                                                                                                If you’ve never written a Lisp before, I can recommend Janet as a nice introduction to that world.

                                                                                                1. 1

                                                                                                  May I ask why do you like it, and how it compares to more traditional choices like Scheme and Common Lisp?

                                                                                                  1. 2

                                                                                                    Regarding the second point, they’re similar. The syntax is the same, and you have “similar enough” primitives for working with sequences. There are a few “big” differences, like most sequential data structures not being linked lists, but it still feels lisp-y to me.

                                                                                                    Racket is another beginner-friendly Lisp if you’re looking for something resembling Scheme. It has a nicely-written guide. The Janet documentation might be a bit confusing if you don’t have prior experience with lisp.

                                                                                                    1. 2

                                                                                                      I have a few reasons that I like it.

                                                                                                      I tried it on a whim for Advent of Code 2020, and ended up liking the PEGs quite a lot.

                                                                                                      Then, as I’ve used it more and more, I’ve grown to like it for it’s macros, and for the rather practical sensibilities it has around it’s data structures. It is still very young in an ecosystem sense, so it’s a great chance to learn how to build things for a small ecosystem, and/or to practice hacking on C extensions, or writing things in Zig (there are some people working with zig with it).

                                                                                                      I don’t have deep experience with Scheme and Common Lisp.