My company maintained a fork of LLVM, but not in any disciplined way. Because our changes were not structured as a logical set of patches, it was prohibitively difficult to merge from upstream (I tried once and gave up). So we ended up stuck indefinitely on an LLVM release several years out of date. Do not recommend.
Comments here remind me about testing under time pressure quote from Jamie Zawinski in Peter Seibel’s Coders at Work:
Seibel: In retrospect, do you think you suffered at all because of that? Would development have been easier or faster if you guys had been more disciplined about testing?
Zawinski: I don’t think so. I think it would have just slowed us down. There’s a lot to be said for just getting it right the first time. In the early days we were so focused on speed. We had to ship the thing even if it wasn’t perfect. We can ship it later and it would be higher quality but someone else might have eaten our lunch by then.
There’s bound to be stuff where this would have gone faster if we’d had unit tests or smaller modules or whatever. That all sounds great in principle. Given a leisurely development pace, that’s certainly the way to go. But when you’re looking at, “We’ve got to go from zero to done in six weeks,” well, I can’t do that unless I cut something out. And what I’m going to cut out is the stuff that’s not absolutely critical. And unit tests are not critical. If there’s no unit test the customer isn’t going to complain about that. That’s an upstream issue.
I hope I don’t sound like I’m saying, “Testing is for chumps.” It’s not. It’s a matter of priorities. Are you trying to write good software or are you trying to be done by next week? You can’t do both. One of the jokes we made at Netscape a lot was, “We’re absolutely 100 percent committed to quality. We’re going to ship the highest-quality product we can on March 31st.”
And by using Netscape over its quality assured competitors, you probably proved his point :) At least we live in a time period where products evolve at a great speed, making looking back fun and insightful.
Is there is any situation where I really, really want tests is that kind of frantic situation where everyone tries to pile up as much features as they possibly can in a short amount of time. Not for every feature, but definitely for the core stuff. Just to avoid the situation where a user could not open the app anymore because someone changed the version number in the About dialog. And everyone was too busy to check it before shipping.
With tests, I can refactor my code. Without tests, your code will get worse and worse because you don’t feel confident enough to refactor
I wonder to what extent this is true. Poor quality tests that test the implementation is what it is also inhibit refactoring.
How do we know that codebases without tests are less refactorable? Not being refactored is evidence, but weak evidence - it might be a sign that refactoring is not needed.
How do you get out of this? Tests need to be non-negotiable. Tests are part of being a professional engineer.
How very Taylorist. The high paid people who don’t do the work will tell the plebs exactly how they should do the work.
How do we know that codebases without tests are less refactorable?
Experience. Every large codebase I’ve seen has had a quagmire somewhere or other with the following properties:
It’s bad. In the sense of much more complex than it needs to be, and filled with bugs, and being poorly tested, and probably also being extremely difficult to test even if you wanted to (e.g. from mixing state and IO with logic).
Everyone knows it’s bad.
Everyone is afraid to touch it, because if you touch it stuff will break. The normal way you feel comfortable modifying code is either from tests, or from understanding it, but the quagmire supports neither.
Many people know some ways to make it less bad, but doing so would (i) require a ton of time, and (ii) likely break stuff in the process.
Thus no one makes changes to the quagmire except for really local patches, which over time makes it even worse.
Compare Postgres vs. SQLite’s respective willingness to undertake invasive refactorings, and then compare each project’s test coverage. Just one data point but I think it’s telling.
Yeah, there are two opposite failure modes: (i) not testing tricky things, and (ii) over-testing simple things, and tying tests to the implementation. Both are bad.
EDIT: I’m having trouble telling if you’re arguing against having any tests at all. If you are, have you worked on a codebase with >10,000 LOC? At a certain point, tests become the only reliable way to make sure complex parts of the codebase work correctly. (It remains important not to test trivial things, and not to tie the tests too closely to the implementation.)
I’ve got files with several thousand lines of code. Believe me when I say you have to test things, but automated tests are not necessarily valuable for all codebases.
The problem is that you’re expecting the same developers who wrote the awful codebase to write the testing suite. Whatever reason for the terrible code (time pressure, misaligned incentives, good ol’ ineptitude) still apply to the development of the tests.
Meaning if the code is really bad, the tests are going to be really bad, and bad tests are worse than no tests.
How do we know that codebases without tests are less refactorable?
In my case, 20 years experience, 11 of which have involved decent-to-good test coverage (I don’t mean just line coverage, but also different types of tests to verify different aspects, such as load tests). In a well-tested code base, some tests will simply break, ideally within milliseconds, if you do anything wrong during a refactor. In an untested code base an unknown number of interactions will have to be manually tested to make sure every impacted piece of functionality still works as intended. And I do mean “unknown”, because most realistically sized code bases involve at least some form of dynamic dispatch, configuration, and other logic far removed from the refactored code, making even the job of working out which parts are impacted intractable.
Having automated tests is such a no-brainer by now, I very much agree with the author. But the author isn’t telling developers they need to write tests, they are telling managers they need to allow developers time to write tests.
I have some things in LLVM that have good test coverage, and some that don’t, and some that do but only in a downstream fork. People routinely refactor the things with good test coverage without my involvement. They routinely break things in the second and third category, and the third is usually fairly easy for me to fix when I merge from upstream.
How do we know that codebases without tests are less refactorable? Not being refactored is evidence, but weak evidence - it might be a sign that refactoring is not needed.
I once had to upgrade an existing site (built by someone else) to newer versions of CakePHP because of PHP support (their hosting provider stopped supporting old PHP versions, and that version of CakePHP would break). In which they’d completely overhauled the ORM. The code contained very tricky business logic for calculating prices (it was an automated system for pricing offers based on selected features). None of it was tested. In the end, I had to throw in the towel - too much shitty code and not enough familiarity with what was supposed to be the correct behaviour of that code.
The company had to rewrite their site from scratch. This would not have been necessary if there had been enough tests to at least verify correct behaviour.
In another example, I had to improve performance of a shitty piece of code that a ex-colleague of mine had written. Again, no tests and lots of complex calculations. It made things unnecessarily difficult and the customer had picked out a few bugs that crept in because of missing tests. I think I managed to “test” it by checking the output against a verified correct output that was produced earlier. But there were many cases where code just looked obviously wrong, with no good idea on what the correct behaviour would’ve been.
In the first case it sounds like the core problem was business logic intertwined with ORM code, and secondarily a lack of indication of intent. Tests would actually have helped with both.
And in fairness the second one also sounds like tests would be the solution.
Indeed. Now to be fair, both were shitty codebases to begin with, and the same lack of discipline that produced such code resulted in the code not having tests in the first place. And if they had written tests, they’d likely be so tied to the implementation as to be kind of useless.
But tests are what make outside contributions more possible by providing some guardrails. If someone doesn’t fully grok your code, at least they’d break the test suite when making well-intentioned changes. Asserts would also help with this, but at a different level.
With tests, I can refactor my code. Without tests, your code will get worse and worse because you don’t feel confident enough to refactor
I wonder to what extent this is true.
True enough that my job the past 2 months was about porting a legacy component that meshes very poorly with the current system (written in C instead of C++, uses a different messaging system…). Having re-implemented a tiny portion of it I can attest that we could re-implement the functionality we actually use in 1/5th to 1/10th of the original code. It was decided however that we could not do that. Reason being, it is reputed to work in production (with the old system of course), and it basically has no test. To avoid the cost of re-testing, they chose to port the old thing into the new system instead of doing it right.
I’ve heard some important design decisions behind the Boeing 737 MAX had a similar rationale.
Poor quality tests that test the implementation is what it is also inhibit refactoring.
Honestly it’s not that hard to make tests that test behaviors and contracts. I’d even say it’s harder to write tests that are so tied to the implementation that it makes refactoring hard
The way I remember the respective meaning of theirs vs. ours in merge vs. rebase is that merge is FROM $OTHER_BRANCH INTO $CURRENT_BRANCH, while rebase is FROM $CURRENT_BRANCH ONTO $OTHER_BRANCH.
It’s also an academic distinction because the correct way to resolve a conflict is completely independent of which is “ours” and which is “theirs”. Either way what you should do is merge the logical intent of the two changes. I wrote more about this at http://h2.jaguarpaw.co.uk/posts/git-rebase-conflicts/
Git almost always uses commits in the first sense, and indeed that is their data model behaviour as well.
Cherrypicking changes the meaning because it would otherwise be useless. But that’s something specific to cherrypicking (and base-swapped rebase I assume).
I agree. However, people often tend to think as a commit as a set of changes (as evidenced by some comments here), so I think that proves my point that there is a lack of conceptual clarity.
Empirically I agree: people are confused by this. I am not sure the fault is the word “commit” though. I am curious what you would suggest instead. To play devil’s advocate and defend git, let’s look at the cherry-pick help:
Given one or more existing commits, apply the change each one introduces, recording a new commit for each.
Conceptually, this doesn’t necessarily rely on the concept of a set of diffs, as in a patch file. You can still have “commit” mean purely “committing the entire state of the working directory to disk” and have a coherent concept of cherry-picking: the “change being introduced” is merely the difference between the commit and its parent. Hence cherry-picking is fundamentally contextual and implicitly references two commits. And for that reason requires the -m flag on commits with multiple parents.
Cherry-picking also sets the state of the current working directory. When working with individual commits, I almost always want to also affect the current state in some way. So in that sense they are inherently intertwined. You could call them “revisions” or “changesets” or whatever, but I don’t think that would change the fact that in most workflows they are inherently coupled.
How so? In both cases IIUC you are just moving HEAD to point to a different commit.
If I asked you raise your hand, I think you would understand that to mean that you also have to move your arm, and the distinction isn’t really that important. It is not a problem in everyday speech that things stand in for other things.
Is it different for Git? HEAD is the name we give to the “current” commit used as the basis of comparison for the working directory state. So we already have different names for #1 and #2. But almost no one says “what commit is HEAD pointing at?”–they say “what commit are you on?” The distinction is there, but most of the time it’s not that important. I wonder if what you’re criticizing is not a feature of Git, but a feature of language itself.
The weird thing about HEAD is that it can have different types depending on context. In non-detached state, it’s a reference to a reference to a commit: a reference to the current branch, which itself is a reference to a commit. But in detached state, one of the levels of indirection collapses: HEAD is just a reference to a commit.
If you check-out a commit, you tell git to set HEAD to the state associated with that commit.
If you cherry-pick a commit, you tell git to compute the difference between that commit and its parent, try to somehow “apply” that difference to the current HEAD, create a new commit and set HEAD to that.
Both can fail if your working directory contains uncommited changes. The latter does a lot more, and it can also fail in a new way: When the contents of a file that is changed by the commit you’re cherrypicking have changed, you get a merge conflict. This is not possible (AFAIK) when checking out a commit.
Maybe you’re right and I am being pedantic. But in general I think it helps a lot when understanding concepts when you have clear names and not conflate things. With the current naming, it is hard to explain to a beginner what a commit is because it an overloaded term.
With the current naming, it is hard to explain to a beginner what a commit is because it an overloaded term.
Overloading the term would be to use the same term for different purposes or concepts. In git a commit is one thing. It is a recording the revisions made to a set of files. You can record them and once recorded you can consume them in multiple ways, each of which has a separate term. The word serves as both a noun and a verb but this doesn’t make it overloaded because the verb form is the act of creating the noun.
The two concepts you referred to have specific names. You used them: checkout and cherry-pick. Those are different actions you can take that each consume a commit (and create a new commit in the cherry-pick case). You can explain the relationship between the working directory and the commit to a beginner quite simply: if you have uncommitted changes to tracked files or even untracked files in your working directory, either stash them or commit them prior to consuming any other commit in any way.
My point is that a commit refers to a snapshot, but the term “commit” is also used to refer to the diff of the commit and its parent.
For example, in github if you click a commit SHA it will show you the diff with its parent, or when you cherry-pick it also considers diffs.
I think this is confusing to beginners (which most people here obviously aren’t) and a more explicit name should be used to distinguish between the two.
How does a commit affect the state of the working directory? It records the staged changes, which come from the working directory. I don’t know if any file changes it makes outside of the .git directory though.
This didn’t even list my favourite Go footgun. Updates to fields of slice types are not guaranteed to be atomic and it is undefined behaviour to write to them concurrently from two threads (goroutines that are scheduled in parallel). If you do, a reader in another thread may observe the base of one slice and the bounds of another. If you are in an environment where you’re allowed to run arbitrary Go code but not the unsafe package, you can do this intentionally to escape from the Go memory safety model. If you’re not writing malicious code, the fact that Go’s type system has no notion of linearity means that you have to be very careful to not accidentally alias objects between threads and trigger this kind of thing intermittently, in a fashion that’s almost impossible to reproduce, let alone debug.
It’s a deeply special kind of language that manages to have a global garbage collector and still not be memory safe.
Because it would be expensive and they can’t be arsed, the Go people are fundamentally C people, it’s the fault of the developer if they fall into the language’s traps.
Slices are not the only affected type either, interfaces and maps also suffer from data races for sure. Generally speaking data races undermine memory safety and there’s a fair number of possible ways to get data races in Go: https://www.uber.com/blog/data-race-patterns-in-go/
The Go memory model is that everything is thread unsafe unless you put it behind a mutex or a channel. Why would slices be different from everything else?
That does not contradict my comment in any way. And the only thing of interest in that essay is that rsc is self-servingly contradictory:
Go’s approach sits between [C’s data races = invalid program and java’s data races = defined and safe memory behaviour]. Programs with data races are invalid in the sense that an implementation may report the race and terminate the program. But otherwise, programs with data races have defined semantics with a limited number of outcomes, making errant programs more reliable and easier to debug.
is immediately followed by
Note that this means that races on multiword data structures […] can in turn lead to arbitrary memory corruption.
I mean, it’s in the middle. It’s not classic UB nasal demons, but it could be a cause of data corruption. Maybe it’s a bad choice, but that’s what he chose and you can read his blog series to try to work out why he chose it.
I guess I’m just saying, it was a deliberate engineering tradeoff not “dysfunctional” because they didn’t “wrap it in a critical section or something”.
It’s funny how he first pays extensive lip service to Tony Hoare’s philosophy, especially this one:
As well as being very simple to use, a software program must be very difficult to misuse; it must be kind to programming errors, giving clear indication of their occurrence, and never becoming unpredictable in its effects
only to then cheerfully update the docs to explain that you can get arbitrary memory corruption if you “misuse” the program (well, language).
n := 0
for e := list; e != nil; e = e.next {
n++
}
i := *p
*q = 1
All loops must terminate. Therefore we can assume this loop must terminate. Therefore we can rewrite the loop to access *p or *q before the loop happens as an optimization. (But what if it’s an infinite loop? Well, that’s UB, so we can assume it won’t.)
Go is not any less safe than C/C++ and it specifically rules out some of the UB “optimizations” in C/C++ that give UB a bad name. So, slightly safer than C/C++, less safe than other languages.
I also think “wrap it in a critical section or something” is really breezing past how difficult this would be. Every slice/interface/map would need some kind of mutex or the type system would have to be radically different to prevent aliasing. You’re either talking about a huge GIL style performance hit or a totally different language with a much stronger type system.
Every slice/interface/map would need some kind of mutex or the type system would have to be radically different to prevent aliasing. You’re either talking about a huge GIL style performance hit or a totally different language with a much stronger type system.
I doubt it would be a “huge GIL style” performance impact - it’d be a mutex per slice, not a global mutex over all slices. There shouldn’t be much contention on these mutexes if you’re using it like “you’re supposed to”, anyway!
It seems even these days “it’s not fast enough” is still sufficient argument to avoid important safety features. Which is strange, because runtime bounds checking is part of Go. That’s also quite a big performance impact.
I guess it’s just a matter of time before someone opens a can of CVEs on some large Go codebases, and then we can have this whole discussion again.
Performance. Assigning a slice-typed variable is a common operation. If you had to acquire some kind of lock every time that you couldn’t prove non-aliasing then it would slow Go code down a lot. As @masklinn says interface-typed fields in Go are a pair of a pointer to the object and a pointer to the type, so it’s possible to get type confusion in these by racing and reading one type and the other value.
For maps it’s somewhat more excusable. A map is a complex data structure and updating a complex data structure concurrently is a bad idea unless it’s explicitly designed for concurrency. I think the map implementation used to be in C (special, Plan 9-flavoured C), but it might be pure Go now. If it is, then races there should just leave it in a broken state (just as updating any other non-concurrent complex data structure with data races can), rather than break the fundamental guarantees of the environment.
It’s far less excusable for interface pointers and slices because these are value types that are carefully designed to look like they are primitive values. You pass them around just as you would an integer. If two threads write to the same integer variable at the same time, one will win the race and you’ll see a value that makes sense. This is not the case with other Go types.
The depressing thing is that a type system that understands isolation can address this. When I write parallel code, there’s one rule that I want to follow: no object is both mutable and aliased between threads. Go provides absolutely nothing in the type system to help you spot when you’ve broken this rule. For a language that was designed from the ground up for concurrency, this is inexcusable. This is probably why most of the successful Go programs that I’ve seen use it as statically compiled Python and don’t use goroutines at all (or in a handful of very special cases).
You pass them around just as you would an integer. If two threads write to the same integer variable at the same time, one will win the race and you’ll see a value that makes sense.
I learned Go from the Go Tour back in ~2011 or so; IIRC, slices and interfaces were explained as being fat pointers or tuples, so I’ve always thought of them as such rather than thinking of them as integers. As a result, I’ve never really run into these problems. I’m very curious how often people are running into this? One of the things I like about Go is it’s pretty straightforward how things work, so you can intuit about stuff like this. I suppose if someone was writing Go like many people write Python or JavaScript–with no idea about the underlying machinery–this might get people into trouble, but on the other hand I don’t know how you can write Go without understanding some basics about memory layout, pointer traversal, etc. Maybe I’ve just been doing this for too long to empathize well with beginners…
Performance. Assigning a slice-typed variable is a common operation. If you had to acquire some kind of lock every time that you couldn’t prove non-aliasing then it would slow Go code down a lot.
How often is that? Go should be in a pretty good position to reason about aliasing.
I agree with you about the way Go should have… the thing Go should have done. But it would probably be more on-brand for them to fix this by designing atomic slices that avoid atomic operations until they are actually contended. Do we know if they’ve tried that?
How often is that? Go should be in a pretty good position to reason about aliasing.
Why? The type system does not give the compiler any information that it can use to make that kind of decision. If the slice is a local and has not been address taken, it can be assumed to be safe. In pretty much any other situation, the compiler has to assume that it can have escaped to a concurrent context.
I agree with you about the way Go should have… the thing Go should have done. But it would probably be more on-brand for them to fix this by designing atomic slices that avoid atomic operations until they are actually contended. Do we know if they’ve tried that?
I think they were very reluctant to introduce atomics at all, they certainly don’t want more. They want you to design code where objects are held by a single goroutine and you never do racy updates.
Why? The type system does not give the compiler any information that it can use to make that kind of decision. If the slice is a local and has not been address taken, it can be assumed to be safe. In pretty much any other situation, the compiler has to assume that it can have escaped to a concurrent context.
TBF in most cases slices are passed by value, in which case there is aliasing on the backing buffer (and there can be data races on that depending what it stores), but there’s no aliasing on the slice itself. Most issues would occur with slices getting captured by a closure or go statement in which case they essentially “had their address taken”.
A bigger issue, I would think, is that you’d need to add a tripleword pseudo-atomic which pretty much means you need a lock (interfaces are only a doubleword so it’s a bit better). And while in theory you could use the low bits of the pointer as your lock flag I’m not sure there’s such a thing as a masked compare exchange not to mention a sub-byte futex / mutex?
Sorry, I still don’t understand. I have used tagged pointers with CAS many times and I don’t see the problem. Find an unused high or low bit in the pointer, save the initial value of the pointer, mask and test the bit you care about, and if the test passed then set/clear that bit and CAS the old value to this new value. Depending on the context, if the CAS fails then either abort or keep retrying (maybe with backoff) until it succeeds.
Recent revisions of x86 and arm have fast two-word atomic reads and writes (avx and armv8.1 respectively). But more obscure architectures do not, so there are tradeoffs w.r.t. performance portability.
Because accessing anything across threads is already undefined behavior, and your idea would murder the performance of correct code for no real reason. Writing correct code is in no way difficult, and if you do happen to slip up, that’s why your app has a comprehensive test suite, and why you run go test -race in CI, which puts everything into “I want to be painfully slow” mode, but bombs as soon as you have a single cross-thread access without a synchronization edge.
If I want “arbitrary memory corruption”, I already know where to go for that. Do you really want memory-unsafety in a language that is marketed for externally-facing web servers? Java demonstrates that you can allow data races without compromising the integrity of the runtime itself.
I’ve been working with my current company and doing Go for about 9 years now. We’ve written several nontrivial services, a number of which handle more than 10k RPS. We’ve had zero production issues caused by data races, and… maybe two times in those nine years that someone wrote a race bug, which was caught by automated testing before it made it out the door. It’s not high on my list of concerns. The kinds of access patterns that could even theoretically run into this problem just don’t exist in our code, because people who understand the language have no inclination to write them.
Holy crap that’s dysfunctional! Why don’t they wrap it in a critical section or something to avoid this kind of bug?
I don’t know the specifics of this issue, but I do know that you’re not supposed to share a variable between go routines like that. If two go routines must work with the same data, you’re supposed to let them communicate it through a channel.
Whether that means it is OK to leave in a footgun like that is a different matter. But this is one of the many “problems with Go” that I somehow magically never encounter in real life.
I don’t see “so many footguns”. I see two things. The bit about append is something you learn in the first 30 minutes of using the language. And the other thing… I don’t even know. I can’t figure out what he thinks he’s doing there, or trying to prove, or what it might have to do with anything that might happen to anyone in the real world, because the presentation of the concept is basically nonexistent. And the “advice” about constructing slices is pure nonsense; there’s no justification given for it because none is possible.
When I worked on a research database I came up with the idea of storing 128-bit hash codes (instead of the actual keys) to optimize DISTINCT, but I’m not sure if any production database uses this approach.
From the look of those numbers, io_uring isn’t much faster than naive blocking, maxing out at 112% of blocking. I imagine that’s because the writes are sequential instead of random, or because the userspace program isn’t doing anything CPU intensive.
For disk I/O, it’s unlikely to help much. There are two big benefits to io_uring:
It avoids system call overheads.
It avoids contention between threads on the file descriptor table.
The first matters a lot for operations like sending a small network packet, but if you’re writing entire disk blocks then this is amortised. The second matters a lot if you’re doing a bunch of independent operations across multiple threads (for example, handling thousands of active connections) but doesn’t matter much if you’re a single-threaded workload.
I expect io_uring would be more useful on spinning rust, so a single process can have lots of ops in flight that can be reordered to make the most of disk head scheduling. Should be more efficient than one op in flight per thread.
Why? An async batched interface makes it easier to exploit the internal parallelism of SSDs. And that’s all that io_uring really is: an asynchronous, batched syscall interface. Nothing conceptually unique to I/O even though that’s where this sort of interface is the most obvious win.
Most of that should be possible with existing mechanisms, though last time I checked the AIO subsystem in Linux was so much worse than everything else (FreeBSD, macOS, Solaris) that it was not worth using. Unless you’re actively bypassing the buffer cache, most writes will write to the buffer cache and then be asynchronously written to disk. They can be reordered in the storage stack and in the disk, right up until the next sync.
For reads, if you’re not doing prefetching, it might make a difference. Netflix used to do a one-byte aio_read to kick off prefetching (before they added aio_sendfile, which largely eliminated the need for this).
aio_read() doesn’t work like pread() but instead messes with the file offset 🤪
That’s true and annoying, but it’s a fairly minor change to fix it.
completion requires delivery of a signal 😱
You can also poll aio_error (which will return EINPROGRESS if the operation is still in flight). On FreeBSD, at least, you can call aio_waitcomplete to wait for a single completion and use kqueue to monitor multiple in-flight aio operations for completion.
you need two calls to get the errno and return value per op 🤡
I assumed that the most obvious implementation was to store both of these in the aiocb and have static inline functions that read them in the header (nothing in POSIX says that aio_return and aio_error have to be system calls). On FreeBSD, the error and status results are stored in the error and status fields of the _aiocb_private member of struct aiocb. I’d assumed that they were then queried from userspace (so aio_return and aio_error are just lightweight function calls), but apparently not and require system calls to read them.
It’s not the best API in the world, but it’s far from the worst bit of POSIX. At least most of the worst bits of it can be fixed by adding things, rather than removing things or redefining the semantics of things.
As far as I know, io_uring was specifically designed for disk IO, and most development effort still goes into higher speed disk IO. But you probably need huge enterprise NVMe or Optane setups for this to matter much.
Though, Windows copied io_uring specifically for the directx game asset loading thing, so I reckon it must be useful even on regular PCs, at least for easily queuing lots and lots of smallish loads.
Though, Windows copied io_uring specifically for the directx game asset loading thing, so I reckon it must be useful even on regular PCs, at least for easily queuing lots and lots of smallish loads.
This is DirectStorage you’re describing?
I suspect that even if it were only moderately useful now, it might also be viewed as insurance against the possibility that the ratio of disk IOPS to CPU cycles shoots up again in future. Say if someone productionises phase change memory.
Yeah. AFAIK the IORING facility they introduced in Windows a few years ago was originally developed just for DirectStorage.
Another point is that I think the NT kernel is much better at async stuff than Linux, so perhaps it was easier to get better results than it has been for io_uring.
I imagine that’s because the writes are sequential instead of random
I’m pretty sure that has nothing to do with the writes being sequential.
Practically all modern filesystems have a feature called delayed allocation, which means the program written in the article has already exited (and thus has already been benchmarked) before the filesystem even decides whether the file’s data will be allocated sequentially or not (never mind actually writing the data to disk).
This is likely to happen even in the 1 GB file case, as long as enough free RAM is available to hold the written data.
Lots of people don’t have a correct mental model of how filesystems and disk I/O work on modern OSes. The actual, physical disk writes done by the filesystem and the OS are almost always asynchronous with respect to the logical writes done by applications, and often purposefully delayed by 5s to 30s (unless a synchronization mechanism such as fsync() is used, and even with respect to fsync() there are popular false myths).
Fwiw I benchmarked versions of these programs using directio and the blocking writes did terrible and the io_uring version did best at 33% it’s non-directio (i.e. kernel-buffered io).
But yeah I haven’t turned on fsync because it seemed like that was irrelevant when comparing write methods.
It seemed only relevant if you wanted to care about absolute numbers, which I wasn’t doing.
Fwiw I benchmarked versions of these programs using directio and the blocking writes did terrible and the io_uring version did best at 33% it’s non-directio (i.e. kernel-buffered io).
Of course. Direct I/O is terrible for performance in almost all cases, which is why filesystems don’t do that unless specifically asked for. There are extremely good reasons for why buffered I/O is the default.
If you want to benchmark the actual I/O subsystem (including filesystem, I/O drivers such as NVMe, and the actual disk devices) in normal conditions, then you’d want to write at least 30 seconds worth of data (enough data to make the test somewhat representative with respect to all sources of timing noise) using normal buffered I/O, but still making sure that you do an fsync() at the end (before completing the benchmark), so that you also time all the actual disk writes.
Even then it’s much better to write at least several minutes worth of data, as there are devices that lie about doing fsync() and thus obtain better benchmark scores by cheating (at the expense of risking data corruption when there are power failures).
Note that doing an fsync() at the end is not the same as doing direct I/O, which is something completely different.
But yeah I haven’t turned on fsync because it seemed like that was irrelevant when write methods.
I agree, I’m not sure if doing fsync is very relevant for benchmarking io_uring, since as @david_chisnall alluded to, io_uring is more like an alternative syscall interface rather than something inherently tied to I/O. It was initially conceived as a faster way to do I/O (since I/O can be syscall-heavy in some cases), but I think it’s somewhat of a misnomer now (or will be, in the future).
So in this case, benchmarking how long it takes for userspace writes to reach the kernel is already quite relevant. Benchmarking additional filesystem and I/O subsystems might also be interesting but probably not very relevant, as they’ll likely be doing the same thing whether using io_uring or not.
I find this a bit of a weird question. “AST” and “bytecode” are not methods of interpretation, but ways of representing a program. It’s made even more confusing by the fact that that the article itself doesn’t mean “AST walker” and “bytecode switch-dispatch loop”.
ASTs and bytecode are both abstract syntax. An AST walker is the natural interpreter for an AST, in the sense that AST walkers are homomorphisms from trees; similarly, iterative execution of codes (that is, a machine) is the natural interpreter for bytecode in linear memory.
The analogy goes further. Graphs can also be abstract syntax, and graph reduction is the natural interpreter for graphs. Matrices and semirings can also be used for abstract syntax, but I don’t know what their natural interpreters look like; semirings might be naturally interpreted by something like regular-expression engines, but the details are questionable.
And yet there are many, many ways to build interpreters on top of both that don’t fit this supposedly neat model: compiling ASTs to a series of nested lambdas, for example. Or threaded (direct or indirect) interpreters for bytecode. I think your assertion that there’s a natural/obvious single evaluation strategy for either representation is overly simplistic.
Perl is an amusing example. It’s an AST interpreter, which you might expect to be slower than a bytecode interpreter, but Perl is faster than Python. This is partly because Perl augments its AST with a pointer to the “next” tree node(s) in execution order, which mostly eliminates tree-walking faff. The inner loop is basically,
Yeah, the performance of sensibly-written AST walkers is really underrated. There seems to be this widely held belief that bytecode interpreters are not just faster, but fundamentally faster. Nothing could be further from the truth. Bytecode dispatch can be really slow unless you’re doing something like translating the bytecode to threaded code.
Those aren’t counterexamples. A homomorphism out of trees merely needs to have the property that each branch is compiled with only the local context of its leaves, and compiling to nested lambdas (known as “compiling to closures”) is such a homomorphism. Similarly, bytecode is stored in a one-dimensional list, which can be viewed as a free monoid, and threaded interpreters like those for Forth are usually monoid homomorphisms, as long as the compiler is not doing DOES magic.
But they’re not, in practice, all the same. The performance of those various strategies end up being radically different. I have some examples here. So again, my original point stands: it’s bizarre to ask which is faster when each one covers a lot of different techniques.
But the end result is just cover fire. The competition has no choice but to spend all their time porting and keeping up, time that they can’t spend writing new features. Look closely at the software landscape. The companies that do well are the ones who rely least on big companies and don’t have to spend all their cycles catching up and reimplementing and fixing bugs that crop up only on Windows XP. The companies who stumble are the ones who spend too much time reading tea leaves to figure out the future direction of Microsoft.
An old analysis, but substitute the 800 pound gorilla du jour for Microsoft and it holds up well.
Have they been at loggerheads before? From what I’ve gleaned, the projects respect each other but have fundamental disagreements about how to structure a Unix-like.
Linus in 2008: “I think the OpenBSD crowd is a bunch of masturbating monkeys, in that they make such a big deal about concentrating on security to the point where they pretty much admit that nothing else matters to them.”
I interpreted the reply as mock offense at the insult of being called a monkey with the humorous and unwritten acceptance of the accusation that they were masturbating over security. I don’t know, it seemed funny and answered in kind without taking up an argument.
That’s funny, except I’m afraid it shows rather a lack of research effort :-) — none of the people named was an American at the time (and most of them aren’t now, either).
It’s not for HFT or any other specific domain, just an MVCC KV DB with about 200ns update transaction latency (currently snapshot isolation but I plan to offer a serializable option).
It’s a fork of https://github.com/gaia-platform/GaiaPlatform but I haven’t made the fork public yet, since I’m undecided on licensing. I’ve discarded most of the features in the original repo other than the core database, which is about 100x faster now (due to removing mmap and IPC from the critical path).
Linearizability is theoretically great (I’d say the ideal), but also very strict in practice. For example I’m mostly familiar with Postgres transaction levels, none of which support pure linearizability. The default (read committed) is pretty relaxed and allows a few unexpected behaviors, which is kind of scary when you compare it to the more logically “clean” idea of linearizability. My understanding is that linearizability would be correlated with very poor performance.
This definitely has an interesting impact on testing, because you have to plan for and allow a certain class of consistency errors. The definition of correctness has to allow these errors. This for me is vacuous truth - it’s technically true that a consistency error is correct behavior, but I also want to know that eventually the state is consistent. I don’t know if this means that eventual consistency should be expressed as a liveness property?
Anyway, the recommended followup reading to this is Strong consistency models. There’s a whole slew of consistency models out there.
I think you tend to see less talk of linearizability in the context of relational databases because linearizability is a constraint on the allowable orderings of operations on a single object, instead of the multi-object guarantees (like serializability) that someone making use of multiple tables in a transaction is likely more interested in. Linearizability isn’t irrelevant in that context, but it rarely comes up because, as you note, it’s generally a very expensive guarantee, but also because it doesn’t imply serializability: the combination of the two is strict serializability, and the only database I’m aware of that offers it is Spanner.
Speaking of Aphyr, his consistency models breakdown on jepsen.io is a really great resource for an overview of the full logical hierarchy at play and for links to relevant papers.
Most single-node databases that offer serializable isolation at all are also strictly serializable, because non-distributed CC protocols tend to fall out that way. E.g., vanilla 2-phase locking is strictly serializable, and so is vanilla OCC.
I disagree that strict serializability implies poor performance. It is quite possible for a strictly serializable database to perform millions of transactions/second. (Source: I’ve done it.)
I’m in a weird situation. My company did 30% layoffs this week. Roughly half were laid off immediately. I am in the other 15% who were incentivized with a large bonus to “please stay through March as the company transitions”. I’m planning on leaving immediately, but further communications have been slow and full of conflicting (and disappointing) information. So I’m trying to figure out what to do.
Thankfully I will be okay for a while. I’m using up my vacation days while things settle and enjoying the extra free time to hike and do open source stuff! I’ve made great progress on several projects already this week, and plan on doing more this weekend :)
Are you sure? The job market is still brutal for many types of roles. It is much easier and less stresfull to look for a new job when you have a job. Compared to when your time is running out financially.
It is much easier and less stresfull to look for a new job when you have a job.
I was just talking to a friend about how difficult it is to do the job search when you have a job. I feel very anxious every time I take an interview during a work day. Always sleep terribly the night before too. I hate lying to people about where I am. (Yes, even lies of omission.)
I also find it quite difficult to do job search while working full time. Most of the times I had quit the job and continued with the same commitment until the end (notice times in Europe are multiple months), then took a break and started to search for a new one.
Valid points for sure. And I’m keeping my options open while I wait for more info. However:
Company culture has gone south over the last year, these layoffs aren’t helping (even those not laid off are unhappy). Many people in my situation are also planning on leaving
I would have a very hard time being happy and productive working with numbered days
I’m in a safe enough place financially that I’m comfortable leaving immediately (thanks YNAB!)
So while I may have added stress, overall I think I will be happier to leave now than to stay
Thanks for being a voice of reason. After a weekend to cool down emotionally I’ve decided to stay for a few weeks and reevaluate my situation. Not thriving at work today, but oh well :)
You are correct.
BUT: the last one who stayed has to write the documentation eeeek :D
I doubt that the large bonus is 17.6% more than the old salary (which is the work of the sacked 15% distributed onto the remaining 85% people). And the bonus will not rise as more and more of the second 15% break away one after the other. It will get increasingly stressful the longer it takes, as the best and most flexible ones go first.
Which eats the time that you should use to do job interviews.
Grab anything which could be useful (contacts of good colleagues, cheaply sold off hardware and furniture, if you need it) and offboard asap, in a friendly manner. save the 4letter words, you always meet more than once.
If you have a physical key to the office, toss it on the copier so that the key’s number can be read, and get date, company stamp and signature when you hand over the key. This saved my rear side more than once. Because if a few keys are missing from the locking system, they have to switch all keys which is super expensive, and you don’t want to be that guy.
I still have a yucca tree from a new economy office, it was more durable than the office chair.
My company did 30% layoffs this week. Roughly half were laid off immediately. I am in the other 15% who were incentivized with a large bonus to “please stay through March as the company transitions”.
/u/st3fan ’s statement is entirely correct, but on the other hand, fuck those guys.
Sometimes layoffs are unavoidable, but there are ways of handling them that are better or worse. The way that Microsoft handled them for my colleagues was the main reason that I decided to leave (and, since my official last day was Tuesday, I am no longer bound by their social media policy and can say fuck those guys). When you have to contact HR because you don’t believe that the way things are being handled is legal, it doesn’t actually matter what their response is: if someone even needs to ask the question then you’re not a place I want to work and not a place I want to encourage other people to work at.
My mental health has improved significantly since I handed in my resignation. In hindsight, I probably should have done it back in March.
When you have to contact HR because you don’t believe that the way things are being handled is legal
I don’t want to sound too bitter but it is important to understand that contacting HR is mostly pointless. They have exactly one purpose and that is protecting the company, not you as an individual. They are NEVER on your side or care about your concerns. HR is not your friend and HR will not defend you, they tolerate you at best. There are too many people that misunderstand this. At the end of the day it is important that the company survives, it does not matter if some people need to be fired or how that is handled from HR’s perspective.
My primary goal in contacting HR was to ensure that there was a written record that the folks on my team could require be presented in court if they sued for constructive dismissal and to make it very clear that, if called as a a witness in such a case, I would happily testify to that effect, so that they could use that in their assessment of corporate risk.
This is a very common internet sentiment but it’s important to not generalize to “NEVER contact HR.” Knowing that HR will engage in damage control and not operate with your well being in mind doesn’t mean you can’t or should never contact them. Sometimes they are the only outlet and you just have to speak your mind. Sometimes you need to get something on record for potential future discovery. Sometimes contacting HR is like getting a police report for a stolen item: it’s a necessary first step towards some other goal.
I didn’t get fired but it became clear that none of the reasons that I’d joined Microsoft five years ago (a company that can do big hardware/software co-designed shifts because it controls the entire stack, with a commitment to being a good citizen of the open source ecosystem, and which understands that you get the best out of talented people by treating them well) remained valid.
Fortunately for me, there is no shortage of jobs for people with 11 years CHERI experience. I will shortly be starting at another company that wants to build CHERIoT chips (more details once they’re happy with it being public). The folks on the silicon team that I was working with on CHERIoT have been incredibly supportive and I’m looking forward to working with them, since we will have some common interests going forward.
When I reached out to HR at Microsoft about a retaliatory performance review full of provably false things, the only action taken was to replace all the provably false things in the review with vaguer statements that were no longer disprovable.
Every megacorp will eventually behave toxically but it feels a little more common at Microsoft from my admittedly limited vantage point.
I’m sorry to hear you ran into it and I wish you well on your recovery.
It’s particularly disappointing because I was part of the T&R Diversity and Inclusion Council. There are senior folks in the Office of the CTO that really understand that behaving well towards employees is essential to get the best work out of them, which is essential to remaining competitive.
When I complained to my manager about how people are on my team were treated, he told me that it was out of his control and out of his manager’s control. That leaves Scott Guthrie and Satya Nadella as the only two that can be responsible. There’s a saying that you don’t leave bad companies, you leave bad managers. When the manager in question is either the CEO or someone who reports directly to the CEO (and whose performance review is conducted by the CEO) it’s very hard to tell the difference.
That wasn’t great, but closing a whole lab meant that the impact on the rest of the company was reduced. It was also constrained to MSR, not the whole of MS, and the statutory protections for employees in California are much weaker than in the UK. The most recent round left teams across the company understaffed for things that they’d committed to delivering. It also included redeployments, where people were moved to different groups, largely for beam counting purposes (the new groups rarely matched their interests and often didn’t need their skills), which harmed career development for these people (‘you’re an expert in X, why were you working in a team that doesn’t do X?’). The process that they followed did not match my understanding of their obligations under UK law and definitely was not aligned with the alleged corporate values of ‘respect, integrity, and accountability’.
Which part is egregious? If you’re going to ask people to work under less than ideal conditions (low morale, fewer workers) it only seems appropriate to give a bonus.
They are artificially creating the less-than-ideal conditions, asking people to work under them, probably paying them less than they deserve (15% fewer people tends to make more than 15% more work as deadlines slip and maintenance is deferred, and I doubt their large bonus is 15% of their salary), and then firing them. ie, intentionally burning out and discarding people.
I can’t imagine the 30% layoffs coming as a surprise to anyone with access to see the actual accounts, and they’re the ones who should be planning for this sort of thing. To me it seems that if you have to axe a third of your workforce in one go that means things have been fucked up for a long time, and nobody with decision making power has done anything about it. I bet they’ll get raises for “maintaining positive net growth” though.
Translating between SQL dialects is difficult but at least seems somewhat tractable. But trying to emulate (presumably bug-for-bug) the behavior of decades-old software with millions of LOC seems like a fool’s errand.
PS See also CompilerWorks: https://www.compilerworks.com (looks like they were acquired by Google Cloud after I interviewed with them).
I’d much rather have a fake server speaking the wire protocol and returning hardcoded data a la https://github.com/jackc/pgmock because it’s just so much faster not to run Oracle or SQL Server in tests.
Ultimately I haven’t found anything like pgmock for Oracle or SQL Server yet.
I don’t use async Rust, but I’ve seen complaints that the lang-level async support prematurely standardized on the “readiness-based” async model (e.g. epoll/kqueue), while the rest of the world seems to be moving toward the “completion-based” async model (e.g. IOCP/io_uring). What do actual Rust async users (say who might want to use underlying OS support in IOCP/io_uring) think of this?
It’s a common fallacy that the borrow checker is just a substitute for a GC, when it’s much more than that: it can give you safety guarantees (absence of mutable aliasing and data races) that e.g. Java or Go cannot.
This is true, but at least for me these additional guarantees aren’t preventing many bugs. In particular, my code is rarely subject to data races–usually any shared state is a file or network resource that is accessible by multiple processes on multiple hosts and Rust’s borrow checker doesn’t help in these regards, but I still have to pay for it with respect to productivity (yes, I know that productivity improves with experience, but returns diminish quickly and the apparent consensus seems to be that a wide productivity gap remains between GC and BC).
I’m not taking shots at Rust; I’m glad it exists, it’s impressive and ambitious and etc. It just isn’t going to eat Go’s lunch unless there’s some (ergonomic) way to opt out of borrow checking.
Because for a language with unrestricted semantics around mutability, like Go, this would be equivalent to the halting problem.
Languages like Rust or similar use linear/affine/unique types to restrict semantics in a way to make such type of analysis possible. That’s the whole point of linear types.
You can do it at runtime though (see Go’s race detector), and while that’s immensely useful, it’s not the same thing.
They can, to a small extent. C compilers too, or Python or whatever. But if you don’t have Rust’s move semantics and Rust’s borrow system, then the compiler doesn’t have a lot of information to do the analysis with.
Compiler has to work in concert with the language – Rust isn’t just C++ with more rules on top, lifetimes and aliasing are a visible part of type system the programmer interacts with. The user spends more time explaining in more details what’s going on to the compiler: in Java you have T, in Rust it’s your task to pick between T, &'a T or &'a mut T. In exchange for this extra annotation burden, compiler can reason precisely about aliasing and rule out things like data races or iterator invalidation.
As an analogy, we can type-check JavaScript to some extent, but we really need TypeScript’s extra type annotations to make this really work.
In other words: Rust is referentially transparent. If you would like a referentially transparent language—or a language which allows you to control and manage mutation—without spurious copies I suggest haskell or ocaml.
To be frank, no project that uses the GPL post-GPL2 will ever be the ‘boring’ variant. Not because it can’t, but because you need to convince lawyers that it’s boring.
And GCC has done little to create a situation where it might be. clang/LLVM breaks the compiler in two, with a frontend and a backend that can evolve independently. Can you even do that with gcc? And I mean, in a practical sense. I know that the frontend and backend of gcc can technically be decoupled, but technically != plausibly.
What does a compiler’s choice of license have to do with its approach to undefined behaviour? Maybe just being dense, but I don’t understand what point you’re making here.
Your information is outdated. You can use GCC backend without GCC frontend, and it is an option supported by upstream. Since GCC 5. See https://gcc.gnu.org/wiki/JIT.
Since the compiler’s license has no effect on the license of your code, nor does GPL3 change anything much vs GPL2 (in reality, I understand there is a lot of spin to the contrary), this seems like an axe to grind more than a contribution
There were many engineers using emacs, but the official line was that you weren’t allowed to install any GPL3/AGPL software on a work machine for any purpose, and that explicitly included recent versions of emacs (and also recent versions of gcc, which meant the build system was stuck with obsolete versions of gcc). I suspect everyone just ignored the emacs restriction, though. I’m sure a lot has changed since I left in 2014 (I bet the build system has moved to clang), and I don’t know the current policy on GPL software.
At Microsoft, the policy is surprisingly sane: You can use any open source program. The only problems happen when you need to either:
Distribute the code.
Include it in something customers can use.
Incorporate it into a product
There are approvals processes for these. There’s no blanket ban on any license (that I’m aware of) but there are automatic approvals for some licenses (e.g. MIT), at least from the lawyers - the security folks might have different opinions if upstream has no coordinated disclosure mechanism or even a mechanism for reporting CVEs.
That sound unsustainable. Do you not already need new builds of GCC to build Linux? Surely if not, then you will eventually. And I can’t see Amazon ditching Linux any time soon
My company maintained a fork of LLVM, but not in any disciplined way. Because our changes were not structured as a logical set of patches, it was prohibitively difficult to merge from upstream (I tried once and gave up). So we ended up stuck indefinitely on an LLVM release several years out of date. Do not recommend.
Comments here remind me about testing under time pressure quote from Jamie Zawinski in Peter Seibel’s Coders at Work:
Anyone who ever used Netscape can confirm that its quality was exactly what you would expect from this quote.
And by using Netscape over its quality assured competitors, you probably proved his point :) At least we live in a time period where products evolve at a great speed, making looking back fun and insightful.
Is there is any situation where I really, really want tests is that kind of frantic situation where everyone tries to pile up as much features as they possibly can in a short amount of time. Not for every feature, but definitely for the core stuff. Just to avoid the situation where a user could not open the app anymore because someone changed the version number in the About dialog. And everyone was too busy to check it before shipping.
I wonder to what extent this is true. Poor quality tests that test the implementation is what it is also inhibit refactoring.
How do we know that codebases without tests are less refactorable? Not being refactored is evidence, but weak evidence - it might be a sign that refactoring is not needed.
How very Taylorist. The high paid people who don’t do the work will tell the plebs exactly how they should do the work.
Experience. Every large codebase I’ve seen has had a quagmire somewhere or other with the following properties:
Thus no one makes changes to the quagmire except for really local patches, which over time makes it even worse.
Compare Postgres vs. SQLite’s respective willingness to undertake invasive refactorings, and then compare each project’s test coverage. Just one data point but I think it’s telling.
Ok but that doesn’t tell us that codebases without tests have this property of being a quagmire. That tells us that many quagmires have no tests.
In my experience useless tests can make this even worse.
Yeah, there are two opposite failure modes: (i) not testing tricky things, and (ii) over-testing simple things, and tying tests to the implementation. Both are bad.
EDIT: I’m having trouble telling if you’re arguing against having any tests at all. If you are, have you worked on a codebase with >10,000 LOC? At a certain point, tests become the only reliable way to make sure complex parts of the codebase work correctly. (It remains important not to test trivial things, and not to tie the tests too closely to the implementation.)
I’ve got files with several thousand lines of code. Believe me when I say you have to test things, but automated tests are not necessarily valuable for all codebases.
The problem is that you’re expecting the same developers who wrote the awful codebase to write the testing suite. Whatever reason for the terrible code (time pressure, misaligned incentives, good ol’ ineptitude) still apply to the development of the tests.
Meaning if the code is really bad, the tests are going to be really bad, and bad tests are worse than no tests.
In my case, 20 years experience, 11 of which have involved decent-to-good test coverage (I don’t mean just line coverage, but also different types of tests to verify different aspects, such as load tests). In a well-tested code base, some tests will simply break, ideally within milliseconds, if you do anything wrong during a refactor. In an untested code base an unknown number of interactions will have to be manually tested to make sure every impacted piece of functionality still works as intended. And I do mean “unknown”, because most realistically sized code bases involve at least some form of dynamic dispatch, configuration, and other logic far removed from the refactored code, making even the job of working out which parts are impacted intractable.
Having automated tests is such a no-brainer by now, I very much agree with the author. But the author isn’t telling developers they need to write tests, they are telling managers they need to allow developers time to write tests.
Perhaps for a better A:B example:
I have some things in LLVM that have good test coverage, and some that don’t, and some that do but only in a downstream fork. People routinely refactor the things with good test coverage without my involvement. They routinely break things in the second and third category, and the third is usually fairly easy for me to fix when I merge from upstream.
I once had to upgrade an existing site (built by someone else) to newer versions of CakePHP because of PHP support (their hosting provider stopped supporting old PHP versions, and that version of CakePHP would break). In which they’d completely overhauled the ORM. The code contained very tricky business logic for calculating prices (it was an automated system for pricing offers based on selected features). None of it was tested. In the end, I had to throw in the towel - too much shitty code and not enough familiarity with what was supposed to be the correct behaviour of that code.
The company had to rewrite their site from scratch. This would not have been necessary if there had been enough tests to at least verify correct behaviour.
In another example, I had to improve performance of a shitty piece of code that a ex-colleague of mine had written. Again, no tests and lots of complex calculations. It made things unnecessarily difficult and the customer had picked out a few bugs that crept in because of missing tests. I think I managed to “test” it by checking the output against a verified correct output that was produced earlier. But there were many cases where code just looked obviously wrong, with no good idea on what the correct behaviour would’ve been.
In the first case it sounds like the core problem was business logic intertwined with ORM code, and secondarily a lack of indication of intent. Tests would actually have helped with both.
And in fairness the second one also sounds like tests would be the solution.
Indeed. Now to be fair, both were shitty codebases to begin with, and the same lack of discipline that produced such code resulted in the code not having tests in the first place. And if they had written tests, they’d likely be so tied to the implementation as to be kind of useless.
But tests are what make outside contributions more possible by providing some guardrails. If someone doesn’t fully grok your code, at least they’d break the test suite when making well-intentioned changes. Asserts would also help with this, but at a different level.
True enough that my job the past 2 months was about porting a legacy component that meshes very poorly with the current system (written in C instead of C++, uses a different messaging system…). Having re-implemented a tiny portion of it I can attest that we could re-implement the functionality we actually use in 1/5th to 1/10th of the original code. It was decided however that we could not do that. Reason being, it is reputed to work in production (with the old system of course), and it basically has no test. To avoid the cost of re-testing, they chose to port the old thing into the new system instead of doing it right.
I’ve heard some important design decisions behind the Boeing 737 MAX had a similar rationale.
Honestly it’s not that hard to make tests that test behaviors and contracts. I’d even say it’s harder to write tests that are so tied to the implementation that it makes refactoring hard
The way I remember the respective meaning of
theirs
vs.ours
inmerge
vs.rebase
is thatmerge
is FROM$OTHER_BRANCH
INTO$CURRENT_BRANCH
, whilerebase
is FROM$CURRENT_BRANCH
ONTO$OTHER_BRANCH
.Another way: imagine you’re Linus Torvalds. You maintain the main branch and you mostly incorporate changes others send you.
I merge “theirs” feature branch into “ours” main branch.
I rebase “theirs” feature branch onto “ours” main branch.
99% of people are not the head developer and rebase “ours” changes on top of “theirs” (upstream) updates.
It’s also an academic distinction because the correct way to resolve a conflict is completely independent of which is “ours” and which is “theirs”. Either way what you should do is merge the logical intent of the two changes. I wrote more about this at http://h2.jaguarpaw.co.uk/posts/git-rebase-conflicts/
There are some very long words in that commit message.
The term “commit” is badly chosen, in my opinion. It conflates the following things:
You can set the state of a repository to a commit in the first sense. If you cherry-pick, you are using commits in the second sense.
Git almost always uses commits in the first sense, and indeed that is their data model behaviour as well.
Cherrypicking changes the meaning because it would otherwise be useless. But that’s something specific to cherrypicking (and base-swapped rebase I assume).
I agree. However, people often tend to think as a commit as a set of changes (as evidenced by some comments here), so I think that proves my point that there is a lack of conceptual clarity.
Empirically I agree: people are confused by this. I am not sure the fault is the word “commit” though. I am curious what you would suggest instead. To play devil’s advocate and defend git, let’s look at the cherry-pick help:
Conceptually, this doesn’t necessarily rely on the concept of a set of diffs, as in a patch file. You can still have “commit” mean purely “committing the entire state of the working directory to disk” and have a coherent concept of cherry-picking: the “change being introduced” is merely the difference between the commit and its parent. Hence cherry-picking is fundamentally contextual and implicitly references two commits. And for that reason requires the
-m
flag on commits with multiple parents.Cherry-picking also sets the state of the current working directory. When working with individual commits, I almost always want to also affect the current state in some way. So in that sense they are inherently intertwined. You could call them “revisions” or “changesets” or whatever, but I don’t think that would change the fact that in most workflows they are inherently coupled.
Yes, it does (but in a different way).
Yes, they are. My arm and my hand are also inherently intertwined, but they are very different things.
How so? In both cases IIUC you are just moving HEAD to point to a different commit.
If I asked you raise your hand, I think you would understand that to mean that you also have to move your arm, and the distinction isn’t really that important. It is not a problem in everyday speech that things stand in for other things.
Is it different for Git?
HEAD
is the name we give to the “current” commit used as the basis of comparison for the working directory state. So we already have different names for #1 and #2. But almost no one says “what commit is HEAD pointing at?”–they say “what commit are you on?” The distinction is there, but most of the time it’s not that important. I wonder if what you’re criticizing is not a feature of Git, but a feature of language itself.The weird thing about
HEAD
is that it can have different types depending on context. In non-detached state, it’s a reference to a reference to a commit: a reference to the current branch, which itself is a reference to a commit. But in detached state, one of the levels of indirection collapses:HEAD
is just a reference to a commit.If you check-out a commit, you tell git to set HEAD to the state associated with that commit.
If you cherry-pick a commit, you tell git to compute the difference between that commit and its parent, try to somehow “apply” that difference to the current HEAD, create a new commit and set HEAD to that.
Both can fail if your working directory contains uncommited changes. The latter does a lot more, and it can also fail in a new way: When the contents of a file that is changed by the commit you’re cherrypicking have changed, you get a merge conflict. This is not possible (AFAIK) when checking out a commit.
Maybe you’re right and I am being pedantic. But in general I think it helps a lot when understanding concepts when you have clear names and not conflate things. With the current naming, it is hard to explain to a beginner what a commit is because it an overloaded term.
Overloading the term would be to use the same term for different purposes or concepts. In git a commit is one thing. It is a recording the revisions made to a set of files. You can record them and once recorded you can consume them in multiple ways, each of which has a separate term. The word serves as both a noun and a verb but this doesn’t make it overloaded because the verb form is the act of creating the noun.
The two concepts you referred to have specific names. You used them: checkout and cherry-pick. Those are different actions you can take that each consume a commit (and create a new commit in the cherry-pick case). You can explain the relationship between the working directory and the commit to a beginner quite simply: if you have uncommitted changes to tracked files or even untracked files in your working directory, either stash them or commit them prior to consuming any other commit in any way.
My point is that a commit refers to a snapshot, but the term “commit” is also used to refer to the diff of the commit and its parent.
For example, in github if you click a commit SHA it will show you the diff with its parent, or when you cherry-pick it also considers diffs.
I think this is confusing to beginners (which most people here obviously aren’t) and a more explicit name should be used to distinguish between the two.
How does a commit affect the state of the working directory? It records the staged changes, which come from the working directory. I don’t know if any file changes it makes outside of the .git directory though.
ugh, so many footguns. I’m not a Go user, and the more I read about it the more likely I’ll stay away from it.
This didn’t even list my favourite Go footgun. Updates to fields of slice types are not guaranteed to be atomic and it is undefined behaviour to write to them concurrently from two threads (goroutines that are scheduled in parallel). If you do, a reader in another thread may observe the base of one slice and the bounds of another. If you are in an environment where you’re allowed to run arbitrary Go code but not the unsafe package, you can do this intentionally to escape from the Go memory safety model. If you’re not writing malicious code, the fact that Go’s type system has no notion of linearity means that you have to be very careful to not accidentally alias objects between threads and trigger this kind of thing intermittently, in a fashion that’s almost impossible to reproduce, let alone debug.
It’s a deeply special kind of language that manages to have a global garbage collector and still not be memory safe.
Holy crap that’s dysfunctional! Why don’t they wrap it in a critical section or something to avoid this kind of bug?
Because it would be expensive and they can’t be arsed, the Go people are fundamentally C people, it’s the fault of the developer if they fall into the language’s traps.
Slices are not the only affected type either, interfaces and maps also suffer from data races for sure. Generally speaking data races undermine memory safety and there’s a fair number of possible ways to get data races in Go: https://www.uber.com/blog/data-race-patterns-in-go/
Interestingly, just like java which has race conditions that can break the String class: https://wouter.coekaerts.be/2023/breaking-string
The Go memory model is that everything is thread unsafe unless you put it behind a mutex or a channel. Why would slices be different from everything else?
That something is not thread safe does not mean it breaks memory safety.
For instance as far as I understand while Java won’t protect the program from data races, the JMM guarantees memory safety in front of data races.
Okay, you can read a crap ton of writing by rsc if you want his take on this. https://research.swtch.com/gomm is the relevant part of a 3 part series.
That does not contradict my comment in any way. And the only thing of interest in that essay is that rsc is self-servingly contradictory:
is immediately followed by
which makes it completely moot.
I mean, it’s in the middle. It’s not classic UB nasal demons, but it could be a cause of data corruption. Maybe it’s a bad choice, but that’s what he chose and you can read his blog series to try to work out why he chose it.
I guess I’m just saying, it was a deliberate engineering tradeoff not “dysfunctional” because they didn’t “wrap it in a critical section or something”.
It’s funny how he first pays extensive lip service to Tony Hoare’s philosophy, especially this one:
only to then cheerfully update the docs to explain that you can get arbitrary memory corruption if you “misuse” the program (well, language).
Arbitrary memory corruption is waaay into classic UB nasal demon territory.
This is a nasal demon:
All loops must terminate. Therefore we can assume this loop must terminate. Therefore we can rewrite the loop to access
*p
or*q
before the loop happens as an optimization. (But what if it’s an infinite loop? Well, that’s UB, so we can assume it won’t.)Go is not any less safe than C/C++ and it specifically rules out some of the UB “optimizations” in C/C++ that give UB a bad name. So, slightly safer than C/C++, less safe than other languages.
I also think “wrap it in a critical section or something” is really breezing past how difficult this would be. Every slice/interface/map would need some kind of mutex or the type system would have to be radically different to prevent aliasing. You’re either talking about a huge GIL style performance hit or a totally different language with a much stronger type system.
I doubt it would be a “huge GIL style” performance impact - it’d be a mutex per slice, not a global mutex over all slices. There shouldn’t be much contention on these mutexes if you’re using it like “you’re supposed to”, anyway!
It seems even these days “it’s not fast enough” is still sufficient argument to avoid important safety features. Which is strange, because runtime bounds checking is part of Go. That’s also quite a big performance impact.
I guess it’s just a matter of time before someone opens a can of CVEs on some large Go codebases, and then we can have this whole discussion again.
Performance. Assigning a slice-typed variable is a common operation. If you had to acquire some kind of lock every time that you couldn’t prove non-aliasing then it would slow Go code down a lot. As @masklinn says interface-typed fields in Go are a pair of a pointer to the object and a pointer to the type, so it’s possible to get type confusion in these by racing and reading one type and the other value.
For maps it’s somewhat more excusable. A map is a complex data structure and updating a complex data structure concurrently is a bad idea unless it’s explicitly designed for concurrency. I think the map implementation used to be in C (special, Plan 9-flavoured C), but it might be pure Go now. If it is, then races there should just leave it in a broken state (just as updating any other non-concurrent complex data structure with data races can), rather than break the fundamental guarantees of the environment.
It’s far less excusable for interface pointers and slices because these are value types that are carefully designed to look like they are primitive values. You pass them around just as you would an integer. If two threads write to the same integer variable at the same time, one will win the race and you’ll see a value that makes sense. This is not the case with other Go types.
The depressing thing is that a type system that understands isolation can address this. When I write parallel code, there’s one rule that I want to follow: no object is both mutable and aliased between threads. Go provides absolutely nothing in the type system to help you spot when you’ve broken this rule. For a language that was designed from the ground up for concurrency, this is inexcusable. This is probably why most of the successful Go programs that I’ve seen use it as statically compiled Python and don’t use goroutines at all (or in a handful of very special cases).
I learned Go from the Go Tour back in ~2011 or so; IIRC, slices and interfaces were explained as being fat pointers or tuples, so I’ve always thought of them as such rather than thinking of them as integers. As a result, I’ve never really run into these problems. I’m very curious how often people are running into this? One of the things I like about Go is it’s pretty straightforward how things work, so you can intuit about stuff like this. I suppose if someone was writing Go like many people write Python or JavaScript–with no idea about the underlying machinery–this might get people into trouble, but on the other hand I don’t know how you can write Go without understanding some basics about memory layout, pointer traversal, etc. Maybe I’ve just been doing this for too long to empathize well with beginners…
How often is that? Go should be in a pretty good position to reason about aliasing.
I agree with you about the way Go should have… the thing Go should have done. But it would probably be more on-brand for them to fix this by designing atomic slices that avoid atomic operations until they are actually contended. Do we know if they’ve tried that?
Why? The type system does not give the compiler any information that it can use to make that kind of decision. If the slice is a local and has not been address taken, it can be assumed to be safe. In pretty much any other situation, the compiler has to assume that it can have escaped to a concurrent context.
I think they were very reluctant to introduce atomics at all, they certainly don’t want more. They want you to design code where objects are held by a single goroutine and you never do racy updates.
TBF in most cases slices are passed by value, in which case there is aliasing on the backing buffer (and there can be data races on that depending what it stores), but there’s no aliasing on the slice itself. Most issues would occur with slices getting captured by a closure or go statement in which case they essentially “had their address taken”.
A bigger issue, I would think, is that you’d need to add a tripleword pseudo-atomic which pretty much means you need a lock (interfaces are only a doubleword so it’s a bit better). And while in theory you could use the low bits of the pointer as your lock flag I’m not sure there’s such a thing as a masked compare exchange not to mention a sub-byte futex / mutex?
Why would you need such a thing? You can implement arbitrary RMW ops (on a single word) with cmpxchg.
Because if you want to smuggle the lock in a pointer you need to test and (un)set a single bit inside a value you don’t know.
Cmpxchg would require changing the structure of the slice to add a new member, at which point you might as well have a normal lock.
Sorry, I still don’t understand. I have used tagged pointers with CAS many times and I don’t see the problem. Find an unused high or low bit in the pointer, save the initial value of the pointer, mask and test the bit you care about, and if the test passed then set/clear that bit and CAS the old value to this new value. Depending on the context, if the CAS fails then either abort or keep retrying (maybe with backoff) until it succeeds.
Recent revisions of x86 and arm have fast two-word atomic reads and writes (avx and armv8.1 respectively). But more obscure architectures do not, so there are tradeoffs w.r.t. performance portability.
Sorry, for posterity, it is armv8.4.
Because accessing anything across threads is already undefined behavior, and your idea would murder the performance of correct code for no real reason. Writing correct code is in no way difficult, and if you do happen to slip up, that’s why your app has a comprehensive test suite, and why you run
go test -race
in CI, which puts everything into “I want to be painfully slow” mode, but bombs as soon as you have a single cross-thread access without a synchronization edge.If I want “arbitrary memory corruption”, I already know where to go for that. Do you really want memory-unsafety in a language that is marketed for externally-facing web servers? Java demonstrates that you can allow data races without compromising the integrity of the runtime itself.
I’ve been working with my current company and doing Go for about 9 years now. We’ve written several nontrivial services, a number of which handle more than 10k RPS. We’ve had zero production issues caused by data races, and… maybe two times in those nine years that someone wrote a race bug, which was caught by automated testing before it made it out the door. It’s not high on my list of concerns. The kinds of access patterns that could even theoretically run into this problem just don’t exist in our code, because people who understand the language have no inclination to write them.
I don’t know the specifics of this issue, but I do know that you’re not supposed to share a variable between go routines like that. If two go routines must work with the same data, you’re supposed to let them communicate it through a channel.
Whether that means it is OK to leave in a footgun like that is a different matter. But this is one of the many “problems with Go” that I somehow magically never encounter in real life.
I don’t see “so many footguns”. I see two things. The bit about
append
is something you learn in the first 30 minutes of using the language. And the other thing… I don’t even know. I can’t figure out what he thinks he’s doing there, or trying to prove, or what it might have to do with anything that might happen to anyone in the real world, because the presentation of the concept is basically nonexistent. And the “advice” about constructing slices is pure nonsense; there’s no justification given for it because none is possible.When I worked on a research database I came up with the idea of storing 128-bit hash codes (instead of the actual keys) to optimize DISTINCT, but I’m not sure if any production database uses this approach.
Cranking away on a hypothetical press release and FAQ for my project since I’m meeting with some Amazon pals on Monday to discuss it :)
From the look of those numbers, io_uring isn’t much faster than naive blocking, maxing out at 112% of blocking. I imagine that’s because the writes are sequential instead of random, or because the userspace program isn’t doing anything CPU intensive.
For disk I/O, it’s unlikely to help much. There are two big benefits to io_uring:
The first matters a lot for operations like sending a small network packet, but if you’re writing entire disk blocks then this is amortised. The second matters a lot if you’re doing a bunch of independent operations across multiple threads (for example, handling thousands of active connections) but doesn’t matter much if you’re a single-threaded workload.
I expect io_uring would be more useful on spinning rust, so a single process can have lots of ops in flight that can be reordered to make the most of disk head scheduling. Should be more efficient than one op in flight per thread.
Why? An async batched interface makes it easier to exploit the internal parallelism of SSDs. And that’s all that io_uring really is: an asynchronous, batched syscall interface. Nothing conceptually unique to I/O even though that’s where this sort of interface is the most obvious win.
Most of that should be possible with existing mechanisms, though last time I checked the AIO subsystem in Linux was so much worse than everything else (FreeBSD, macOS, Solaris) that it was not worth using. Unless you’re actively bypassing the buffer cache, most writes will write to the buffer cache and then be asynchronously written to disk. They can be reordered in the storage stack and in the disk, right up until the next sync.
For reads, if you’re not doing prefetching, it might make a difference. Netflix used to do a one-byte
aio_read
to kick off prefetching (before they addedaio_sendfile
, which largely eliminated the need for this).But the POSIX aio interface is horrible
That’s true and annoying, but it’s a fairly minor change to fix it.
You can also poll
aio_error
(which will returnEINPROGRESS
if the operation is still in flight). On FreeBSD, at least, you can callaio_waitcomplete
to wait for a single completion and usekqueue
to monitor multiple in-flight aio operations for completion.I assumed that the most obvious implementation was to store both of these in the aiocb and have static inline functions that read them in the header (nothing in POSIX says that
aio_return
andaio_error
have to be system calls). On FreeBSD, the error and status results are stored in theerror
andstatus
fields of the_aiocb_private
member ofstruct aiocb
. I’d assumed that they were then queried from userspace (soaio_return
andaio_error
are just lightweight function calls), but apparently not and require system calls to read them.It’s not the best API in the world, but it’s far from the worst bit of POSIX. At least most of the worst bits of it can be fixed by adding things, rather than removing things or redefining the semantics of things.
Talk about damning with faint praise.
Perhaps relevant, here is an article from a few years ago comparing io_uring to Linux AIO using FIO: https://www.phoronix.com/news/Linux-5.6-IO-uring-Tests
As far as I know, io_uring was specifically designed for disk IO, and most development effort still goes into higher speed disk IO. But you probably need huge enterprise NVMe or Optane setups for this to matter much.
Though, Windows copied io_uring specifically for the directx game asset loading thing, so I reckon it must be useful even on regular PCs, at least for easily queuing lots and lots of smallish loads.
This is DirectStorage you’re describing?
I suspect that even if it were only moderately useful now, it might also be viewed as insurance against the possibility that the ratio of disk IOPS to CPU cycles shoots up again in future. Say if someone productionises phase change memory.
Yeah. AFAIK the IORING facility they introduced in Windows a few years ago was originally developed just for DirectStorage.
Another point is that I think the NT kernel is much better at async stuff than Linux, so perhaps it was easier to get better results than it has been for io_uring.
Huh! My initial thought it might have been for WSL1, and added as a generic thing because it was useful outside the Linux context.
I’m pretty sure that has nothing to do with the writes being sequential.
Practically all modern filesystems have a feature called delayed allocation, which means the program written in the article has already exited (and thus has already been benchmarked) before the filesystem even decides whether the file’s data will be allocated sequentially or not (never mind actually writing the data to disk).
This is likely to happen even in the 1 GB file case, as long as enough free RAM is available to hold the written data.
Lots of people don’t have a correct mental model of how filesystems and disk I/O work on modern OSes. The actual, physical disk writes done by the filesystem and the OS are almost always asynchronous with respect to the logical writes done by applications, and often purposefully delayed by 5s to 30s (unless a synchronization mechanism such as
fsync()
is used, and even with respect tofsync()
there are popular false myths).Fwiw I benchmarked versions of these programs using directio and the blocking writes did terrible and the io_uring version did best at 33% it’s non-directio (i.e. kernel-buffered io).
But yeah I haven’t turned on fsync because it seemed like that was irrelevant when comparing write methods.
It seemed only relevant if you wanted to care about absolute numbers, which I wasn’t doing.
Of course. Direct I/O is terrible for performance in almost all cases, which is why filesystems don’t do that unless specifically asked for. There are extremely good reasons for why buffered I/O is the default.
If you want to benchmark the actual I/O subsystem (including filesystem, I/O drivers such as NVMe, and the actual disk devices) in normal conditions, then you’d want to write at least 30 seconds worth of data (enough data to make the test somewhat representative with respect to all sources of timing noise) using normal buffered I/O, but still making sure that you do an
fsync()
at the end (before completing the benchmark), so that you also time all the actual disk writes.Even then it’s much better to write at least several minutes worth of data, as there are devices that lie about doing
fsync()
and thus obtain better benchmark scores by cheating (at the expense of risking data corruption when there are power failures).Note that doing an
fsync()
at the end is not the same as doing direct I/O, which is something completely different.I agree, I’m not sure if doing
fsync
is very relevant for benchmarkingio_uring
, since as @david_chisnall alluded to,io_uring
is more like an alternative syscall interface rather than something inherently tied to I/O. It was initially conceived as a faster way to do I/O (since I/O can be syscall-heavy in some cases), but I think it’s somewhat of a misnomer now (or will be, in the future).So in this case, benchmarking how long it takes for userspace writes to reach the kernel is already quite relevant. Benchmarking additional filesystem and I/O subsystems might also be interesting but probably not very relevant, as they’ll likely be doing the same thing whether using
io_uring
or not.Great suggestions, thank you!
I find this a bit of a weird question. “AST” and “bytecode” are not methods of interpretation, but ways of representing a program. It’s made even more confusing by the fact that that the article itself doesn’t mean “AST walker” and “bytecode switch-dispatch loop”.
ASTs and bytecode are both abstract syntax. An AST walker is the natural interpreter for an AST, in the sense that AST walkers are homomorphisms from trees; similarly, iterative execution of codes (that is, a machine) is the natural interpreter for bytecode in linear memory.
The analogy goes further. Graphs can also be abstract syntax, and graph reduction is the natural interpreter for graphs. Matrices and semirings can also be used for abstract syntax, but I don’t know what their natural interpreters look like; semirings might be naturally interpreted by something like regular-expression engines, but the details are questionable.
And yet there are many, many ways to build interpreters on top of both that don’t fit this supposedly neat model: compiling ASTs to a series of nested lambdas, for example. Or threaded (direct or indirect) interpreters for bytecode. I think your assertion that there’s a natural/obvious single evaluation strategy for either representation is overly simplistic.
Perl is an amusing example. It’s an AST interpreter, which you might expect to be slower than a bytecode interpreter, but Perl is faster than Python. This is partly because Perl augments its AST with a pointer to the “next” tree node(s) in execution order, which mostly eliminates tree-walking faff. The inner loop is basically,
And Ruby is an AST interpreter that (historically) is dog-slow.
Yeah, the performance of sensibly-written AST walkers is really underrated. There seems to be this widely held belief that bytecode interpreters are not just faster, but fundamentally faster. Nothing could be further from the truth. Bytecode dispatch can be really slow unless you’re doing something like translating the bytecode to threaded code.
Those aren’t counterexamples. A homomorphism out of trees merely needs to have the property that each branch is compiled with only the local context of its leaves, and compiling to nested lambdas (known as “compiling to closures”) is such a homomorphism. Similarly, bytecode is stored in a one-dimensional list, which can be viewed as a free monoid, and threaded interpreters like those for Forth are usually monoid homomorphisms, as long as the compiler is not doing
DOES
magic.But they’re not, in practice, all the same. The performance of those various strategies end up being radically different. I have some examples here. So again, my original point stands: it’s bizarre to ask which is faster when each one covers a lot of different techniques.
Linus’s reply to the mailing list might have been a better link than phoronix: https://lore.kernel.org/lkml/CAHk-=whFZoap+DBTYvJx6ohqPwn11Puzh7q4huFWDX9vBwXHgg@mail.gmail.com/
Damn, I was hoping for a flamewar between Linus and Theo de Raadt, but then Theo says “I agree completely”…
His reply, at length, is a good read: https://lore.kernel.org/lkml/55960.1697566804@cvs.openbsd.org/
I remember when he introduced
mimmutable(2)
. I really is amazing how much bs Chrome puts everyone through.Fire and motion, fire and motion.
https://www.joelonsoftware.com/2002/01/06/fire-and-motion/
An old analysis, but substitute the 800 pound gorilla du jour for Microsoft and it holds up well.
what a week, Microsoft explains how-to download and install Linux, and Theo and Linus getting along on a mailinglist.
Must be a bit chilly in hell
Have they been at loggerheads before? From what I’ve gleaned, the projects respect each other but have fundamental disagreements about how to structure a Unix-like.
Linus in 2008: “I think the OpenBSD crowd is a bunch of masturbating monkeys, in that they make such a big deal about concentrating on security to the point where they pretty much admit that nothing else matters to them.”
I think it was Marc Espie who replied “Who are you calling a monkey?” which was a perfect response.
I’ll take your word for it. I’m not that well-versed in US primary school insults.
I interpreted the reply as mock offense at the insult of being called a monkey with the humorous and unwritten acceptance of the accusation that they were masturbating over security. I don’t know, it seemed funny and answered in kind without taking up an argument.
That’s funny, except I’m afraid it shows rather a lack of research effort :-) — none of the people named was an American at the time (and most of them aren’t now, either).
Just shows nerds need some tutoring in how to beef. This stuff is pathetic.
Trying to find a business model for an ultra-low-latency database, before I take out an HEL for the next 6 months to work on it.
Something something high frequency trading, perhaps?
What is a HEL btw? Some form of sabbatical?
Home Equity Loan LOL
It’s not for HFT or any other specific domain, just an MVCC KV DB with about 200ns update transaction latency (currently snapshot isolation but I plan to offer a serializable option).
Is it open source? I’d be interested in checking out database projects!
It’s a fork of https://github.com/gaia-platform/GaiaPlatform but I haven’t made the fork public yet, since I’m undecided on licensing. I’ve discarded most of the features in the original repo other than the core database, which is about 100x faster now (due to removing mmap and IPC from the critical path).
If you’re interested in the core database stuff, check out https://github.com/gaia-platform/GaiaPlatform/blob/main/production/db/core/src/db_server.cpp. This has the key transactional logic (all of which has been moved to the client in my fork).
Linearizability is theoretically great (I’d say the ideal), but also very strict in practice. For example I’m mostly familiar with Postgres transaction levels, none of which support pure linearizability. The default (read committed) is pretty relaxed and allows a few unexpected behaviors, which is kind of scary when you compare it to the more logically “clean” idea of linearizability. My understanding is that linearizability would be correlated with very poor performance.
This definitely has an interesting impact on testing, because you have to plan for and allow a certain class of consistency errors. The definition of correctness has to allow these errors. This for me is vacuous truth - it’s technically true that a consistency error is correct behavior, but I also want to know that eventually the state is consistent. I don’t know if this means that eventual consistency should be expressed as a liveness property?
Anyway, the recommended followup reading to this is Strong consistency models. There’s a whole slew of consistency models out there.
I think you tend to see less talk of linearizability in the context of relational databases because linearizability is a constraint on the allowable orderings of operations on a single object, instead of the multi-object guarantees (like serializability) that someone making use of multiple tables in a transaction is likely more interested in. Linearizability isn’t irrelevant in that context, but it rarely comes up because, as you note, it’s generally a very expensive guarantee, but also because it doesn’t imply serializability: the combination of the two is strict serializability, and the only database I’m aware of that offers it is Spanner.
Speaking of Aphyr, his consistency models breakdown on jepsen.io is a really great resource for an overview of the full logical hierarchy at play and for links to relevant papers.
Most single-node databases that offer serializable isolation at all are also strictly serializable, because non-distributed CC protocols tend to fall out that way. E.g., vanilla 2-phase locking is strictly serializable, and so is vanilla OCC.
I disagree that strict serializability implies poor performance. It is quite possible for a strictly serializable database to perform millions of transactions/second. (Source: I’ve done it.)
I’m in a weird situation. My company did 30% layoffs this week. Roughly half were laid off immediately. I am in the other 15% who were incentivized with a large bonus to “please stay through March as the company transitions”. I’m planning on leaving immediately, but further communications have been slow and full of conflicting (and disappointing) information. So I’m trying to figure out what to do.
Thankfully I will be okay for a while. I’m using up my vacation days while things settle and enjoying the extra free time to hike and do open source stuff! I’ve made great progress on several projects already this week, and plan on doing more this weekend :)
Are you sure? The job market is still brutal for many types of roles. It is much easier and less stresfull to look for a new job when you have a job. Compared to when your time is running out financially.
I was just talking to a friend about how difficult it is to do the job search when you have a job. I feel very anxious every time I take an interview during a work day. Always sleep terribly the night before too. I hate lying to people about where I am. (Yes, even lies of omission.)
I suspect this is more of a me problem though.
You are not alone.
I also find it quite difficult to do job search while working full time. Most of the times I had quit the job and continued with the same commitment until the end (notice times in Europe are multiple months), then took a break and started to search for a new one.
Valid points for sure. And I’m keeping my options open while I wait for more info. However:
So while I may have added stress, overall I think I will be happier to leave now than to stay
Thanks for being a voice of reason. After a weekend to cool down emotionally I’ve decided to stay for a few weeks and reevaluate my situation. Not thriving at work today, but oh well :)
You are correct. BUT: the last one who stayed has to write the documentation eeeek :D
I doubt that the large bonus is 17.6% more than the old salary (which is the work of the sacked 15% distributed onto the remaining 85% people). And the bonus will not rise as more and more of the second 15% break away one after the other. It will get increasingly stressful the longer it takes, as the best and most flexible ones go first.
Which eats the time that you should use to do job interviews.
Grab anything which could be useful (contacts of good colleagues, cheaply sold off hardware and furniture, if you need it) and offboard asap, in a friendly manner. save the 4letter words, you always meet more than once.
If you have a physical key to the office, toss it on the copier so that the key’s number can be read, and get date, company stamp and signature when you hand over the key. This saved my rear side more than once. Because if a few keys are missing from the locking system, they have to switch all keys which is super expensive, and you don’t want to be that guy.
I still have a yucca tree from a new economy office, it was more durable than the office chair.
/u/st3fan ’s statement is entirely correct, but on the other hand, fuck those guys.
Sometimes layoffs are unavoidable, but there are ways of handling them that are better or worse. The way that Microsoft handled them for my colleagues was the main reason that I decided to leave (and, since my official last day was Tuesday, I am no longer bound by their social media policy and can say fuck those guys). When you have to contact HR because you don’t believe that the way things are being handled is legal, it doesn’t actually matter what their response is: if someone even needs to ask the question then you’re not a place I want to work and not a place I want to encourage other people to work at.
My mental health has improved significantly since I handed in my resignation. In hindsight, I probably should have done it back in March.
I don’t want to sound too bitter but it is important to understand that contacting HR is mostly pointless. They have exactly one purpose and that is protecting the company, not you as an individual. They are NEVER on your side or care about your concerns. HR is not your friend and HR will not defend you, they tolerate you at best. There are too many people that misunderstand this. At the end of the day it is important that the company survives, it does not matter if some people need to be fired or how that is handled from HR’s perspective.
My primary goal in contacting HR was to ensure that there was a written record that the folks on my team could require be presented in court if they sued for constructive dismissal and to make it very clear that, if called as a a witness in such a case, I would happily testify to that effect, so that they could use that in their assessment of corporate risk.
This is a very common internet sentiment but it’s important to not generalize to “NEVER contact HR.” Knowing that HR will engage in damage control and not operate with your well being in mind doesn’t mean you can’t or should never contact them. Sometimes they are the only outlet and you just have to speak your mind. Sometimes you need to get something on record for potential future discovery. Sometimes contacting HR is like getting a police report for a stolen item: it’s a necessary first step towards some other goal.
You got fired? Where is your new job?
I’ve been enjoying reading your comments on this site, for whatever that is worth
I didn’t get fired but it became clear that none of the reasons that I’d joined Microsoft five years ago (a company that can do big hardware/software co-designed shifts because it controls the entire stack, with a commitment to being a good citizen of the open source ecosystem, and which understands that you get the best out of talented people by treating them well) remained valid.
Fortunately for me, there is no shortage of jobs for people with 11 years CHERI experience. I will shortly be starting at another company that wants to build CHERIoT chips (more details once they’re happy with it being public). The folks on the silicon team that I was working with on CHERIoT have been incredibly supportive and I’m looking forward to working with them, since we will have some common interests going forward.
When I reached out to HR at Microsoft about a retaliatory performance review full of provably false things, the only action taken was to replace all the provably false things in the review with vaguer statements that were no longer disprovable.
Every megacorp will eventually behave toxically but it feels a little more common at Microsoft from my admittedly limited vantage point.
I’m sorry to hear you ran into it and I wish you well on your recovery.
It’s particularly disappointing because I was part of the T&R Diversity and Inclusion Council. There are senior folks in the Office of the CTO that really understand that behaving well towards employees is essential to get the best work out of them, which is essential to remaining competitive.
When I complained to my manager about how people are on my team were treated, he told me that it was out of his control and out of his manager’s control. That leaves Scott Guthrie and Satya Nadella as the only two that can be responsible. There’s a saying that you don’t leave bad companies, you leave bad managers. When the manager in question is either the CEO or someone who reports directly to the CEO (and whose performance review is conducted by the CEO) it’s very hard to tell the difference.
Congratulations on getting out.
I don’t know how this could have been any worse than how they handled the MSR SVC layoffs back in 2014.
That wasn’t great, but closing a whole lab meant that the impact on the rest of the company was reduced. It was also constrained to MSR, not the whole of MS, and the statutory protections for employees in California are much weaker than in the UK. The most recent round left teams across the company understaffed for things that they’d committed to delivering. It also included redeployments, where people were moved to different groups, largely for beam counting purposes (the new groups rarely matched their interests and often didn’t need their skills), which harmed career development for these people (‘you’re an expert in X, why were you working in a team that doesn’t do X?’). The process that they followed did not match my understanding of their obligations under UK law and definitely was not aligned with the alleged corporate values of ‘respect, integrity, and accountability’.
Which part is egregious? If you’re going to ask people to work under less than ideal conditions (low morale, fewer workers) it only seems appropriate to give a bonus.
They are artificially creating the less-than-ideal conditions, asking people to work under them, probably paying them less than they deserve (15% fewer people tends to make more than 15% more work as deadlines slip and maintenance is deferred, and I doubt their large bonus is 15% of their salary), and then firing them. ie, intentionally burning out and discarding people.
I can’t imagine the 30% layoffs coming as a surprise to anyone with access to see the actual accounts, and they’re the ones who should be planning for this sort of thing. To me it seems that if you have to axe a third of your workforce in one go that means things have been fucked up for a long time, and nobody with decision making power has done anything about it. I bet they’ll get raises for “maintaining positive net growth” though.
Oh, one of those realists :)
Translating between SQL dialects is difficult but at least seems somewhat tractable. But trying to emulate (presumably bug-for-bug) the behavior of decades-old software with millions of LOC seems like a fool’s errand.
PS See also CompilerWorks: https://www.compilerworks.com (looks like they were acquired by Google Cloud after I interviewed with them).
The reason I’m interested is because I just want fake servers for integration tests. Right now I’m working through integration tests against SQL Server in docker and I’ll be doing the same thing for Oracle soon.
I’d much rather have a fake server speaking the wire protocol and returning hardcoded data a la https://github.com/jackc/pgmock because it’s just so much faster not to run Oracle or SQL Server in tests.
Ultimately I haven’t found anything like pgmock for Oracle or SQL Server yet.
I don’t use async Rust, but I’ve seen complaints that the lang-level async support prematurely standardized on the “readiness-based” async model (e.g. epoll/kqueue), while the rest of the world seems to be moving toward the “completion-based” async model (e.g. IOCP/io_uring). What do actual Rust async users (say who might want to use underlying OS support in IOCP/io_uring) think of this?
It’s a common fallacy that the borrow checker is just a substitute for a GC, when it’s much more than that: it can give you safety guarantees (absence of mutable aliasing and data races) that e.g. Java or Go cannot.
This is true, but at least for me these additional guarantees aren’t preventing many bugs. In particular, my code is rarely subject to data races–usually any shared state is a file or network resource that is accessible by multiple processes on multiple hosts and Rust’s borrow checker doesn’t help in these regards, but I still have to pay for it with respect to productivity (yes, I know that productivity improves with experience, but returns diminish quickly and the apparent consensus seems to be that a wide productivity gap remains between GC and BC).
I’m not taking shots at Rust; I’m glad it exists, it’s impressive and ambitious and etc. It just isn’t going to eat Go’s lunch unless there’s some (ergonomic) way to opt out of borrow checking.
why do you say Go (or Java) compilers can not effect such analyses?
Because for a language with unrestricted semantics around mutability, like Go, this would be equivalent to the halting problem.
Languages like Rust or similar use linear/affine/unique types to restrict semantics in a way to make such type of analysis possible. That’s the whole point of linear types.
You can do it at runtime though (see Go’s race detector), and while that’s immensely useful, it’s not the same thing.
Can you show me one that does (i.e. statically detects all mutable aliasing or data races)?
That didn’t answer the question asked.
They can, to a small extent. C compilers too, or Python or whatever. But if you don’t have Rust’s move semantics and Rust’s borrow system, then the compiler doesn’t have a lot of information to do the analysis with.
Compiler has to work in concert with the language – Rust isn’t just C++ with more rules on top, lifetimes and aliasing are a visible part of type system the programmer interacts with. The user spends more time explaining in more details what’s going on to the compiler: in Java you have
T
, in Rust it’s your task to pick betweenT
,&'a T
or&'a mut T
. In exchange for this extra annotation burden, compiler can reason precisely about aliasing and rule out things like data races or iterator invalidation.As an analogy, we can type-check JavaScript to some extent, but we really need TypeScript’s extra type annotations to make this really work.
In other words: Rust is referentially transparent. If you would like a referentially transparent language—or a language which allows you to control and manage mutation—without spurious copies I suggest haskell or ocaml.
The ideal message delivery is a fiction, and building software that requires it is negligent.
What do you mean by “ideal message delivery”? Uniform atomic broadcast? If so, is every system built around Paxos or Raft “negligent”?
To be frank, no project that uses the GPL post-GPL2 will ever be the ‘boring’ variant. Not because it can’t, but because you need to convince lawyers that it’s boring.
And GCC has done little to create a situation where it might be. clang/LLVM breaks the compiler in two, with a frontend and a backend that can evolve independently. Can you even do that with gcc? And I mean, in a practical sense. I know that the frontend and backend of gcc can technically be decoupled, but technically != plausibly.
What does a compiler’s choice of license have to do with its approach to undefined behaviour? Maybe just being dense, but I don’t understand what point you’re making here.
Your information is outdated. You can use GCC backend without GCC frontend, and it is an option supported by upstream. Since GCC 5. See https://gcc.gnu.org/wiki/JIT.
Since the compiler’s license has no effect on the license of your code, nor does GPL3 change anything much vs GPL2 (in reality, I understand there is a lot of spin to the contrary), this seems like an axe to grind more than a contribution
We couldn’t use gcc post-GPL3 when I was at Amazon (or recent versions of emacs for that matter). Do GOOG/MSFT/FB treat gcc differently?
Are you saying that Amazon engineers are not allowed to use Emacs as a text editor?
There were many engineers using emacs, but the official line was that you weren’t allowed to install any GPL3/AGPL software on a work machine for any purpose, and that explicitly included recent versions of emacs (and also recent versions of gcc, which meant the build system was stuck with obsolete versions of gcc). I suspect everyone just ignored the emacs restriction, though. I’m sure a lot has changed since I left in 2014 (I bet the build system has moved to clang), and I don’t know the current policy on GPL software.
Okay, that sounds bad. Thanks for the clarification! 👍🏽
At Microsoft, the policy is surprisingly sane: You can use any open source program. The only problems happen when you need to either:
There are approvals processes for these. There’s no blanket ban on any license (that I’m aware of) but there are automatic approvals for some licenses (e.g. MIT), at least from the lawyers - the security folks might have different opinions if upstream has no coordinated disclosure mechanism or even a mechanism for reporting CVEs.
That sound unsustainable. Do you not already need new builds of GCC to build Linux? Surely if not, then you will eventually. And I can’t see Amazon ditching Linux any time soon
Keep in mind my information is 7 years out of date (I left in 2014, when Amazon was just starting to hire Linux kernel developers).