I actually make pottery. I have more vague shapes these days, but I have been bad about updating the album. I also made a couple of nudes, which are handy when explaining to a toddler exactly where on her butt still needs wiping.
And I could totally accept one or two of the AI turds as art pottery. That’s the blue circle category. But that’s a lot of oddly -located weird pottery.
or because the signature was present but had since expired
The fact that PGP considers a signature produced 2 years ago invalid now because the key expired last month is just one more example of how broken its design is.
There seems to be a recurring pattern where everyone who promises permanent storage without charging money for it regrets it 5-10 years later. (The cost is relative to bytes currently stored, not bytes ingested, but you don’t notice this until your oldest drives start to fail and force data migration.)
However, they do charge money. So I think we’re good here :)
The blog post glaringly omits defer
, which also has a runtime cost compared to straight function calls, but it’s far better than finalizers are. Finalizers are only appropriate when the objects are subject to complex lifetime graphs where you actually need the GC to figure out when they’re dead. For objects with simple lifetimes, just defer C.free_allocated(ptr)
.
Finalizers are only appropriate when the objects are subject to complex lifetime graphs where you actually need the GC to figure out when they’re dead.
Finalizers aren’t guaranteed to be called even when the GC reaps the value, so they don’t actually help in this case, except as a best-effort optimization. Said another way, there’s no reliable way to know whether or not a value is “dead” or not! Annoying, but true.
Finalizers ARE guaranteed to be run IF the GC reaps the value. What is not guaranteed is that any given dead value will ever actually ever be reaped.
And that is absolutely fine. Go is doing the right thing here.
If what you want is guaranteed destruction before program exit then what you do (and this is language-independent … I don’t actually use Go) is:
add the object to a weak (doesn’t cause the object to not be GCd) collection of objects to be destroyed at exit
add a finalizer to the object that removes itself from the at-exit collection, as well as any other actions
at exit, call the finalisers for any objects still in the collection
NOTE if you do this then you have to be very careful because some of the objects may refer to each other, and there is no way in general to BOTH destroy only objects that no other object is referring to AND to make sure all objects are destroyed.
Not running GC at program exit is the right choice, because in fact you would need to run GC repeatedly until a run in which no new objects were reaped.
In fact it is often beneficial for short-lived programs to never GC at all.
Try making a library with…
void free(void *p){}
... and LD_PRELOAD it into a regular C program that uses malloc/free such as gcc. You will find that on a machine with a decent amount of RAM small compiles will actually go faster.
Try doing the same thing with Boehm GC compiled in transparent malloc/free replacement mode, so that free() is a NOP (as above) and malloc() does GC. Virtually any standard C/C++ program you try it on will 1) still work fine, and 2) run faster.
Finalizers ARE guaranteed to be run IF the GC reaps the value. What is not guaranteed is that any given dead value will ever actually ever be reaped.
I think this is the best summary of what’s happening, thank you, and my apologies for confusing the issue earlier.
If what you want is guaranteed destruction before program exit then … add the object to a weak (doesn’t cause the object to not be GCd) collection of objects to be destroyed at exit
AFAIK there is no way to construct a collection with this property in Go, but if you know of a way I’m happy to be educated!
Finalizers and weak maps are sort of different interfaces for equivalent capabilities. You can make a weak map using finalizers, but it’s a pain in the butt. See https://github.com/golang/go/issues/43615
Virtually any standard C/C++ program you try it on will 1) still work fine, and 2) run faster.
Frankly, I would be surprised to see that happen. GC can definitely beat malloc for interesting workloads, but although bdw is an excellent piece of engineering, it is at an inherent disadvantage compared to a proper gc, considering that it lacks precise layouts and cannot move. Historically, malloc implementations were quite bad, whereas bdw’s allocator was rather decent; nowadays, I think the mallocs have largely caught up.
No doubt libc malloc implementations are a lot better now than when I first tried this experiment in the late 90s. From memory, the last time I did the experiment was 2017 when I was working on an LLVM-based compiler for OpenCL. For compiling typical OpenCL kernels it was definitely better to use a simple bump-pointer malloc(), disable free(), and free all the RAM in one go after compiling the kernel.
In my experience, most objects that contain pointers are mostly pointers, and finding and parsing the relevant layout information takes longer than just assuming that everything might be a pointer. On anything where the heap is small compared to the address space (e.g. all 64 bit machines) a simple bounds check (size_t)(ptr-heapStart) < heapSize
instantly eliminates virtually all non-pointers.
Being able to move / compact objects is an advantage, for sure, but C malloc() libraries can’t do that either, so BWDGC is not at a disadvantage relative to malloc().
Mallocs typically use freelists that have good temporal locality: you can free an object, and soon after, when you allocate another one, you can reuse the memory from the first, while it is still hot in cache. Good tracing gcs generally can’t do this, but make up for it with good spatial locality, which mallocs can’t typically provide. Bdw provides neither.
Bumping and never freeing is good if most objects live long, but this is unlikely to be the case for graph mungers such as compilers, so I find your result somewhat surprising. See e.g. shipilev on hotspot’s version of this. Are GPU kernels are usually less heavily optimised than typical CPU code?—not my area of expertise here.
most objects that contain pointers are mostly pointers, and finding and parsing the relevant layout information takes longer than just assuming that everything might be a pointer
Conservative has its own issues—in particular, you have to identify where objects start, whereas you probably would not otherwise have needed to support interior pointers. And there are middle grounds. Hotspot will partition all the fields of a class into those that are pointers and those which are immediate, placing all the pointers contiguously, so your ‘layout information’ is just the offset where the pointer fields start. (And note you need some layout information anyway to find out how big the object is, so this is not a huge imposition.) Implementations of ocaml and lisp use tagged pointers, where the low bits of each word are a tag, which tells whether the high bits are immediate or a memory address, so this information can be gleaned latently (in lisp’s case, obligatorily, since it is dynamically typed). (This is why ocaml uses 62-bit or 30-bit integers—a language deficiency; they should have transparently upgraded to big integers, like lisp.)
Also note that tracing tends to be bound by latency/memory parallelism, so there are some cycles to burn parsing layouts; obviously you’d rather not have to store them, but.
Mallocs typically use freelists that have good temporal locality: you can free an object, and soon after, when you allocate another one, you can reuse the memory from the first, while it is still hot in cache.
Modern malloc try very hard not to do this because it makes it very easy for attackers to turn a use after free bug into a stable exploit. In snmalloc, we treat the free lists as a queue and add some randomisation so that it’s hard for attackers to predict where a freed object will be reallocated. Other modern allocators have similar mitigations (though with variations to fit better with their internal data structures).
The overhead of most defer calls is significantly better after the implementation of open-coded defers in https://golang.org/cl/202340, but I’m not certain if this particular call benefits.
Here are my results from hacking in an allocate-defer benchmark to his code:
goos: linux
goarch: amd64
pkg: example/cgotest
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
BenchmarkAddition-8 19435881 52.43 ns/op
BenchmarkAllocate-8 10378263 110.5 ns/op
BenchmarkAllocateDefer-8 10262841 114.2 ns/op
BenchmarkAllocateFree-8 20076832 60.31 ns/op
BenchmarkAllocateAuto-8 1000000 1094 ns/op
BenchmarkAllocateAutoDummy-8 1000000 1142 ns/op
BenchmarkAdditionGo-8 1000000000 0.2452 ns/op
PASS
ok example/cgotest 7.603s
It seems to be way better than finalizers, but defer is still 3-4% slower than calling it manually. Not a huge loss, but it’s noticeable with this small workload.
This would have been more clearly phrased in terms of capitalism and communism. Consumers in a capitalist society are trained to purchase products. That urge to purchase shouldn’t be conflated with genuine desires, whatever those may be.
Okay, let’s use the article’s example: consumers don’t have a genuine desire to purchase Twitter/Bluesky verification, nor to purchase Mastodon accounts. They have a genuine desire to interact with other people and express themselves in public. But capitalism can’t provide genuine human interaction as a product.
But capitalism can’t provide genuine human interaction as a product.
My daughter uses FaceTime on our iPad to speak with her grandparents in another country. She uses Messenger kids to organize playdates with her friends.
I see. When you wrote
But capitalism can’t provide genuine human interaction as a product.
You meant a literal human interaction. I interpreted it as giving us products that enable human interaction.
I am now curious. What economic system do you think provides genuine human interaction as a product, since you singled out capitalism.
Genuine human interaction isn’t an economic good! As soon as it is transactionalized, it is commodified, and any alterations ruin the candor. In general, any amateur interaction cannot be sold as a good.
But capitalism can’t provide genuine human interaction as a product.
I still don’t understand why you specifically put in capitalism in this sentence.
Because the entire thesis of the article is phrased with a consumerist worldview which is only viable under extractive capitalist societies. (And I’ve yet to see a non-extractive capitalist system.)
Genuine human interaction can be a product, but it cannot be an economic good. Capitalism deals in economic goods.
Yeah, even in a communist financial structure there would still be a need for polished final products. Otherwise you’re just producing things to say you’re producing things without a care for whether it actually gets used.
Otherwise you’re just producing things to say you’re producing things without a care for whether it actually gets used.
Wasn’t that exact outcome among the failure points of communist command economies?
(I hope I’m adding something, and not just ruining a joke by pointing it out; sometimes sensing tone on the internet is hard)
Not to derail what is apparently an off-topic thread, but what exactly does it mean for a software package to be a “final product”? I understand what it means in terms of capitalist logistics: each boxed copy of software is sold with its own license. But in the Free Software community – a functioning commune – what does it mean to finalize or productize a hunk of software, beyond compatibility with capitalist worldviews?
Maybe from a usability perspective: to finalize and productize is to take further steps than just the initial hacking to turn a tool or a system into a well-delineated and documented unit fit for use by others (and potentially many more) than oneself. That said, a lot of usability comes from familiarity and is thus is largely based in the current culture of use, which can be intertwined with capitalism.
Converso elsewhere asserts that WhatsApp “generates unencrypted chat backups in Google Cloud or iCloud,” however this is also untrue, or misleading at best. WhatsApp backups are optional and end-to-end encrypted – unless the user turns off the encryption.
Misleading for sure. WhatsApp backups were uploaded to a dedicated slot in Google Drive, and reacharound access to this slot was handed to Facebook. This changed after it was exposed.
I would genuinely love AP over IMAP.
I want to get my friends’ post into folders with their names on them. It would be more like a messaging app than twitter-style firehose, but I could use the email client that’s accessible to me.
Might be something that could be implemented as a Dovecot storage plugin to save a lot of the IMAP implementation hassle (obviously with something else putting the AP stuff into the storage.)
Futexes are the thing that the implementation of your condition variable / monitor variable uses to park the thread when the condition isn’t satisfied.
I see. Condition and monitor vars are already extremely low-level synchronization primitives, and are notoriously difficult to “get right” in application code. A primitive at an even lower level of abstraction doesn’t strike me as something that can be effectively deployed by applications. But, to each their own!
You absolutely should not be using futex in application code. It is a kernel interface for implementing low-level locking (and a few other) primitives in standard libraries and concurrency libraries. The idea behind a futex is to allow a pure userspace fast path. If you acquire an uncontended mutex or release a mutex with no waiters, this can be a single atomic op that doesn’t use futex at all. When you are contended, futex gives you a way of blocking until a memory word changes or waking threads that have blocked on a specific word.
As the article describes, you can use it to implement a mutex. You can also use it for counting semaphores or for condition variables. It’s also flexible enough that you can use it for other things that don’t look quite like locks, For example, if you have a fixed-size lockless ring buffer with producer and consumer counters then you ideally want to sleep in the producer thread when the ring is full and in the consumer thread when it’s full. You can use a futex wait operation on the consumer and producer threads for this, respectively. This is much simpler than building the same mechanism out of a mutex and a condition variable. That kind of data structure will typically live in a concurrency library though, you probably don’t want to implement it yourself.
They’re at approximately the same level of abstraction imo, when you’re looking at application design without diving deep into performance, but futex is the one that’s a system call.
Here’s a concrete example of using it to implement a mutex: https://github.com/rust-lang/rust/blob/9881574/library/std/src/sys/unix/locks/futex_mutex.rs#L25-L61
Note that line 56 and line 94 are the only ones that actually uses the futex syscall - the rest are all atomic operations.
It is definitely not only you, it seems that a lot of people think that the paper is worthless, because it has “wrong form” – doesn’t have working code, or doesn’t have formal proofs, or glosses over many details.
But to me that seems like an incorrect reaction? Like, it’s a perfectly fine written paper which presents an elementary algorithm in some level of detail. If this was a paper for novel 2-SAT algorithm, rather than for 2-MAXSAT, there would be zero questions like this.
It is possible to read the paper and just point out a specific flaw. Maybe some reduction is wrong, and there’s a counter-example, where the algo is giving a wrong result. Maybe some size calculation is wrong, and some graph is not N*M, but N^M. Maybe there’s some specific gap in the reasoning, like “and now we check if these two graphs are isomorphic using an obvious quadratic algorithm”.
I guess, the crux of my objection is that the paper is perfectly falsifiable as is — if one has read through the mobile book, has pen&paper and a couple of hours of free time, one can just go through the paper and decide if it is correct or not. Demanding more rigour here seems unwarranted.
Of course, one can say that the prior for this being correct is extremely low, and that its not worth the time to look for the error, but that seems substantially different from “no implementation = does not count”.
Demanding more rigour here seems unwarranted.
I absolutely disagree. P=NP is the most famous problem in CS. If you’re going to claim to overturn it where 1000s of other CS experts have failed over the last 50 years, you need to supply more than hand-waving and zero peer review.
I think I disagree with that. The amount of peer review has zero bearing on whether the algorithm is correct or not. Physical truths are determined by experimental observations, mathematical truths are determined by the proofs. What other people think or say does not affect what actually is fact, and what is not.
You can and should use peer review as a heuristic for what is worth paying attention to. You can also use peer review as a proxy for cases you can not judge yourself.
But this is not the case where you can not judge yourself. This particular paper is elementary, you can just go and find a specific issue. It is falsifiable with little effort, you don’t need peer review shortcut if you can do the work directly.
But the nice thing about peer review, is it save the other million onlookers like myself having to wade through this as non-experts every time some crackpot thinks they have yet another P=NP solution.
I agree with both of you. Really, the author of any P=NP proof should provide extremely hard evidence:
Note that this last requirement looks a lot like the top comment here, itself mirrored from the orange site. This works because of the folklore that exponentially-large resolution trees naturally arise when proving pigeonhole problems for 3-SAT have no solutions; any P=NP proof must explain it and show how to defeat pigeonhole problems quickly.
If you really had a working algorithm for solving an NP-hard problem, you could also support your claim by posting some stranger’s private RSA keys.
Or if it’s fast enough, use it to mine a few bitcoin blocks.
I might be a fool but I am worried about non peer reviewed literature from an economic perspective. I find it hard to refute we live in an era of information overload. Given that we are just experiencing a boom of generative content, there will be much, much more academic production, making proofs that can be checked mechanically will be essential.
This is not too say that the idea is not merit of evaluation, and I believe we can agree that we have to be extra zealous of rigor in this case? Because if you read the paper, you will not find what you would expect - a hypothesis so deep being solved without the use of any new mathematics would yield to a simple argument? Contrast with the proof of fernat’s.
But I was too harsh on it, I am sorry for being impolite in this forum that is dear for us.
What is this hex0 program that they are talking about? I don’t understand how that is the starting point, could someone expand?
The program is here: https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX/x86/hex0_x86.hex0
It’s a program that reads ASCII hex bytes from one file and outputs their binary form to the second file.
Yeah, I think this is pretty confusing unless you’re already very guix-savvy; it claims to be fully bootstrapped from source, but then in the middle of the article it says:
There are still some daunting tasks ahead. For example, what about the Linux kernel?
So what it is that was bootstrapped if it doesn’t include Linux? Is this a feature that only works for like … Hurd users or something?
They bootstrapped the userspace only, and with the caveat that the bootstrap is driven by Guix itself, which requires a Guile binary much larger than the bootstrap seeds, and there are still many escape hatches used for stuff like GHC.
reading the hex0
thing, it looks like this means that if you are on a Linux system, then you could build all of your packages with this bootstrapped thing, and you … basically just need to show up with an assembler for this hex0
file?
One thing about this is that hex0
calls out to a syscall to open()
a file. Ultimately in a bootstrappable system you still likely have some sort of spec around file reading/writing that needs to be conformed to, and likely drivers to do it. There’s no magic to cross the gap of system drivers IMO
Hex0 is a language specification (like brainf#ck but more useful)
no, you don’t even need an assembler.
hex0.hex0 is and example of a self-hosting hex0 implementation.
hex0 can be approximated with: sed ‘s/[;#].*$//g’ $input_file | xxd -r -p > $output_file
there are versions written in C, assembly, various shells and as it is only 255bytes it is something that can be hand toggled into memory or created directly in several text editors or even via BootOS.
It exists for POSIX, UEFI, DOS, BIOS and bare metal.
I have no existing insight, but it looks like https://bootstrapping.miraheze.org/wiki/Stage0 at least tries to shed some light on this :)
These were rejected from glibc as ‘horribly inefficient BSD crap’ by Ulrich Drepper (glibc maintainer at the time). I wonder how many millions of dollars of security vulnerabilities that one decision was responsible for.
There are good reasons for avoiding these functions. They are memory safe, but they might accidentally truncate strings, which can have other security implications (imagine if the strings are paths, for example). Generally, they catch bugs where you think you’re tracking the length of a string but the length is wrong, which should be impossible with a good string abstraction but is depressingly easy in C.
C++’s std::string
is one of the worst string abstractions in any programming language, but eliminates the entire class of bugs that functions like this in C were designed to help mitigate. These days, if I need to deal with C string interfaces, I use std::string
or std::vector
and check in code review that every call to .c_str()
or data()
is accompanied by a call to .size()
at the same location. If you have to use C, there are a load of string libraries (I think glib has a moderately competent one) that avoid all of these pitfalls. Dealing with raw C strings is best avoided where possible.
C++’s
std::string
is one of the worst string abstractions in any programming language
In what way? C++ deficiencies aside, and maybe the requirement for C-during compatibility (which I would argue is also a C++ deficiency), its worst sin seems to be that string_view
took until C++17, which is hardly a fault with std::string
as an abstraction.
So very many reasons. A few off the top of my head:
std::basic_string<char>
(a.k.a. std::string
).std::string_view
also does). I’ve managed to get 50% aggregate transaction throughput processing improvements in server workloads and similar efficiency gains in desktop apps from changing the underlying representation of a string to tailor it to the specific workload, leaving that kind of performance on the table makes no sense for a language that generally micro-optimises for performance at the expense of everything else.size()
method returns the number of characters in the string. A std::string
is required to have a null byte at the end of its buffer (not explicitly, but as a result of a combination of other requirements), but whether this null byte is counted as part of the length depends on how the string was created and so you can end up with exciting length mismatches.The only thing which doesn’t seem to be a C++ deficiency here is the 4th item, and maybe the 5th.
And the 4th seems relatively normal? You don’t explain what your changes were, but even for C++ the standard collections can’t cater to every use cases, and trying to do so can yield significantly worse results (std::vector<bool>
being one such cautionary tale).
The only thing which doesn’t seem to be a C++ deficiency here is the 4th item, and maybe the 5th.
I disagree, most of these are fixable in C++. A number of them are fixed in C++ with third-party libraries. ICU has string objects that don’t suffer from any of them, though ICU is a huge library to pull in to just fix these problems.
And the 4th seems relatively normal?
Most Smalltalk-family languages (I think JavaScript is probably the only exception?) provide an interface for strings and permit different implementations. Objective-C has APIs for very efficient iteration, which allow implementations to batch. ICU’s UText abstraction is very similar and is written in C++.
You don’t explain what your changes were, but even for C++ the standard collections can’t cater to every use cases, and trying to do so can yield significantly worse results (std::vector being one such cautionary tale).
This is a false dichotomy. The goal is not to cater to every use case, it’s to allow interoperability between data structures that cater for each specific use case. My string might be represented as ASCII characters in a flat array, UTF-8 with a skip list or bitmap for searching code point breaks, a tree of contiguous UTF-32 code points for fast insertions, a list of reference-counted immutable (copy-on-write) chunks for fast substring operations, embedded in a pointer with the low bit for tagging to avoid memory allocation for short strings, or any combination of the above (or something completely different - this list is a subset of the data structures that I’ve used to represent runs of text in different use cases). In a language with a well-designed string abstraction, no one needs to know which of these I’ve picked and I don’t need to make that decision globally, I can choose different representations without changing any callers.
With Objective-C’s NSString
and NSMutableString
(which are not perfect, by any means), I can try a dozen different data structures for the strings in the places where string manipulation is the performance-critical part of my workload, without touching anything else. I can do the same in C++ codebases that use ICU’s UText. I can do it with (usually) better performance in C++ with a few C++ string libraries that define abstract interfaces and template their string operations over concrete instantiations of those interfaces. I cannot with interfaces that use std::string
and I cannot easily write a templated interface where std::string
is one of the options because its representation leaks into its interface in so many places.
I disagree, most of these are fixable in C++. A number of them are fixed in C++ with third-party libraries. ICU has string objects that don’t suffer from any of them, though ICU is a huge library to pull in to just fix these problems.
You can’t change the existing APIs, they just can’t be fixed.
Adding “correct” behaviour and enumerating multibyte characters correctly means you need a significant chunk of the complexity of ICU, otherwise you’re restricted to just enumerating codepoints, and you lack many of the character introspection functions you would often want.
If you can operate on code points in the core string APIs then it’s easy to add the richer Unicode things in an external library that cleanly interoperates with your core standard library.
That argument applies to returning chars as well, and returning individual bytes is much more efficient, etc.
What is the case you see where enumerating code points is the correct behavior?
To me, it all comes down to the ability to provide my own data structures. Consumers of text APIs want to be able to modify code points (e.g. to add or remove diacritics), they never want to be able to add or remove a single byte in the middle of a multi-byte character because doing so can corrupt the whole of the rest of the string. If the storage format is exposed, I can’t store the raw data in a more efficient encoding. For example, both Apple and I have made some huge wins from observing that a very large number of strings in desktop applications are short ASCII strings (path components, field names in JSON) and providing an optimised encoding for these, behind an interface that deals with Unicode characters so calling code is oblivious to the fact that the data is embedded in the pointer most of the time (in desktop apps that I profiled, this one optimisation reduced the total number of malloc calls by over 10%, Apple took it further and added 6 and 5-bit encodings with the most common characters, so probably save even more in iOS). If the caller needs to know that this string is ASCII (or one of Apple’s compressed encodings) because bytes are exposed then I can’t do this optimisation without modifying callers.
This implies it would be more valuable to have explicit iterators (e.g no randomized default), e.g. ::codepoints(), ::characters(), ::bytes_if_thats_what_you_really_want(), etc
::codepoints() on its own means that the common cases (displaying characters, substring, etc) aren’t possible without using a separate library to do the codepoint->character coalescing. The bytes->code point conversion is at least a trivial static one, that anyone could do. Code points to character conversion is the hard one, and it’s one where you want to be in agreement with your platform, not whatever external library version you’re on.
Of course the problem is that that might make iteration “less efficient” per the C++ committee, and heaven forbid anyone do something useful and correct if it might be slower than doing something incorrect \o/
When it doesn’t contain unicode, it doesn’t capture the encoding that it is storing in any way, so good luck if someone hands to a std::basic_string (a.k.a. std::string).
This is no longer an issue. If I have an 8-bit string, the encoding is UTF-8 unless otherwise specified. And I’m going to demand a wrapper struct to carry the encoding identifier along side the string. And a detailed explanation for why you aren’t using UTF-8 or converting to UTF-8 at the system ingest boundary.
Good for you. C++, in contrast, will happily construct a std::string
from a char*
provided by some API that gives you characters in whatever character encoding LC_CTYPE
happens to be set to by an environment variable for the current user. Now, to be fair, that’s a POSIX problem not a C++ one, but it’s still a big pile of suffering.
Pretty much all these problems are a by product of the age, much like every other API from that era. Then you run into the general unbreakable API problem all C++ APIs have.
The real issue is “wtf has there not been a new interface added that can handle the multibyte characters?”. Part of the problem is that the various C++ committees seem super opposed to anything that adds “overhead” and their definition of overhead can be annoying :-/
I would argue that conflating representation with interface is not a problem of age. Smalltalk-80 didn’t do this and it was one of the influences of C++. OpenStep kept the separation to great effect (and it was critical to NeXT being able to run rich DTP applications in 8MiB of RAM) and it predates the C++ standard that introduced the core of the modern standard library by six years.
whether this null byte is counted as part of the length depends on how the string was created
I’ve never heard of this issue and never run into a string whose length includes the trailing null — have I just gotten lucky in all the time I’ve been using std::string?
If you construct them from C strings, you’re fine. I’ve hit this problem twice in production code, in about 20 years of using C++. The second time was fairly quick to debug because I knew it could be a problem. The issue the first time was that one string was constructed from a C string, the other from a pointer and length, and the length (due to the flow that created it) was the length of the allocation including the null. One was used as a key in a hash table, the other was looking it up. The two strings were different and so they didn’t match. It took me a day of debugging because even printing the two strings in a debugger showed the same thing. I eventually noticed two identical strings with different lengths.
Oh! Yes, you can get string objects with embedded nulls, including at the end. I agree, that can be super confusing when it occurs.
I hope it’s obvious to everyone at this point that any time a corporation is comparatively better than others at respecting the people who rely on its stuff, this is purely a temporary state of affairs.
Free software projects are never as polished or featureful as corporate ones, but it’s quite rare for them to suddenly grow surveillance tooling. To me, this is worth it. Everyone can make their own decisions of course…
Free software projects are never as polished or featureful as corporate ones
Honestly, I’ve found the opposite to be true. I think it is fair to say they’re rarely as easy for novices to learn, however.
I’ve always found semaphores (compared to other concurrency concepts) hard to understand, and not terribly useful. I don’t think I’ve ever actually used one. I don’t doubt that they have their place for certain problems, but they always seemed more like a niche thing, and when reading about concurrency they always seemed to me to get more airtime than they deserved.
I might but the ‘not terribly useful’ bit, since that’s very situational, but why are they hard to understand? They model a bucket full of flags. You want to do a thing, you take a flag. No flags? Wait. You want to allow someone else to do a thing? Put a flag in the bucket. These were used almost 200 years ago to control access to segments of railway lines, you don’t even need to understand anything about computers to understand them.
I knew a semaphore was a kind of flag irl, but I never realized this is what they had in mind with the abstraction….
Interesting. For me it’s the opposite, i never found any concurrency model that I can reason about easier than the original P() and V() introduced by Dijkstra. Typically one will wrap access to any shared resource in a semaphore to prevent race conditions.
What’s the difference between “wrap[ping] access to any shared resource in a semaphore” and protecting it with a mutex?
If you just mean using a semaphore like a mutex, that’s probably not the part which people have trouble understanding about semaphores.
What’s the difference between “wrap[ping] access to any shared resource in a semaphore” and protecting it with a mutex?
A mutex is a specialisation of a mutex with two key properties:
This means that you can trivially use a semaphore for any use case where a mutex is also appropriate, but given its additional restrictions a mutex may be either more efficient or may give better debugging features. Some mutex implementations are thin wrappers around semaphores.
Generally, you want to protect a resource with a semaphore if you don’t want either of these properties. For example:
Notably, the Rust MutexGuard type is !Send because some prominent OS mutex implementations will crash the process if a mutex is unlocked on a thread that did not lock it.
Depending on the options used to create it, POSIX says that unlocking a not-owned mutex is either UB or will return an error. On some platforms, normal mutexes default to behaving like error checking ones.
POSIX also has a notion of a ‘recursive’ mutex, which is a mutex that you can lock multiple times from the same thread and unlock the same number of times. To implement this, a mutex needs to contain an owning thread and a lock count. The lock count is typically also used in normal (non-recursive) mutexes, it’s just that you never try to move it up from 1. If you have space for the owner, then on non-recursive mutexes then you basically get error-checking mutexes from the same code paths and the implementations are simpler of recursive, error-checking, and normal mutexes are the same shape. The cost of error-checking mutexes is pretty trivial - it’s one extra load and a compare-and-branch (if you store something like the %gs base there), and the branch is always statically predicted not taken.
I meant them as being equivalent. It’s mostly a matter of what you or your API calls it.
If you just mean using a semaphore like a mutex, that’s probably not the part which people have trouble understanding about semaphores
What else does it do? It just let’s a finite number of concurrent executions access a resource. Often times just one. What is there more to understand? Honest question.
wallclock-based rate limiters.
You mean something like that? https://stackoverflow.com/a/668327
This is actually an area I was really interested in a while back, and I spent a good bit of time experimenting with using neural networks to model logic gates.
Some activation functions like GCU are able to solve the XOR problem in a single neuron. They proved that using this fancy activation function allows for networks to be more expressive in less neurons and even train faster in some cases.
I built a little neural network demo that runs in the browser and included GCU as one of the available activation function types, and it provides benefits for this simple use case as well: https://nn.ameo.dev/
I found in my own experiments that it can often be effective to mix activation functions between layers. Complex ones like GCU work well in early layers, and then simpler ones work well in later layers for combining together the signals more precisely. I did some research into building the simplest activation function I could that could solve the XOR problem too, with my results here: https://cprimozic.net/blog/boolean-logic-with-neural-networks/
Another thing to consider is that artificial neurons as we currently use them are quite specific, and there’s a lot of room to expand or tweak neural network architectures without even having to move away from the automatic differentiation. One thing to worry about, though, is making things complicated or conditional enough that you can’t just matrix multiply your way through it. That could have the effect of making things much more expensive to train or slow to infer with.
to save a clickthrough: GCU is x * cos(x).
The paper does some dirty pool in cutting off the activation graphs before 2pi. The oscillation instability at high values where floating point rounding takes effect feels like it could easily be a problem.
But even so, systems using the LUKS2 header format used to default to argon2i, again not a memory expensive KDF. New versions default to argon2id, which is. You want to be using argon2id.
Please forgive me, but what the fuck is this?
I’ve implemented all versions of Argon2, and the claim that any one variant is somehow not memory hard, while the others are are… well, new. Here’s what I’m aware of:
Now I’m assuming perfect timing attacks. I’m not aware of those having been performed, or even being possible at all. So I don’t know the best trade-off there. But as far as I know it is highly threat model dependent. It is not clear to me which would be better. I’m pretty sure about one thing though: Argon2i is not a bad default. Just use 3 passes with as much memory as is tolerable (I personally set my password manager to 1 second), and you’re done. Argon2id is better when you don’t fear timing attacks too much, but unless there’s a new attack I’m not aware of Argon2i remains memory hard.
Summoning @soatok in case I missed something, and here’s a link to /r/crypto as well.
So the takeaway is “General usage: pick argon2i. For disk encryption specifically, argon2id is stronger.” because timing attacks are not a concern.
I could advise something like that. I do have to concede however that lately Argon2id tend to be recommended as the default. Libsodium change its default from Argon2i to Argon2id a few years ago (likely 2017), and RFC 9106 does not recommend Argon2i at all. (In fact the RFC considers Argon2id to be the primary variant. Funny that historically it was the latest.)
For some time there was this idea that Argon2id was immune to timing attacks. If that were true it would utterly dominate Argon2i. Unfortunately it’s not that simple, so I need a deeper explanation. I don’t know, maybe there currently is no good side channel attack on Argon2id? Or maybe there is, but the chances of timing attacks are low enough that it’s wort the trade-off?
Note: The author has kindly disavowed their error, their claim is now stricken through.
Not sure about their new claim about GPU attacks though. The only thing I can say about that is that Argon2i uses 3 times less memory in practice (because 3 passes), and that could indeed make it a little more vulnerable. But if we’re talking strong password hashing as used for full disk encryption this is still a crapton of memory (at least 300MB), so I’m not sure it matters that much.
Personally I prefer to talk about the better, more theoretical attacks, which give us a better idea of what we’re up against in the long term (or with state-level attackers, which is very much the case with our anarchist friend). For the RFC (published in 2021/07), the best reported attacks when using 1GB of memory give attackers the following advantages (smaller is better):
For constant time defenders (they can spend a fixed amount of time on each hash), the strongest options are:
Now what if an attacker can mount a magical timing attack that reveals all secret-dependent access patterns, I think we get the following:
Argon2d reduces into a fast hash, which destroys its purpose as a password hash. That’s why it was never recommended for regular password hashing where timing attacks are a concern (like a PC where untrusted programs may be running). Argon2id gets its initial advantage multiplied by 5 (ouch). Argon2i is unaffected, and now the winner. Still, in relative terms the differences aren’t that big:
If side channels are a concern but not a certainty I’d be hard pressed to determine which is the better candidate.
One caveat: I may have painted Argon2i in a better light than is warranted: because it uses 3 passes it also uses 3 times less memory, and that makes it weaker in practice. I expect however that the effects are even subtler than what I’ve just outlined.
Not at a computer at the moment, but I kinda suspect that you can compare them, just not in the way the author is expecting.
If you define:
f := one.v
Then my guess is sameFunction(f, f)
will return true
, as it will only instantiate one.v
once.
I’d be happy and slightly intrigued to be proven wrong though if somebody wants to try it out.
Just remembered this. I tried it this morning and, sameFunction(f, f)
does indeed return true.
f := one.v
fmt.Println("f matches itself:", sameFunction(f, f))
---
sameFunction: 0x102df9390 0x102df9390
f matches itself: true
Wow, the original version was accompanied with a comment of “Don’t even think about changing this.”
Haunted graveyards are bad!!
In the second paragraph, I’m already realizing the solution:
Read + Seek + Clone
. If something is random access, the Read+Seek object is a position handle, not the sole access pathway to the data (whichRead
must accommodate).