I guess I can’t do bulk insertion with this as with bloom filters. (Add 1000 elements to a fresh bloom filter, then OR that one on top of the one I want to insert into.) Also, bloom filters increase their failure rate with more insertions, this data structure fails instead.
An interesting variation of bloom filters to touch fewer cache lines is used in some ELF files: https://blogs.oracle.com/ali/gnu-hash-elf-sections
Visualization for bloom filters: https://www.jasondavies.com/bloomfilter/
Visualization for cuckoo hashing: http://www.lkozma.net/cuckoo_hashing_visualization/ (just imagine the two arrays are actually just one)
Future work It is rather depressing to reflect on the amount of effort that is still required for such a project. Coq has a very steep learning curve. There is no tool support to automate the translation from Haskell to Coq.
Just want to point out that by now https://github.com/antalsz/hs-to-coq exists.
at first glance I thought this was about pypy, and my second thought: “heh, no way it builds in 6 minutes”
I personally dislike rustup. Why cant cargo handle getting the right compiler version for building different projects?
Wouldn’t that encourage people to pin their current version of the compiler and never upgrade? As far as I can see the rust devs already take great care to not break existing code, so sticking with rustc master should be safe.
Rustup allows you to pin your current version as well. It’s very useful, especially if you’re working on nightly. Sticking to “rustc master” isn’t guaranteed to not break things, as that’s the whole point of nightly; there’s no stability guarantees. Those only apply to stable releases.
The real issue here is that rustup also manages cargo versions, so having cargo do it would be weird. It’s about separation of concerns: rustup manages toolchains, cargo manages dependencies and building, rustc manages actually compiling your code. You can replace anything at any layer.
I don’t really see a distinction between a toolchain and a dependency. I guess you need to break the loop somewhere though.
A knee-jerk reaction out of half-hearted privacy concerns. With a driver in kernel mode and a hardware with direct RAM access the computer can be considered ‘compromised’ anyway.
EDIT: Oh, and the irony when it asks you to download and run some random .exe at the bottom. Over http.
With a driver in kernel mode and a hardware with direct RAM access the computer can be considered ‘compromised’ anyway.
Doesn’t windows require driver signing these days (which doesn’t solve everything, but creates at least some accountability)? And modern hardware tends to come with an IOMMU that controls hardware’s access to RAM, no?
Is it a proposed feature, or is it in mainline? Is it micro-kernel features tacked onto Linux? How much is the overlap with current primitives?
More gifts from the greeks: https://www.phoronix.com/scan.php?page=news_item&px=BUS1-In-Development
There are a few things in this post that I’m unhappy with.
Translating subset relationship as →.
If you have two sets A and B and they are defined as A = {x ∈ X | p x} and B = {x ∈ X | q x}
then ∀ (x:X), p x → q x means the same as A is a subset of B.
But that’s not what this post uses.
To demonstrate the mistake made: “Real numbers are not a subset of the integers.”
Proper formalization: ¬(∀ (n:numbers), is_real n → is_integer n)
What this post uses: ¬(real → integer)
Since I have a proof for real → integer namely floor the latter can be disproven.
It can be disproven even though the original sentence is true because they don’t
represent the same proposition.
Mixing the language being analyzed and the language in which the proof is made.
On the level of the proof S and T are not types, they are values. Values of a type I’ll call OOType.
You can of course define a function lift : OOType → Type but then your signatures on proof level would look like lift S -> lift T not S -> T.
Missing the point of the LSP.
LSP is not a consequence. It’s an axiom. It defines what subtyping means.
Though axiom is probably flattering because despite its formal look it has a “should” in it for a reason.
Without the “should” you can always construct a property so strict that only T fulfills it (and every S must be T)
making it quite useless.
LSP is not a consequence. It’s an axiom. It defines what subtyping means.
The definition of subtyping is the rule of subsumption: “if S <: T and t : S, then t : T”. The Liskov substitution principle is an immediate corollary.
Though axiom is probably flattering because despite its formal look it has a “should” in it for a reason.
No, there’s no reason for it to have a “should”. The Liskov substitution principle always holds in a type-safe language with subtyping. What actually doesn’t hold is that you can make logically valid assumptions about things that can be arbitrarily redefined. Such as the behavior of non-final methods.
In a different comment you wrote:
After putting the pieces together, the only thing we can conclude is that we just broke the Liskov substitution principle. But, wait, didn’t I just say that the Liskov substitution principle always holds? Where did we go wrong?
What’s your conclusion? The way I understand it, we did break it. Exactly because methods can be overridden.
Maybe I’m missing the point. Allow me to reword my third paragraph:
There is the concept of subtyping, and what individual languages use as “subtyping”. A language designer can choose to include a subtype relation in their language that fulfills the requirements of LSP. With dynamic type dispatch (which overrideable methods offer) you can leak dynamic type info to the program. Every property you construct around that information will naturally differ between S and T and violate LSP.
You also said:
The definition of subtyping is the rule of subsumption: “if S <: T and t : S, then t : T”. The Liskov substitution principle is an immediate corollary.
Subsumption rule states that the property “is of type T” propagates from T to its subtypes.
LSP states that every decidable property propagates from T to its subtypes.
So I’d say it’s the other way round. LSP implies the subsumption rule because “is of type T” is just another property about which LSP makes a statement. In fact, I have a proof for it: http://lpaste.net/246224
What’s your conclusion? The way I understand it, we did break it. Exactly because methods can be overridden.
Nope, we didn’t break it. What’s actually broken is my assumptions: (0) and (2) are wrong. Not every Animal produces generic sounds, precisely because makeNoise isn’t a final method. In fact, without some analog of Haskell’s type class laws for [EDIT: oops!] overriding methods, you can’t guarantee anything about the behavior of non-final methods.
Subsumption rule states that the property “is of type T” propagates from T to its subtypes.
Subsumption says that, if S is a subtype of T, then every term t that has type S also has type T. The Liskov substitution principle is just a very trivial corollary of it.
LSP states that every decidable property propagates from T to its subtypes.
Every property, decidable or not, propagates from T to its subtypes.
It needs big lookup tables. See this thread: https://twitter.com/rygorous/status/716891236850749440
There’s a lot of complaining in that post, but ultimately, the compilers aren’t doing anything “wrong” or even technically unexpected. “Undefined behavior” is just that, undefined.
Early versions of GCC took that in classic hacker humor fashion, even. The C89 standard defines the “#pragma” directive as having “undefined behavior”. Early GCC versions would, upon encountering an unknown “#pragma”, launch NetHack or Rogue.
I’m sure we all agree that this compiler behaviour conforms to the letter of the C standard. That doesn’t make it productive or even, I would say, “right”.
In fact, they are doing something wrong. As Jones noted - the standard does not prohibit error flagging or expected machine dependent behavior on UB. The compiler developers made a design choice to assume 0=1 on UB when they could perform “optimizations”. But that’s a stupid design decision. I think the attitude that someone like DJB should just suck it up and waste his time trying to decipher the obscure “nuances” of the standard is ridiculous.
You’re not prohibited from compiling with -fwrapv either. Telling people about this option might actually help them.
Yeah, it’s the sort of thing where the compiler developers have put in the substantial amount of time necessary to really grok all the details of the C standard, and have little care for the idea that other people haven’t and won’t put in that same amount of time. More importantly, that the compiler developers should have some degree of sympathy for those other people and try to avoid making these sorts of assumptions and optimizations that will surprise anyone without the same advanced level of expertise.
To make a small analogy to role playing games, compiler developers are the rules lawyer min-maxers, murderhoboing all over your code. And you’re looking for a good story and a fun experience, and the whole thing is off putting when they make these sorts of aggressive optimizations, and their only defense for being like that is that you should read the standard. The only option becomes to be just like the murderhobo, and that’s not fun for a lot of people.
It’s really worth trying to read the C11 standard explanation of, for example, what type casts are permitted. Look at page 77 of (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) for example.
I don’t think the compiler developers ever made an actual choice to “assume 0=1” or anything similarly pugnacious. Rather they make the choice to assume that constraints are not violated by the program and make use of this (together with the “undefined behaviour” allowance) to make optimisations that wouldn’t otherwise be possible. The problem is people expecting a certain behaviour from, for instance, signed integer overflow which they may have seen previously - because the compilers didn’t have such sophisticated (or “agressive” if you insist) optimisation as they do today, but never because that behaviour was actually mandated.
Actually, if you don’t rely on signed overflow and you don’t do type punning, C is still a perfectly usable language for encryption or anything else (though there are often much better choices).
I understand what you are saying - and I feel like sometimes the compiler vendors do go too far - but I feel like complaints such as these are often either misdirected (is it the compiler or the language standard to blame?) or come from a some kind of self-righteous indignation (dammit! why won’t this compiler do what I want it to? Who do these compiler developers think they are, telling me the bug is in my code?!!).
Yes, there are obscure nuances which e.g. allow type punning in some cases. But if you avoid type punning altogether - which many languages force you to do - you don’t need to understand those nuances. And if you don’t think of C as a kind of high-level assembly language, then you shouldn’t expect signed integer overflow to work - and if you do think of it as a high-level assembly, then you’re wrong.
I don’t think many C programmers were even aware of the process of adding additional UB. And obviously the standard committee itself even didn’t understand the implications of what they were doing - or else the char* hack would not have been needed.
The problem is that they do not tell you. Note that using signed integers for for loop counters is about as idiomatic as you can go in C.
I don’t think there is such a process. What is UB now was always UB. The difference is that in the past, the behaviour tended to more closely mirror behaviour expected due to an understanding of the underlying hardware and an incorrect assumption about how the compiler operated.
Sure, but there’s nothing wrong with using signed integers for loop counters. The problem only comes when you write code that expects that if it keeps incrementing such a counter it will eventually jump to a negative value. That’s not idiomatic.
Also, fun fact, loop counters being idiomatically plain int is actually what motivates the compiler to rely on signed overflow not occurring. Otherwise it generates uglier code. For an example: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7
My experiments with gcc and clang show that using a unsigned int counter in a loop improves performance a bit.
Can you show these experiments?
they are totally simple
run a bunch of times and count
https://godbolt.org/g/fFpBuk
On gcc 7.1 with -O2 these two only differ by one opcode. Only on -O3 they start differing significantly.
And it isn’t even in the loop.
Yet these differences are rather unsubstantial. Differently named labels, swapped operands for cmp, and correspondingly swapped jump opcodes. The most substantial difference is that the code using ints does a few sign extensions. Worth noting is that all those differences take place at the “tail” of the loop, where the few remaining floats that weren’t done by the xmm-wide copy are handled. The hot part of the loop is identical, just as in the -O2 version.
Not correct. The standards have added things to the UB list from “indeterminate” etc.
If you’re talking about the proposals of David Keaton, you need to be aware that:
In fact, there is a formal proposal here (which comes out of the Cerberus survey) which specifically suggests clarifying that the read of an uninitialized variable does not give UB:
http://www.cl.cam.ac.uk/~pes20/cerberus/n2089.html
If you’re talking about something else, other than minor clarifications of behaviour which was obviously dubious in the first place, I’m curious to hear what it is.
Note the original C ANSI standard included the following
This would forbid what we are now told is intrinsic to C compilation. Just as a matter of common sense, “optimizing” “a = b+c; Assert(a> b && a>c)” to remove the assert because it is convenient to assume undefined behavior never happens is absurd. Or consider:
I believe that is now UB. Or consider
Compare that to the total hash in C11.
As for the proposal you link to, it’s indicative of a disregard of C practice. For example, the proposal is that if you compute a checksum on a partly initialized data structure, that checksum could be made unstable and unreliable. I don’t get the advantage of ruling the BSD and Linux network stacks to be subject to this sort of nonsense.
I disagree. The reason signed overflow is specified as undefined (and not say some specific set of possible behaviours including eg wraparound, termination and termination) is precisely to allow such an optimisation. For idiomatic use of loop counters, for instance, this can allow for generation of significantly better code. Even allowing those specific behaviours would at least make the behaviour platform dependent.
In any case the assert could only be optimised away if the compiler was certain that b > 0 and c > 0. For the programmer to have written the above and not merely be trying to assert that (b > 0 && c > 0) implies that they were familiar with bitwise representation but ignorant of C’s operator semantics.
However:
I agree that this needs a solution, and I agree that the best solution is to mandate that reading an uninitialised variable (including a member of a partly initialised structure) and then reading it again should yield the same value both times. I don’t see much value in that proposal; I linked to it merely to point out that it’s not the case that reading an unitialised value is necessarily going to become undefined behaviour.
I don’t read it that way. e.g. I think it’s fair to characterize the compiler’s behaviour in your example as ignoring the situtaion where b + c overflows completely, which is something the standard explicitly permits.
It’s not ignoring it - it is making the assumption that it cannot happen. Ignoring it would involve compiling the code as is, not rewriting the loop. Consider the case DJB notes - where the conforming null check was removed because the compiler assumed if the pointer was null the prior reference would be UB!
Removing a redundant assert is entirely normal and reasonable behaviour for an optimizing compiler, and that particular assert is flagrantly redundant in all situations except the one in which an explicitly permissible behaviour is “ ignoring the situation completely with unpredictable results”.
It’s redundant in all false cases, yes.