TL;DW: A pen-and-paper secret sharing scheme for bitcoin keys with bech32 checksums using circular paper LUTs
It may be interesting for some people to know that the Atari 2600 has an active demoscene. Many cartridges also features circuitry to expand the capabilities of the hardware, so-called mappers. These were also popular on the NES and some other consoles of the era. The capabilities of these mappers are somewhat hampered by the lack of a write line on the cartridge port, extended RAM has to have separate addresses for reading and writing..
The Pitfall II cartridge had an additional sound chip to enable the continuous multichannel music, and a larger ROM enabled by bank-switching to hold all the data for the game.
A peeve of mine are those so-called GDPR banners…. but the actual regulation says they cannot interfere with the use of the site to be compliant. If only that was actually enforced!
This is my main frustration. All of these EU regulations were implemented with the right intent, but since the ad-industry failed to lobby, they are now fighting these regulations through malicious compliance.
The fact that many people refer to these banner as “EU cookies banners” shows how the EU failed the battle of the public opinion, and the anti-privacy-industry won. The real issue is that websites set cookies just for me to read an article which could be served statically. They could serve ads related to the article content, the same way duckduckgo serve ads related to the search query. (And surprise, surprise, Duckduckgo has no cookie/GDPR banner) But somehow the ad-industry managed to shift the blame from their bad-practices to the EU in the public’s mind.
Same goes with GDPR approval modals, many of them are non compliant. (Der Spiegel’s modal is literally “either you pay to subscribe, or accept our trackers. Otherwise you cannot access the article.”) But nothing is enforced…
The fact that many people refer to these banner as “EU cookies banners” shows how the EU failed the battle of the public opinion, and the anti-privacy-industry won.
Part of the reason for this is the neoliberal want that users should ~make a choice~ in the matter rather than banning the practice outright.
All of these EU regulations were implemented with the right intent, but since the ad-industry failed to lobby, they are now fighting these regulations through malicious compliance.
I think it’s moved beyond the ad industry, too. At my previous job, I tried and failed to convince our UI designers that our site didn’t need a cookie popup because the cookies we were setting (login session ID kinds of things) were all fine under EU regulations. Their rebuttal wasn’t to cite the regulations back at me and show me I was wrong, but rather to say, “Better safe than sorry.”
“Better safe than sorry.”
This is the insidious problem with most regulation. Good or neutral actors over-comply in harmful ways, and bad actors continue to fail to comply. Usually the problem with a bad actor isn’t that there is no regulation they are breaking, but that no one is making them comply (or that they have so much power no one can reasonably make them comply). More regs is not (usually) a solution to bad actors – better strategies for getting compliance is.
[ I am not a lawyer, I am definitely not a lawyer who specialises in privacy regulations, this is not legal advice ]
You could try pointing them at the GitHub blog post that describes how they reached GDPR compliance without a banner. You could also try asking them to look at bounce rates: how many people just leave the site entirely rather than clicking through the banner (I do this most of the time I see one, not sure how representative I am).
Perhaps more effectively, you could talk to them about the flow to withdraw consent. The GDPR requires that this be visible. If visitors are granting consent then you must also provide an option for them to withdraw consent. This means that you also need a workflow for what it means to withdraw consent and what your process is for deleting the PII that you’ve collected on that visitor. Without this, you are definitely not safe: by requesting consent, you are publicly asserting that you are collecting PII and without a process for tracking this PII and withdrawing consent then you are open to liability because you’d have to prove to the regulator that you actually weren’t collecting PII in the first place (and are therefore misrepresenting the operation of your site to visitors, which may open you up to a different kind of liability). The mantra ‘better safe than sorry’ definitely applies but having a misleading tracking banner does not make you safer.
Do you have a reference for where in the GDPR it says cookie banners can’t interfere with the site? Or an authoritative interpretation that says so? @acatton seems like you might know too.
This page is pretty good: https://gdpr.eu/gdpr-consent-requirements/
A few specific quotes:
“Freely given” consent essentially means you have not cornered the data subject into agreeing to you using their data. For one thing, that means you cannot require consent to data processing as a condition of using the service. They need to be able to say no.
Of course, the site owners might disagree that a big obtrusive popup that says in huge terms “YES” and then in tiny little grey text on the side “customize settings” that makes you jump through a few hoops to turn it off (and it is liable to randomly ask you again until you say yes) is not cornering the user and making clicking it a condition of using the service. Technically, you can say no and still get to the site, you just have to do all this first. But if their mantra is “better safe than sorry”, why risk having to litigate it?
Consent must be specific Consent must be informed If the request for consent is vague, sweeping or difficult to understand, then it will be invalid.
How many times have you seen something like “this site uses cookies to improve your experience”? That’s completely meaningless so it violates these too.
Neat. I did some sketches a while back to see if this could be ported to the Atari 2600. I suspect it can, but you need a huge RAM expansion in the cartridge
Is paging not a plausible thing for 2600 cartridges? I assume that they didn’t break a write pin out to the cartridge slot, but maybe you can do a door-knock or something.
The Atari 2600 cartridge slot has no R/W pin yes. Bankswitching is done by having the cartridge respond to reads to certain addresses. For example the F8 bankswitching scheme involves reading 1FF8/1FF9 to switch between bank 0/1. F6 gives 16k via 1FF6..1FF9 and F4 32k via 1FF4..1FFB.
Writes to cartridge RAM works by having separate addresses for reads and writes. This means operations like INC $1000 do not work. Instead you must LDX $1100, INX, STX $1000.
Gah, I meant “no no, bank switching was done AFAIK” not “no, no bank switching was done AFAIK”.
Stupid comma.
So, a modern analog might be something like google’s protocol buffers? Both start from a type definition which gets “compiled” into auto-generated source code in your favorite language. Then there are other binary encoding formats like CBOR which start from your own source code and move towards a compatible serializer / deserializer by specifying numerical field IDs for each serialized property on a type / struct.
Never surprises me when I find out something “new” is just a rewrite of something quite old :)
Protocol buffers serve a similar function, but for cryptographic applications it’s important to keep in mind that they don’t provide a unique encoding. In particular, a lot of things under the hood in protos are variable-length integers, and these integers are not required to be represented minimally. ASN.1 DER, in contrast, guarantees that any permissible value will have a unique encoding.
That is the thing - CBOR and ProtoBuffers are encodings that also offer some kind of IDL, while ASN.1 (that is why it is called Abstract Syntax One) is just IDL with multiple encodings. Technically it should be possible to use ASN.1 for defining ProtoBuffs, CBOR, or any other encoding you like. It already supports JSON (JER) and XML (XER).
The reason why we now have a lot of different concepts like ASM.1 that use their own IDL is that ASN.1 grown in a lot of cruft and the spec is abysmal. It would be great to have something like “ASN.2” that would remove all the cruft and would start from mostly “clear slate”, but I do not think that it will become a thing any time near. Too many people focused on looking what Google will publish next.
CBOR is just “binary JSON”, it is a self-describing encoding. There is no IDL in CBOR. There is an external CDDL spec (or good old JSON Schema) but.. meh.
Protobuf and the performance focused alternatives are actually IDL-first. They just happen to have one standard encoding because there are benefits to developing IDL+encoding together as a cohesive system. You could use alternative encodings, but the only reason anyone actually does is to have a human-readable JSON representation for debugging.
a modern analog might be something like google’s protocol buffers?
No. protobuf is just another encoding. You can almost certainly define a protobuf-like encoding rule for ASN.1. This is the problem with all of these flavor-of-the-day formats. All they do is fracture this space.
No! Stop! Where do all of you get that idea?!
Protobuf is IDL first, encoding second. The main part of protobuf is the proto(2/3) language for describing data structures with deep support for format evolution (backward/forward compatibility).
Also, “flavor of the day”? Really? Protobuf has been public / open source for 13 years now.
As far as I can tell protobuf provides nothing that ASN.1 does not already provide. In my experience it’s worse. Try doing polymorphism in protobuf to see what I mean.
You could define a new wire format for it. But why do that when ASN.1 already has the infrastructure for all this?
It’s a good question. I’m not sure that npm is all that different from most other dependency managers. My feeling is that it’s more cultural than anything – why do JS developers like to create such small packages, and why do they use so many of them? The install script problem is exacerbated because of this, but really the same issue applies to RubyGems, PyPI, etc.
There are some interesting statistics in Veracode’s State of Software Security - Open Source Edition report (PDF link). Especially the chart on page 15!
Deno’s use of permissions looks very interesting too, but I haven’t tried it myself.
I’m not sure that npm is all that different from most other dependency managers. My feeling is that it’s more cultural than anything – why do JS developers like to create such small packages, and why do they use so many of them?
I thought this was fairly-well understood, certainly it’s been discussed plenty: JS has no standard library, and so it has been filled-in over many years by various people. Some of these libraries are really quite tiny, because someone was scratching their own itch and published the thing to npm to help others. Sometimes there are multiple packages doing essentially the same thing, because people had different opinions about how to do it, and no canonical std lib to refer to. Sometimes it’s just the original maintainers gave up, or evolved their package in a way that people didn’t like, and other packages moved in to fill the void.
I’m also pretty sure most people developing applications rather than libraries aren’t directly using massive numbers of dependencies, and the ones they pulling in aren’t “small”. Looking around at some projects I’m involved with, the common themes are libraries like react
, lodash
, typescript
, tailwind
, material-ui
, ORMs, testing libraries like Cypress, or enzyme
, client libraries eg for Elasticsearch or AWS, etc… The same stuff you find in any language.
It’s more than just library maintainers wanting to “scratch their own itch.” Users must download the js code over the wire everytime they navigate to a website. Small bundle sizes is a unique problem that only JS and embedded systems need to worry about. Large utility libraries like lodash are not preferred without treeshaking — which is easy to mess up and non-trivial.
People writing python code don’t have to worry about numpy
being 30MB, they just install it an move on with their lives. Can you imagine if a website required 30MB for a single library? There would be riots.
I wrote more about it in blog article:
https://erock.io/2021/03/27/my-love-letter-to-front-end-web-development.html
Sure, but that’s just the way it is? There is no standard library available in the browser, so you have to download all the stuff. It’s not the fault of JS devs, and it’s not a cultural thing. At first people tried to solve it with common CDNs and caching. Now people use tree-shaking, minification, compression etc, and many try pretty hard to reduce their bundle size.
I was thinking about Deno as well. The permission model is great. I’m less sure about URL-based dependencies. They’ve been intentionally avoiding package management altogether.
It’s at least interesting to consider that with deno, a package might opt to require limited access - and the installer/user might opt to invoke (a hypothetical js/deno powered dependency resolver/build system) with limited permissions. It won’t fix everything, but might at least make it easier for a package to avoid permissions it does not need?
You mean policy restrictions? Because that only applies if you don’t add any repos or install random downloaded Debs, both of which many routinely do
Boring things are good because we can rely on them. It enables automation proper. It’s the old AM/FM problem once again: Actual Machines vs Fucking Magic.
What this shows is that even an unplugged Ethernet cable can radiate energy which is detectable.
None of this is news to anyone with even basic knowledge of RF. Everything leaks EM radiation. Put a shortwave radio next to a laptop and you will hear keyboard presses on it up to quite a few meters away. Twisted pair will always be more leaky than coax, else it would be used in place of coax in labs. Shielding makes TP less shitty, but clearly there are limits.
The fact that it leaks EM radiation is not news, the fact that software can control the EM leakage to a sufficient degree to be able to establish a covert channel is news.
Except that we don’t have optical, we have opto-electronic and the optical transceivers generate a lot of EM noise and so are still likely to be susceptible to this kind of attack.
Great article. Similar things pop up whenever youngsters think they can replace old proven tech with $FLAVOR_OF_MONTH. NoSQL is to SQL as..
You mean NoSQL is significantly better for its intended use? Or are you just picking really bad examples.
JSON is to XML
JSON is an easy-to-parse serialisation format with a well-defined object model. It has a few weaknesses (no way to serialise 64-bit integers is the big one). Most of the ‘parsing minefield’ problems are related to handling invalid JSON which is far more of a problem with XML because there are so many ways to get things wrong in the standard.
In a language that has unicode string support, you can write a JSON parser in about 200 LoC (I have). The XML standard is very complicated and so your choices are either something that supports a subset of XML or using libxml2 because no one else can manage to write a compliant parser (and libxml2 doesn’t have a great security record). Even just correctly handling XML entities is a really hard problem and so a lot of things punt and use a subset that doesn’t permit them. Only now they’re not using XML, they’re using a new poorly specified language that happens to look like XML.
XML does a lot more than JSON. It can be both a markup language and an object serialisation language and it allows you to build arbitrary shapes on top, but that’s also its failing. It tries to solve so many problems that it ends up being a terrible solution for any of them.
Matrix is to XMPP
I was involved in the XMPP process around 2002ish and for a few years. It was a complete mess. The core protocol was more or less fine (though it had some issues, including some core design choices that made it difficult to use a lot of existing XML interfaces to parse) but everything else, including account registration, avatars, encryption, audio / video calling, and file transfer were defined by multiple non-standards-track proposals, each implemented by one or two clients, many depending on features that weren’t implemented on all servers. There wasn’t a reference server implementation (well, there was. It was jabberd. No, the jabberd2 rewrite. No, ejabberd… The JSF changed their mind repeatedly) and no reference client library, so everything was interoperable at the core protocol level and nothing was interoperable at the level users cared about.
In contrast, Matrix has a much more fully specified set of core functionality and a permissively licensed reference implementation of the client interfaces.
DAB is to FM
DAB uses less bandwidth and requires less transmitter power for the same audio quality than FM. DAB+ (which is now over 15 years old) moved to AAC audio. Most of the early deployment problems were caused by either turning the compression up far too high or by turning the power down to a fraction of what the FM transmitter was using. For the same power budget and aiming for the same audio quality, you can have more stations and greater range with DAB than FM.
Hyperloop is to rail
Okay, you can have that one.
JSON is an easy-to-parse serialisation format with a well-defined object model. It has a few weaknesses (no way to serialise 64-bit integers is the big one). Most of the ‘parsing minefield’ problems are related to handling invalid JSON which is far more of a problem with XML because there are so many ways to get things wrong in the standard.
Funny, because json.org tells me 64-bit integers are perfectly valid. In fact any size integer is valid.
In a language that has unicode string support, you can write a JSON parser in about 200 LoC (I have).
Don’t write your own parsers. You will get it wrong and make things an even worse mess.
In contrast, Matrix has a much more fully specified set of core functionality
I see there is an actual Matrix spec now. Not bad. No RFC though. But you are right that the core spec of XMPP is very barebones. You need to add lots of XEPs on top to make it useful. Modern servers and clients do this. What the Matrix people have done is take developer effort away from XMPP, fracturing the federated chat ecosystem. Yes I’m upset about this.
One problem with Matrix is the godawful URI syntax. Instead of being able to say user@example.com like every other protocol, the Matrix devs in their junior wisdom decided instead to go with @user:example.com instead. How do I link to my Matrix account from my website? If things were sensible it would just be matrix:user@example.com. Perhaps matrix:@user:example.com? Or should my OS just know that the protocol “@user” means Matrix? Who knows.
All this is without getting into perhaps Matrix’ biggest problem: resource use.
DAB uses less bandwidth and requires less transmitter power for the same audio quality than FM
Yes, this is all true. But you’re also throwing out one of the main points of broadcast radio: to be able to reach the masses, especially in times of crisis. There are ways of retrofitting FM with digital subcarriers such that existing receivers don’t become paperweights. Because it is FM, you can use GMSK which has quite nice Eb/N0 behavior. Not as nice as OFDM used by DAB but eh.. Good enough.
edit: I realized I’m wrong about the modulation. It’s always going to be X-over-FM where X is any modulation. It must always run above the stereo pilot wave. Said pilot wave may be omitted, giving mono FM and more bandwidth for subcarriers.
Anyway, there’s been debate around this in Sweden and the only people who want DAB are the people selling DAB receivers. The broadcasting people don’t want it, the people running the transmitters don’t want it and there is zero pressure from the public.
True, the radio itself is dying slowly, investing to for more channels doesn’t really make sense for the consumer, or producer.
XML is certainly proven harmful by its list of exploits for a serialization format, of all things. JSON doesn’t have that issue.
You mean things like the billion laughs attack? That’s not enabled by default in any modern XML parser. JSON has its own set of parsing nightmares, and lacks a standardized way of writing schemas or handling extensions. On top of that you have things like SOAP, XSLT, XPATH and so on, all standardized.
Do people write new SOAP APIs anymore? Not sure who is also using XPath or XSLT.
IMHO, XML is a good document format, but has a lot of ambiguity for serialization (i.e. attributes or elements?).
Do people write new SOAP APIs anymore?
The EU does, as does many parts of the Swedish government.
IMHO, XML is a good document format, but has a lot of ambiguity for serialization (i.e. attributes or elements?).
This is a bit of a strange one with XML I agree. Attributes have two useful properties however: there cannot be more than one of each and they don’t nest. This could be enforced on elements with a schema, but that came later..
This article posted on this very site a day or two ago: Parsing JSON is a Minefield
If parsing JSON is a minefield, parsing XML is a smoking crater.
Look XML is fine as a document description language, but it’s crazy to pretend like it is somehow a superior ancestor to JSON. JSON and XML just do different things. JSON is a minimal, all purpose serialization format. XML is a document description language. You can of course cram anything into anything else, but those are different jobs and are best treated separately.
And now we have things like JWT, where instead of DoS via (effectively this is what entity-expansion is) zip bombing, we can just jump straight to “you don’t need to check my credentials, I’m allowed to do admin things” attacks.
Like it or not, JSON the format is being transformed into JSON the protocol stack, with all the trouble that implies. Just as XML the format was turned into XML the protocol stack in the last age.
JWT is just poorly designed, over and above its serialization format. But as bad as it is, it is significantly more sane than whatever the SAML people were thinking. To be fair though, both JSON and XML are better than ASN.1. In all cases, the secure protocol implementers chose an off the shelf serialization format which was a significant mistake for something that needs totally different security properties than ordinary serialization. One would hope that the next scheme to come along won’t do this, but I’m guessing it will just be signed protobuffs or some such, and the same problems will occur.
billion laughs
I already addressed this.
XML is mature and does everything JSON does and more. Its maturity is evident in the way JSON people try to reinvent everything XML can already do. From a langsec perspective the only thing JSON has going for it is that it is context-free. There are XML dialects that have this property as well, if I remember correctly.
Tooling is good actually. And as I said to the other person, JSON people are busy reinventing most tools that already exist for XML.
JSON people are busy reinventing most tools that already exist for XML
Are they? Things I never use: JSON Schema (just adds noise to an internal project; can’t force it on an external one); JPath (your data should not be nested enough to need this); code generators beyond https://mholt.github.io/json-to-go/ (if your code can be autogenerated, it is a pointless middle layer and should be dropped); anywhere you’d use SAX with XML, you can probably use ND-JSON instead; XSLT is a weird functional templating language (don’t need another templating language, thanks)… Is there something I’m missing? I mean, the internet is big, and people reinvent everything, but I can’t say that there are XML tools that I’m jealous of.
Maybe we’re in different domains though. I just can’t really imagine having a job where I’m confused about whether to use XML or JSON. The closest is today I saw https://github.com/portabletext/portabletext which is a flavor of JSON for rich text. But I think that project is mistaken and it should just define a sane subset of HTML it supports instead of creating a weird mapping from HTML to JSON.
Things I never use
Yes,you never use them. But there are people who try to write protocols using JSON and they just end up reinventing XML, poorly. This means yet another dependency for everyone to pull in. Someone using JSON in their proprietary web app matters little. Someone baking it into an RFC matters a lot.
This one scratches only the surface of many many issues relating to digital video and audio. The <video> tag is a mess partly because of timestamp issues like this one. It makes me think the W3C doesn’t have many broadcast people on board.
The W3C did make an effort. The video and audio tags were not standardized until HTML5. At that time, W3C wanted to make choices in harmony with five major browser vendors: Apple, Google, Microsoft, Mozilla, and Opera. I recall that three of five vendors were required to agree in order to ratify anything, and they couldn’t agree on which formats to allow in multimedia tags; Microsoft refused to commit to anything, Apple and Google only signed off on formats in their respective patent pools, and Mozilla and Opera only signed off on open formats. They couldn’t even agree on PNG, if I recall correctly! (WP memorializes the discussions for video and audio tags respectively.)
Hah, yes, I remember these discussions on some mailing lists I’m on. At one point someone suggested H.261 as a common format since any patents on it are long expired. But the codec issue is not so much an issue now. What is an issue is timestamps. For some reason it was decided that timestamps should be floats. But with multimedia timestamps are always fractional numbers. Because of this there’s no reliable way too seek to a specific frame that works in every browser.
Another issue is if the video or audio doesn’t start at t=0 exactly. Audio starting before t=0 is particularly problematic, and is also very common for files in the wild since that is how ffmpeg deals with MDCT-ish audio codecs.
My money is on SWAGGINZZZ :)
Ah.
My pet hobby horse.
Let me ride it.
It’s a source of great frustration to me that formal methods academia, compiler writers and programmers are missing the great opportunity of our life time.
Design by Contract.
(Small Digression: The industry is hopelessly confused by what is meant by an assert. And subtle disagreements about what is meant or implied by different programmers is an unending source of programmers talking passed each other).
You’re welcome to your own opinion, but for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment.
By an assert in the following I mean, it’s a programmer written boolean expression, that if it ever evaluates to false, the programmer knows that the preceding code has an unknown bug that can only be fixed or handled by a new version of the code.
If it evaluates to true, the programmer full expects the subsequent code to work and that code will fully rely on the assert expression being true.
In fact, if the assert expression is false, the programmer is certain that the subsequent code will fail to work, so much so, there is no point in executing it.
So going back to DbC and formal methods.
Seriously. Writing postconditions is harder than just writing the code. Formal methods are way harder than just programming.
But we can get 90% of the benefit by specializing the postconditions to a few interesting cases…. aka. Unit testing.
So where can Formal Methods really help?
Assuming we’re choosing languages that aren’t packed with horrid corner cases… (eg. Signed integer overflow in C)…
Given a Design by Contract style of programming, where every function has a bunch of precondition asserts and a bunch of specializations of the postconditions……
My dream is a future where formal methods academics team up with compiler writers and give us…
So far, you might say, why involved the compiler? Why not a standalone linter?
Answer is simple… allow the optimizer to rely on these expressions being true, and make any downstream optimizations and simplifications based on the validity of these expressions.
A lot of optimizations are base on dataflow analysis, if the analysis can be informed by asserts, and the analysis can check the asserts, and be made more powerful and insightful by relying on these asserts… then we will get a massive step forward in performance.
My experience of using a standalone linter like splint… is it forces you to write in a language that is almost, but not quite like C. I’d much rather whatever is parsed as valid (although perhaps buggy) program in the language by the compiler, is parsed and accepted as a valid program by the linter (although hopefully it will warn if it is buggy), and vice versa.
I can hear certain well known lobste.rs starting to the scream about C optimizers relying on no signed integer overflow since that would be, according the standard, undefined and resulting in generated assembler that results in surprised pikachu faced programmers.
I’m not talking about C. C has too much confused history.
I’m talking about a new language that out of the gate takes asserts to have the meaning I describe and explains carefully to all users that asserts have power, lots and lots of power, to both fail your compile AND optimize your program.
As someone who has been using Frama-C quite a lot lately, I can’t but agree with this. There’s potential for a “faster than C” language that is also safer than C because you have to be explicit with things like overflow and proving that some code can’t crash. Never assume. Instead, prove.
for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment
Didactic/pedagogical critique: in such a case, it may be more appropriate to introduce a new term rather than using one which has a common lay meaning.
My dream is a future where formal methods academics team up with compiler writers and give us […]
Sounds a lot like symbolic execution.
using one which has a common lay meaning.
Does assertions have a different meaning than the one given here?
I have colleagues for whom it means, “Gee, I didn’t think that input from outside the system was possible, so I want to know about it if I see it in unit test, and log it in production, but I must still handle it as a possibility”.
When I personally put in an assert at such a point, I mean, “a higher layer has validated the inputs already, and such a value is by design not possible, and this assert documents AND checks that is true, so in my subsequent code I clearly don’t and won’t handle that case”.
I have also seen debates online where people clearly use it to check for stuff during debugging, and then assume it is compiled out in production and hence has no further influence or value in production.
GTest (Google Test), which I’ve recently had to use for C++ school assignments, refers to this in their macro names as EXPECT
. Conditions whose failure is fatal are labelled with ASSERT
. This makes intuitive sense to me: if you expect something to be true you accept its potential falsehood, whereas when you assert something to be true you reject its potential falsehood.
A common design choice is that assertions are evaluated in test environments, but not production. In that case, plus a test environment that you’re not confident fully covers production use cases, you might use assertions for hypotheses about the system that you’re not confident you can turn into an error yet.
I’m not sure that’s a good idea, but it’s basically how we’ve used assertions at my current company.
Alas, my aim there is to point out people think there is a common lay meaning…. but I have been involved in enough long raging arguments online and in person to realize… everybody means whatever they damn want to mean when they write assert. And most get confused and angry when you corner them and ask them exactly what they meant.
However, the DbC meaning is pretty clear and for decades explicitly uses the term “assert”… except a lot of people get stuck on their own meaning of “assert” and conclude DbC is useless.
Sounds a lot like symbolic execution.
Ahh, there is that black or white thinking again that drives me nuts.
Symbolic execution and program proving is a false aim. The halting problem and horrible things like busy beavers and horrible fuzzy requirements at the UX end of things make it certain that automated end to end program proving simply will never happen.
That said, it can be incredibly useful. It’s limited for sure, but within it’s limits it can be extraordinarily valuable.
Odds on symbolic execution is going to fail on a production scale system. Not a chance.
However, it will be able to reason from assert A to assert B that given A, B will fail in these odd ball corner cases… ie. You have a bug. Hey, thats’ your grandfathers lint on steroids!
You might find the Lean theorem proving language meets some of your requirements. As an example:
structure Substring :=
( original : string )
( offset length : ℕ )
( invariant : offset + length ≤ original.length )
In order to construct an instance of this Substring
type, my code has to provide proof of that invariant
proposition. Any function that consumes this type can rely on that invariant to be constrained by the compiler, and can also make use of that invariant to prove proofs about the function’s postcondition.
I legit love the first image. Used to be it was guys with big beards and glasses who did insane stuff like run Linux on a GPU for fun, now it’s catboys in VR. The future is a magical place.
In particular it feels like the trans and furry communities are overrepresented in the “doing weird shit on computers” space. Or maybe this could be a sampling bias since the majority of the people I know are one or both of those.
It’s not just your experience; as a non-queer non-furry, I know quite a few queer furries doing weird shit on computers; quite a few I’d consider friends. I suspect I know some reasons why (and there’s been posts here musing why too; i.e. soatok’s).
It would be cool if you could create a subthread that muted … anyway.
Is it because it offends folks that need to be offended and it opens the Overton Window a little wider, as it should be?
Found the post, https://lobste.rs/s/apl5u6/why_furries_make_excellent_hackers
I think it’s an accidental overlap with VR, because VR let’s you identify as a catgirl.
It’s an awesome accidental overlap though, way more awesome than, ‘half of all programmers are converted accountants’. The future is going to be richer for it; don’t tell the accountants.
In particular it feels like the trans and furry communities are overrepresented in the “doing weird shit on computers” space
To the point where “programmer socks” is a well-established meme. Programmers have always been weird, so that’s keeping the tradition alive
That’s a bit different - You choose your socks. You don’t really choose to be trans in the same way. Not quite sure where the furry point is on that line.
The “programmer socks” from the meme are typically very ‘feminine’ colored thigh-high socks, not the normal ankle-high kind the typical programmer wears.
How does git9 support staging only part of the changes to a file? From what I can tell it does not.
I would describe any version control system which doesn’t allow me to commit only some of the hunks in a file or edit the patch myself as “boneheaded.”
Can I quote you on the boneheaded bit? It seems like a great endorsement.
Anyways – this doesn’t fit my workflow. I build and test my code before I commit, and incrementally building commits from hunks that have never been compiled in that configuration is error prone. It’s bad enough committing whole files separately – I’m constantly forgetting files and making broken commits as a result. I’ve been using git since 2006, and every time I’ve tried doing partial stages, I’ve found it more pain than it was worth.
So, for me (and many others using this tool) this simply isn’t a feature that’s been missed.
That said, it’s possible to build up a patch set incrementally, and commit it: there are tools like divergefs
that provide a copy on write view of the files, so you can start from the last commit in .git/fs/HEAD/tree
, and pull in the hunks from your working tree that you want to commit using idiff
. That shadowed view will even give you something that you can test before committing.
If someone wanted to provide a patch for automating this, I’d consider it.
Thanks for this response - its a very clear argument for a kind of workflow where staging partial changes to a file doesn’t make sense.
I work primarily as a data scientist using languages like R and Python which don’t have a compilation step and in which it is often the case that many features are developed concurrently and more or less independently (consider that my projects usually have a “utils” file which accumulates mostly independent trivia). In this workflow, I like to make git commits which touch on a single feature at a time and its relatively easy in most cases to select out hunks from individual files which tell that story.
As somebody who near-exclusively uses hggit, and hence no index, I can answer this from experience. If you want to commit only some of your changes, that’s what you do. No need to go through an index.
Commit only some of your changes?
hg commit --interactive
git commit --patch
Add more changes to the commit you’re preparing?
hg amend --interactive
git commit --amend --patch
Remove changes from the commit?
hg uncommit --interactive
git something-complicated --hopefully-this-flag-is-still-called-patch
The main advantage this brings: because the commit-you’re-working-on is a normal commit, all the normal verbs apply. No need for special index-flavoured verbs/flags like reset
or diff --staged
. One less concept.
If you want to be sure you won’t push it before you’re done, use hg commit --secret
on that / those commits; then hg phase --draft
when you’re ready.
You can do it like hg does with shelve - always commit what is on disk, but allow the user to shelve hunks. These can be restored after the commit is done. Sort of a reverse staging area.
I haven’t tried git9, but it should still be possible to support committing parts of files in a world without a staging area. As I imagine it, the --patch
option would just be on the commit command (instead of the add command).
Same with all other functionality of git add/rm/mv
– these commands wouldn’t exist. Just make them options of git commit
. It doesn’t matter if the user makes a commit for each invocation (or uses --amend
to avoid that): If you can squash, you don’t need a staging area for the purpose of accumulating changes.
Proof of concept: You can already commit parts of files without using the index, and without touching the workspace: Just commit everything first, then split it interactively using git-revise (yes, you can edit the inbetween patch too). I even do that quite often. Splitting a commit is something you have to do sometimes anyway, so you might as well learn that instead. When you can do this – edit the commit boundaries after the fact, you no longer need to get it perfect on the first try, which is all that the staging area can help you with.
Rather than a staging area, I wish I could mark commits as “unfinished” (meaning that I don’t want to push them anywhere), and that I could refer to these unfinished commits by a never-changing id that didn’t change while working on them.
This fits my mental model much better too. Any time I have files staged and am not in the “the process of committing” I probably messed someting up. The next step is always clear the index or add everything to the index and commit.
I feel the Plan 9 way would be to use a dedicated tool to help stash away parts of the working directory instead.
I would describe any version control system which doesn’t allow me to commit only some of the hunks in a file or edit the patch myself as “boneheaded.”
I would describe people wedded to the index in softer but similar terms.
Here’s the thing: if you’re committing only part of your working tree, then you are, by definition, committing code that you have never run or even attempted to compile. You cannot have tested it, you cannot have even built it, because you can’t do any of those things against the staged hunks. You’re making a bet that any errors you make are going to be caught either by CI or by a reviewer. If you’ve got a good process, you’ve got good odds, but only that: good odds. Many situations where things can build and a reviewer might approve won’t work (e.g., missing something that’s invoked via reflection, missing a tweak to a data file, etc.).
These aren’t hypotheticals; I’ve seen them. Many times. Even in shops with absolute top-tier best-practices.
Remove-to-commit models (e.g. hg shelve
, fossil stash
, etc.) at least permit you not to go there. I can use pre-commit or pre-push hooks to ensure that the code at the very least builds and passes tests. I’ve even used pre-push hooks in this context to verify your build was up-to-date (by checking whether a make
-like run would be a no-op or not), and rejected the push if not, telling the submitter they need to at least do a sanity check. And I have, again, seen this prevent actual issues in real-world usage.
Neither of these models is perfect, both have mitigations and workarounds, and I will absolutely agree that git add -p
is an incredibly seductive tool. But it’s an error-prone tool that by definition must lead to you submitting things you’ve never tested.
I don’t think my rejection of that model is boneheaded.
You cannot have tested it, you cannot have even built it, because you can’t do any of those things against the staged hunks.
Sure you can, I do this all the time.
When developing a feature, I’ll often implement the whole thing (or a big chunk of it) in one go, without really thinking about how to break that up into commits. Then when I have it implemented and working, I’ll go back and stage / commit individual bits of it.
You can stage some hunks, stash the unstaged changes, and then run your tests.
Here’s the thing: if you’re committing only part of your working tree, then you are, by definition, committing code that you have never run or even attempted to compile. You cannot have tested it, you cannot have even built it, because you can’t do any of those things against the staged hunks
While this is true, it isn’t quite as clear-cut as you make it seem. The most common case I have for this is fixing typos or other errors in comments or documentation that I fixed while adding comments / docs for the new feature. I don’t want to include those changes in an unrelated PR, so I pull them out into a separate commit and raise that as a separate (and trivial to review) PR. It doesn’t matter that I’ve never tried to build them because there are no changes in the code, so they won’t change the functionality at all.
Second, just because I haven’t compiled them when I commit doesn’t mean that I haven’t compiled them when I push. Again, my typical workflow here is to notice that there are some self-contained bits, commit them, stash everything else, test them, and then push them and raise a PR, before popping the stash and working on the next chunk. The thing that I push is tested locally, then tested by CI, and is then a small self-contained thing that is easy to review before merging.
But it’s an error-prone tool that by definition must lead to you submitting things you’ve never tested.
And yet, in my workflow, it doesn’t. It allows you to submit things that you’ve never tested, but so does any revision-control system that isn’t set up with pre-push hooks that check for tests (and if you’re relying on that, rather than pre-merge CI with a reasonable matrix of targets, as any kind of useful quality bar then you’re likely to end up with a load of code that ‘works on my machine’).
I mentioned there are “mitigations and workarounds,” some of which you’re highlighting, but you’re not actually disagreeing with my points. Git is the only SCM I’ve ever encountered where make
can work, git diff
can show nothing, git commit
won’t be a no-op, and the resulting commit can’t compile.
And the initial comment I’m responding to is that a position like mine is “boneheaded”. I’m just arguing it isn’t.
Here’s the thing: if you’re committing only part of your working tree, then you are, by definition, committing code that you have never run or even attempted to compile.
I mean, sure. And there are many places where this matters.
Things like cleanly separating a bunch of changes to my .vimrc
into logical commits, and similar sorts of activity, are… Not really among them.
That’s a good question. I imagine something like
@{cp file /tmp && bind /tmp/file file && ed file && git/commit file}
should work.
I was going to mention the BIT trick as a potential pitfall to this type of transpiling attempt, but I see this is addressed in the assembly tricks section. There’s more problems such as self-modifying code, which is a common speedcode technique. Likewise with compressed code.
Another problem is that recompiling things this way will completely destroy any timing-critical code. The NES is particularly temperamental, so you will likely get a rolling screen if you were to run it on real hardware or a good emulator.
On machines where the stack and hardware registers overlap, such as the Atari 2600, these kinds of analyses go out the window. You get things like pointing the stack at hardware registers and doing BRK to get maximum update rate on the screen, just to name one.
Yes, and I think that this is part of why metainterpretation with a JIT is the only practical way to maintain a stable and reasonably well-behaved accelerated emulator.
As someone who was involved in the HL1 and HL2 modding scene back in the day, this is really interesting. I hadn’t thought about how connected the various modding teams were, with resulting cross-pollination. Of my projects, the one that ended up taking off the most was Battleground 2 (ModDB), which racked up hundreds of thousands of downloads due to being one of the first HL2 mods released. In true modding fashion, development of that mod has recently been taken over by an entirely new team under the name Battleground 3.
It might interest people reading this to know that The Wastes has been ported to id Tech 3 (Q3) and re-released on Steam.
People like me have been saying this for quite some time. You could use traditional non-linear optimization techniques here to do even better than what the author’s simple random search does, for example gradient descent.
My old boss at uni used to point out that neural networks are just another form of interpolation, but far harder to reason about. People get wowed by metaphors like “neural networks” and “genetic algorithms” and waste lots of time on methods that are often outperformed by polynomial regression.
Most of ML techniques boil down to gradient descent at some point, even neural networks.
Youtuber 3blue1brown has an excellent video on that: https://www.youtube.com/watch?v=IHZwWFHWa-w .
Yep, any decent NN training algorithm will seek a minimum. And GAs are just very terrible optimization algorithms.
I’d say that only a few ML algorithms ultimately pan out as something like gradient descent. Scalable gradient descent is a new thing thanks to the advent of differentiable programming. Previously, you’d have to hand-write the gradients which often would involve investment into alternative methods of optimization. Cheap, fast, scalable gradients are often “good enough” to curtail some of the other effort.
An additional issue is that often times the gradients just aren’t available, even with autodiff. In this circumstance, you have to do something else more creative and end up with other kinds of iterative algorithms.
It’s all optimization somehow or another under the hood, but gradients are a real special case that just happens to have discovered a big boost in scalability lately.
A large part of ML engineering is about evaluating model fit. Given that linear models and generalized linear models can be constructed in a few lines of code using most popular statistical frameworks [1], I see no reason for ML engineers not to reach for a few lines of a GLM, evaluate fit, and conclude that the fit is fine and move on. In practice for more complicated situations, decision trees and random forests are also quite popular. DL methods also take quite a bit of compute and engineer time to train, so in reality most folks I know reach for DL methods only after exhausting other options.
[1]: https://www.statsmodels.org/stable/examples/index.html#generalized-linear-models is one I tend to reach for when I’m not in the mood for a Bayesian model.
For a two parameter model being optimized over a pretty nonlinear space like a hand-drawn track I think random search is a great choice. It’s probably close to optimal and very trivial to implement whereas gradient descent would require at least a few more steps.
Hill climbing with random restart would likely outperform it. But not a bad method for this problem, no.
I suppose people typically use neural networks for their huge model capacity, instead of for the efficiency of the optimization method (i.e. backward propagation). While neural networks are just another form of interpolation, they allow us to express much more detailed structures than (low-order) polynomials.
There is some evidence that this overparameterisation in neural network models is actually allowing you to get something that looks like fancier optimisation methods[1] as well as it’s a form of regularisation[2].
The linked works are really interesting. Here is a previous article with a similar view: https://lobste.rs/s/qzbfzc/why_deep_learning_works_even_though_it
neural networks […] allow us to express much more detailed structures than (low-order) polynomials
Not really. A neural network and a polynomial regression using the same number of parameters should perform roughly as well. There is some “wiggle room” for NNs to be better or PR to be better depending on the problem domain. Signal compression has notably used sinusodial regression since forever.
A neural network and a polynomial regression using the same number of parameters should perform roughly as well.
That’s interesting. I have rarely seen polynomial models with more than 5 parameters in the wild, but neural networks easily contain millions of parameters. Do you have any reading material and/or war stories about such high-order polynomial regressions to share?
This post and the associated paper made the rounds a while ago. For a linear model of a system with 1,000 variables, you’re looking at 1,002,001 parameters. Most of these can likely be zero while still providing a decent fit. NNs can’t really do that sort of stuff.
I’ve tried something similar with PNG by just throwing the lowest bits away in RGB24 mode, but this appears to be much more clever. Question is, is the extra computational power necessary worth it?
On the decoder side it doesn’t matter how it was done, and generally less data is better. If you don’t have limits on the encoding side, then you can make it as complex as you want.
Worth it how?
Existing lossless formats are extremely compute intensive. The author referenced QOI, think of this as in the same vein.
Are you saying that throwing away the lowest bits is easier and achieve the same result, that it doesn’t need the additional complexity (in the authors work) to achieve the result they showed. Without a mathematical comparison between your work and the authors it would be impossible to say if it was worth it.
Indeed. It would be interesting to compare this to a stupid quantizer like I suggest, combined with zopflipng. The latter is extremely slow. Perhaps we could get the best of both worlds? Would this be worthwhile compared to just using JPEG with full chroma and quality >= 90?
Not really, unless you’re talking about lossless JPEG2000 Part 1. I work with decoding intra-only video in a broadcast context, where files are often 4k or 8k with 48 bits per pixel at 50 Hz or so, regularly above 1 Gbps, sometimes as high as 5 Gbps. Here decoding performance becomes an issue, especially for Part 1 files. Amusingly QOI is likely much too slow for this use case due to its serial nature.
These new formats don’t appear to be designed to solve any actual industry issue. No one cares about a format’s spec fitting on a single page.