JSON doesn’t really have any semantics to speak of: when are two JSON values equal? When are they different? You could do better than JSON here by defining an equivalence relation over JCOF terms.
Well, call it what you will, there’s no there there :-) The JSON data model is not well-defined enough to really be said to exist. I ranted a little on this topic here: https://preserves.dev/why-not-json.html
I took the time to try to define semantics, in a way I think is consistent with how JSON parsers and serializers are often implemented: https://github.com/mortie/jcof/pull/3/files
I would love some feedback on that. In particular, is everything clear enough? Should I have a more thorough explanation of how exactly numbers map to floating point values? On the one hand, it would’ve been nice; but on the other hand, correct floating point parsing and serialization is so complicated that it’s nice to leave it up to a language’s standard library, even if that result in slight implementation differences. (While doing research on how other languages do this, I even found that JavaScript’s number to string function has implementation-defined results.)
That’s really nice. You probably don’t have to pin down text representation of floats further, but you might say something like “the IEEE754 double value closest to the mathematical meaning of the digits in the number” if you like. It’s a bit thorny, still, depressingly, isn’t it! For preserves I pointed at Will Clinger’s and Aubrey Jaffer’s papers. It might also be helpful to give examples of JCOF’s answers to the questions I wrote down in my rant linked upthread. Also useful would be to simply point at the relevant bit of the spec for comparing two doubles: for preserves I chose to use the totalOrder predicate from the standard, because I wanted a total ordering, not just an equivalence, but I think the prose you have maps more closely to compareQuietEqual from section 5.11.
I actually originally had wording to the effect of “the IEEE 754 double value closest to the meaning of the digits”, but I tried to figure out if that’s actually what JavaScript’s parseFloat does, which is when I found out that JavaScript actually leaves it up to the implementation whether the value is rounded up or down after the 20th digit. So for the string "2.00000000000000000013" (1 being the 20th significant digit), it’s implementation-defined whether you get the float representing 2.0000000000000000001 or 2.0000000000000000002, even though the former is closer. I could try to copy the JavaScript semantics, as that probably represents basically what’s achievable on as broad a range of hardware as is reasonable. I certainly don’t think I should be more strict than JavaScript. Though I was surprised that JavaScript apparently doesn’t require that you can round-trip a float perfectly with parseFloat(num.toString()).
I also originally tried looking into how IEEE 754 defines equality, thinking I could defer to that instead of talking about values being bit-identical, and I found the predicate compareQuietEqual in table 5.1 in section 5.11. I was never able to find a description of what compareQuietEqual actually does, however, nor did I find anything else which describes how “equality” is defined. If you have any insight here, I’d like to hear. (Additionally, my semantics would want to consider -0 and 0 to not be the same; this is actually why I use the phrase “the same” rather than “compare equal”. I wouldn’t want a serializer to encode -0 as 0.)
I also noticed that JavaScript doesn’t mention compareQuietEqual; it defines numbers x and y to be equal if, among other things,“x is the same Number value as y”, where “the Number value for x” is defined to be the same as IEEE754’s roundTiesToEven(x). And roundTiesToEven is just a way to go from an abstract exact mathematical quantity to a concrete floating point number. So that, to me, sounds like JavaScript is using bitwise equality, unless it uses “the same” to mean “compares equal according to compareQuietEqual”.
It always seems that once you dig deep enough into the specs underpinning our digital world, you find that at the core, it’s all just ambiguous prose and our world hangs together because implementors happen to agree on interpretations.
Regarding the questions, my semantics answer most of them, but I would need to constrain float parsing to be able to answer the second one. The answers are:
are the JSON values 1, 1.0, and 1e0 the same or different? They are all the same, since they parse to the same IEEE 754 double precision floating point numbers.
are the JSON values 1.0 and 1.0000000000000001 the same or different? Currently ambiguous, since I don’t define parsing rules. If we used JavaScript’s rules, they would be different, since they differ in the 17th significant digit, and JavaScript parseFloat is exact until the 20th.
are the JSON strings “päron” (UTF-8 70c3a4726f6e) and “päron” (UTF-8 7061cc88726f6e) the same or different? They are different, since they have different UTF-8 code units.
are the JSON objects {"a":1, "b":2} and {"b":2, "a":1} the same or different? They are the same, since order doesn’t matter.
which, if any, of {"a":1, "a":2}, {"a":1} and {"a":2} are the same? Are all three legal? The first one is illegal because keys must be unique. The second and third are different, since the value of key “a” is different.
are {"päron":1} and {"päron":1} the same or different? They are the same if both use the same UTF-8 code point sequence for their keys.
Once we have the float parsing thing nailed down, it would be a good idea to add updated answers to the readme.
I think IEEE-754 floats is one area where the binary formats win over text. CBOR can represent IEEE-754 doubles, singles and halfs exactly (and include +inf, -inf, 0, -0, and NaN). When I wrote my own CBOR library, I even went so far as to use the smallest IEEE-754 format that would would trip (so +inf would be encoded as a half-float for instance).
For Unicode, you may want to specify a canonical form (say, NFC or NFD) to ensure interoperability.
Re unicode normalization forms: I’d avoid them at this level. It feels like an application concern, not a transport concern to me. Different normalization forms have different purposes; the same text sometimes needs renormalizing to be used in a different way; etc. Sticking to just sequence-of-codepoints is IMO the right thing to do.
I won’t specify a Unicode canonicalization form, since that would require correct parsers to contain or depend on a whole Unicode library, and it would mean different JCOF implementations which operate with different versions of Unicode are incompatible. Strings will remain sequences of UTF-8 encoded code points which are considered “the same” only if their bytes are the same.
Regarding floats, I agree that binary formats have an advantage there, since they can just output the float’s bits directly. Parsing and stringifying floats is an annoyingly hard problem. But I want this to remain a text format. Maybe I could represent the float’s bits as a string somehow though; base64 encode the 8 bytes or something. I’ll think about it.
Aha! Then you do want totalOrder after all, I think. When used as an equivalence it ends up being a comparison-of-the-bits IIRC. See here, for example.
1.0 =?= 1.0000000000000001
Wow, are you sure this is ambiguous for IEEE754 and Javascript? Trying it out in my browser, the two parseFloat to identical-appearing values. I can’t make it distinguish between them. What am I missing?
Per Wikipedia on IEEE754 (not Javascript numbers per se): doubles have “from 15 to 17 significant decimal digits precision […] If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.” I used this info when cooking up the example.
Oh, wow, OK, I’ve just found RoundMVResult in the spec. Perhaps it’s aimed at those implementations that use, say, 80-bit floats for their numbers? But no, that can’t be right. What am I missing?
3 extra decimal digits is… about 10 bits of extra mantissa. Which gets us pretty close to the size of the mantissa of an 80-bit float. So maybe that’s the reason. Hmmm.
Namely, that the two lists, once they intersect, have the same elements. That’s an important assumption, and it’s what allows the approach outlined above to work. If the lists just intersected in one element, and then went their separate ways
I can’t see how this, i.e. going separate ways, would be possible. A node has a single pointer to the next element. So the tails of the lists have to be identical if the two lists intersect.
I agree. The author of the post seems to think, wrongly, that it’s the values stored in the nodes that need to be compared for equality, not the node pointers themselves. But the original problem statement clearly shows that they’re looking for the list structures joining.
The term “intersection” is IMHO wrong here; it would have been better to describe the two lists as “merging”.
A more interesting and useful way to describe this problem is to consider the links to be “parent” pointers in tree structures. The question is then, given two nodes, to find their common ancestor, or else show that they’re not in the same tree.
It is perhaps an unexpected definition of “intersection” that is being used here; You should read the linked problem-statement which may make this clearer: They aren’t considering the use case of [a,b] and [a] perhaps because they are thinking of blockchains or git histories or something like that, but for whatever reason it isn’t given as a possibility the program needs to deal with.
The author might also not be aware that “intersection” means something very different to some (many? most?) people, and that might be contributing to your confusion.
Maybe it’s too simple? This comment was part of the “My first impressions of web3” discussion we are having in parallel:
A protocol moves much more slowly than a platform. After 30+ years, email is still unencrypted; meanwhile WhatsApp went from unencrypted to full e2ee in a year. People are still trying to standardize sharing a video reliably over IRC; meanwhile, Slack lets you create custom reaction emoji based on your face.
I don’t want to discount Moxie’s otherwise entirely correct (IMHO) observation here but it’s worth remembering that, like everything else in engineering, this is also a trade-off.
IRC is an open standard, whereas WhatsApp is both a (closed, I think?) standard and a single application. Getting all implementations of an open standard on the same page is indeed difficult and carries out a lot of inertia. Getting the only implementation of a closed standard on the same page is trivial. However, the technical capability to do so immediately also carries the associated risk that it’s not necessarily the page everyone wants to be on. That’s one of the reasons why, when WhatsApp is going to be as dead as Yahoo! Messenger, people are still going to be using IRC. This shouldn’t be taken to mean that IRC is better than either WhatsApp or Slack – just that, for all its simplicity, there are in fact many good reasons why it outlasted many more capable and well-funded platforms, not all of them purely technical and thus not all of them outperformable by means of technical capability.
It’s also worth pointing out that, while Slack lets you create custom reaction emoji based on your face, the standard way to share video reliably over both IRC and Slack is posting a Youtube link. More generally, it has been my experience that, for any given task, a trivial application of a protocol that’s better suited for that task will usually outperform any non-core extensions of a less suitable protocol.
It’s also worth pointing out that, while Slack lets you create custom reaction emoji based on your face, the standard way to share video reliably over both IRC and Slack is posting a Youtube link. More generally, it has been my experience that, for any given task, a trivial application of a protocol that’s better suited for that task will usually outperform any non-core extensions of a less suitable protocol.
I mean, I’m not uploading an entire 10 minute video, but likely something short from my camera. To do so from Slack, I press the button and pick it out of my camera roll. From IRC, I have to upload it somewhere, copy and paste the link, and paste that…
meanwhile, Slack lets you create custom reaction emoji based on your face.
This is exactly why email survived all the IMs du jour that have came and gone. A clear vision of what matters and what the core functionality is as well as the problem which it poses itself to solve. All of which Slack lacks.
This is exactly why email survived all the IMs du jour that have came and gone.
Does this really matter, though? I’ve probably used 10 messaging platforms in the last 25 years, some concurrently for different purposes. They each served their purpose. The transition was, in some cases, a little rocky, but only for a short time. I don’t really think my life would have been meaningfully improved by having used only a single messaging system during that time period.
I’ve probably used 10 messaging platforms in the last 25 years, some concurrently for different purposes
Mostly because all the disruptors bootstrapped their network effect by integration with XMPP and IRC and then dropped it when they had enough market share?
The flipside of Moxie’s (excellent) observation is that the slow-moving protocols are easy to support - so that is where the network effects live. If the world is split between centralised network X and Y (which will refiuse to interoperate with each other) and you can get the core functionality with the (more basic) protocol, then there is a a value to the core protocol (you can speak to X and Y)
There are several components to a messaging system:
A client, that the end-user interacts with.
A server (optionally) that the client communicates with.
A protocol that the client and server use to communicate (or that clients use to communicate with each other in a P2P system).
A service, which runs the server and (often) provides a default client.
In my experience, the overwhelming majority of users conflate all four of these. Even for email, they see something like Hotmail or GMail as a combination of all of these things and the fact that Hotmail and GMail can communicate with each other is an extra and they have no idea that it works because email is a standard protocol. The fact that WhatsApp doesn’t let you communicate with any other chat service doesn’t surprise them because they see the app as the whole system. The fact that there’s a protocol, talking to a server, operated by a service provider, just isn’t part of their mental model of the system.
I read a paper a couple of years ago that was looking at what users thought end-to-end encryption meant. It asked users to draw diagrams of how they thought things like email and WhatsApp worked and it pretty much coincided with my prior belief: there’s an app and there’s some magic, and that’s as far as most users go in their mental models.
…then there is a a value to the core protocol (you can speak to X and Y)
But if those two apps won’t talk to one another, why would they speak the shared protocol? Plus, most people don’t choose protocols, they choose applications, and most people don’t choose based on “core functionality”, they choose based on differentiated features. I’m not unsympathetic here, I kicked and screamed about switching from Hipchat to Slack (until Hipchat ceased to exist, of course), but I watched people demand the features that Slack offered (threaded conversations, primarily). They didn’t care about being able to choose their clients, or federation, or being at the mercy of a single company. They cared about the day-to-day user experience.
eh, I think it’s potentially more that email is so deeply embedded into society that it’s difficult to remove, rather than about any defining characteristics of the protocol itself :p
I’m still not sure though if you can call it “survived” when the lowest common denominator for email is still
no dmarc
no SPF
MS blackholes at random(status code ok, mail gone, support can “upgrade” your IP; then you get at least a “denied”, and then you’re told to subscribe to some trust-service for email delivery)
google doesn’t like you sometimes
german telekom doesn’t trust new IPs, wants an imprint to whitelist, no SPF
there’s some random blacklist out there marking IPV4s as “dynamic” since 2002, give them money to change that (no imprint)
and if you want push notification the content goes through your mobile OS provider in plaintext
if you want SMTP/IMAP there is no second factor login at all, so you’re probably using the same credentials that everything else uses
etc
So everyone goes to AWS for a mail sling or some other service. Because the anti-spam / trust model is inherently broken and centralized.
Yes email is still alive and I wouldn’t want to exchange it with some proprietary messenger, but it’s increasingly hard to even remotely self host this or even let some company host this for you that isn’t one of the big 5 (because one bad customer in the IP range and you’re out) - except if you don’t actually care if your emails can be received or send.
It is. To properly use IRC, you don’t just need IRC, you need a load of ad-hoc things on top. For example, IRC has no authentication at all. Most IRC servers run a bot called NickServ. It runs as an operator and if you want to have a persistent identity then you register it by sending this bot the username and a password. The bot then kicks anyone off the service if they try to use your username but don’t first notify the bot with your password. This is nice in some ways because it means that a client that is incredibly simple can still support these things by having the user send the messages directly. This is not very user friendly. There’s no command vocabulary and no generalised discovery mechanism, there’s just an ad-hoc set of extensions that you either support individually or you punt to the user. This also means that you’re limited to plain text for the core protocol. Back in the ’90s, Internet Explorer came with a thing called MS Comic Chat, which provided a UI that looked like a comic on top of IRC. Every other user saw some line noise in your messages because it just put the state of the graphics inline in the message. If IRC were slightly less simple then this could have been embedded as out-of-band data and users of other IRC clients could have silently ignored it or chosen to support it.
I’m still a bit sad that SILC never caught on. SILC is basically modernised IRC. It had quite a nice permissively licensed client library implementation and I wrote a SILC bot back when it looked like it might be a thing that took over from IRC (2005ish?). It was basically dead by around 2015 though. It had quite a nice identity model: usernames were not unique but public keys were and clients could tell you if the David that you’re talking to today was the same as the one you were talking to yesterday (or even tell you if the person decided to change nicknames but kept their identity).
SASL is a thing now. And clients can get by being simple by simply letting the user enter IRC commands directly.
Nickserv is bad mainly because it is an in-band communications mechanism; that has security implications. I have seen people send e.g. ‘msg nickserv identify hunter2’ to a public channel (mistakenly omitting the leading ‘/’).
About every client and almost every network (the only notable exceptions are: OFTC/Undernet/EFnet/IRCnet) support both.
If IRC were slightly less simple then this could have been embedded as out-of-band data and users of other IRC clients could have silently ignored it or chosen to support it.
The cost of making a visual contribution increases over time, and the funds a contributor pays to mint are distributed to all previous artists (visualizing this financial structure would resemble something similar to a pyramid shape).
I think Moxie’s Autonomous Art is clever - it is the NFT equivalent of the Million Dollar Homepage. Both projects I would consider art because each reflects in its own way on the state of society.
From the perspective of compilers of the Turbo Pascal era, compilers on Unix looked slow: they would call assemblers and linkers as a separate process whereas Turbo Pascal would create an executable directly. Certainly it is possible to create binaries more directly but other desirable features like FFI might still depend on doing this in steps.
You will always need a linker to call external functions from system libraries, but you could use this approach internally for almost everything, and then run a linker to link in libc.so and friends. I’m.not sure how much link time would be saved, but my intuition is that it would be a lot.
Unless I misunderstood your point (i.e. we do need a dynamic linker at runtime for shared libraries), Go does not use a conventional linker step to link against a symbol in libc or any other shared library (see //go:cgo_import_dynamic directive), and neither the presence of the target library is required (in fact, it is not accessed at all).
Older Windows/Mac IDEs like Turbo, Lightspeed/THINK, Metrowerks used the same compile-then-link architecture, they just had the compiler and linker built into the app instead of being separate binaries. I definitely recall waiting for THINK C++’s link phase to finish.
I found it interesting to look at stop watch apps on mobile devices. They offer many more capabilities but few are able to leverage them. Tasks a stop watch could solve:
start and stop multiple stop watches together or sequentially
take a split time for individual or all running stop watches. The result could be either the lap time (since the last event) or the elapsed time for that stop watch.
combine a stop watch with a countdown timer: count down from a set time, then start the stop watch
make sure this is usable in the dark or with wet fingers. Your phone might not unlock when your fingers are wet ..
keep the stop watch running in the background
control a group of stop watches from another device - with people at start, finish and intermediate points all operating a group of stop watches
label stop watches and save results, export results
maybe combine with a camera - take a photo and embed the current timers into the photo
How does this work? Is this expanding the VM instructions into C code and thus provides some ahead-of-time compilation of WASM code? The README is quite terse.
Only tangentially related: I always thought it would be nice if filesystems would use free space for deleted files much like snapshots and reclaim space by removing the oldest files when needed. This would give users a safety net by making good use of free disk space. Is any operating system providing something like this?
Pleased to hear someone else is eager for this! I’ve often thought it an oversight that filesystems treat “delete” as such a destructive operation when really it doesn’t need to be. As you mention, we do get part of the way there with snapshots, particularly on ZFS.
I realise that there are speed issues related to fragmentation and complexity in terms of finding a free block on a disk with less real free space. Both of these are less of a concern on SSD though.
Having an “IsDeleted” field on rows in a SQL table is not an uncommon pattern, would be great to have this equivalent for a filesystem.
Not discovered any existing filesystems or operating system that does this, but would love to see it!
That’s more or less what FAT did. Deleting a file was accomplished by just zeroing the first character of the filename, the file was only overwritten when you needed the space. The undelete tool for DOS would just ask you to fill in the first letter of the filename for the restore. Log-structured filesystems typically get this kind of almost for free: a delete operation is just an entry in the log saying that the next GC phase is allowed to reclaim the blocks. You could imagine adding a generational GC that would not actually reclaim blocks until they’d been marked as garbage for multiple cycles.
On an inode-based filesystem you could probably do something similar by just adding a hard link to deleted files into a hidden trash directory in a circular buffer and adding files from this back to the free list when space is constrained.
Amoeba’s filesystem (I think) never really did deletion, it was predicated on the idea that files grew more slowly than available storage (which wasn’t really true then but probably is now) and so it was an append-only log. I think they did something different for ephemeral things. Another system (or possibly Amoeba and I’m getting them mixed up) had no explicit delete at all and just GC’d files that haven’t been accessed recently. Apparently users just ended up running cron jobs that touched every file in their home directory every night.
I know Sprite had a log-structured file system (maybe even the first?) and leaned heavily on client-side caching of normally-remote files with IIRC fairly aggressive deletion of local cached copies.
Files-11 on VMS did automatic file versioning: when you wrote to a file, a new version was created and versions older than the limit were removed. You had to explicitly specify a version when deleting, IIRC (there were facilities for purging all old version as well).
I worked on C– with Norman Ramsey and Simon Peyton Jones. Indeed, it emerged around the same time as LLVM and we know that it became much more successful. C– as a project had the right goals and was modest in some sense as it was mostly a portable assembly language, but it suffered from over-ambition in the implementation: it was done as a literate program, which creates a barrier for supporters, and relied on an automatically generated back-end. The back-end would rewrite complex operations into simpler ones, which then would be recognised by a generated recogniser, that would map them to target-specific instructions. While academically interesting, I believe this complexity did not help. The implementation was done in OCaml, which I still think is a good choice for a compiler project (witness web assembly).
“In the two years since Google’s Quantum Computer went live, a Chinese research team led by Jian-Wei Pan have upped the anti.” How can this get past any proof reading?
The idea of identifying code by structure globally is fascinating but how does this accommodate abstraction? The hallmark of abstraction is an interface that can be implemented in different ways (and trade-offs). This does not seem possible (or encouraged) here.
I think Unison has four ways to support polymorphism. (Though this is my novice-level interpretation so I might not get all the nuances.)
First and most visible, Unison is an FP language with functions as first class values. So you can always pass in a function (as long as the function’s type is acceptable) that contains a specific variation of behavior.
Second, functions can be polymorphic in their types.
Third, Unison has an effects system (called “abilities”) where the effect definition is an abstract interface and specific implements can vary. Abilities occupy a special position in the syntax and semantics though, so not everything can be abstracted this way.
Fourth, there is work going on now about how to incorporate typeclasses.
So it might be that there is a surplus of abstraction techniques!
As I recall, it’s approach to abstraction is similar to Haskell’s; type-wise, if function a has the same types as function b, then the functions are treated the same to the rest of the code using them. Haskell seems to do ok with abstraction, but maybe I’m misinterpreting you.
I found that interesting as a subject, a method, and how it is presented. I do wonder though: the model assumes that every node has the same structure in terms of number of neighbours. How much of a limitation is this or can you construct your problem in such a way that this condition is met?
(In the past I was annoyed by Wolfram’s narrow and self-centered writing but was pleasantly surprised by this article.)
I only glanced at the algorithm, but: is the algorithm minimising only the error for the next step or globally? Could a small error in the current step force a larger error in the next one that is overall larger?
I am working on GPS analysis for rowers: it takes a GPS file and computes metrics that are useful to track performance. I’m doing this specifically for rowers in Cambridge UK who are rowing on a narrow river. The feature set is similar to what Strava does. Currently it is a command-line application that generates an HTML report. An example is here: https://lindig.github.io/tmp/example.html
The ideas are not necessarily tied to rowing. They could be also used for running or cycling but these domains are much better served by existing tools than rowing. Main ideas are:
within a GPS track, find the fastest 100m, 200m, 300m .. these may be overlapping
within a GPS track, find the fastest non-overlapping 500m intervals - to identify sprints
within a GPS track, recognise the passing of landmarks and time spent between such landmarks
summarise time spent within certain speed brackets
Surely interesting. The syntax is almost Lisp-like in its simplicity (which, for Lisp, a lot of people feel is too simplistic for mainstream success – so it’s extra interesting to use that in a product with mainstream users).
This does not discuss tables, which comes as a surprise, because Tufte’s books have beautiful tables.
JSON doesn’t really have any semantics to speak of: when are two JSON values equal? When are they different? You could do better than JSON here by defining an equivalence relation over JCOF terms.
Maybe it would’ve been better say it “the JSON data model”, which is what CBOR calls it. I’ll consider updating the readme.
Well, call it what you will, there’s no there there :-) The JSON data model is not well-defined enough to really be said to exist. I ranted a little on this topic here: https://preserves.dev/why-not-json.html
I took the time to try to define semantics, in a way I think is consistent with how JSON parsers and serializers are often implemented: https://github.com/mortie/jcof/pull/3/files
I would love some feedback on that. In particular, is everything clear enough? Should I have a more thorough explanation of how exactly numbers map to floating point values? On the one hand, it would’ve been nice; but on the other hand, correct floating point parsing and serialization is so complicated that it’s nice to leave it up to a language’s standard library, even if that result in slight implementation differences. (While doing research on how other languages do this, I even found that JavaScript’s number to string function has implementation-defined results.)
That’s really nice. You probably don’t have to pin down text representation of floats further, but you might say something like “the IEEE754 double value closest to the mathematical meaning of the digits in the number” if you like. It’s a bit thorny, still, depressingly, isn’t it! For preserves I pointed at Will Clinger’s and Aubrey Jaffer’s papers. It might also be helpful to give examples of JCOF’s answers to the questions I wrote down in my rant linked upthread. Also useful would be to simply point at the relevant bit of the spec for comparing two doubles: for preserves I chose to use the
totalOrder
predicate from the standard, because I wanted a total ordering, not just an equivalence, but I think the prose you have maps more closely tocompareQuietEqual
from section 5.11.I actually originally had wording to the effect of “the IEEE 754 double value closest to the meaning of the digits”, but I tried to figure out if that’s actually what JavaScript’s
parseFloat
does, which is when I found out that JavaScript actually leaves it up to the implementation whether the value is rounded up or down after the 20th digit. So for the string"2.00000000000000000013"
(1
being the 20th significant digit), it’s implementation-defined whether you get the float representing2.0000000000000000001
or2.0000000000000000002
, even though the former is closer. I could try to copy the JavaScript semantics, as that probably represents basically what’s achievable on as broad a range of hardware as is reasonable. I certainly don’t think I should be more strict than JavaScript. Though I was surprised that JavaScript apparently doesn’t require that you can round-trip a float perfectly withparseFloat(num.toString())
.I also originally tried looking into how IEEE 754 defines equality, thinking I could defer to that instead of talking about values being bit-identical, and I found the predicate
compareQuietEqual
in table 5.1 in section 5.11. I was never able to find a description of whatcompareQuietEqual
actually does, however, nor did I find anything else which describes how “equality” is defined. If you have any insight here, I’d like to hear. (Additionally, my semantics would want to consider-0
and0
to not be the same; this is actually why I use the phrase “the same” rather than “compare equal”. I wouldn’t want a serializer to encode-0
as0
.)I also noticed that JavaScript doesn’t mention compareQuietEqual; it defines numbers
x
andy
to be equal if, among other things,“x is the sameNumber value
as y”, where “theNumber value
for x” is defined to be the same as IEEE754’sroundTiesToEven(x)
. AndroundTiesToEven
is just a way to go from an abstract exact mathematical quantity to a concrete floating point number. So that, to me, sounds like JavaScript is using bitwise equality, unless it uses “the same” to mean “compares equal according to compareQuietEqual”.It always seems that once you dig deep enough into the specs underpinning our digital world, you find that at the core, it’s all just ambiguous prose and our world hangs together because implementors happen to agree on interpretations.
Regarding the questions, my semantics answer most of them, but I would need to constrain float parsing to be able to answer the second one. The answers are:
{"a":1, "b":2}
and{"b":2, "a":1}
the same or different? They are the same, since order doesn’t matter.{"a":1, "a":2}
,{"a":1}
and{"a":2}
are the same? Are all three legal? The first one is illegal because keys must be unique. The second and third are different, since the value of key “a” is different.{"päron":1}
and{"päron":1}
the same or different? They are the same if both use the same UTF-8 code point sequence for their keys.Once we have the float parsing thing nailed down, it would be a good idea to add updated answers to the readme.
I think IEEE-754 floats is one area where the binary formats win over text. CBOR can represent IEEE-754 doubles, singles and halfs exactly (and include +inf, -inf, 0, -0, and NaN). When I wrote my own CBOR library, I even went so far as to use the smallest IEEE-754 format that would would trip (so +inf would be encoded as a half-float for instance).
For Unicode, you may want to specify a canonical form (say, NFC or NFD) to ensure interoperability.
+1 for binary floats.
Re unicode normalization forms: I’d avoid them at this level. It feels like an application concern, not a transport concern to me. Different normalization forms have different purposes; the same text sometimes needs renormalizing to be used in a different way; etc. Sticking to just sequence-of-codepoints is IMO the right thing to do.
I won’t specify a Unicode canonicalization form, since that would require correct parsers to contain or depend on a whole Unicode library, and it would mean different JCOF implementations which operate with different versions of Unicode are incompatible. Strings will remain sequences of UTF-8 encoded code points which are considered “the same” only if their bytes are the same.
Regarding floats, I agree that binary formats have an advantage there, since they can just output the float’s bits directly. Parsing and stringifying floats is an annoyingly hard problem. But I want this to remain a text format. Maybe I could represent the float’s bits as a string somehow though; base64 encode the 8 bytes or something. I’ll think about it.
Hexfloats are a thing! https://gcc.gnu.org/onlinedocs/gcc/Hex-Floats.html
For preserves text syntax, I didn’t offer hexfloats (yet), instead escaping to the binary representation. https://preserves.dev/preserves-text.html#fn:rationale-switch-to-binary
Aha! Then you do want
totalOrder
after all, I think. When used as an equivalence it ends up being a comparison-of-the-bits IIRC. See here, for example.Wow, are you sure this is ambiguous for IEEE754 and Javascript? Trying it out in my browser, the two
parseFloat
to identical-appearing values. I can’t make it distinguish between them. What am I missing?Per Wikipedia on IEEE754 (not Javascript numbers per se): doubles have “from 15 to 17 significant decimal digits precision […] If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.” I used this info when cooking up the example.
Oh, wow, OK, I’ve just found RoundMVResult in the spec. Perhaps it’s aimed at those implementations that use, say, 80-bit floats for their numbers? But no, that can’t be right. What am I missing?
3 extra decimal digits is… about 10 bits of extra mantissa. Which gets us pretty close to the size of the mantissa of an 80-bit float. So maybe that’s the reason. Hmmm.
One detail is the meaning of an object that uses the same key twice - what does that mean?
I can’t see how this, i.e. going separate ways, would be possible. A node has a single pointer to the next element. So the tails of the lists have to be identical if the two lists intersect.
I agree. The author of the post seems to think, wrongly, that it’s the values stored in the nodes that need to be compared for equality, not the node pointers themselves. But the original problem statement clearly shows that they’re looking for the list structures joining.
The term “intersection” is IMHO wrong here; it would have been better to describe the two lists as “merging”.
A more interesting and useful way to describe this problem is to consider the links to be “parent” pointers in tree structures. The question is then, given two nodes, to find their common ancestor, or else show that they’re not in the same tree.
It is perhaps an unexpected definition of “intersection” that is being used here; You should read the linked problem-statement which may make this clearer: They aren’t considering the use case of [a,b] and [a] perhaps because they are thinking of blockchains or git histories or something like that, but for whatever reason it isn’t given as a possibility the program needs to deal with.
The author might also not be aware that “intersection” means something very different to some (many? most?) people, and that might be contributing to your confusion.
Maybe it’s too simple? This comment was part of the “My first impressions of web3” discussion we are having in parallel:
I don’t want to discount Moxie’s otherwise entirely correct (IMHO) observation here but it’s worth remembering that, like everything else in engineering, this is also a trade-off.
IRC is an open standard, whereas WhatsApp is both a (closed, I think?) standard and a single application. Getting all implementations of an open standard on the same page is indeed difficult and carries out a lot of inertia. Getting the only implementation of a closed standard on the same page is trivial. However, the technical capability to do so immediately also carries the associated risk that it’s not necessarily the page everyone wants to be on. That’s one of the reasons why, when WhatsApp is going to be as dead as Yahoo! Messenger, people are still going to be using IRC. This shouldn’t be taken to mean that IRC is better than either WhatsApp or Slack – just that, for all its simplicity, there are in fact many good reasons why it outlasted many more capable and well-funded platforms, not all of them purely technical and thus not all of them outperformable by means of technical capability.
It’s also worth pointing out that, while Slack lets you create custom reaction emoji based on your face, the standard way to share video reliably over both IRC and Slack is posting a Youtube link. More generally, it has been my experience that, for any given task, a trivial application of a protocol that’s better suited for that task will usually outperform any non-core extensions of a less suitable protocol.
I mean, I’m not uploading an entire 10 minute video, but likely something short from my camera. To do so from Slack, I press the button and pick it out of my camera roll. From IRC, I have to upload it somewhere, copy and paste the link, and paste that…
Fair point, that’s not the kind of video I was thinking of. You’re right!
The simplest way for me to share pics among my friends on IRC is to upload the images to our Discord, then share the image from there. Sad, but true.
A big thing hobbling IRC is the lack of decent mobile support without paid services (IRCloud) or using a bouncer.
This is exactly why email survived all the IMs du jour that have came and gone. A clear vision of what matters and what the core functionality is as well as the problem which it poses itself to solve. All of which Slack lacks.
Does this really matter, though? I’ve probably used 10 messaging platforms in the last 25 years, some concurrently for different purposes. They each served their purpose. The transition was, in some cases, a little rocky, but only for a short time. I don’t really think my life would have been meaningfully improved by having used only a single messaging system during that time period.
Mostly because all the disruptors bootstrapped their network effect by integration with XMPP and IRC and then dropped it when they had enough market share?
The flipside of Moxie’s (excellent) observation is that the slow-moving protocols are easy to support - so that is where the network effects live. If the world is split between centralised network X and Y (which will refiuse to interoperate with each other) and you can get the core functionality with the (more basic) protocol, then there is a a value to the core protocol (you can speak to X and Y)
There are several components to a messaging system:
In my experience, the overwhelming majority of users conflate all four of these. Even for email, they see something like Hotmail or GMail as a combination of all of these things and the fact that Hotmail and GMail can communicate with each other is an extra and they have no idea that it works because email is a standard protocol. The fact that WhatsApp doesn’t let you communicate with any other chat service doesn’t surprise them because they see the app as the whole system. The fact that there’s a protocol, talking to a server, operated by a service provider, just isn’t part of their mental model of the system.
I read a paper a couple of years ago that was looking at what users thought end-to-end encryption meant. It asked users to draw diagrams of how they thought things like email and WhatsApp worked and it pretty much coincided with my prior belief: there’s an app and there’s some magic, and that’s as far as most users go in their mental models.
But if those two apps won’t talk to one another, why would they speak the shared protocol? Plus, most people don’t choose protocols, they choose applications, and most people don’t choose based on “core functionality”, they choose based on differentiated features. I’m not unsympathetic here, I kicked and screamed about switching from Hipchat to Slack (until Hipchat ceased to exist, of course), but I watched people demand the features that Slack offered (threaded conversations, primarily). They didn’t care about being able to choose their clients, or federation, or being at the mercy of a single company. They cared about the day-to-day user experience.
eh, I think it’s potentially more that email is so deeply embedded into society that it’s difficult to remove, rather than about any defining characteristics of the protocol itself :p
I’m still not sure though if you can call it “survived” when the lowest common denominator for email is still
So everyone goes to AWS for a mail sling or some other service. Because the anti-spam / trust model is inherently broken and centralized. Yes email is still alive and I wouldn’t want to exchange it with some proprietary messenger, but it’s increasingly hard to even remotely self host this or even let some company host this for you that isn’t one of the big 5 (because one bad customer in the IP range and you’re out) - except if you don’t actually care if your emails can be received or send.
It is. To properly use IRC, you don’t just need IRC, you need a load of ad-hoc things on top. For example, IRC has no authentication at all. Most IRC servers run a bot called NickServ. It runs as an operator and if you want to have a persistent identity then you register it by sending this bot the username and a password. The bot then kicks anyone off the service if they try to use your username but don’t first notify the bot with your password. This is nice in some ways because it means that a client that is incredibly simple can still support these things by having the user send the messages directly. This is not very user friendly. There’s no command vocabulary and no generalised discovery mechanism, there’s just an ad-hoc set of extensions that you either support individually or you punt to the user. This also means that you’re limited to plain text for the core protocol. Back in the ’90s, Internet Explorer came with a thing called MS Comic Chat, which provided a UI that looked like a comic on top of IRC. Every other user saw some line noise in your messages because it just put the state of the graphics inline in the message. If IRC were slightly less simple then this could have been embedded as out-of-band data and users of other IRC clients could have silently ignored it or chosen to support it.
I’m still a bit sad that SILC never caught on. SILC is basically modernised IRC. It had quite a nice permissively licensed client library implementation and I wrote a SILC bot back when it looked like it might be a thing that took over from IRC (2005ish?). It was basically dead by around 2015 though. It had quite a nice identity model: usernames were not unique but public keys were and clients could tell you if the David that you’re talking to today was the same as the one you were talking to yesterday (or even tell you if the person decided to change nicknames but kept their identity).
SASL is a thing now. And clients can get by being simple by simply letting the user enter IRC commands directly.
Nickserv is bad mainly because it is an in-band communications mechanism; that has security implications. I have seen people send e.g. ‘msg nickserv identify hunter2’ to a public channel (mistakenly omitting the leading ‘/’).
There is. https://ircv3.net/specs/extensions/capability-negotiation https://ircv3.net/specs/extensions/sasl-3.1
About every client and almost every network (the only notable exceptions are: OFTC/Undernet/EFnet/IRCnet) support both.
Today it could be done cleanly on many networks, thanks to https://ircv3.net/specs/extensions/message-tags
Get the distinct impression that Moxie is pulling his punches a little here in order to preserve the impression of being even handed.
If the goal is to persuade, max vehemence can be counter-productive. The goal is to be hard to dismiss.
He’s not pulling them too hard:
I think Moxie’s Autonomous Art is clever - it is the NFT equivalent of the Million Dollar Homepage. Both projects I would consider art because each reflects in its own way on the state of society.
He’s describing his own stuff there, though. The app he used to get a feel for the platform.
From the perspective of compilers of the Turbo Pascal era, compilers on Unix looked slow: they would call assemblers and linkers as a separate process whereas Turbo Pascal would create an executable directly. Certainly it is possible to create binaries more directly but other desirable features like FFI might still depend on doing this in steps.
You will always need a linker to call external functions from system libraries, but you could use this approach internally for almost everything, and then run a linker to link in libc.so and friends. I’m.not sure how much link time would be saved, but my intuition is that it would be a lot.
Unless I misunderstood your point (i.e. we do need a dynamic linker at runtime for shared libraries), Go does not use a conventional linker step to link against a symbol in libc or any other shared library (see
//go:cgo_import_dynamic
directive), and neither the presence of the target library is required (in fact, it is not accessed at all).Older Windows/Mac IDEs like Turbo, Lightspeed/THINK, Metrowerks used the same compile-then-link architecture, they just had the compiler and linker built into the app instead of being separate binaries. I definitely recall waiting for THINK C++’s link phase to finish.
I found it interesting to look at stop watch apps on mobile devices. They offer many more capabilities but few are able to leverage them. Tasks a stop watch could solve:
How does this work? Is this expanding the VM instructions into C code and thus provides some ahead-of-time compilation of WASM code? The README is quite terse.
Correct, it ahead-of-time compiles the WASM instructions to equivalent C code.
Here is a blog post by the author of wasm2c, another compiler which translates WASM to C (wasm2c was the inspiration for w2c2), which explains the approach and uses: https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html.
Here is an example of the input and output using wasm2c (w2c2 is very similar): https://github.com/WebAssembly/wabt/tree/main/wasm2c#a-quick-look-at-facc
Only tangentially related: I always thought it would be nice if filesystems would use free space for deleted files much like snapshots and reclaim space by removing the oldest files when needed. This would give users a safety net by making good use of free disk space. Is any operating system providing something like this?
Pleased to hear someone else is eager for this! I’ve often thought it an oversight that filesystems treat “delete” as such a destructive operation when really it doesn’t need to be. As you mention, we do get part of the way there with snapshots, particularly on ZFS.
I realise that there are speed issues related to fragmentation and complexity in terms of finding a free block on a disk with less real free space. Both of these are less of a concern on SSD though.
Having an “IsDeleted” field on rows in a SQL table is not an uncommon pattern, would be great to have this equivalent for a filesystem.
Not discovered any existing filesystems or operating system that does this, but would love to see it!
That’s more or less what FAT did. Deleting a file was accomplished by just zeroing the first character of the filename, the file was only overwritten when you needed the space. The
undelete
tool for DOS would just ask you to fill in the first letter of the filename for the restore. Log-structured filesystems typically get this kind of almost for free: a delete operation is just an entry in the log saying that the next GC phase is allowed to reclaim the blocks. You could imagine adding a generational GC that would not actually reclaim blocks until they’d been marked as garbage for multiple cycles.On an inode-based filesystem you could probably do something similar by just adding a hard link to deleted files into a hidden trash directory in a circular buffer and adding files from this back to the free list when space is constrained.
Amoeba’s filesystem (I think) never really did deletion, it was predicated on the idea that files grew more slowly than available storage (which wasn’t really true then but probably is now) and so it was an append-only log. I think they did something different for ephemeral things. Another system (or possibly Amoeba and I’m getting them mixed up) had no explicit delete at all and just GC’d files that haven’t been accessed recently. Apparently users just ended up running cron jobs that touched every file in their home directory every night.
I know Sprite had a log-structured file system (maybe even the first?) and leaned heavily on client-side caching of normally-remote files with IIRC fairly aggressive deletion of local cached copies.
Files-11 on VMS did automatic file versioning: when you wrote to a file, a new version was created and versions older than the limit were removed. You had to explicitly specify a version when deleting, IIRC (there were facilities for purging all old version as well).
I heard rumours that Simon Peyton Jones left Microsoft Research. Did he talk about that or any plans?
https://discourse.haskell.org/t/new-horizons-for-spj/3099
I worked on C– with Norman Ramsey and Simon Peyton Jones. Indeed, it emerged around the same time as LLVM and we know that it became much more successful. C– as a project had the right goals and was modest in some sense as it was mostly a portable assembly language, but it suffered from over-ambition in the implementation: it was done as a literate program, which creates a barrier for supporters, and relied on an automatically generated back-end. The back-end would rewrite complex operations into simpler ones, which then would be recognised by a generated recogniser, that would map them to target-specific instructions. While academically interesting, I believe this complexity did not help. The implementation was done in OCaml, which I still think is a good choice for a compiler project (witness web assembly).
“In the two years since Google’s Quantum Computer went live, a Chinese research team led by Jian-Wei Pan have upped the anti.” How can this get past any proof reading?
The article is also much much more brief than I thought it would be.
The idea of identifying code by structure globally is fascinating but how does this accommodate abstraction? The hallmark of abstraction is an interface that can be implemented in different ways (and trade-offs). This does not seem possible (or encouraged) here.
I think Unison has four ways to support polymorphism. (Though this is my novice-level interpretation so I might not get all the nuances.)
First and most visible, Unison is an FP language with functions as first class values. So you can always pass in a function (as long as the function’s type is acceptable) that contains a specific variation of behavior.
Second, functions can be polymorphic in their types.
Third, Unison has an effects system (called “abilities”) where the effect definition is an abstract interface and specific implements can vary. Abilities occupy a special position in the syntax and semantics though, so not everything can be abstracted this way.
Fourth, there is work going on now about how to incorporate typeclasses.
So it might be that there is a surplus of abstraction techniques!
As I recall, it’s approach to abstraction is similar to Haskell’s; type-wise, if function a has the same types as function b, then the functions are treated the same to the rest of the code using them. Haskell seems to do ok with abstraction, but maybe I’m misinterpreting you.
I found that interesting as a subject, a method, and how it is presented. I do wonder though: the model assumes that every node has the same structure in terms of number of neighbours. How much of a limitation is this or can you construct your problem in such a way that this condition is met?
(In the past I was annoyed by Wolfram’s narrow and self-centered writing but was pleasantly surprised by this article.)
How does it compare with Opium. Concepts are similar (maybe that is inevitable) and it looks fairly comprehensive.
Looks like it uses opium’s ideas as inspiration (is the dream implied to be a pipe dream? ;).
I think I’m going to give it a try to see how the embedded template DSL works in practice.
For those following along in Lobsters, the author wrote a comparison: https://discuss.ocaml.org/t/excited-about-dream-web-framework/7605/21?u=yawaramin
I think a better name would be Parametric Typography.
I only glanced at the algorithm, but: is the algorithm minimising only the error for the next step or globally? Could a small error in the current step force a larger error in the next one that is overall larger?
I am working on GPS analysis for rowers: it takes a GPS file and computes metrics that are useful to track performance. I’m doing this specifically for rowers in Cambridge UK who are rowing on a narrow river. The feature set is similar to what Strava does. Currently it is a command-line application that generates an HTML report. An example is here: https://lindig.github.io/tmp/example.html
The ideas are not necessarily tied to rowing. They could be also used for running or cycling but these domains are much better served by existing tools than rowing. Main ideas are:
Surely interesting. The syntax is almost Lisp-like in its simplicity (which, for Lisp, a lot of people feel is too simplistic for mainstream success – so it’s extra interesting to use that in a product with mainstream users).
Out of curiosity: everything is a parameter except the last parameter, which is the body of the lambda?
That’s how I read it, too.
I’m not an expert for this problem but had this idea: