One of the exercises I had to do as a student was implement a metacircular evaluator for Prolog (i.e. a Prolog program that takes Prolog programs as input and evaluates them). It was quite eye opening how little code this needed. Erlang was originally implemented in Prolog for much this reason: it’s a really fast way of writing an interpreter for an AST. It was later rewritten because it’s definitely not a way of writing a fast interpreter. For rapid prototyping AST interpreters, Prolog is excellent: the language is basically a graph-traversal engine.
I used a similar approach for generating error messages in a data validator system. It runs Prolog goals in a metainterpreter and collects failures, then transforms the failing goal into an appropriate error message.
One of the most powerful aspects of Prolog is how flexible the execution strategy can be. It’s a very cool language. Being able to query your own source code is super handy.
That’s really interesting: I’ve been wondering how a meta interpreter to provide error messages might work.
How did you decide which failing goals should be reported? If a disjunction fails, it must be that every branch fails, so do you report multiple errors?
Great question. This was for $work and I ended going with a (relatively) simple implementation that might be a bit disappointing, but I think it could definitely be expanded on.
The metainterpreter limits reporting to the first failing top level goal within valid/N clauses in which the head matches. It can report multiple goals if there are multiple clauses with the head matching. In your example, it would end up reporting valid(a) as the failing goal.
I added some extra syntax using {}/1 for silently failing goals, useful for guard conditions that we don’t want to report.
A simplified example of a way to generically validate enum values would be something like:
field_type(language, enum([en, ja, de])).
valid(Field, Value) :-
{ field_type(Field, enum(Values)) },
member(Value, Values).
error_message(member(V, List), Msg) :-
phrase(format_("Value must be one of: ~w", [List]), Msg).
Without the {guard} goal, this would end up producing errors for every field that isn’t an enum.
The downside of this is that
valid(foo, bar).
valid(foo, baz).
would need to be written as:
valid(foo, X) :- X = bar.
valid(foo, X) :- X = baz.
instead. But in practice it didn’t affect us because we could express the same conditions using other constructions (like the enum example above).
It would be possible to have the metainterpreter transform clauses to move ground terms in the 2nd argument from the head to the body of the clause to work around this.
I think there is a lot of room for improvement in this system, and I’d like to play around with open sourcing a fancy one someday that lifts some of these restrictions. In practice this was “enough” to do what we needed, but I think it would be fun to see how far you could take this.
Can you tell whether Mercury supports all the Prolog features that make it easy to implement an interpreter?
If so, you might be able to approximate the best of both worlds if you use Mercury, as it should be significantly faster than Prolog. In fact, its authors claim that Mercury is the fastest logic language in the world by a wide margin (according to Wikipedia).
Mercury supports functional programming features that make it easier to implement interpreters but it has differences from Prolog that means you take a different approach. More like Haskell or SML than Prolog. In Mercury you can’t implement a metacircular evaluator and thereby extend the Mercury system. Mercury also differs from Prog in that you can’t have holes in data structures that are represented as unbound logic variables to be bound at a later time. On the other hand Mercury has a lot of great features that make it an interesting language in its own right.
You are not going to get a very fast interpreter this way because AST interpreters are pretty slow regardless of what language you write them in. The fact that prolog is pretty slow is only a (large) contributing factor.
You are not going to get a very fast interpreter this way because AST interpreters are pretty slow regardless of what language you write them in
That definitely was true, but I think Graal has shown that it doesn’t have to be. With their model, you do type specialisation in the AST as you interpret it. There’s a lot of overhead in traversing an AST (even in comparison to bytecode execution: on the order of 10-20 instructions to move to the next AST node rather than 2-3 to fetch and start executing the next bytecode) but that can often be amortised by doing more work. If each AST node represents a few hundred instructions worth of work, then you won’t get much speedup going to a bytecode interpreter or even a JIT. This can impact how you design the language. For example, if you have an AST node to multiply together two integers, the performance will be totally dominated by the the cost of dispatch. If you have an AST node to multiple together two 1024x1024 matrixes, the interpreter overhead will be in the noise.
Factor has a few significant advantages over other languages, most arising from the fact that it has essentially no syntax:
• It has powerful metaprogramming capabilities, exceeding even those of LISPs
I’m lightly familiar with LISPs (dabbled in Clojure like 10 years ago) and not at all with Factor - so if someone could ELI5 what that claim about metaprogramming means (and why that’s important/useful) I’d appreciate it!
You can write “parsing words” in Factor that read the tokens in the source code and generate Factor code to be produced by it. You have full control over the Syntax if you do this. For example, a post I wrote on embedded grammars.
I’m still at a loss why anyone would knowingly use the Chrome browser. It was created with exactly one purpose: To help Google track you better. Oh wait, it wasn’t Google wanting to “give something back” or “donate something for free out of the kindness of their hearts”? Nope. It was created as a response to browsers like Firefox and Safari that were slowly improving their privacy settings, and thus reducing Google’s ability to violate your privacy for a fee.
And if you’re curious, Google didn’t create and give away fonts out of the kindness of their hearts, either. If you’re using Google fonts, you aren’t anonymous. Anywhere. Ever. Private mode via a VPN? Google knows who you are. Suckers. Seriously: How TF do you think they do perfect fingerprinting when everyone is using only a few browsers on relatively small number of hardware devices?
TLDR - Google dropped the “do no evil” slogan for a reason.
To be fair, they also wanted their web apps to run better. They went with Google Chrome rather than making Gmail and Docs into desktop apps. If the owner of the platform I make my apps on is a direct competitor (Microsoft Office vs Google Docs), I wouldn’t be happy. Especially when that competitor platform sucks. Now that Chrome holds the majority of market share, Google can rest easy knowing that their stuff runs how they want for most users. Chrome also pushed the envelope for browser features they directly wanted to use in their apps.
The tracking and privacy thing is a LOT more pronounced now than it was in 2008 when Chrome came out. That’s definitely an issue that’s relevant today, but you can’t really pretend it was the sole driving force of the original release of Google Chrome.
I knew that Google was building Chrome for the purpose of tracking back when it was still in development, based on private comments from friends at Google. I don’t know if that was 2008, but it was somewhere in that time period. Yes, they needed a better browser experience to support some of their product goals, but Google’s overwhelmingly-critical product is tracking users, and protecting that one cash cow is important enough to give away gmail and browsers and fonts and phone OSs (and a thousand other things) for free.
Google loses money on pretty much everything, except advertising. And despite whatever the execs say in public, they’re actually quite content with that situation, because the advertising dollars are insane.
“If you’re not paying for the product, then you are the product.”
To be fair, they also wanted their web apps to run better.
They could have done that by funding development in Firefox.
It would have been hard to work within an existing technical framework, especially considering that Firefox in 2008 or whatever was probably saddled with more tech debt than it is today, but it’d certainly be an option.
You can’t strong-arm the web into adopting the features you want by simply funding or contributing to Firefox.
And it’s not clear to me that Google would’ve been able to get Mozilla to take the necessary steps, such as killing XUL (which Mozilla eventually did many many years later, to compete with Chrome). And sandboxing each tab into its own process is probably also the kind of major rework that’s incredibly hard to pull off when you’re just an outsider contributing code with no say in the project management.
I get why Google wanted their own browser. I think they did a lot of good work to push performance and security forwards, plus some more shady work in dictating the web standards, in ways that would’ve been really hard if they didn’t have their own browser.
I still feel a bit bitter about the retirement of XUL. Back in the mid-2000 you could get a native-looking UI running with advanced controls within days. Haven’t seen anything that would get close to that in speed of development so far, maybe except VisualBasic?
Yeah, which I’m sure very conveniently prevents them from attracting too much anti-trust attention, the same way that Intel or NVidia don’t just buy AMD. But I doubt they pay any developers directly to contribute to Firefox, the way that for example AMD contributes to Mesa, Valve contributes to WINE, Apple contributes to LLVM, etc.
There’s a difference between not crushing something because its continued existence is useful to you, and actually contributing to it.
On one hand, you’re totally right. Throwing cash at keeping other browsers alive keeps their ass away from the anti-trust party.
On the other hand, again, between 75% to 95% of [Mozilla’s] entire yearly budget comes from Google. At that volume of financial contributions, I don’t think it matters that they’re not paying Google employees to contribute to Firefox—they’re literally bankrolling the entire organization around Firefox, and by extension basically its paid developers.
They pretty much were back then. At the time Google weren’t happy with the uptake of Firefox vs IE, despite promoting FF on their own platforms, and wanted to pursue the option of their own browser. Mozilla weren’t particularly well known for being accepting of large contributions or changes to their codebase from third parties. There was no real embedding story either which prevented Google from going with Gecko (the Firefox browser engine) as the base instead of WebKit.
I’m still at a loss why anyone would knowingly use the Chrome browser.
Chrome was the first browser to sandbox Flash and put Java behind a ‘click to play’. This was an extreme game changer for security.
Expanding on that, Chrome was the first browser to build sandboxing into the product from day 1. This was an extreme game changer for security.
Between (1) and (2) the threat landscape changed radically. We went from PoisonIvy and BlackHole exploits absolutely running rampant with 0 or 1 click code execution to having to nothing in a few years - the browser exploit market, in that form, literally died because of Chrome.
Continuing on,
Firefox had tons of annoying bugs that Chrome didn’t. “Firefox is already running” - remember that? Chrome had bugs but ultimately crashes were far less problematic, only impacting a tab. Back then that really mattered.
Chrome integrates with GSuite extremely well. Context Aware Access and browser management are why every company is moving to enforce use of Chrome - the security wins you get from CAA are incredible, it’s like jumping 10 years into the future of security just by choosing a different browser.
To help Google track you better.
Whether that’s true or not, the reality is that for most of Chrome’s lifetime, certainly at least until very recently, there were very very few meaningful privacy issues (read: none, depending on your point of view) with the browser. Almost everything people talked about online was just a red herring - people would talk about how Chrome would send out LLMNR traffic like it was some horrible thing and not just a mitigation against attacks, or they’d complain about the numerous ways Chrome might talk to Google that could just be disabled and were often even part of the prompts during installation.
I don’t see why it’s hard to believe that Google wanted more control over the development of the major browser because they are a ‘web’ company and controlling web standards is a massive competitive edge + they get to save hundreds of millions of dollars by not paying Firefox for google.com to be the homepage.
Chrome has been publishing whitepapers around its features for a long time. I don’t keep up anymore and things may have changed in the last few years but there was really nothing nearly as serious as what people were saying.
If you’re using Google fonts, you aren’t anonymous. Anywhere. Ever.
Just to be clear, what you’re talking about is, I assume, the fact that if a website loads content (such as fonts) from a CDN then your browser makes requests to that CDN. Google discusses this here, although I’m not sure why this behavior is surprising:
Is there some other aspect of Google Fonts that you’re referring to? Because I’ve really lost my ability to give statements like “Google Fonts tracks you” any benefit of the doubt after a decade of people misunderstanding things like “yes, a CDN can see your IP address”.
Seriously: How TF do you think they do perfect fingerprinting when everyone is using only a few browsers on relatively small number of hardware devices?
Who says they do perfect fingerprinting? Also since when are there a relatively small number of hardware devices? An IP, useragent, and basic device fingerprinting (“how big is the window”) is plenty to identify a lot of people.
Infosec people love Chrome for architectural reasons. Which just goes to show that privacy and security are separate concerns that only marginally overlap.
I agree, Privacy is totally separate from Security. That said, I do not believe Chrome has represented a privacy concern since its inception - at least until recently, which I only say because I no longer care to follow such things.
*$font,third-party slap this bad boy in your uBlock Origin config & you won’t be downloading any fonts from third-party sites.
…But do be warned that laggards are still using icon fonts (even on contemporary sites!) despite it not being the best practice for icons for over a decade.
I’m still at a loss why anyone would knowingly use the Chrome browser.
I use it because (among other reasons) I want protection against viruses more than against Google. The last thing I heard about Firefox security (I don’t follow it actively) was Patrick Walton commenting that they have made significant improvements but still have never caught up to Chrome on security. I want Chrome’s security for lunch, there is no free lunch, and I’m okay with paying for that lunch in ad targetting data. With JavaScript disabled by default (for security), I never even see many of the ads that they might spend that data on targetting.
Your attitude is a good one: You’re conscious about what they’re probably trying to do with data about you, and you accept the trade-off for the services provided, and you do so consciously.
If Google were above-board in what they’re doing, I’d be 100% thumbs up for their behavior. But they’re not.
A major financial reason for Chrome is to save the cost of paying browser vendors to make Google the default search engine. Google pays Apple like $10 billion a year for this purpose on Safari. This is why Microsoft aggressively promoting Edge is such a threat to Google - fewer users using Google.
When I looked at this many years ago, I obviously had the same exact question, but I don’t actually have an answer. The same browser version (stock) with the same config (stock) on the same OS and OS version (stock) on the same hardware (stock) with the “same” Google fonts apparently generates a different Google fingerprint, apparently even in private browsing mode through the same VPN IP. Drop the Google fonts, and the fingerprint is apparently identical. It’s been a decade since I looked at any of this, so my memory is pretty fuzzy at this point. And of course Google doesn’t provide any hints as to how they are unveiling users’ identities; this is a super closely guarded trade secret and no one I knew at Google even gave me any hints. (My guess is that they are all completely unaware of how Google does it, since the easiest way to maintain secrets is to not tell all of your employees what the secret is.)
The Google fingerprinting does a render on a hidden canvas (including rendering text using fonts) and then sends Google a hash of that render. Somehow the use of Google fonts (note: specifically when downloaded from Google, which is what most web sites do) appears to give different users their own relatively unique hash. If I had to guess (WAG WARNING!!!), I’d suggest that at least one of the most widely distributed fonts is altered ever-so-imperceptibly per download – but nothing you can see unless you render large and compare every pixel (which is what their fingerprint algo is doing). Fonts get cached for a year, so if (!!!) this is their approach, they basically get a unique ID that lasts for the term of one year, per human being on the planet.
If you examine their legalese, you’ll see that they carefully carve out this possible exception. For example: “The Google Fonts API is designed to limit the collection, storage, and use of end-user data to what is needed to serve fonts efficiently.” Right. They don’t need to collect or store anything from the Fonts API. Because your browser would be doing the work for them. Similarly, “requests for fonts are separate from and do not contain any credentials you send to google.com while using other Google services that are authenticated, such as Gmail.” So they went out of their way to provide the appearance of privacy, yet somehow their fingerprinting is able to defeat that privacy.
The only thing that I know for certain is that Google hires tons of super smart people explicitly to figure out how to work around privacy-protecting features on other companies’ web browsers, and their answer was to give away fonts for free. 🤷♂️
I’m not normally accused of being a conspiracy theorist, but damn, writing this up I sure as hell feel like one now. You’re welcome to call me crazy, because if I read this shit from anyone else, I’d think that they were nuts.
That’s really ingenious, if true. To go along with supporting your theory, there is a bug open since 2016 for enabling Subresource Integrity for Google Fonts that still isn’t enabled.
I’m a bit sceptical about the concept, that seems like it comes with an enormous cost of downsides - fonts are not light objects, and really do benefit from caching. Whereas merely having the Referer tag of the font request in addition to timing information & what is sent with the original request (IP addr, User agent, etc) seem perfectly sufficient in granularity to track a user.
This feels too easy to detect for it not to have been noticed by now - someone would have attempted to add the SRI hash themselves and noticed it break for random users, instead of the expected failure case of “everyone, everywhere, all at once”.
The fonts are constantly updated on Google fonts at the behest of the font owners, so the SRI hash issue being marked as WONTFIX isn’t very exciting, as I wouldn’t be surprised at it being legally easier for Google to host one version of the font (as Google is often not the holder of the Reserved Font Name), as the Open Font License seems to be very particular about referring to fonts by name. Reading through the OFL FAQ (https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL-FAQ_web), if I were a font distributor I would be hesitant to host old conflicting versions of the font amongst each other. Plus, easier to cache a single file than multiple, and lower effort on the side of a font foundry, as it means they do not need to have some versioning system set up for their font families (because they’re not just one weight, type, etc).
The fonts not being versioned goes beyond the SRI hash benefits, fonts often have breaking changes[0] in them e.g. https://github.com/JulietaUla/Montserrat/issues/60 so a designer being to know future font changes wouldn’t result in any changes in the page. So in my mind, it really feels like it’s the foundry who’s wanting there to be a single authoritative version of the font.
0: I suppose a semver major version bump in the font world would be character width changes.
Even if cpurdy’s version is true, I’m sure they use every normal and not so normal trick in the book to track as well. If your singular goal is identifying users uniquely, you would be stupid to rely on only 1 method. You would want 100 different methods, and you would want to deploy every single one. So if a competing browser vendor or unique edge case happens to break a single method, you don’t really care.
I agree caching of fonts is useful, but the browser would cache the most common fonts locally anyway. It would behoove Google to set the cache lifetime of the font file as long as practically possible, even if they were not using it to track you.
I agree, fingerprinting is a breadth of signals game, but I just can’t believe this vector, it feels way too technically complicated for comparable methods available within the same context - the idea was minute changes in font to result in different canvas render hashes, but a user already has a lot of signals within JS & canvas (system fonts, available APIs, etc) that are much quicker to test.
Fonts are cached per site by the browsers as a means of avoiding fingerprinting the cross-domain timing effects - Safari & Chrome call it partitioned cache; Firefox, first party isolation. So Google can tell you’ve visited a site as the referer gets sent on first load, unless they set Referrer-Policy: no-referrer, of course
I agree it’s technically complicated, but I imagine they want a variety of hard and not very hard methods. Assuming they do it, perhaps they only run it when they can’t figure out who you are from some other, easier method.
I always have a degoogled Chrome fork installed as a backup browser, in case I have website compatibility problems with Firefox. Some of my problems might be caused by my Firefox extensions, but it’s easier to switch browsers than to start disabling extensions and debugging.
On desktop I use Ungoogled Chromium. On Android I use Bromite and Vanadium. My Android fork is GrapheneOS, which is fully degoogled by default. I am grateful that Google created Android Open Source to counteract iOS, it is an excellent basis for distros like Graphene. I use F-Droid for apps.
Also, I have the Google Noto fonts installed as a debian package (fonts-noto). It’s a great font that eliminates tofu, and I thank Google for creating it. I don’t think Google is spying on my debian installed package list. If I don’t connect to google servers, they can’t see the fonts used by my browser.
I primarily rely on Ublock Origin for blocking spyware and malware, including Google spying. It’s not perfect. I can always use Tor if I feel really paranoid. The internet isn’t anonymous for web browsers if you are connecting with your real IP address (or using the kind of VPN that you warned about). Google isn’t the only surveillance capitalist on the web; I expect the majority of sites spy on you. Even Tor is probably not anonymous if state actors are targeting you. I wouldn’t use the internet at all if I was concerned about that.
Chrome initially came out in late 2008, when Firefox and Safari were actually doing OK, and IE8 was just around the corner, its betas were already out on Chrome’s release date. Chrome wasn’t even a serious player* until about 2010 or 2011, by which time IE9 was out and IE 6 was really quite dead. This article from June 2010 has a chart: https://www.neowin.net/news/ie6-market-share-drops-6-ie8-still-in-top-spot/
You can see IE8 and Firefox 3.5 were the major players, with Safari on the rise (probably mostly thanks to the iphone’s growing popularity).
i remember when Chrome’s marketshare was big enough that I had to start working with its annoying bugs. I tried to ignore it at first hoping it would die, but once the boss himself started pushing it, the pain kept coming. But there was a period there I didn’t hate - IE8 and Firefox 3.5 actually both worked pretty well.
Firefox was the #2 browser (behind only IE*) when Chrome was introduced, and at the time it was still growing its market share year after year, mostly at the expense of IE.
After its release, Chrome quickly overtook Safari (then #3), and proceeded to eat almost all of IE’s and Firefox’s market share. It is now the #1 browser, by a significant margin.
Interestingly, Safari did not yield market share to Chrome, and continued to grow its market share, albeit at a much lower rate than Chrome did. I assume that this growth is based on the growth of iPhone market share, and relatively few iPhone users install Chrome. Today, Safari is now solidly the #2 browser behind Chrome.
Edge (the new IE) is #3.
Firefox has dropped to the #4 position, in a three-way tie with Opera and Samsung.
Agreed. I’m not sure if IE7 was a thing until after chrome. Also when Chrome first came out it was a breath of fresh air because at the time you either had to use Firefox or Opera, both of which had the issue of sites breaking that were made with IE in mind or the whole browser locking up because one site was hung. While I won’t speculate that tracking was a primary goal of Chrome development let’s not pretend that it wasn’t leaps and bounds ahead of what else was available at the time on the IE6 centric web.
Chrome was definitely aimed directly at IE, most likely because they couldn’t bribe MS to default to Google search and because its outdated tech made the push to web apps much harder - consider the fact that early versions didn’t run on anything other than Windows (about 6 months between 1.0 and previews for Mac and Linux), and the lengths they went to get sandboxing to work on WinXP.
I think it’s fair to say that Firefox did have an impact - but it wasn’t that Chrome was created as a response, rather that Firefox defeated the truism that nothing could dethrone IE because it was built into Windows.
I’m still at a loss why anyone would knowingly use the Chrome browser.
I generally don’t like when technology companies use their product to push some ideological agenda, so I would probably choose Chrome over Firefox if I would have to choose between only those two. Also the new Firefox tabs waste a lot of screen space, and they didn’t give any official way to return to the previous look, so that’s another argument (last time I’ve used FF I had to hack through some CSS, which stopped working few updates later). The only thing I miss from FF are tab containers, but that’s about it.
But still, I use Vivaldi, which runs on Blink, so I’m not sure if I match your criteria, since your question is about “Chrome browser” not “Chrome engine”.
My work uses Google apps heavily and so I maintain a personal/work distinction in browsing by routing everything for work to Chrome. Privacy is a technical dead letter.
Yeah, I obviously have to use Chrome a bit, too. Because as a developer, ignoring the #1 browser seems stupid, any way you look at it. And a few sites only work on Chrome (not even on Safari or Firefox). I try to avoid Edge except for testing stuff, because the nag level is indescribable. “Are you super double extra sure that you didn’t want me to not be not your not default web browser? Choose one: [ Make Edge the default browser ] [ Use Microsoft suggested defaults for your web browser choice]” followed by “It’s been over three minutes since you signed in to your Microsoft cloud-like-thingy user account that we tied to your Windows installation despite your many protestations. Please sign in again, and this time we’ll use seven factor authentication. Also, you can’t not do this right now.” And so on.
I abhor and abjure the modern web, but we all have to live in it. On my Mac I use an app called ‘Choosy’ which lets me reroute URLs to arbitrary browsers, so I can use Safari without worry, given that I send all the garbage to either Chrome or a SSB.
It has been a long time that I have lost sight of Mozart ML and Alice ML.
I sort of remember their going from 32bits to 64bits has been a hurdle as they were lacking benevolent workforce, IIRC.
Do you advise to take the time to look at them again?
I think Oz/Mozart was a very interesting project ahead of its time in some ways, but your right that the language is not evolving and there is no signs of that changing. (From what I understand, it is still used in teaching in some places.)
What I found especially interesting about Alice ML was the module system, which implements some improvements over other ML module systems. (Some of the same ideas were/are part of the 1ML and SuccessorML initiatives, I think.)
In the case of Mozart/Oz, they worked on a ground up rewrite for 64-bit support called Mozart 2 in C++. It doesn’t have all the features of the original implemented yet. For example, distributed objects. I worked on porting the existing 32-bit engine, based on the 1.3.x series for compatibility with the book, to 64-bit. This works and runs and all tests pass. I still use it occasionally.
I’ve read through this previously, and I’m intrigued but I apparently haven’t spent enough time to fully grok the why, how, and all that jazz. I’m not coming from a strong logic programming background, though. Have you given talks, or written anything that provides an introduction that might create better intuition on rationale, target use cases, etc?
Actually, this type of languages has a much different (and simpler) semantics than Prolog, which means that one doesn’t require to be a Prolog afficionado. The main similarities are the syntax, logic variables and the concept of (output) unification. As doublec has pointed out, there is an article by me that tries to give an introduction into the topic. The Strand book is highly recommended, very accessible and explains the basic concepts in a step-by-step manner.
FLENG/FGHC/Strand (and other languages of this family) are highly parallel, which has interesting implications: concurrency is always there and you concentrate on what to make sequential (there are several strategies and tools for this). Stream processing is a very powerful abstraction mechanism, and logic variables with the possibility of data structures having “holes” allows straightforward parallelisation of tasks. I would think this is very handy for GUIs, games and all sorts of simulations.
It’s a different way of programming, but it really opened my mind to dive deeper into these concepts. This implementation is an attempt to be simple, yet practical. The language makes multi-threaded programming much easier, compared to traditional techniques, which are, IMHO, lacking in elegance and convenience.
I can’t believe I wrote the Usenet post about Dylan referenced in this article 22 years ago - after all that time are there any mainstream languages using conditions/restarts?
Smalltalk doesn’t have exceptions as a language-level construct. Instead, blocks (closures/lambdas) can return from the scope in which the block was created (assuming it’s still on the stack). You implement exceptions by pushing a handler block onto the top of a global stack, invoking it at the ‘throw’ point, and having it return. Smalltalk ‘stacks’ are actually lists of activation records where every local is a field of the object and amenable to introspection, so you can also use it to walk up the stack and build resumable or restartable exceptions. Not really a mainstream language though.
Perhaps more interesting, the language-agnostic parts of the SEH mechanism on Windows fully support resumable and restartable exceptions. There are some comments about this in the public source in the MSVC redistributable things. To my knowledge, no language has ever actually used them. This is much easier with SEH than with Itanium-style unwinding. The Itanium ABI does a two-pass unwind, where one pass finds cleanups and catch handlers, the second pass runs them. This means that the stack is destroyed by the time that you get to the catch block: each cleanup runs in the stack frame that it’s unwinding through, so implicitly destroys anything that has been unwound through by trampling over its stack. In contrast, the SEH mechanism invokes ‘funclets’, new functions that run on top of the stack with a pointer to the stack pointer for the frame that they’re cleaning up. This means that it’s possible to quite easily shuffle the order in which they are executed and allow a far-off handler to decide, based on the exception object and any other state that it has access to, that it wants to just adjust the object and resume execution from a point in the frame that threw the exception or control the unwinding process further.
Oh, and one of the comments in the article talks about longjmping out of a signal handler. Never do this. setjmp stores only the callee-save register state: it is a function call and so the caller is responsible for saving any caller-save state. Signals are delivered at arbitrary points, not just at call points, and so you will corrupt some subset of your register state if you do this. setcontext exists specifically to jump out of signal handlers. Similarly, you should not throw out of signal handlers. In theory, DWARF unwind metadata can express everything that you need for this (the FreeBSD signal trampolines now have complete unwind info, so you get back into the calling frame correctly) but both LLVM and GCC assume that exceptions are thrown only at call sites and so it is very likely that the information will be incorrect for the top frame. It will usually work if the top frame doesn’t try to catch the exception, because generally spills and reloads happen in the prolog / epilog and so the unwind state for non-catching functions will correctly restore everything that the parent frame needs.
Oh, and one of the comments in the article talks about longjmping out of a signal handler. Never do this. setjmp stores only the callee-save register state: it is a function call and so the caller is responsible for saving any caller-save state. Signals are delivered at arbitrary points, not just at call points, and so you will corrupt some subset of your register state if you do this.
I think you’re wrong here.
The POSIX standard says this about invoking longjmp() from a signal handler:
The behavior of async-signal-safe functions, as defined by this section, is as specified by POSIX.1, regardless of invocation from a signal-catching function. This is the only intended meaning of the statement that async-signal-safe functions may be used in signal-catching functions without restriction.
…
Note that although longjmp() and siglongjmp() are in the list of async-signal-safe functions, there are restrictions on subsequent behavior after the function is called from a signal-catching function. This is because the code executing after longjmp() or siglongjmp() can call any unsafe functions with the same danger as calling those unsafe functions directly from the signal handler. Applications that use longjmp() or siglongjmp() out of signal handlers require rigorous protection in order to be portable.
It also says this:
Although longjmp() is an async-signal-safe function, if it is invoked from a signal handler which interrupted a non-async-signal-safe function or equivalent (such as the processing equivalent to exit() performed after a return from the initial call to main()), the behavior of any subsequent call to a non-async-signal-safe function or equivalent is undefined.
So it seems the extra restriction mentioned before is that longjmp() and siglongjmp() cannot be used from a signal handler that interrupted a non-async-signal-safe function.
There are other restrictions. The standard says this about them:
If the most recent invocation of setjmp() with the corresponding jmp_buf occurred in another thread, or if there is no such invocation, or if the function containing the invocation of setjmp() has terminated execution in the interim, or if the invocation of setjmp() was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined.
So the function with the setjmp() invocation must be in the same thread and still on the stack (i.e., the invoking function has not returned), and it also has to be in scope (i.e., execution must be in the same C scope as the setjmp() invocation).
In addition:
All accessible objects have values, and all other components of the abstract machine have state (for example, floating-point status flags and open files), as of the time longjmp() was called, except that the values of objects of automatic storage duration are unspecified if they meet all the following conditions:
They are local to the function containing the corresponding setjmp() invocation.
They do not have volatile-qualified type.
They are changed between the setjmp() invocation and longjmp() call.
All three of those conditions have to exist for the value of local variables to be unspecified. If you have a pointer to something in the stack, but in a function above, you’re fine; the pointer doesn’t change, and the contents at the pointer are specified. If you have a pointer to a heap allocation, the contents of the heap allocation are specified. And so on.
All of this is to say that you are broadly right, that code should usually not longjmp() out of signal handlers. But to say never seems a bit much, if you do everything right. Of course, like crypto, you should only do it if you know what you’re doing.
Also, I believe that the quotes above mean that if an implementation does not save caller-save registers, it is wrong. The reason is that the compiler could have lifted one local into a caller-save register. If that local does not change between the initial setjmp() and the longjmp() back to it, that caller-save register should have the same value, and if it doesn’t, I argue that the implementation does not follow the standard, that the implementation is wrong, not the application, especially since it is legal for an implementation to use a macro. In fact, the standard has several restrictions on howsetjmp() can be invoked to make it easier to implement as a macro:
An application shall ensure that an invocation of setjmp() appears in one of the following contexts only:
The entire controlling expression of a selection or iteration statement
One operand of a relational or equality operator with the other operand an integral constant expression, with the resulting expression being the entire controlling expression of a selection or iteration statement
The operand of a unary ‘!’ operator with the resulting expression being the entire controlling expression of a selection or iteration
The entire expression of an expression statement (possibly cast to void)
If the invocation appears in any other context, the behavior is undefined.
Source: I implemented a longjmp() out of a signal handler in my bc and followed all of the relevant standard restrictions I quoted above. It was hard, yes, and I had to have signal “locks” for the signal handler to tell if execution was in a non-async-signal-safe function (in which case, it sets a flag, and returns normally, and when the signal lock is removed, the jump happens then).
Also, I believe that the quotes above mean that if an implementation does not save caller-save registers, it is wrong.
If that is the case, then there are no correct implementations. For example, the glibc version saves only 8 registers on x86-64. Similarly, the FreeBSD version stores 8 integer registers. This means that they are not storing all of the integer register set, let alone any floating-point or vector state. This is in contrast to ucontext, which does store the entire state. The sig prefixed versions also store the signal mask and restore it.
It looks as if ucontext is finally gone from the latest version of POSIX, but the previous version said this:
When a signal handler is executed, the current user context is saved and a new context is created. If the thread leaves the signal handler via longjmp(), then it is unspecified whether the context at the time of the corresponding setjmp() call is restored and thus whether future calls to getcontext() provide an accurate representation of the current context, since the context restored by longjmp() does not necessarily contain all the information that setcontext() requires. Signal handlers should use siglongjmp() or setcontext() instead.
Explicitly: longjmp does not store the entire context, any temporary registers may not be stored.
Also, I believe that the quotes above mean that if an implementation does not save caller-save registers, it is wrong. The reason is that the compiler could have lifted one local into a caller-save register. If that local does not change between the initial setjmp() and the longjmp() back to it, that caller-save register should have the same value, and if it doesn’t, I argue that the implementation does not follow the standard, that the implementation is wrong, not the application, especially since it is legal for an implementation to use a macro.
No, this is why setjmp explicitly requires that all local mutable state that you access after the return is held in volatile variables: to force the compiler to explicitly save them to the stack (or elsewhere) and reload them. If they are local, not volatile, and have changed, then accessing them is UB. Any subset of these is fine:
If they are not volatile and are local, but have not changed, then they must be stored either on the stack or in a callee-save register because otherwise the function call (for setjmp) may clobber them and so that’s fine.
If they are local and have not changed, but are not volatile, then they will be stored in the same location (register or stack slot) in the use point as before the setjmp call (which must be something that’s preserved across calls).
If they are not local, then any local modifications must have been written back to the global before setjmp and any access after setjmp returns must reload.
Source: I implemented a longjmp() out of a signal handler in my bc and followed all of the relevant standard restrictions I quoted above. It was hard, yes, and I had to have signal “locks” for the signal handler to tell if execution was in a non-async-signal-safe function (in which case, it sets a flag, and returns normally, and when the signal lock is removed, the jump happens then).
I hope that you mean siglongjmp, not longjmp, because otherwise your signal mask will be left in an undefined state (in particular, because you’re not resuming via sigreturn, the signal that you’re currently handling will be masked, which may cause very surprising behaviour and incorrect results if you rely on the signals for correctness). It’s not UB, it’s well specified, it’s just impossible to reason about in the general case.
Note that your signal-lock approach works only if you have an exhaustive list of async signal unsafe functions. This means that it cannot work in a program that links libraries other than libc. Jumping out of signal handlers and doing stack unwinding also requires that you don’t use pthread_cleanup, GCC’s __attribute__((cleanup)), or C++ RAII in your program. I honestly can’t conceive of a situation where I’d consider that a good tradeoff.
EDIT: I am dumb. I realized that under the ABI’s currently in use, treating caller save registers normally would actually be correct. So there’s no special code, and my bc will work correctly under all compilers. I’m leaving this comment up as a lesson to me and for posterity.
I hope that you mean siglongjmp, not longjmp, because otherwise your signal mask will be left in an undefined state (in particular, because you’re not resuming via sigreturn, the signal that you’re currently handling will be masked, which may cause very surprising behaviour and incorrect results if you rely on the signals for correctness). It’s not UB, it’s well specified, it’s just impossible to reason about in the general case.
Yes, I used siglongjmp(). I apologize if this was wrong, but in your original post, it sounded like you thought siglongjmp() should not be used either, so I lumped longjmp() and siglongjmp() together.
It looks as if ucontext is finally gone from the latest version of POSIX, but the previous version said this:
When a signal handler is executed, the current user context is saved and a new context is created. If the thread leaves the signal handler via longjmp(), then it is unspecified whether the context at the time of the corresponding setjmp() call is restored and thus whether future calls to getcontext() provide an accurate representation of the current context, since the context restored by longjmp() does not necessarily contain all the information that setcontext() requires. Signal handlers should use siglongjmp() or setcontext() instead.
Explicitly: longjmp does not store the entire context, any temporary registers may not be stored.
Yes, that looks like longjmp() does not have to restore the full context according to POSIX. (C99 might still require it, and C99 controls; more on that later.) However, it looks like siglongjmp()does. That paragraph mentions them separately, and it explicitly says that siglongjmp() should be used because of the restrictions on the use of longjmp(), which implies that siglongjmp() does not have those restrictions. And if siglongjmp() does not have those restrictions, then it should restore all of the register context.
I agree that it does not have to set floating-point context or reset open files, etc., since those are explicitly called out. But it certainly seems to me like siglongjmp() is expected to restore them.
No, this is why setjmp explicitly requires that all local mutable state that you access after the return is held in volatile variables: to force the compiler to explicitly save them to the stack (or elsewhere) and reload them.
This sounds completely wrong, so I just went through POSIX and checked: those is no such restriction on the compiler, specifically the c99 utility, nor anywhere else it mentions compilers. I also checked the C99 standard, and it says the exact same thing as POSIX.
I also searched the C99 standard to see if there were restrictions on handling of objects of automatic storage duration, and there were none.
So there could be, and are, several C99 compilers that follow the standard, such as tcc, cproc, icc, chibicc, and others, and yet, if someone compiles my bc with one of those compilers on FreeBSD, my bc could fail to work properly.
But even with Clang, does it either treat setjmp() or sigsetjmp() specially or refuse to promote local variables to caller-save registers? Can you point me to the code in Clang that does either one?
Is a failure in my bc the fault of the compiler? I argue that the C standard does not have any restriction that puts the compiler at fault.
Is a failure in my bc my fault? If I had used longjmp() instead of siglongjmp() in the signal handler, yes, it would be my fault. But I did not, and I followed all of the other restrictions on applications, including not relying on any floating-point state.
Thus, I argue that since neither C99 nor POSIX has a restriction on compilers that require special handling of setjmp(), and since POSIX has no restrictions on applications that I did not follow, neither I nor compilers can be at fault.
In addition, because POSIX says about setjmp():
The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2017 defers to the ISO C standard.
This means the C99 standard controls, and the C99 standard says:
The environment of a call to the setjmp macro consists of information sufficient for a call to the longjmp function to return execution to the correct block and invocation of that block, were it called recursively.
The line “information sufficient” is crucial; I argue that it is a restriction on the implementation, not the compiler or the application.
Now, we can argue what “sufficient information” means, and I think it might be different to every platform (well, ABI), but if FreeBSD and Linux want to obey the POSIX standard, I think each of them needs to have a conversation about what sigsetjmp() and setjmp() should save. Looking into this for this discussion has made me realize just how little platforms have actually considered what those require.
So yes, I argue that glibc and FreeBSD are wrong with respect to the POSIX standard. They might decide they are right for what they want.
Now, why has my bc worked so well on those platforms? Well, the standard compilers might be helping, but it might also be because my bc has been lucky to this point so far. I don’t like that thought.
So let’s have that conversation, especially on FreeBSD, especially 13.1 just hit, and my bc has begun to be used a lot more. I want my bc to work right, and I’m sure FreeBSD does too.
Note that your signal-lock approach works only if you have an exhaustive list of async signal unsafe functions. This means that it cannot work in a program that links libraries other than libc.
This is correct. I do want to have zero dependencies other than libc, so it wasn’t a problem.
Jumping out of signal handlers and doing stack unwinding also requires that you don’t use pthread_cleanup, GCC’s __attribute__((cleanup)), or C++ RAII in your program. I honestly can’t conceive of a situation where I’d consider that a good tradeoff.
Because of portability. If I had used any of those, my bc would not be nearly as portable as it is. Making it work on Windows (with the exception of command-line history) was simple. That would not have been the case if I had used any of those, except C++ RAII. But you already know my opinion about C++, so that wasn’t an option either.
So I did give myself something like C++ RAII in C.
EDIT: I just want to add that this heap-based stack also made it possible for me to implement conditions and restarts in C, to put things back on topic.
No, this is why setjmp explicitly requires that all local mutable state that you access after the return is held in volatile variables: to force the compiler to explicitly save them to the stack (or elsewhere) and reload them.
This sounds completely wrong, so I just went through POSIX and checked: those is no such restriction on the compiler, specifically the c99 utility, nor anywhere else it mentions compilers. I also checked the C99 standard, and it says the exact same thing as POSIX.
It’s implicit from the definition of volatile. The compiler must not elide memory accesses that are present in the C abstract machine when accessing volatile variables. If you read a volatile int then the compiler must emit an int-sized load. It therefore follows that, if you read a volatile variable, call a function, modify the variable, and then read the variable again, the compiler must emit a load, a call, a store, and a load. The C spec has a lot of verbiage about this.
But even with Clang, does it either treat setjmp() or sigsetjmp() specially or refuse to promote local variables to caller-save registers? Can you point me to the code in Clang that does either one?
Clang does have a little bit of special handling for setjmp, but the same handling that it has for vfork and similar returns-twice functions. It doesn’t need anything else, because (from the bit of POSIX that I referred to, which mirrors similar text in ISO C) if you modify a non-volatile local in between the first return and the second and then try to access it the results are UB. Specifically, POSIX says:
All accessible objects have values, and all other components of the abstract machine have state (for example, floating-point status flags and open files), as of the time longjmp() was called, except that the values of objects of automatic storage duration are unspecified if they meet all the following conditions:
They are local to the function containing the corresponding setjmp() invocation.
They do not have volatile-qualified type.
They are changed between the setjmp() invocation and longjmp() call.
This is confusingly written with a negation of a universally quantified thing, rather than an existentially quantified thing, but applying De Morgan’s laws, we can translate it into a more human-readable form: It is undefined behaviour if you access any local variable that is volatile qualified and is modified between the setjmp and longjmp calls.
As I said, this works because of other requirements. The compiler must assume (unless it can prove otherwise via escape / alias analysis. The fact that setjmp is either a compiler builtin or an assembly sequence blocks this analysis):
Any modification to a global or heap object is visible to code executing in a function call and so must store any changes to such objects back to memory.
Any function call may modify any global or heap object and so it must reload any values that it reads from such locations.
Loads and stores of volatile variables, including locals, must not be elided, and so any modification of a volatile local must preserve the write to the stack.
Local variables must be stored in a location that is not clobbered by a call. If a local variable (even a non-volatile one) is written before the setjmp call, then it must be stored either on the stack, or in a callee-save register, or it will be clobbered by the call.
These are not special rules for setjmp, they fall out of the C abstract machine and are requirements on any ABI that implements that abstract machine.
Note: I’m using call a shorthand here. You cannot implement setjmp in C because it needs to do some things that are permitted within the C abstract machine only by setjmp itself. The standard allows it to be a macro so that it can be implemented in inline assembly, rather than as a call to an assembly routine. In this case, the inline assembly will mark all of the registers that are not preserved as clobbered. Inline assembly is non-standard and it’s up to the C implementer to use whatever non-standard extensions they want (or external assembly routines) to implement things like this. On Itanium, setjmp was actually implemented on top of libunwind, with the jump buffer just storing a token that told the unwinder where to jump (this meant that it ran all cleanups on the way up the stack, which was a very nice side effect, and made longjmp safe in exception-safe C++ code as well).
Oh, and there’s a really stupid reason why setjmp is often a macro: the standard defines it to take the argument by value. In C, if it really is a function, you need something like #define setjmp(env) real_setjmp(&env). GCC’s __builtin_setjmp actually takes a jmp_buf& as an argument and so C++ reference types end up leaking very slightly into C. Yay.
Note that there probably are some compiler transforms that make even this a bit dubious. The compiler may not elide loads or stores of volatile values, nor reorder them with respect to other accesses to the same value, but it is free to reorder them with respect to other operations. _Atomic introduces restrictions with respect to other accesses, so you might actually need _Atomic(volatile int) for an int that’s guaranteed to have the expected value on return from a siglongjmp out of a signal handler. This does not contradict what the quoted section says: the value that it will have without the _Atomic specifier is still well-defined, it’s just defined to be one of a set of possible values in the state space defined by the C abstract machine (and, in some cases, that’s what you actually want).
This is particularly true for jumping into the top stack frame. Consider:
jmp_buf env;
volatile int x = 0;
volatile int y = 0;
volatile int z = 1;
if (sigsetjmp(env) == 0)
{
int y;
x = 1;
y = z / y; // SIGFPE delivered here, signal handler calls `siglongjmp`.
}
else
{
printf("%d\n", x);
}
This is permitted to print 0. The compiler is free to reorder the code in the block to:
z_0 = load z;
y_0 = load y;
y_1 = z_0 / y_0; // SIGFPE happens here
store y y_1
store x 1
If you used _Atomic(volatile int) instead of volatile int for the three locals then (I think) this would not be permitted because each of these would be a sequentially-consistent operation and so store to x may not be reordered with respect to the loads of z and y. I believe you can also fix it by making each of the locals in a single volatilestruct, because then the rule that prevents reordering memory accesses to the same volatile object would kick in.
Note that this example will, I believe, do what you expect when compiled with clang at the moment because LLVM is fairly conservative with respect to reordering volatile accesses. There are some FIXMEs in the code about this because the conservatism largely dates from a pre-C[++]11 world when _Atomic didn’t exist and so people had to fudge it with volatile and pray.
In spite of the fact that it is, technically, possible to write correct code that jumps out of a signal handler, my opinion stands. The three hardest parts of the C specification to understand are [sig]setjmp, signals, and volatile. Anything that requires a programmer to understand and use all of these to implement correctly is going to be unmaintainable code. Even if you are smart enough to get it right, the next person to try to modify the code might be someone like me, and I’m definitely not.
longjmp() and siglongjmp() cannot be used from a signal handler that interrupted a non-async-signal-safe function.
Okay sure the text says that.
But that requirement, of knowing what function you have interrupted, is ludicrously hard to arrange except when the signal handler is installed in the same function it is triggered from. Certainly entirely unsuitable for a generic mechanism that might wrap arbitrary user code.
They are sometimes mentioned in newer research languages with algebraic effects. So no mainstream languages yet, but at least some people outside of the Lisp/Dylan world are thinking about them!
And ATS, which provides the ability to do pointer arithmetic and dereferencing safely without garbage collection. In general, claims of “only one that does” and “first to do” should be avoided.
It depends a bit on how you define garbage collection. I’ve seen it used to mean both any form of memory management that does not have explicit deallocation or specifically tracing-based approaches. Swift inherits Objective-C’s memory management model, which uses reference counting with explicit cycle detection. Rust uses unique ownership. C++ provides manual memory management, reference counting, and unique ownership.
Ada isn’t completely memory safe, though I would say it’s a lot safer than C or C++ with all of the additional built-in error checking semantics it provides (array bounds checks, pre/post conditions, numeric ranges, lightweight semantically different (derived) numeric types, type predicates). I’ve found it hard to write bugs in general in Ada-only code. It’s definitely worth checking out if you haven’t.
As for modern C++, it feels like we made these great strides forward in safety only for coroutines to make it easy to add a lot of silent problems to our code. They’re super cool, but it has been a problem area for myself.
Rust is also not completely memory safe: it has an unsafe escape hatch and core abstractions in the standard library as well as third-party frameworks require it.
I agree. Not a lot of people are familiar with Ada, so my point was to dispel the myth it is completely safe, while also answering the common reply I’ve seen that “Ada isn’t memory safe, hence you shouldn’t use it.”
One could say that everything that deviates from manual memory management is some form of GC. Still, we do have the traditional idea that, generically speaking, GC implies a background process that deallocates the objects asynchronously at runtime.
If you think about it, stacks are GC because they automatically allocate and deallocate in function calls. 🤔 That’s why they were called “auto” variables in early C.
Galaxy brain: malloc is GC because it manages which parts of the heap are free or not. 🤯
ahahah great viewpoints, wow I learned so much with your comment, it got me looking into “auto” C variables. never realised that all vars in C are implicitly auto because of the behaviour that they get removed when they go out of scope 🤯 (I wonder how did they got explicitly removed back then? and how it all relates to alloca()? could it have been designed that way to allow for multiple stacks at the same time with other segments besides the stack segment?)
that last bit about malloc is amazing, it is indeed managing virtual memory, it is us who make the leaks 😂
all vars in C are implicitly auto because of the behaviour that they get removed when they go out of scope 🤯 (I wonder how did they got explicitly removed back then?
Back in the day, as I understand it, there was no “going out of scope” for C. All the variables used by a function had to be declared at the very top of the function so the compiler could reserve enough stack space before getting to any code. They only got removed when you popped the stack.
Sorry, no, nothing exists in that regard yet, as I got relatively little feedback so far (probably exactly because there is no central place to report bugs, ask qestions, etc…. :-) A public git repository would also be nice, but I’m not sure where to host this. I will try to set something up. Stay tuned!
This was a great talk, talks about proof languages are usually too high-level/academic for my small engineer brain, but I was able to follow this one without any problem. I wish there had been a comparison with Ada/Spark though as I’m unsure what ATS offers over these.
I wouldn’t necessarily call ATS a “proof language” just because you can do proofs in it. Much of your code would, realistically, not be proven correct. When you say “proof language”, I think of languages exclusively for mathematical proving, like e.g. HOL Light.
Great article, I’m loving seeing more articles on Self. For those wanting to see a video of using the transporter, I have a short demonstration here.
I find the transporter and module system takes quite a bit of getting used to. There’s a lot of moving parts and it’s easy to mess up. But it is nice to be able to export Self code to files where they can be stored in a standard version control system. I do resort to grepping the Self code sometimes despite the good graphical tools to find things.
I find the transporter and module system takes a quite a bit of getting used to.
It’s definitely a different paradigm indeed. It’s quite different from other languages where you write the code first and it’s converted to the in-language structure later. Plus, the information you have to supply via annotations is kind of counter-intuitive at first, but I get why they are there. I hope we can figure out a better implementation for those.
I do resort to grepping the Self code sometimes despite the good graphical tools to find things.
Can you give an example? So far I haven’t felt the need to do this, because you can usually use the Find Slot... tool to find whatever you need in an object quite easily (though some slots are named… oddly. Looking at you traits string shrinkwrapped).
I mostly don’t need to grep Self code, but having the text was very useful a while back when I did a full reorganisation of the categories in globals. It was too invasive to do in a live image with the standard tools - even a Self world can’t keep running if the collections prototypes flicker in and out of existence :)
It was very useful to be able to do regex replaces on the .self files before building a new Self image.
At least once a week I spend some time trying to build simple things in a mind stretching language.
Lately that’s been APL for me, I find APL challenging and fun!
I spent several months learning how to write code with Joy ( https://en.wikipedia.org/wiki/Joy_(programming_language) ) and that was equally mind bending.
What else is on the edges? I have a job writing Haskell, and I got paid for decades of Python.
What other brain stretching languages should I try?
One that’s personally been on my list for too long is miniKanren. There’s a video that showed writing a program most of the way, then putting constraints on the possible output, which generated the rest of the code. It blew my mind and it’s sad I haven’t gotten a chance to dive in yet. Plus Clojure’s core.logic is basically an implementation of miniKanren and has alot of users, so it looks like there’s actual use of it in the “actually gets things done” parts of the software world which is always nice.
pure has been on my list for a while; I just can’t think of anything specific I want to do with it, and I haven’t been motivated to just work through something like 99 problems.
Joy and other concatenative languages have been a pet favourite of mine, and it is fun to play around with it. Here was one of my attempts to cloth postscript on a concatenativeish skin.
I was really interested in IPFS a few years ago, but ultimately was disappointed that there seemed to be no passive way to host content. I’d like to have had the option to say along the lines o “I’m going to donate 5GB for hosting IPFS data, and the software will take care of the rest”.
My understanding was that, one has to explicitly mark some file as something you’d like to serve too, and only then will be really be permanent. Unless it got integrated into a browser-like boomark system, I have the feeling that most content will be lost because. Can anyone who has been following their developments tell me if they have improved on this situation?
I thought they were planning to use a cryptocurrency (“Filecoin”) to incentivize hosting. I’m not really sure how that works though. I guess you “mine” Filecoins by hosting other people’s files, and then spend Filecoins to get other people to host your files.
This is a hard problem to solve, because you want to prevent people from flooding all hosters; so there has to be either some kind of PoW or money involved. And with money involved, there’s now an incentive for hosters to misbehave, so you have to deal with them, and this is hard; there are some failed projects that tried to address it.
IPFS’ authors’ solution to this is Filecoin which, afaik, they had in mind since the beginning of IPFS, but it’s not complete yet.
My understanding was that, one has to explicitly mark some file as something you’d like to serve too,
Sort of… my recollection is that when you run an IPFS node (which is just another peer on the network), you can host content on IPFS via your node, or you can pull content from the network through your node. If you publish content to your node, the content will always be available as long as your node is online. If another node on the network fetches your content, it will only be cached on the other node for some arbitrary length of time. So the only way to host something permanently on IPFS is to either run a node yourself or arrange for someone else’s node to keep your content in their cache (probably by paying them). It’s a novel protocol with interesting technology but from a practical standpoint, doesn’t seem to have much benefit over the traditional Internet in terms of content publishing and distribution, except for the fact that everything can be massively (and securely) cached.
There are networks where you hand over a certain amount of disk space to the network and are then supposedly able to store your content (distributed, replicated) on other nodes around the Internet. But IPFS isn’t one of those.
There are networks where you hand over a certain amount of disk space to the network and are then supposedly able to store your content (distributed, replicated) on other nodes around the Internet.
Freenet is one. You set aside an amount of disk space and encrypted chunks of files will be stored on your node. Another difference from IPFS is that when you add content to Freenet it pushes it out to other nodes immediately, so you can turn your node off and the content remains in the network through the other nodes.
VP Eng of Storj here! Yes, Storj is (kinda) one of them, with money as an intermediary. Without getting into details, if you give data to Storj, as long as you have enough STORJ token escrowed (or a credit card on file), you and your computers could walk away and the network will keep your data alive. You can earn STORJ tokens by sharing your hard drive space.
The user experience actually mimics AWS much more than you’d guess for a decentralized cryptocurrency storage product. Feel free to email me (jt@storj.io) if some lobste.rs community members want some free storage to try it out: https://tardigrade.io/satellites/
Friend, I’ve been following your work for ages and have had no real incentive to try it. As a distributed systems nerd, I love what you’ve come up with. The thing which worries me is this bit:
decentralized cryptocurrency storage product.
I’m actually really worried about the cryptocurrency part of this, since it imbues an otherwise-interesting product with a high degree of sketchiness. Considering that cryptocurrency puts you in the same boat as Bitcoin (and the now-defunct art project Ponzicoin), why should I rethink things? Eager to learn more facts in this case. Thanks for taking the time to comment in the first place!
I guess there’s a couple of things you might be saying here, and I’m not sure which, so I’ll respond to all of them!
On the technical side:
One thing that separates Storj (v3) from Sia, Maidsafe, Filecoin, etc, is that there really is no blockchain element whatsoever in the actual storage platform itself. The whitepaper I linked above is much more akin to a straight distributed systems pedigree sans blockchain than you’d imagine. Cryptocurrency is not used in the object storage hotpath at all (which I continue to maintain would be latency madness) - it’s only used for the economic system of background settlement. The architecture of the storage platform itself would continue to work fine (albeit less conveniently) if we swapped cryptocurrency for live goats.
That said, it’s hard to subdivide goats in a way that retain many of the valuable properties of live goats. I think live goats make for a good example of why we went with cryptocurrency for the economic side of storage node operation - it’s really much more convenient to automate.
As a user, though, our primary “Satellite” nodes will absolutely just take credit cards. If you look up “Tardigrade Cloud Storage”, you will be able to sign up and use the platform without learning one thing about cryptocurrency. In fact, that’s the very reason for the dual brands (tardigrade.io vs storj.io)
On the adoption side:
At a past cloud storage company I worked at before AWS existed, we spent a long time trying to convince companies it was okay to back up their most sensitive data offsite. It was a challenge! Now everyone takes it for granted. I think we are in a similar position at Storj, except now the challenge is decentralization and cryptocurrency.
On the legal/compliance side:
Yeah, cryptocurrency definitely has the feeling of a wild west saloon in both some good ways and bad. To that end, Storj has spent a significant investment in corporate governance. There’s definitely a lot of bad or shady actors in the ecosystem, and it’s painfully obvious that by choosing cryptocurrency we exist within that ecosystem and are often judged by the actions of neighbors. We’re not only doing everything we can to follow existing regulations with cryptocurrency tokens, we’re doing our best to follow the laws we think the puck could move towards, and follow those non-existent laws as well. Not that it makes a difference to you if you’re averse to the ecosystem in general, but Storj has been cited as an example of how to deal with cryptocurrency compliance the right way. There’s definitely a lot of uncertainty in the ecosystem, but our legal and compliance team are some of the best in the business, and we’re making sure to not only walk on the right side of the line, but stay far away from lines entirely.
Without going into details I admit that’s a bit vague.
Anyway, given the length of my response you can tell your point is something I think a lot about too. I think the cryptocurrency ecosystem desperately needs a complete shaking out of unscrupulous folks, and it seems like that’s about as unlikely to happen as a complete shaking out of unscrupulous folks from tons of other money-adjacent industries, but perhaps the bar doesn’t have to be raised very far to make things better.
Only if the person hosting it turns off their server? IPFS isn’t a storage system, like freenet, but a protocol that allows you to fetch data from anywhere it is stored on the network (for CDN, bandwidth, and harder-to-block). The person making the content available is still expected to bother storing/serving it somewhere themselves, just like with the normal web.
Factor was the first open-source project I ever contributed to and still a language I find fascinating. Always nice to encounter it again - like meeting an old friend! (Speaking of, hi doublec - I remember that username from #concatenative in ~2007)
Hi jamesvnc, I remember you from #concatenative too. “like meeting an old friend” is exactly how I feel when I fire up Factor too. The mid 2000s was a flurry of activity in Factor - so many good things came out of it.
I’m interested in reading more about it too. With ipfs you can trivially find out the IP address of the users that are seeding the content. There are ipfs cli commands to do it.
Dat, ipfs, scuttlebutt are all pull based. If I provide content on the network then I start off as the only seeder, then other users pull from me when they want the content. If I turn my device off before at least one other seeder has all the content then the content is unavailable until I turn it back on. How do you know when it’s safe to turn off the device? The general approach seems to be to set up a 24x7 server to seed the content or use a pinning service. Another issue with this, in my use of IPFS in the past, has been a thundering herd effect when initially announcing available data. If it’s popular anticipated content then a huge number of users will attempt to get the data initially but there’s only one provider and seeding drops to a crawl. Maybe this has been improved recently.
I like the Freenet approach where the data is pushed out into the network to multiple nodes. Once inserted the device can be turned off but the content remains out there. There’s a defined point where you know it’s safe to shut your node down, have nothing running anywhere, but users can still retrieve the content.
Brave is just a browser which hides ads (which every browser should do, because ads are a cancer on the Internet) and displays its own (which no browser should do — but users are free to install whatever software they want). And hey, it even adds a way for sites to make money if they want.
I don’t use Brave, I’ve never downloaded it, but it’s a-okay by me. Don’t want people to view your content without paying? Then don’t display it to them.
Besides potentially putting ads on sites that have deliberately chosen not to run any, Brave has done some other sketchy things. I don’t know if they still do, but they used to run those “we’re fundraising on behalf of this site” notices showing on sites that had no affiliation with them at all. Hopefully Eich finally got or listened to some lawyers and was told why that’s a bad idea, but it’s always seemed to me to be one of the classic desperate tactics of a certain class of crypto-fundraising scam.
There should at the very least be some program for websites to say that no, they’re not interested in money from Brave (that’s the idea, right? Brave puts ads on the website and gives a portion of the revenues to the website owner?).
My understanding is that Brave removes a site’s ads, and adds their own, then holds the money made by the impression hostage, splitting the money with the content creator if they ever come forward.
Brave does not add their own ads to a site. They block the sites ads and provide a way for people to tip registered publishers, or auto-donate a share of a set amount based on time spent on the site. If the site is not registered they don’t receive the tips and they are returned to the donator.
Brave has its own ads that are part of the application and appear as pop up notifications unrelated to the site being visited. These are opt-in and users get a share of the ad revenue for seeing them.
If the site is not registered they don’t receive the tips and they are returned to the donator.
Right, so by using Brave you’re standing on Madison Avenue in NYC screaming “HERE’S PAYMENT FOR YOUR CONTENT!” but their office is actually in Hollywood. It’s not stealing if they don’t accept my payment, right?
Ko te mea whakarapa kē, I’m not familiar with American geography so don’t get your analogy. But it doesn’t matter - I wasn’t debating Brave’s model, I was correcting your misunderstanding of how Brave works.
I get it. Thanks for pointing out my misunderstanding! As for US geography, the two locations are on opposite sudes of the US. Point being, if I try to pay you, and you don’t take my money because I am trying to pay in the wrong place, I didn’t pay.
Bookstores could choose to sell books in shrink wrap, but choose not to.
If ad based businesses wanted to give their content away for free, they wouldn’t put ads on their pages. It’s all about intent, and by blocking ads, you intend to deprive the creator from their source of revenue for your use of their content. Why isn’t that theft?
Bookstores could. If I opened the shrinkwrap, read the book, put it back on the shelf and left, I would not have committed theft (possibly property damage).
Websites could refuse to show me their content until I view an ad. They could even quiz me on the ad to make sure I paid attention. If I somehow circumvent that, I’m committing illegal access to a computer system (which, I believe, is a felony in the USA).
Theft deprives the victim of property, which is taken by the thief.
Now, you could argue that it’s wrong (fwiw, I’m sympathetic to that view), but if you use words contrary to their straightforward definitions (in law), I’m going to call bullshit.
It seems they can add ads to site without ads. The original article complains about this (for example, in the very last paragraph). I wonder where ads are added and I would also be worried if ads are presented in my own ad-free website.
Definitely; if a portion of my userbase started seeing ads on my personal website, I would seriously consider at least adding a banner or something telling them that the ads they see aren’t mine and that their browser is adding ads on my ad-free page.
Actually, I should probably get Brave and check out how that whole thing works.
Ah, if it just blocks ads in the web content area and keeps all ads in the chrome or in notifications or someplace else where it’s obviously from the browser itself, that’s not really an issue at all.
I suspect Facebook’s longer term plans, were they to actually launch Libra, are to move to a permissionless proof-of-stake model […]
some of the testnets we are running for other BFT protocols are operating at those performance levels or above
Is it actually sound? I don’t know, it’s brand new and looks unfinished to me […]
Right now Libra is missing many features of other proof-of-stake blockchains like governance, however these mechanisms can be used to implement things like payment reversals […]
Nothing in the rebuttal other than the cryptography section sounds particularly strong.
The cryptography section isn’t wrong but it misses my point entirely. I claim that the more mission critical the cryptography is in a piece of software the more scrutiny it should be given and that these libraries are relatively new and not as tested as older ones. A company with resources like Facebook could have put a lot more resources into this and it would have been good for the Rust community for this to happen as it makes the whole ecosystem more robust.
When you’re going before congress to testify that this system can safely handle private user data, the QA should be much higher on your software because the public’s interest is involved. I thought this was a fairly uncontroversial opinion.
I agree, it isn’t a rebuttal of my original points. It’s more a statement of blind faith that the technology just needs to be invented and that the public should just trust that currently intractable problems will just be somehow solved before it goes live. I don’t have any faith in blockchain tech, I only look at what exists and is provable today.
Cool, it’s Shen! I became aware of this language years ago but have never really seen any content about it (I think it was closed source / not free for a while?)
I enjoyed this post, and I have one (probably stupid) question: do you actually have to type out lines of -------------- or =========== in the REPL when defining datatypes?
Yes, the _____ and ===== lines have to be typed in the REPL. You don’t need to match them to the length of the code, just one _ or = suffices. So these are equivalent:
N : number;
__________
N : integer;
N : number;
_
N : integer;
In the REPL I make it short for ease of typing, in files I make it long to look nicer.
FLENG author @bunny351, has a Forth implementation and a microFLENG implementation for uxn.
What Prolog system are you using?
swipl
One of the exercises I had to do as a student was implement a metacircular evaluator for Prolog (i.e. a Prolog program that takes Prolog programs as input and evaluates them). It was quite eye opening how little code this needed. Erlang was originally implemented in Prolog for much this reason: it’s a really fast way of writing an interpreter for an AST. It was later rewritten because it’s definitely not a way of writing a fast interpreter. For rapid prototyping AST interpreters, Prolog is excellent: the language is basically a graph-traversal engine.
For anyone curious how small a Prolog metainterpreter can be, check out https://www.metalevel.at/acomip/
I used a similar approach for generating error messages in a data validator system. It runs Prolog goals in a metainterpreter and collects failures, then transforms the failing goal into an appropriate error message.
One of the most powerful aspects of Prolog is how flexible the execution strategy can be. It’s a very cool language. Being able to query your own source code is super handy.
That’s really interesting: I’ve been wondering how a meta interpreter to provide error messages might work.
How did you decide which failing goals should be reported? If a disjunction fails, it must be that every branch fails, so do you report multiple errors?
For example if the rules are like:
Then why does
valid(branch(a, leaf))
fail? It seems like there are many explanations:a
is not valida
is not a leafa
is not a branch(_, _)branch(a, leaf)
is not a leafThe last reason is technically correct but seems less helpful since it does partially match.
Do you collect and report all of these? Try to prune some of the less helpful ones?
Great question. This was for $work and I ended going with a (relatively) simple implementation that might be a bit disappointing, but I think it could definitely be expanded on.
valid(a)
as the failing goal.{}/1
for silently failing goals, useful for guard conditions that we don’t want to report.A simplified example of a way to generically validate enum values would be something like:
Without the {guard} goal, this would end up producing errors for every field that isn’t an enum.
The downside of this is that
would need to be written as:
instead. But in practice it didn’t affect us because we could express the same conditions using other constructions (like the enum example above). It would be possible to have the metainterpreter transform clauses to move ground terms in the 2nd argument from the head to the body of the clause to work around this.
I think there is a lot of room for improvement in this system, and I’d like to play around with open sourcing a fancy one someday that lifts some of these restrictions. In practice this was “enough” to do what we needed, but I think it would be fun to see how far you could take this.
Can you tell whether Mercury supports all the Prolog features that make it easy to implement an interpreter?
If so, you might be able to approximate the best of both worlds if you use Mercury, as it should be significantly faster than Prolog. In fact, its authors claim that Mercury is the fastest logic language in the world by a wide margin (according to Wikipedia).
Mercury supports functional programming features that make it easier to implement interpreters but it has differences from Prolog that means you take a different approach. More like Haskell or SML than Prolog. In Mercury you can’t implement a metacircular evaluator and thereby extend the Mercury system. Mercury also differs from Prog in that you can’t have holes in data structures that are represented as unbound logic variables to be bound at a later time. On the other hand Mercury has a lot of great features that make it an interesting language in its own right.
You are not going to get a very fast interpreter this way because AST interpreters are pretty slow regardless of what language you write them in. The fact that prolog is pretty slow is only a (large) contributing factor.
That definitely was true, but I think Graal has shown that it doesn’t have to be. With their model, you do type specialisation in the AST as you interpret it. There’s a lot of overhead in traversing an AST (even in comparison to bytecode execution: on the order of 10-20 instructions to move to the next AST node rather than 2-3 to fetch and start executing the next bytecode) but that can often be amortised by doing more work. If each AST node represents a few hundred instructions worth of work, then you won’t get much speedup going to a bytecode interpreter or even a JIT. This can impact how you design the language. For example, if you have an AST node to multiply together two integers, the performance will be totally dominated by the the cost of dispatch. If you have an AST node to multiple together two 1024x1024 matrixes, the interpreter overhead will be in the noise.
I’m lightly familiar with LISPs (dabbled in Clojure like 10 years ago) and not at all with Factor - so if someone could ELI5 what that claim about metaprogramming means (and why that’s important/useful) I’d appreciate it!
You can write “parsing words” in Factor that read the tokens in the source code and generate Factor code to be produced by it. You have full control over the Syntax if you do this. For example, a post I wrote on embedded grammars.
I’m still at a loss why anyone would knowingly use the Chrome browser. It was created with exactly one purpose: To help Google track you better. Oh wait, it wasn’t Google wanting to “give something back” or “donate something for free out of the kindness of their hearts”? Nope. It was created as a response to browsers like Firefox and Safari that were slowly improving their privacy settings, and thus reducing Google’s ability to violate your privacy for a fee.
And if you’re curious, Google didn’t create and give away fonts out of the kindness of their hearts, either. If you’re using Google fonts, you aren’t anonymous. Anywhere. Ever. Private mode via a VPN? Google knows who you are. Suckers. Seriously: How TF do you think they do perfect fingerprinting when everyone is using only a few browsers on relatively small number of hardware devices?
TLDR - Google dropped the “do no evil” slogan for a reason.
To be fair, they also wanted their web apps to run better. They went with Google Chrome rather than making Gmail and Docs into desktop apps. If the owner of the platform I make my apps on is a direct competitor (Microsoft Office vs Google Docs), I wouldn’t be happy. Especially when that competitor platform sucks. Now that Chrome holds the majority of market share, Google can rest easy knowing that their stuff runs how they want for most users. Chrome also pushed the envelope for browser features they directly wanted to use in their apps.
The tracking and privacy thing is a LOT more pronounced now than it was in 2008 when Chrome came out. That’s definitely an issue that’s relevant today, but you can’t really pretend it was the sole driving force of the original release of Google Chrome.
Note: I don’t use Chrome.
I knew that Google was building Chrome for the purpose of tracking back when it was still in development, based on private comments from friends at Google. I don’t know if that was 2008, but it was somewhere in that time period. Yes, they needed a better browser experience to support some of their product goals, but Google’s overwhelmingly-critical product is tracking users, and protecting that one cash cow is important enough to give away gmail and browsers and fonts and phone OSs (and a thousand other things) for free.
Google loses money on pretty much everything, except advertising. And despite whatever the execs say in public, they’re actually quite content with that situation, because the advertising dollars are insane.
“If you’re not paying for the product, then you are the product.”
They could have done that by funding development in Firefox.
It would have been hard to work within an existing technical framework, especially considering that Firefox in 2008 or whatever was probably saddled with more tech debt than it is today, but it’d certainly be an option.
You can’t strong-arm the web into adopting the features you want by simply funding or contributing to Firefox.
And it’s not clear to me that Google would’ve been able to get Mozilla to take the necessary steps, such as killing XUL (which Mozilla eventually did many many years later, to compete with Chrome). And sandboxing each tab into its own process is probably also the kind of major rework that’s incredibly hard to pull off when you’re just an outsider contributing code with no say in the project management.
I get why Google wanted their own browser. I think they did a lot of good work to push performance and security forwards, plus some more shady work in dictating the web standards, in ways that would’ve been really hard if they didn’t have their own browser.
I still feel a bit bitter about the retirement of XUL. Back in the mid-2000 you could get a native-looking UI running with advanced controls within days. Haven’t seen anything that would get close to that in speed of development so far, maybe except VisualBasic?
They essentially do (and did) fund the development of Firefox.
Yeah, which I’m sure very conveniently prevents them from attracting too much anti-trust attention, the same way that Intel or NVidia don’t just buy AMD. But I doubt they pay any developers directly to contribute to Firefox, the way that for example AMD contributes to Mesa, Valve contributes to WINE, Apple contributes to LLVM, etc.
There’s a difference between not crushing something because its continued existence is useful to you, and actually contributing to it.
On one hand, you’re totally right. Throwing cash at keeping other browsers alive keeps their ass away from the anti-trust party.
On the other hand, again, between 75% to 95% of [Mozilla’s] entire yearly budget comes from Google. At that volume of financial contributions, I don’t think it matters that they’re not paying Google employees to contribute to Firefox—they’re literally bankrolling the entire organization around Firefox, and by extension basically its paid developers.
That’s probably fine if they don’t have a say in technical or business decisions.
They pretty much were back then. At the time Google weren’t happy with the uptake of Firefox vs IE, despite promoting FF on their own platforms, and wanted to pursue the option of their own browser. Mozilla weren’t particularly well known for being accepting of large contributions or changes to their codebase from third parties. There was no real embedding story either which prevented Google from going with Gecko (the Firefox browser engine) as the base instead of WebKit.
And yet, Google Chrome was nicknamed “Big Browser” from the start.
Chrome was the first browser to sandbox Flash and put Java behind a ‘click to play’. This was an extreme game changer for security.
Expanding on that, Chrome was the first browser to build sandboxing into the product from day 1. This was an extreme game changer for security.
Between (1) and (2) the threat landscape changed radically. We went from PoisonIvy and BlackHole exploits absolutely running rampant with 0 or 1 click code execution to having to nothing in a few years - the browser exploit market, in that form, literally died because of Chrome.
Continuing on,
Firefox had tons of annoying bugs that Chrome didn’t. “Firefox is already running” - remember that? Chrome had bugs but ultimately crashes were far less problematic, only impacting a tab. Back then that really mattered.
Chrome integrates with GSuite extremely well. Context Aware Access and browser management are why every company is moving to enforce use of Chrome - the security wins you get from CAA are incredible, it’s like jumping 10 years into the future of security just by choosing a different browser.
Whether that’s true or not, the reality is that for most of Chrome’s lifetime, certainly at least until very recently, there were very very few meaningful privacy issues (read: none, depending on your point of view) with the browser. Almost everything people talked about online was just a red herring - people would talk about how Chrome would send out LLMNR traffic like it was some horrible thing and not just a mitigation against attacks, or they’d complain about the numerous ways Chrome might talk to Google that could just be disabled and were often even part of the prompts during installation.
I don’t see why it’s hard to believe that Google wanted more control over the development of the major browser because they are a ‘web’ company and controlling web standards is a massive competitive edge + they get to save hundreds of millions of dollars by not paying Firefox for
google.com
to be the homepage.https://www.google.com/chrome/privacy/whitepaper.html
Chrome has been publishing whitepapers around its features for a long time. I don’t keep up anymore and things may have changed in the last few years but there was really nothing nearly as serious as what people were saying.
Just to be clear, what you’re talking about is, I assume, the fact that if a website loads content (such as fonts) from a CDN then your browser makes requests to that CDN. Google discusses this here, although I’m not sure why this behavior is surprising:
https://developers.google.com/fonts/faq#what_does_using_the_google_fonts_api_mean_for_the_privacy_of_my_users
https://developers.google.com/fonts/faq/privacy
Is there some other aspect of Google Fonts that you’re referring to? Because I’ve really lost my ability to give statements like “Google Fonts tracks you” any benefit of the doubt after a decade of people misunderstanding things like “yes, a CDN can see your IP address”.
Who says they do perfect fingerprinting? Also since when are there a relatively small number of hardware devices? An IP, useragent, and basic device fingerprinting (“how big is the window”) is plenty to identify a lot of people.
Infosec people love Chrome for architectural reasons. Which just goes to show that privacy and security are separate concerns that only marginally overlap.
I agree, Privacy is totally separate from Security. That said, I do not believe Chrome has represented a privacy concern since its inception - at least until recently, which I only say because I no longer care to follow such things.
*$font,third-party
slap this bad boy in your uBlock Origin config & you won’t be downloading any fonts from third-party sites.…But do be warned that laggards are still using icon fonts (even on contemporary sites!) despite it not being the best practice for icons for over a decade.
Out of interest, what is current best practice? I have stopped following most of the frontend stuff over a decade ago.
https://css-tricks.com/svg-symbol-good-choice-icons/ (which as links, but it’s not from 2014 to show how vintage font icons are compared to using vector graphic symbols via SVG)
I use it because (among other reasons) I want protection against viruses more than against Google. The last thing I heard about Firefox security (I don’t follow it actively) was Patrick Walton commenting that they have made significant improvements but still have never caught up to Chrome on security. I want Chrome’s security for lunch, there is no free lunch, and I’m okay with paying for that lunch in ad targetting data. With JavaScript disabled by default (for security), I never even see many of the ads that they might spend that data on targetting.
Your attitude is a good one: You’re conscious about what they’re probably trying to do with data about you, and you accept the trade-off for the services provided, and you do so consciously.
If Google were above-board in what they’re doing, I’d be 100% thumbs up for their behavior. But they’re not.
A major financial reason for Chrome is to save the cost of paying browser vendors to make Google the default search engine. Google pays Apple like $10 billion a year for this purpose on Safari. This is why Microsoft aggressively promoting Edge is such a threat to Google - fewer users using Google.
How do they track using their fonts?
When I looked at this many years ago, I obviously had the same exact question, but I don’t actually have an answer. The same browser version (stock) with the same config (stock) on the same OS and OS version (stock) on the same hardware (stock) with the “same” Google fonts apparently generates a different Google fingerprint, apparently even in private browsing mode through the same VPN IP. Drop the Google fonts, and the fingerprint is apparently identical. It’s been a decade since I looked at any of this, so my memory is pretty fuzzy at this point. And of course Google doesn’t provide any hints as to how they are unveiling users’ identities; this is a super closely guarded trade secret and no one I knew at Google even gave me any hints. (My guess is that they are all completely unaware of how Google does it, since the easiest way to maintain secrets is to not tell all of your employees what the secret is.)
The Google fingerprinting does a render on a hidden canvas (including rendering text using fonts) and then sends Google a hash of that render. Somehow the use of Google fonts (note: specifically when downloaded from Google, which is what most web sites do) appears to give different users their own relatively unique hash. If I had to guess (WAG WARNING!!!), I’d suggest that at least one of the most widely distributed fonts is altered ever-so-imperceptibly per download – but nothing you can see unless you render large and compare every pixel (which is what their fingerprint algo is doing). Fonts get cached for a year, so if (!!!) this is their approach, they basically get a unique ID that lasts for the term of one year, per human being on the planet.
If you examine their legalese, you’ll see that they carefully carve out this possible exception. For example: “The Google Fonts API is designed to limit the collection, storage, and use of end-user data to what is needed to serve fonts efficiently.” Right. They don’t need to collect or store anything from the Fonts API. Because your browser would be doing the work for them. Similarly, “requests for fonts are separate from and do not contain any credentials you send to google.com while using other Google services that are authenticated, such as Gmail.” So they went out of their way to provide the appearance of privacy, yet somehow their fingerprinting is able to defeat that privacy.
The only thing that I know for certain is that Google hires tons of super smart people explicitly to figure out how to work around privacy-protecting features on other companies’ web browsers, and their answer was to give away fonts for free. 🤷♂️
I’m not normally accused of being a conspiracy theorist, but damn, writing this up I sure as hell feel like one now. You’re welcome to call me crazy, because if I read this shit from anyone else, I’d think that they were nuts.
That’s really ingenious, if true. To go along with supporting your theory, there is a bug open since 2016 for enabling Subresource Integrity for Google Fonts that still isn’t enabled.
I’m a bit sceptical about the concept, that seems like it comes with an enormous cost of downsides - fonts are not light objects, and really do benefit from caching. Whereas merely having the Referer tag of the font request in addition to timing information & what is sent with the original request (IP addr, User agent, etc) seem perfectly sufficient in granularity to track a user.
This feels too easy to detect for it not to have been noticed by now - someone would have attempted to add the SRI hash themselves and noticed it break for random users, instead of the expected failure case of “everyone, everywhere, all at once”.
The fonts are constantly updated on Google fonts at the behest of the font owners, so the SRI hash issue being marked as WONTFIX isn’t very exciting, as I wouldn’t be surprised at it being legally easier for Google to host one version of the font (as Google is often not the holder of the Reserved Font Name), as the Open Font License seems to be very particular about referring to fonts by name. Reading through the OFL FAQ (https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL-FAQ_web), if I were a font distributor I would be hesitant to host old conflicting versions of the font amongst each other. Plus, easier to cache a single file than multiple, and lower effort on the side of a font foundry, as it means they do not need to have some versioning system set up for their font families (because they’re not just one weight, type, etc).
The fonts not being versioned goes beyond the SRI hash benefits, fonts often have breaking changes[0] in them e.g. https://github.com/JulietaUla/Montserrat/issues/60 so a designer being to know future font changes wouldn’t result in any changes in the page. So in my mind, it really feels like it’s the foundry who’s wanting there to be a single authoritative version of the font.
0: I suppose a semver major version bump in the font world would be character width changes.
Even if cpurdy’s version is true, I’m sure they use every normal and not so normal trick in the book to track as well. If your singular goal is identifying users uniquely, you would be stupid to rely on only 1 method. You would want 100 different methods, and you would want to deploy every single one. So if a competing browser vendor or unique edge case happens to break a single method, you don’t really care.
I agree caching of fonts is useful, but the browser would cache the most common fonts locally anyway. It would behoove Google to set the cache lifetime of the font file as long as practically possible, even if they were not using it to track you.
I agree, fingerprinting is a breadth of signals game, but I just can’t believe this vector, it feels way too technically complicated for comparable methods available within the same context - the idea was minute changes in font to result in different canvas render hashes, but a user already has a lot of signals within JS & canvas (system fonts, available APIs, etc) that are much quicker to test.
Fonts are cached per site by the browsers as a means of avoiding fingerprinting the cross-domain timing effects - Safari & Chrome call it partitioned cache; Firefox, first party isolation. So Google can tell you’ve visited a site as the referer gets sent on first load, unless they set Referrer-Policy: no-referrer, of course
I agree it’s technically complicated, but I imagine they want a variety of hard and not very hard methods. Assuming they do it, perhaps they only run it when they can’t figure out who you are from some other, easier method.
Considering the other BS that big companies tend to do, my first thought on this is basically https://www.youtube.com/watch?v=WRWt5kIbWAc
You win the interwebs today. The tick is the greatest superhero ever. ❤️
I always have a degoogled Chrome fork installed as a backup browser, in case I have website compatibility problems with Firefox. Some of my problems might be caused by my Firefox extensions, but it’s easier to switch browsers than to start disabling extensions and debugging.
On desktop I use Ungoogled Chromium. On Android I use Bromite and Vanadium. My Android fork is GrapheneOS, which is fully degoogled by default. I am grateful that Google created Android Open Source to counteract iOS, it is an excellent basis for distros like Graphene. I use F-Droid for apps.
Also, I have the Google Noto fonts installed as a debian package (fonts-noto). It’s a great font that eliminates tofu, and I thank Google for creating it. I don’t think Google is spying on my debian installed package list. If I don’t connect to google servers, they can’t see the fonts used by my browser.
I primarily rely on Ublock Origin for blocking spyware and malware, including Google spying. It’s not perfect. I can always use Tor if I feel really paranoid. The internet isn’t anonymous for web browsers if you are connecting with your real IP address (or using the kind of VPN that you warned about). Google isn’t the only surveillance capitalist on the web; I expect the majority of sites spy on you. Even Tor is probably not anonymous if state actors are targeting you. I wouldn’t use the internet at all if I was concerned about that.
This seems ahistoric to me, when Chrome was created, most popular browsers were IE6 and IE7?
Chrome initially came out in late 2008, when Firefox and Safari were actually doing OK, and IE8 was just around the corner, its betas were already out on Chrome’s release date. Chrome wasn’t even a serious player* until about 2010 or 2011, by which time IE9 was out and IE 6 was really quite dead. This article from June 2010 has a chart: https://www.neowin.net/news/ie6-market-share-drops-6-ie8-still-in-top-spot/
You can see IE8 and Firefox 3.5 were the major players, with Safari on the rise (probably mostly thanks to the iphone’s growing popularity).
Firefox was the #2 browser (behind only IE*) when Chrome was introduced, and at the time it was still growing its market share year after year, mostly at the expense of IE.
After its release, Chrome quickly overtook Safari (then #3), and proceeded to eat almost all of IE’s and Firefox’s market share. It is now the #1 browser, by a significant margin.
Interestingly, Safari did not yield market share to Chrome, and continued to grow its market share, albeit at a much lower rate than Chrome did. I assume that this growth is based on the growth of iPhone market share, and relatively few iPhone users install Chrome. Today, Safari is now solidly the #2 browser behind Chrome.
Edge (the new IE) is #3.
Firefox has dropped to the #4 position, in a three-way tie with Opera and Samsung.
Agreed. I’m not sure if IE7 was a thing until after chrome. Also when Chrome first came out it was a breath of fresh air because at the time you either had to use Firefox or Opera, both of which had the issue of sites breaking that were made with IE in mind or the whole browser locking up because one site was hung. While I won’t speculate that tracking was a primary goal of Chrome development let’s not pretend that it wasn’t leaps and bounds ahead of what else was available at the time on the IE6 centric web.
Chrome was definitely aimed directly at IE, most likely because they couldn’t bribe MS to default to Google search and because its outdated tech made the push to web apps much harder - consider the fact that early versions didn’t run on anything other than Windows (about 6 months between 1.0 and previews for Mac and Linux), and the lengths they went to get sandboxing to work on WinXP.
I think it’s fair to say that Firefox did have an impact - but it wasn’t that Chrome was created as a response, rather that Firefox defeated the truism that nothing could dethrone IE because it was built into Windows.
I generally don’t like when technology companies use their product to push some ideological agenda, so I would probably choose Chrome over Firefox if I would have to choose between only those two. Also the new Firefox tabs waste a lot of screen space, and they didn’t give any official way to return to the previous look, so that’s another argument (last time I’ve used FF I had to hack through some CSS, which stopped working few updates later). The only thing I miss from FF are tab containers, but that’s about it.
But still, I use Vivaldi, which runs on Blink, so I’m not sure if I match your criteria, since your question is about “Chrome browser” not “Chrome engine”.
My work uses Google apps heavily and so I maintain a personal/work distinction in browsing by routing everything for work to Chrome. Privacy is a technical dead letter.
Yeah, I obviously have to use Chrome a bit, too. Because as a developer, ignoring the #1 browser seems stupid, any way you look at it. And a few sites only work on Chrome (not even on Safari or Firefox). I try to avoid Edge except for testing stuff, because the nag level is indescribable. “Are you super double extra sure that you didn’t want me to not be not your not default web browser? Choose one: [ Make Edge the default browser ] [ Use Microsoft suggested defaults for your web browser choice]” followed by “It’s been over three minutes since you signed in to your Microsoft cloud-like-thingy user account that we tied to your Windows installation despite your many protestations. Please sign in again, and this time we’ll use seven factor authentication. Also, you can’t not do this right now.” And so on.
I abhor and abjure the modern web, but we all have to live in it. On my Mac I use an app called ‘Choosy’ which lets me reroute URLs to arbitrary browsers, so I can use Safari without worry, given that I send all the garbage to either Chrome or a SSB.
It has been a long time that I have lost sight of Mozart ML and Alice ML. I sort of remember their going from 32bits to 64bits has been a hurdle as they were lacking benevolent workforce, IIRC. Do you advise to take the time to look at them again?
I think Oz/Mozart was a very interesting project ahead of its time in some ways, but your right that the language is not evolving and there is no signs of that changing. (From what I understand, it is still used in teaching in some places.)
What I found especially interesting about Alice ML was the module system, which implements some improvements over other ML module systems. (Some of the same ideas were/are part of the 1ML and SuccessorML initiatives, I think.)
In the case of Mozart/Oz, they worked on a ground up rewrite for 64-bit support called Mozart 2 in C++. It doesn’t have all the features of the original implemented yet. For example, distributed objects. I worked on porting the existing 32-bit engine, based on the 1.3.x series for compatibility with the book, to 64-bit. This works and runs and all tests pass. I still use it occasionally.
I’ve read through this previously, and I’m intrigued but I apparently haven’t spent enough time to fully grok the why, how, and all that jazz. I’m not coming from a strong logic programming background, though. Have you given talks, or written anything that provides an introduction that might create better intuition on rationale, target use cases, etc?
Actually, this type of languages has a much different (and simpler) semantics than Prolog, which means that one doesn’t require to be a Prolog afficionado. The main similarities are the syntax, logic variables and the concept of (output) unification. As doublec has pointed out, there is an article by me that tries to give an introduction into the topic. The Strand book is highly recommended, very accessible and explains the basic concepts in a step-by-step manner.
FLENG/FGHC/Strand (and other languages of this family) are highly parallel, which has interesting implications: concurrency is always there and you concentrate on what to make sequential (there are several strategies and tools for this). Stream processing is a very powerful abstraction mechanism, and logic variables with the possibility of data structures having “holes” allows straightforward parallelisation of tasks. I would think this is very handy for GUIs, games and all sorts of simulations.
It’s a different way of programming, but it really opened my mind to dive deeper into these concepts. This implementation is an attempt to be simple, yet practical. The language makes multi-threaded programming much easier, compared to traditional techniques, which are, IMHO, lacking in elegance and convenience.
Have you seen their The Joy of Concurrent Logic Programming article? The Strand book is also a great read.
Hmm. I’ve not seen “The Joy” article! I’ll take a look. Thanks!
I can’t believe I wrote the Usenet post about Dylan referenced in this article 22 years ago - after all that time are there any mainstream languages using conditions/restarts?
Smalltalk doesn’t have exceptions as a language-level construct. Instead, blocks (closures/lambdas) can return from the scope in which the block was created (assuming it’s still on the stack). You implement exceptions by pushing a handler block onto the top of a global stack, invoking it at the ‘throw’ point, and having it return. Smalltalk ‘stacks’ are actually lists of activation records where every local is a field of the object and amenable to introspection, so you can also use it to walk up the stack and build resumable or restartable exceptions. Not really a mainstream language though.
Perhaps more interesting, the language-agnostic parts of the SEH mechanism on Windows fully support resumable and restartable exceptions. There are some comments about this in the public source in the MSVC redistributable things. To my knowledge, no language has ever actually used them. This is much easier with SEH than with Itanium-style unwinding. The Itanium ABI does a two-pass unwind, where one pass finds cleanups and catch handlers, the second pass runs them. This means that the stack is destroyed by the time that you get to the catch block: each cleanup runs in the stack frame that it’s unwinding through, so implicitly destroys anything that has been unwound through by trampling over its stack. In contrast, the SEH mechanism invokes ‘funclets’, new functions that run on top of the stack with a pointer to the stack pointer for the frame that they’re cleaning up. This means that it’s possible to quite easily shuffle the order in which they are executed and allow a far-off handler to decide, based on the exception object and any other state that it has access to, that it wants to just adjust the object and resume execution from a point in the frame that threw the exception or control the unwinding process further.
Oh, and one of the comments in the article talks about
longjmp
ing out of a signal handler. Never do this.setjmp
stores only the callee-save register state: it is a function call and so the caller is responsible for saving any caller-save state. Signals are delivered at arbitrary points, not just at call points, and so you will corrupt some subset of your register state if you do this.setcontext
exists specifically to jump out of signal handlers. Similarly, you should notthrow
out of signal handlers. In theory, DWARF unwind metadata can express everything that you need for this (the FreeBSD signal trampolines now have complete unwind info, so you get back into the calling frame correctly) but both LLVM and GCC assume that exceptions are thrown only at call sites and so it is very likely that the information will be incorrect for the top frame. It will usually work if the top frame doesn’t try to catch the exception, because generally spills and reloads happen in the prolog / epilog and so the unwind state for non-catching functions will correctly restore everything that the parent frame needs.I think you’re wrong here.
The POSIX standard says this about invoking
longjmp()
from a signal handler:It also says this:
So it seems the extra restriction mentioned before is that
longjmp()
andsiglongjmp()
cannot be used from a signal handler that interrupted a non-async-signal-safe function.There are other restrictions. The standard says this about them:
So the function with the
setjmp()
invocation must be in the same thread and still on the stack (i.e., the invoking function has not returned), and it also has to be in scope (i.e., execution must be in the same C scope as thesetjmp()
invocation).In addition:
All three of those conditions have to exist for the value of local variables to be unspecified. If you have a pointer to something in the stack, but in a function above, you’re fine; the pointer doesn’t change, and the contents at the pointer are specified. If you have a pointer to a heap allocation, the contents of the heap allocation are specified. And so on.
All of this is to say that you are broadly right, that code should usually not
longjmp()
out of signal handlers. But to say never seems a bit much, if you do everything right. Of course, like crypto, you should only do it if you know what you’re doing.Also, I believe that the quotes above mean that if an implementation does not save caller-save registers, it is wrong. The reason is that the compiler could have lifted one local into a caller-save register. If that local does not change between the initial
setjmp()
and thelongjmp()
back to it, that caller-save register should have the same value, and if it doesn’t, I argue that the implementation does not follow the standard, that the implementation is wrong, not the application, especially since it is legal for an implementation to use a macro. In fact, the standard has several restrictions on howsetjmp()
can be invoked to make it easier to implement as a macro:Source: I implemented a
longjmp()
out of a signal handler in mybc
and followed all of the relevant standard restrictions I quoted above. It was hard, yes, and I had to have signal “locks” for the signal handler to tell if execution was in a non-async-signal-safe function (in which case, it sets a flag, and returns normally, and when the signal lock is removed, the jump happens then).If that is the case, then there are no correct implementations. For example, the glibc version saves only 8 registers on x86-64. Similarly, the FreeBSD version stores 8 integer registers. This means that they are not storing all of the integer register set, let alone any floating-point or vector state. This is in contrast to
ucontext
, which does store the entire state. Thesig
prefixed versions also store the signal mask and restore it.It looks as if
ucontext
is finally gone from the latest version of POSIX, but the previous version said this:Explicitly:
longjmp
does not store the entire context, any temporary registers may not be stored.No, this is why
setjmp
explicitly requires that all local mutable state that you access after the return is held involatile
variables: to force the compiler to explicitly save them to the stack (or elsewhere) and reload them. If they are local, not volatile, and have changed, then accessing them is UB. Any subset of these is fine:setjmp
) may clobber them and so that’s fine.setjmp
call (which must be something that’s preserved across calls).setjmp
and any access aftersetjmp
returns must reload.I hope that you mean
siglongjmp
, notlongjmp
, because otherwise your signal mask will be left in an undefined state (in particular, because you’re not resuming viasigreturn
, the signal that you’re currently handling will be masked, which may cause very surprising behaviour and incorrect results if you rely on the signals for correctness). It’s not UB, it’s well specified, it’s just impossible to reason about in the general case.Note that your signal-lock approach works only if you have an exhaustive list of async signal unsafe functions. This means that it cannot work in a program that links libraries other than libc. Jumping out of signal handlers and doing stack unwinding also requires that you don’t use
pthread_cleanup
, GCC’s__attribute__((cleanup))
, or C++ RAII in your program. I honestly can’t conceive of a situation where I’d consider that a good tradeoff.EDIT: I am dumb. I realized that under the ABI’s currently in use, treating caller save registers normally would actually be correct. So there’s no special code, and my
bc
will work correctly under all compilers. I’m leaving this comment up as a lesson to me and for posterity.Yes, I used
siglongjmp()
. I apologize if this was wrong, but in your original post, it sounded like you thoughtsiglongjmp()
should not be used either, so I lumpedlongjmp()
andsiglongjmp()
together.Yes, that looks like
longjmp()
does not have to restore the full context according to POSIX. (C99 might still require it, and C99 controls; more on that later.) However, it looks likesiglongjmp()
does. That paragraph mentions them separately, and it explicitly says thatsiglongjmp()
should be used because of the restrictions on the use oflongjmp()
, which implies thatsiglongjmp()
does not have those restrictions. And ifsiglongjmp()
does not have those restrictions, then it should restore all of the register context.I agree that it does not have to set floating-point context or reset open files, etc., since those are explicitly called out. But it certainly seems to me like
siglongjmp()
is expected to restore them.This sounds completely wrong, so I just went through POSIX and checked: those is no such restriction on the compiler, specifically the
c99
utility, nor anywhere else it mentions compilers. I also checked the C99 standard, and it says the exact same thing as POSIX.I also searched the C99 standard to see if there were restrictions on handling of objects of automatic storage duration, and there were none.
So there could be, and are, several C99 compilers that follow the standard, such as tcc, cproc, icc, chibicc, and others, and yet, if someone compiles my
bc
with one of those compilers on FreeBSD, mybc
could fail to work properly.But even with Clang, does it either treat
setjmp()
orsigsetjmp()
specially or refuse to promote local variables to caller-save registers? Can you point me to the code in Clang that does either one?Is a failure in my
bc
the fault of the compiler? I argue that the C standard does not have any restriction that puts the compiler at fault.Is a failure in my
bc
my fault? If I had usedlongjmp()
instead ofsiglongjmp()
in the signal handler, yes, it would be my fault. But I did not, and I followed all of the other restrictions on applications, including not relying on any floating-point state.Thus, I argue that since neither C99 nor POSIX has a restriction on compilers that require special handling of
setjmp()
, and since POSIX has no restrictions on applications that I did not follow, neither I nor compilers can be at fault.In addition, because POSIX says about
setjmp()
:This means the C99 standard controls, and the C99 standard says:
The line “information sufficient” is crucial; I argue that it is a restriction on the implementation, not the compiler or the application.
Now, we can argue what “sufficient information” means, and I think it might be different to every platform (well, ABI), but if FreeBSD and Linux want to obey the POSIX standard, I think each of them needs to have a conversation about what
sigsetjmp()
andsetjmp()
should save. Looking into this for this discussion has made me realize just how little platforms have actually considered what those require.So yes, I argue that glibc and FreeBSD are wrong with respect to the POSIX standard. They might decide they are right for what they want.
Now, why has my
bc
worked so well on those platforms? Well, the standard compilers might be helping, but it might also be because mybc
has been lucky to this point so far. I don’t like that thought.So let’s have that conversation, especially on FreeBSD, especially 13.1 just hit, and my
bc
has begun to be used a lot more. I want mybc
to work right, and I’m sure FreeBSD does too.This is correct. I do want to have zero dependencies other than libc, so it wasn’t a problem.
Because of portability. If I had used any of those, my
bc
would not be nearly as portable as it is. Making it work on Windows (with the exception of command-line history) was simple. That would not have been the case if I had used any of those, except C++ RAII. But you already know my opinion about C++, so that wasn’t an option either.But if you are unsatisfied with that answer, then I have another: in the project where I was thinking about implementing a
malloc()
, I implemented a heap-based stack with the abiilty to call destructors. With a little clever use of macros, I can apply it to every function in my code, and it is 100% portable! And it also does stack traces!So I did give myself something like C++ RAII in C.
EDIT: I just want to add that this heap-based stack also made it possible for me to implement conditions and restarts in C, to put things back on topic.
It’s implicit from the definition of
volatile
. The compiler must not elide memory accesses that are present in the C abstract machine when accessingvolatile
variables. If you read avolatile int
then the compiler must emit anint
-sized load. It therefore follows that, if you read avolatile
variable, call a function, modify the variable, and then read the variable again, the compiler must emit a load, a call, a store, and a load. The C spec has a lot of verbiage about this.Clang does have a little bit of special handling for
setjmp
, but the same handling that it has forvfork
and similar returns-twice functions. It doesn’t need anything else, because (from the bit of POSIX that I referred to, which mirrors similar text in ISO C) if you modify a non-volatile
local in between the first return and the second and then try to access it the results are UB. Specifically, POSIX says:This is confusingly written with a negation of a universally quantified thing, rather than an existentially quantified thing, but applying De Morgan’s laws, we can translate it into a more human-readable form: It is undefined behaviour if you access any local variable that is volatile qualified and is modified between the
setjmp
andlongjmp
calls.As I said, this works because of other requirements. The compiler must assume (unless it can prove otherwise via escape / alias analysis. The fact that setjmp is either a compiler builtin or an assembly sequence blocks this analysis):
volatile
variables, including locals, must not be elided, and so any modification of avolatile
local must preserve the write to the stack.volatile
one) is written before thesetjmp
call, then it must be stored either on the stack, or in a callee-save register, or it will be clobbered by the call.These are not special rules for
setjmp
, they fall out of the C abstract machine and are requirements on any ABI that implements that abstract machine.Note: I’m using call a shorthand here. You cannot implement
setjmp
in C because it needs to do some things that are permitted within the C abstract machine only bysetjmp
itself. The standard allows it to be a macro so that it can be implemented in inline assembly, rather than as a call to an assembly routine. In this case, the inline assembly will mark all of the registers that are not preserved as clobbered. Inline assembly is non-standard and it’s up to the C implementer to use whatever non-standard extensions they want (or external assembly routines) to implement things like this. On Itanium,setjmp
was actually implemented on top of libunwind, with the jump buffer just storing a token that told the unwinder where to jump (this meant that it ran all cleanups on the way up the stack, which was a very nice side effect, and madelongjmp
safe in exception-safe C++ code as well).Oh, and there’s a really stupid reason why
setjmp
is often a macro: the standard defines it to take the argument by value. In C, if it really is a function, you need something like#define setjmp(env) real_setjmp(&env)
. GCC’s__builtin_setjmp
actually takes ajmp_buf&
as an argument and so C++ reference types end up leaking very slightly into C. Yay.Note that there probably are some compiler transforms that make even this a bit dubious. The compiler may not elide loads or stores of
volatile
values, nor reorder them with respect to other accesses to the same value, but it is free to reorder them with respect to other operations._Atomic
introduces restrictions with respect to other accesses, so you might actually need_Atomic(volatile int)
for anint
that’s guaranteed to have the expected value on return from asiglongjmp
out of a signal handler. This does not contradict what the quoted section says: the value that it will have without the_Atomic
specifier is still well-defined, it’s just defined to be one of a set of possible values in the state space defined by the C abstract machine (and, in some cases, that’s what you actually want).This is particularly true for jumping into the top stack frame. Consider:
This is permitted to print
0
. The compiler is free to reorder the code in the block to:If you used
_Atomic(volatile int)
instead ofvolatile int
for the three locals then (I think) this would not be permitted because each of these would be a sequentially-consistent operation and so store tox
may not be reordered with respect to the loads ofz
andy
. I believe you can also fix it by making each of the locals in a singlevolatile
struct
, because then the rule that prevents reordering memory accesses to the samevolatile
object would kick in.Note that this example will, I believe, do what you expect when compiled with clang at the moment because LLVM is fairly conservative with respect to reordering
volatile
accesses. There are some FIXMEs in the code about this because the conservatism largely dates from a pre-C[++]11 world when_Atomic
didn’t exist and so people had to fudge it withvolatile
and pray.In spite of the fact that it is, technically, possible to write correct code that jumps out of a signal handler, my opinion stands. The three hardest parts of the C specification to understand are
[sig]setjmp
, signals, andvolatile
. Anything that requires a programmer to understand and use all of these to implement correctly is going to be unmaintainable code. Even if you are smart enough to get it right, the next person to try to modify the code might be someone like me, and I’m definitely not.Okay sure the text says that.
But that requirement, of knowing what function you have interrupted, is ludicrously hard to arrange except when the signal handler is installed in the same function it is triggered from. Certainly entirely unsuitable for a generic mechanism that might wrap arbitrary user code.
As a negative result, I believe Rust used to have such a system, but lost it.
Yep, added in 0.5
https://doc.rust-lang.org/0.5/core/condition.html https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-05-2012-12-21
Can’t find a good code example though.
See also the issue where they were removed – that issue & the see also links show why it didn’t work for Rust:
https://github.com/rust-lang/rust/pull/12039
They are sometimes mentioned in newer research languages with algebraic effects. So no mainstream languages yet, but at least some people outside of the Lisp/Dylan world are thinking about them!
I’m not sure if this holds, for instance, Swift does memory management without using a GC.
And ATS, which provides the ability to do pointer arithmetic and dereferencing safely without garbage collection. In general, claims of “only one that does” and “first to do” should be avoided.
ATS is the first language to have a “t@ype” keyword.
ATS is very interesting! thanks for sharing it
Check out Aditya Siram’s “A (Not So Gentle) Introduction To Systems Programming In ATS,” it’s a great overview.
It depends a bit on how you define garbage collection. I’ve seen it used to mean both any form of memory management that does not have explicit deallocation or specifically tracing-based approaches. Swift inherits Objective-C’s memory management model, which uses reference counting with explicit cycle detection. Rust uses unique ownership. C++ provides manual memory management, reference counting, and unique ownership.
Both Rust and C++ provide all three.
The problem is that C++ can also provide stuff you don’t want in your code usually - without using an unsafe {}.
Ada meets that definition, too.
Even modern C++ could claim to have run-time memory safety without GC, but that obviously requires the developer to use it correctly.
Ada isn’t completely memory safe, though I would say it’s a lot safer than C or C++ with all of the additional built-in error checking semantics it provides (array bounds checks, pre/post conditions, numeric ranges, lightweight semantically different (derived) numeric types, type predicates). I’ve found it hard to write bugs in general in Ada-only code. It’s definitely worth checking out if you haven’t.
As for modern C++, it feels like we made these great strides forward in safety only for coroutines to make it easy to add a lot of silent problems to our code. They’re super cool, but it has been a problem area for myself.
Rust is also not completely memory safe: it has an
unsafe
escape hatch and core abstractions in the standard library as well as third-party frameworks require it.I agree. Not a lot of people are familiar with Ada, so my point was to dispel the myth it is completely safe, while also answering the common reply I’ve seen that “Ada isn’t memory safe, hence you shouldn’t use it.”
Isn’t atomic reference counting a form of GC as well?
One could say that everything that deviates from manual memory management is some form of GC. Still, we do have the traditional idea that, generically speaking, GC implies a background process that deallocates the objects asynchronously at runtime.
If you think about it, stacks are GC because they automatically allocate and deallocate in function calls. 🤔 That’s why they were called “auto” variables in early C.
Galaxy brain: malloc is GC because it manages which parts of the heap are free or not. 🤯
ahahah great viewpoints, wow I learned so much with your comment, it got me looking into “auto” C variables. never realised that all vars in C are implicitly auto because of the behaviour that they get removed when they go out of scope 🤯 (I wonder how did they got explicitly removed back then? and how it all relates to
alloca()
? could it have been designed that way to allow for multiple stacks at the same time with other segments besides the stack segment?)that last bit about malloc is amazing, it is indeed managing virtual memory, it is us who make the leaks 😂
Back in the day, as I understand it, there was no “going out of scope” for C. All the variables used by a function had to be declared at the very top of the function so the compiler could reserve enough stack space before getting to any code. They only got removed when you popped the stack.
Is there a place where FLENG developers/users/interested parties hang and chat - mailing list or online hangout?
A gitlab repository has been set up here: https://gitlab.com/b2495/fleng, bug reports and suggestions are very welcome!
Sorry, no, nothing exists in that regard yet, as I got relatively little feedback so far (probably exactly because there is no central place to report bugs, ask qestions, etc…. :-) A public git repository would also be nice, but I’m not sure where to host this. I will try to set something up. Stay tuned!
I have also created a temporary channel named “#fleng” on libera.chat (hopefully to be registered soon), feel free to join!
This is great thanks! I’ll definitely drop in.
This was a great talk, talks about proof languages are usually too high-level/academic for my small engineer brain, but I was able to follow this one without any problem. I wish there had been a comparison with Ada/Spark though as I’m unsure what ATS offers over these.
I wrote a post comparing Ada/Spark and ATS in the area of proving program invariants if interested.
Glad you liked it :)
I wouldn’t necessarily call ATS a “proof language” just because you can do proofs in it. Much of your code would, realistically, not be proven correct. When you say “proof language”, I think of languages exclusively for mathematical proving, like e.g. HOL Light.
Great article, I’m loving seeing more articles on Self. For those wanting to see a video of using the transporter, I have a short demonstration here.
I find the transporter and module system takes quite a bit of getting used to. There’s a lot of moving parts and it’s easy to mess up. But it is nice to be able to export Self code to files where they can be stored in a standard version control system. I do resort to grepping the Self code sometimes despite the good graphical tools to find things.
Thank you! The video is great as well.
It’s definitely a different paradigm indeed. It’s quite different from other languages where you write the code first and it’s converted to the in-language structure later. Plus, the information you have to supply via annotations is kind of counter-intuitive at first, but I get why they are there. I hope we can figure out a better implementation for those.
Can you give an example? So far I haven’t felt the need to do this, because you can usually use the
Find Slot...
tool to find whatever you need in an object quite easily (though some slots are named… oddly. Looking at youtraits string shrinkwrapped
).I mostly don’t need to grep Self code, but having the text was very useful a while back when I did a full reorganisation of the categories in
globals
. It was too invasive to do in a live image with the standard tools - even a Self world can’t keep running if the collections prototypes flicker in and out of existence :)It was very useful to be able to do regex replaces on the
.self
files before building a new Self image.At least once a week I spend some time trying to build simple things in a mind stretching language. Lately that’s been APL for me, I find APL challenging and fun! I spent several months learning how to write code with Joy ( https://en.wikipedia.org/wiki/Joy_(programming_language) ) and that was equally mind bending.
What else is on the edges? I have a job writing Haskell, and I got paid for decades of Python. What other brain stretching languages should I try?
One that’s personally been on my list for too long is miniKanren. There’s a video that showed writing a program most of the way, then putting constraints on the possible output, which generated the rest of the code. It blew my mind and it’s sad I haven’t gotten a chance to dive in yet. Plus Clojure’s
core.logic
is basically an implementation of miniKanren and has alot of users, so it looks like there’s actual use of it in the “actually gets things done” parts of the software world which is always nice.You might like Strand. It’s a fun language to play around with and the Strand book “Strand: New Concepts for Parallel Programming” is a great read.
Agreed, I’m really enjoying this one.
I did a tweetstorm on interesting obscure languages I’ve been meaning to try! Check it out here: https://twitter.com/hillelogram/status/1243599545218596864?s=20
I’d say something like Unison or Darklang. Solidity.
pure has been on my list for a while; I just can’t think of anything specific I want to do with it, and I haven’t been motivated to just work through something like 99 problems.
Joy and other concatenative languages have been a pet favourite of mine, and it is fun to play around with it. Here was one of my attempts to cloth postscript on a concatenativeish skin.
I was really interested in IPFS a few years ago, but ultimately was disappointed that there seemed to be no passive way to host content. I’d like to have had the option to say along the lines o “I’m going to donate 5GB for hosting IPFS data, and the software will take care of the rest”.
My understanding was that, one has to explicitly mark some file as something you’d like to serve too, and only then will be really be permanent. Unless it got integrated into a browser-like boomark system, I have the feeling that most content will be lost because. Can anyone who has been following their developments tell me if they have improved on this situation?
I thought they were planning to use a cryptocurrency (“Filecoin”) to incentivize hosting. I’m not really sure how that works though. I guess you “mine” Filecoins by hosting other people’s files, and then spend Filecoins to get other people to host your files.
This is a hard problem to solve, because you want to prevent people from flooding all hosters; so there has to be either some kind of PoW or money involved. And with money involved, there’s now an incentive for hosters to misbehave, so you have to deal with them, and this is hard; there are some failed projects that tried to address it.
IPFS’ authors’ solution to this is Filecoin which, afaik, they had in mind since the beginning of IPFS, but it’s not complete yet.
Sort of… my recollection is that when you run an IPFS node (which is just another peer on the network), you can host content on IPFS via your node, or you can pull content from the network through your node. If you publish content to your node, the content will always be available as long as your node is online. If another node on the network fetches your content, it will only be cached on the other node for some arbitrary length of time. So the only way to host something permanently on IPFS is to either run a node yourself or arrange for someone else’s node to keep your content in their cache (probably by paying them). It’s a novel protocol with interesting technology but from a practical standpoint, doesn’t seem to have much benefit over the traditional Internet in terms of content publishing and distribution, except for the fact that everything can be massively (and securely) cached.
There are networks where you hand over a certain amount of disk space to the network and are then supposedly able to store your content (distributed, replicated) on other nodes around the Internet. But IPFS isn’t one of those.
What are some of them? Is Storj one of those?
Freenet is one. You set aside an amount of disk space and encrypted chunks of files will be stored on your node. Another difference from IPFS is that when you add content to Freenet it pushes it out to other nodes immediately, so you can turn your node off and the content remains in the network through the other nodes.
VP Eng of Storj here! Yes, Storj is (kinda) one of them, with money as an intermediary. Without getting into details, if you give data to Storj, as long as you have enough STORJ token escrowed (or a credit card on file), you and your computers could walk away and the network will keep your data alive. You can earn STORJ tokens by sharing your hard drive space.
The user experience actually mimics AWS much more than you’d guess for a decentralized cryptocurrency storage product. Feel free to email me (jt@storj.io) if some lobste.rs community members want some free storage to try it out: https://tardigrade.io/satellites/
Friend, I’ve been following your work for ages and have had no real incentive to try it. As a distributed systems nerd, I love what you’ve come up with. The thing which worries me is this bit:
I’m actually really worried about the cryptocurrency part of this, since it imbues an otherwise-interesting product with a high degree of sketchiness. Considering that cryptocurrency puts you in the same boat as Bitcoin (and the now-defunct art project Ponzicoin), why should I rethink things? Eager to learn more facts in this case. Thanks for taking the time to comment in the first place!
Hi!
I guess there’s a couple of things you might be saying here, and I’m not sure which, so I’ll respond to all of them!
On the technical side:
One thing that separates Storj (v3) from Sia, Maidsafe, Filecoin, etc, is that there really is no blockchain element whatsoever in the actual storage platform itself. The whitepaper I linked above is much more akin to a straight distributed systems pedigree sans blockchain than you’d imagine. Cryptocurrency is not used in the object storage hotpath at all (which I continue to maintain would be latency madness) - it’s only used for the economic system of background settlement. The architecture of the storage platform itself would continue to work fine (albeit less conveniently) if we swapped cryptocurrency for live goats.
That said, it’s hard to subdivide goats in a way that retain many of the valuable properties of live goats. I think live goats make for a good example of why we went with cryptocurrency for the economic side of storage node operation - it’s really much more convenient to automate.
As a user, though, our primary “Satellite” nodes will absolutely just take credit cards. If you look up “Tardigrade Cloud Storage”, you will be able to sign up and use the platform without learning one thing about cryptocurrency. In fact, that’s the very reason for the dual brands (tardigrade.io vs storj.io)
On the adoption side:
At a past cloud storage company I worked at before AWS existed, we spent a long time trying to convince companies it was okay to back up their most sensitive data offsite. It was a challenge! Now everyone takes it for granted. I think we are in a similar position at Storj, except now the challenge is decentralization and cryptocurrency.
On the legal/compliance side:
Yeah, cryptocurrency definitely has the feeling of a wild west saloon in both some good ways and bad. To that end, Storj has spent a significant investment in corporate governance. There’s definitely a lot of bad or shady actors in the ecosystem, and it’s painfully obvious that by choosing cryptocurrency we exist within that ecosystem and are often judged by the actions of neighbors. We’re not only doing everything we can to follow existing regulations with cryptocurrency tokens, we’re doing our best to follow the laws we think the puck could move towards, and follow those non-existent laws as well. Not that it makes a difference to you if you’re averse to the ecosystem in general, but Storj has been cited as an example of how to deal with cryptocurrency compliance the right way. There’s definitely a lot of uncertainty in the ecosystem, but our legal and compliance team are some of the best in the business, and we’re making sure to not only walk on the right side of the line, but stay far away from lines entirely.
Without going into details I admit that’s a bit vague.
Anyway, given the length of my response you can tell your point is something I think a lot about too. I think the cryptocurrency ecosystem desperately needs a complete shaking out of unscrupulous folks, and it seems like that’s about as unlikely to happen as a complete shaking out of unscrupulous folks from tons of other money-adjacent industries, but perhaps the bar doesn’t have to be raised very far to make things better.
The lack of a blockchain is a selling point. Thanks for taking the time to respond. I’ll check out the whitepaper ASAP!
… I kinda want to live in this world
You might want to check out Arweave.org.
Only if the person hosting it turns off their server? IPFS isn’t a storage system, like freenet, but a protocol that allows you to fetch data from anywhere it is stored on the network (for CDN, bandwidth, and harder-to-block). The person making the content available is still expected to bother storing/serving it somewhere themselves, just like with the normal web.
If you want to donate some disk space you can start following some of the clusters here: https://collab.ipfscluster.io .
Factor was the first open-source project I ever contributed to and still a language I find fascinating. Always nice to encounter it again - like meeting an old friend! (Speaking of, hi doublec - I remember that username from #concatenative in ~2007)
Hi jamesvnc, I remember you from #concatenative too. “like meeting an old friend” is exactly how I feel when I fire up Factor too. The mid 2000s was a flurry of activity in Factor - so many good things came out of it.
This is interesting but has it has the same resilience and ip address security issues as ipfs and scuttlebutt.
Can you talk more about this?
I’m interested in reading more about it too. With ipfs you can trivially find out the IP address of the users that are seeding the content. There are ipfs cli commands to do it.
Dat, ipfs, scuttlebutt are all pull based. If I provide content on the network then I start off as the only seeder, then other users pull from me when they want the content. If I turn my device off before at least one other seeder has all the content then the content is unavailable until I turn it back on. How do you know when it’s safe to turn off the device? The general approach seems to be to set up a 24x7 server to seed the content or use a pinning service. Another issue with this, in my use of IPFS in the past, has been a thundering herd effect when initially announcing available data. If it’s popular anticipated content then a huge number of users will attempt to get the data initially but there’s only one provider and seeding drops to a crawl. Maybe this has been improved recently.
I like the Freenet approach where the data is pushed out into the network to multiple nodes. Once inserted the device can be turned off but the content remains out there. There’s a defined point where you know it’s safe to shut your node down, have nothing running anywhere, but users can still retrieve the content.
The article linked by @skyfaller seems to suggest that ssb does some kind of “[a]utomatic publication of content to your friends”?
I will be writing something up in a big blog post in a month or so but right now I can’t.
Brave is just a browser which hides ads (which every browser should do, because ads are a cancer on the Internet) and displays its own (which no browser should do — but users are free to install whatever software they want). And hey, it even adds a way for sites to make money if they want.
I don’t use Brave, I’ve never downloaded it, but it’s a-okay by me. Don’t want people to view your content without paying? Then don’t display it to them.
Besides potentially putting ads on sites that have deliberately chosen not to run any, Brave has done some other sketchy things. I don’t know if they still do, but they used to run those “we’re fundraising on behalf of this site” notices showing on sites that had no affiliation with them at all. Hopefully Eich finally got or listened to some lawyers and was told why that’s a bad idea, but it’s always seemed to me to be one of the classic desperate tactics of a certain class of crypto-fundraising scam.
There should at the very least be some program for websites to say that no, they’re not interested in money from Brave (that’s the idea, right? Brave puts ads on the website and gives a portion of the revenues to the website owner?).
My understanding is that Brave removes a site’s ads, and adds their own, then holds the money made by the impression hostage, splitting the money with the content creator if they ever come forward.
Brave does not add their own ads to a site. They block the sites ads and provide a way for people to tip registered publishers, or auto-donate a share of a set amount based on time spent on the site. If the site is not registered they don’t receive the tips and they are returned to the donator.
Brave has its own ads that are part of the application and appear as pop up notifications unrelated to the site being visited. These are opt-in and users get a share of the ad revenue for seeing them.
Right, so by using Brave you’re standing on Madison Avenue in NYC screaming “HERE’S PAYMENT FOR YOUR CONTENT!” but their office is actually in Hollywood. It’s not stealing if they don’t accept my payment, right?
Ko te mea whakarapa kē, I’m not familiar with American geography so don’t get your analogy. But it doesn’t matter - I wasn’t debating Brave’s model, I was correcting your misunderstanding of how Brave works.
I get it. Thanks for pointing out my misunderstanding! As for US geography, the two locations are on opposite sudes of the US. Point being, if I try to pay you, and you don’t take my money because I am trying to pay in the wrong place, I didn’t pay.
It’s not stealing full stop, any more than looking inside the books at a bookstore is.
Bookstores could choose to sell books in shrink wrap, but choose not to.
If ad based businesses wanted to give their content away for free, they wouldn’t put ads on their pages. It’s all about intent, and by blocking ads, you intend to deprive the creator from their source of revenue for your use of their content. Why isn’t that theft?
Bookstores could. If I opened the shrinkwrap, read the book, put it back on the shelf and left, I would not have committed theft (possibly property damage).
Websites could refuse to show me their content until I view an ad. They could even quiz me on the ad to make sure I paid attention. If I somehow circumvent that, I’m committing illegal access to a computer system (which, I believe, is a felony in the USA).
Theft deprives the victim of property, which is taken by the thief.
Now, you could argue that it’s wrong (fwiw, I’m sympathetic to that view), but if you use words contrary to their straightforward definitions (in law), I’m going to call bullshit.
It seems they can add ads to site without ads. The original article complains about this (for example, in the very last paragraph). I wonder where ads are added and I would also be worried if ads are presented in my own ad-free website.
Definitely; if a portion of my userbase started seeing ads on my personal website, I would seriously consider at least adding a banner or something telling them that the ads they see aren’t mine and that their browser is adding ads on my ad-free page.
Actually, I should probably get Brave and check out how that whole thing works.
It seems ads are displayed as notifications. See https://whatisbat.com/2019/04/25/how-to-turn-on-brave-ads-and-earn-bat-with-the-brave-browser/ for a screenshot. Fine by me.
Ah, if it just blocks ads in the web content area and keeps all ads in the chrome or in notifications or someplace else where it’s obviously from the browser itself, that’s not really an issue at all.
A rebuttal, though from an OpenLibra contributor rather than Facebook.
Nothing in the rebuttal other than the cryptography section sounds particularly strong.
The cryptography section isn’t wrong but it misses my point entirely. I claim that the more mission critical the cryptography is in a piece of software the more scrutiny it should be given and that these libraries are relatively new and not as tested as older ones. A company with resources like Facebook could have put a lot more resources into this and it would have been good for the Rust community for this to happen as it makes the whole ecosystem more robust.
When you’re going before congress to testify that this system can safely handle private user data, the QA should be much higher on your software because the public’s interest is involved. I thought this was a fairly uncontroversial opinion.
the original: “these ideas are bad and incoherent, and this code cannot possibly implement reality consistently”
the riposte: “but look how shiny the pieces are! also, it has to suck cos it’s a blockchain,”
that’s really not a good rebuttal
I agree, it isn’t a rebuttal of my original points. It’s more a statement of blind faith that the technology just needs to be invented and that the public should just trust that currently intractable problems will just be somehow solved before it goes live. I don’t have any faith in blockchain tech, I only look at what exists and is provable today.
Thanks for finding this, always interesting to read a rebuttal on a similar level.
I struggle to understand the joke here, unless it’s a poke at cryptocurrency boosters to refer to conventional money as “FIAT” (in all caps).
Probably related to the rebrand.
Ah, I’ve missed that. Thanks. That is a fugly logo.
Cool, it’s Shen! I became aware of this language years ago but have never really seen any content about it (I think it was closed source / not free for a while?)
I enjoyed this post, and I have one (probably stupid) question: do you actually have to type out lines of
--------------
or===========
in the REPL when definingdatatype
s?Yes, the
_____
and=====
lines have to be typed in the REPL. You don’t need to match them to the length of the code, just one_
or=
suffices. So these are equivalent:In the REPL I make it short for ease of typing, in files I make it long to look nicer.