Threads for olliej

  1. 4

    Part of this is that unlike rust, swift is intended to be used to write programs for typical users. That means your compiling an application shouldn’t mean including a copy of half of userspace, it means you can’t require distinct builds of the program for every OS point release, etc.

    Rust is a good language in many respects, but the refusal to acknowledge abi stability matters outside of specific niches hinders its adoption outside of sai niches.

    1. 6

      refusal to acknowledge abi stability matters

      I wouldn’t say that this is a fair summary of the situation, especially in light of the recent https://github.com/rust-lang/rust/pull/105586

      1. 3

        The main question here is of course: when is a language sufficiently mature that committing to a stable ABI is worth the downsides of doing so? You can’t make the same criticism of any language at any state of development.

        1.  

          Your point might be easier to discuss if you spelled out your assumptions here. My abbreviated history of rust development is that it began life at mozilla to underpin their Firefox browser. That seems like the epitome of a program written for typical users. It sounds like maybe you disagree or you think it has strayed from the original design constraints. If so, why?

        1. 3

          Wow they were really unlucky the so tiny stuff broke on this motherboard during shipping… :)

          1. 4

            I’m not sure I believe the headline.

            Page 15 of the proposal says:

            In order not to hamper innovation or research, free and open-0source software developed or supplied outside the course of a commercial activity should not be covered by this Regulation.

            There will be new expenses for companies like Docker, I guess, but I don’t see how it affects most open source projects.

            1. 1

              Linux is a commercial product, developed in a commercial environment, by many commercially funded engineers. Seems like these laws would apply there.

              Similar for many of the large oss projects.

              For the big overtly commercial projects like chrome, Firefox, webkit, etc the quotes in the article would seem to rule out the nightly builds many web devs use.

              1. 1

                What makes Linux a commercial product? Has anyone ever bought a copy of the kernel?

                1. 3

                  What makes Linux a commercial product? Has anyone ever bought a copy of the kernel?

                  Yes. What is RHEL, if not a commercial Linux distribution? You pay Red Hat money to use it.

                  1. 2

                    RHEL is an operating system. Linux is a kernel. RHEL is a commercial product, Linux is not. I’m pretty sure /u/olliej was talking about the Linux project, not Linux-based operating systems, since talking about “Linux” as one entity (“Linux is a commercial product”) doesn’t make sense if “Linux” is used to refer to some large set of operating systems.

                    1. 1

                      I am talking about the linux kernel itself, but I’m talking about it in the way that it is actually used: The vast vast majority of linux users are not downloading and building the linux sources directly. They’re using the linux kernel as provided by their vendor.

                  2. 1

                    Oracle, IBM, RedHat, numerous cloud providers, etc

                    A plain reading of “Commercial product” does not mean “sold standalone”, otherwise that would make the law trivially circumventable, e.g. “We aren’t selling X, we’re selling support for X”. To claim that the above are not commercial entities for which linux is a component of a commercial product is fairly obviously nonsense.

                    1. 2

                      Alright, so we agree on that. Linux isn’t a commercial product, but Linux is a core component of a lot of commercial products. And of course, the companies which sell a commercial product with Linux as a core component will contribute to Linux development.

                      I would be interested to know how exactly the regulation treats that. While I think “Linux is a commercial product” is incorrect, “Linux is developed or supplied (in part) in the course of a commercial activity” is certainly correct. My guess is that RHEL/Oracle/IBM/Amazon/etc will have to make sure their product conforms to whatever requirements apply, which includes making sure that Linux conforms; meaning Linux could in principle ignore the regulation, but if it wants to continue being part of commercial products, it must meet the requirements which are imposed upon those commercial products.

                      1. 1

                        “X isn’t a commercial product, it’s just a core component of innumerable commercial products, developed largely by commercial interests, for commercial purposes”?

                        Anyway, even if we take your definition which is I think “linux is a hobby OS, and no core developers are paid to develop it for its commercial value and use”, it’s still borked under this articles interpretation of the law:

                        For all the people who work on the linux kernel for a corporation that uses linux as a commercial product, linux is a commercial product.

                        Afaict, that means that from their point of view, non-commercial contributors need to also treat linux as a commercial product. Which is where the problem happens: unpaid contributors can’t afford the EU rules - unless the EU is willing to pay all contributors that work on software that the EU benefits from, why would any of those people want to do anything but say “my code may not be used in the EU” and add an #ifndef EU around their code? If those contributors don’t do that, and don’t conform to the EU’s rules that are detached from reality, how can RedHat, IBM, ARM, Intel, etc work on or contribute to linux, when some people are not treating it as a commercial product and so are failing to provide the guarantees that the law ostensibly requires, unless they are using their own forks that strip out all such contributions?

                        1. 1

                          You seem to make no distinction between a project being non-commercial but used as a component in products by third-party entities, and a project itself being a commercial product. I do not understand that perspective. I do not know if the law agrees with you or with me on that topic.

                          I am not disagreeing with you on the point that this affects companies which are using Linux as part of a commercial product, which in practice affects Linux since Linux wants to be usable as part of commercial products. The concerns you outline seem like those I wrote in my earlier comment.

                          1. 1

                            As a follow up, it appears the US government takes an even stronger position on OSS being commercial software, hence my concern about what the EU thinks seems reasonable: https://dodcio.defense.gov/Open-Source-Software-FAQ/#q-is-oss-commercial-software-is-it-cots

                            1. 1

                              The point I’m trying to say is that from the point of view of this legislation, it seems like linux would be considered a commercial product - sufficiently that commercial developers would have difficulty having “non commercial” developers involved in what they were shipping.

                    2. 1

                      I don’t see the problem. Giant companies exploiting “open source” for their own benefit have some new expenses. Boo hoo.

                      The headline makes it sound like run of the mill open source projects have a dire crisis on their hands, when they really don’t.

                      1. 2

                        I’m clearly not explaining my concern here very well.

                        It’s not “oh noes, giant ass corporations will have to spend more”, it’s “giant ass corporations won’t contribute to, or allow changes from people who can’t also afford the new costs, leading to a divergence between projects like linux, and what is actually shipped”. Given those corporations do fund much of the core kernel dev, what happens if they start saying “contributors must agree to meet eu requirements”?

                        That’s my concern. I don’t give [insert your own favorite expletives] about a corporation that profits from OSS having to spend some more money to ensure they’re not shipping monstrously unsound products, or supporting people who bought a product expecting some basic level of security and support. I’m concerned about the impact on things like he ability for hobbyist devs being able to participate, etc.

                  1. 2

                    My masters thesis was compiling haskell to .net and trying to coerce some kind of type safe way (my project was supporting haskell without type erasure, this was just after generics were added to .net - I actually did the ppc port of the gyro patch that added generics to the .net reference engine!). It was far from optimal - the reality is that supporting higher kinded types if the VM’s type system doesn’t is misery.

                    Happily, my primary language these days is C++, and it does support higher kinded types :D

                    1. 3

                      Are dedicated return address predictors relatively new, or just under-advertised? I’ve only heard them really talked about in the last few years, but it seems like they would be an easy optimization to make.

                      1. 5

                        Return stack caching is a very old one - I have a recollection of 90s era pentium 1s having it, but can’t trivially find evidence, and more to the point of intel had it I assume so did others.

                        interwstingly early on in the JSC JIT we did terrible things to try to keep the return cache happy when we were doing OSR while avoiding a second branch to get to the new code, but alas there’s not any real way to convince the internal return stack cache that you have changed the actual return address on the stack. The cost of a mispredict on the return was invariably enough to outweigh any win you might get by avoiding an additional indirect jump.

                        1. 1

                          Agner sez:

                          The P1 has no return stack buffer, but uses the same method for returns as for indirect jumps. Later processors have a return stack buffer


                          we did terrible things to try to keep the return cache happy when we were doing OSR while avoiding a second branch to get to the new code, but alas there’s not any real way to convince the internal return stack cache that you have changed the actual return address on the stack. The cost of a mispredict on the return was invariably enough to outweigh any win you might get by avoiding an additional indirect jump.

                          I’m a bit confused by this statement. How can you do on-stack replacement without overwriting the return stack? (I guess you can throw out the entire thing and regenerate the call sequence from scratch, but that seems like a lot of weird, annoying work.) And doesn’t osr require other, far more expensive things to be done besides (like, I$ invalidation for the new code you generated)? I will note that, on return, the cpu still rolls on to the next entry regardless of whether you hit or missed, meaning that you avoid misses for all but the activation record you replace.

                          1. 2

                            Agner sez:

                            The P1 has no return stack buffer, but uses the same method for returns as for indirect jumps. Later processors have a return stack buffer

                            Mercifully I hedged :D

                            How can you do on-stack replacement without overwriting the return stack?

                            The lower levels of the JSC JIT codegen (especially the original JITs) maintain compatible stack layout with the interpreter (this is doable as the JSC interpreter is written in a custom pseudo assembler), but you can also ensure that you get a stable/mappable stack frame at OSR fix points (or the deopt cases you maintain sufficient metadata to rebuild an interpreter frame). The problem that I’m talking about is just switch which code is actually running - e.g something happens that results in a callout to the VM that will replace the implementation of the calling context (deopt due to failure or backedge optimization, etc), but the VM wants to return to the correct point in that new code. Originally we would rewrite the return pointer, pointing to the correct continuation point in the new code, but in the end it turns out to be more expensive than returning the pointer and having the origin code jump to that returned address. e.g. the return stack mispredict is more expensive than the correctly predicted return + an unpredicted indirect branch.

                            1. 1

                              How can you do on-stack replacement without overwriting the return stack?

                              Typically, the return address predictor only tracks things like bl foo. So if you do ldp x29, x30, [sp], #16, you change where you’re going to return to, but the return address predictor doesn’t figure it out. This causes the mispredict.

                            2. 1

                              Sounds like it might be old enough that I haven’t heard of it because everyone already knew about them, then. XD

                            3. 2

                              I implemented one in our prototype CPU several years ago and was told (after feeling smug about inventing the idea) that it was safe because all of the relevant patents had expired because the idea was so old. They’ve been in the news recently because they have understandable behaviour and are the easiest to train to exploit speculative side effects.

                              The code from the article also has two branch instructions right next to each other, which can cause problems on a lot of Arm cores because the predictor runs on fetch granules before decode, rather than on decoded instructions. This means that the performance will change significantly depending on the alignment of the function.

                              1. 1

                                Yes, they have been around for quite a long time.

                                1. 1

                                  x86 has had it since Pentium Pro (1995), I believe.

                                1. 3

                                  Every January 1st, an army of open source developers rushes out to update their copyright attributions in licenses and documentation.

                                  Now, this same army is going to have nothing to do, except open issues in various GitHub projects about how the copyright years shouldn’t be updated.

                                  1. 3

                                    So basically this article is Stealing Jobs (tm) :D

                                  1. 5

                                    Wait, who is going around updating their copyright years??

                                    Ignoring anything else: you can’t simply put a new copyright year on your work to extend the period of coverage: the copyright applies to when you published it first at the latest. If January comes around and you slap a new year on your work you’re not doing anything: updating the copyright date only applies to new material, so you can update a copyright header when you make some other change.

                                    That said given Disney has ensured length of copyright approaches the heat death of the universes I’m not sure there’s real value in that either.

                                    1. 6

                                      The argument goes:

                                      • The initial copyright date for a software project is the date when it was first written
                                      • But if the project undergoes any type of ongoing development, such that new code is being added to it, then that new code has a copyright date of whenever the new code was written
                                      • Therefore the copyright statement should reflect the range of dates involved in the various individual bits of code that make up the project

                                      So a project begun in 2015 would initially have “Copyright 2015”. Then if more code was added in 2016, it would become “Copyright 2015-2016”. And so on.

                                      Or at least that’s what I understand the argument to be for why the years need updating.

                                      The analogy would be a blog that’s kept over multiple years – each individual entry is copyrighted as of its date of authorship, so the blog’s sidebar or footer would display a pair of years in its copyright statement, reflecting the range of dates of copyright of the constituent entries.

                                      1. 6

                                        I understand updating the year when you make a copyrightable change, but some projects (e.g. FreeBSD) go and update copyright on unmodified files at the start of a year and this completely confused me.

                                        1. 1

                                          Copyright terms are based on death of author and not publication date anyway

                                    1. 1

                                      If in JavaScript I were to write:

                                      const res_a = a(value)
                                      const res_b = b(res_a)
                                      

                                      and so on, do mainline JS interpreters not optimise that out with constant folding or whatever?

                                      1. 3

                                        I’m not sure what you’re asking, there’s nothing to constant fold there

                                        1. 2

                                          Don’t worry, I’m not quite sure what I was asking either.

                                          1. 2

                                            I realize you may have been confused by their obsession with temporaries being bad. What’s happening there is that the authors are actually wanting to measure efficiency of code by number of characters. Of course they can’t say that because it’s an objectively false measure of efficiency and legibility, so they come up with a bunch of bogus insinuations about performance.

                                            For a modern JS engine there is no performance difference between

                                            f(g(h(a)))
                                            

                                            or

                                            let htemp = h(a)
                                            let gtemp = g(htemp)
                                            f(gtemp)
                                            

                                            or the hypothetical

                                            a |> h |> g |> f
                                            

                                            Unless there is an eval() in the scope (but no one uses eval right? :D), a closure containing an eval(), or a closure referencing htemp or gtemp, but of course if you wanted to do that then the local var has to exist regardless.

                                            1. 1

                                              :D

                                        1. 1

                                          This is generally correct, but there’s a case where I saw recently where a “security researcher”, attempted to get RCE on an OSS project’s CI infrastructure. They did this not by filing a bug report saying “your CI looks suspicious may I do C?”, but by creating a new GitHub account and posting a bunch of different patches targeting the CI.

                                          It was only during the work being spent trying to identify whether things were under an attack did it come out that they were actually employed by a “pentesting” company that had no relationship with anyone or thing involved with the project. Their presumed intent was a server compromise and then a bug report saying “look we compromised your servers”.

                                          Your right to pull the “I’m a security researcher and they shit in me card” goes out the window once you’re actively attacking and targeting code execution on a system that does not belong to you.

                                          1. 5

                                            Yeah, that’s why I never send traffic to any systems I don’t control. I only study source code, and rarely reverse engineer apps.

                                            Criminal activity is a horrible way to start a relationship with a business.

                                            (n.b. I never do API testing. Yes, even if there’s a Safe Harbor declaration somewhere. Aaron Swartz’s prosecution happened despite MIT and JSTOR not wanting to pursue hacking charges. I do not trust the US government, so it’s best to never run afoul of the Computer Fraud and Abuse Act if you can help it.)

                                          1. 1

                                            Without currying, piping kinda sucks. None of the stdlib stuff would do well supporting either. Do you want [ 1, 2 ] |> Array.prototype.map.call(x => x + 1)? This only works if unexpectedly |> pipes to the first argument cough Elixir. If you are the type of person that wants a pipe operator you’re probably leaning functional in style and you’ll almost certainly be happier with compile-to-JS languages that support it at a more fundamentally level; there are dozens to choose from.

                                            1. 1

                                              The problem w.r.t to currying is that you cannot add currying to JS, as all functions can be call under or over applied. Currying requires at minimum arity level typing of functions, and you just can’t make JS work with that - there’s no way a call site can know whether it’s a curried vs. non curried call.

                                            1. 3

                                              Another day, another proposal for a JS language feature that optimizes for fewest possible characters as though fewer characters means “more readable” (perl and haskell show this to be false), or more understandable.

                                              The proposal is an entire language feature and syntax to support a single construct:

                                              a(b(c(d()))
                                              

                                              Which is frankly not anything remotely common enough to warrant a custom syntax.

                                              Their argument against this is that it is hard to read if the nesting is too deep, and the temporaries are unclear, especially if reused. Which is an absurd argument, because why would you be reusing a single temporary, unless again you believe in optimizing for fewest characters written, as if that were the core metric for productivity.

                                              The proposal attempts to justify this by saying the proposal applies to other cases:

                                              a(b(c(), d()))
                                              

                                              By reusing the % operator as an unnamed token is more readable:

                                              c() |> b(%, c() |> d
                                              

                                              Which is is more characters (gasp!) and I would say much less readable, but also is half assed: which function should get to be %?

                                              The anchor also has orthogonality problems:

                                              a |> (c |> f(%)) |> d
                                              

                                              The obvious response here is “don’t do that, it’s unreadable”, but that fails to acknowledge that the only reason this occurs is this new syntax being insufficiently thought through. Again, that’s because fundamentally the proposal owners only care about their a(b(c(d()))) case, and try to paper over this with the pretense of covering other cases.

                                              1. 1

                                                I tend to agree. In using JQuery I’ve found this method-concatenation style very convenient & readable. But adding a new operator and a new placeholder token to the language, just to make this style work with non-methods, seems like overkill. (I find the “F# style” much cleaner, but for whatever reason it’s already been rejected…)

                                                Anyway, I’m glad I mostly use languages that support operator overloading, so I’m not dependent on some capricious language committee to create new operators :-p

                                                1. 1

                                                  I feel arbitrary custom operators are a hindrance to understanding code. A plain text function name can be read and understood by someone unfamiliar with the code, but if you use a random operator/sigil a person lacking codebase familiarity cannot just read code, they have to seek out and find the definitions for each operator.

                                                  Personally I don’t think that the character savings for custom operators warrant the complexity increase. I’m aware that some code can “read” better purely left-to-right (another factitious argument from the earlier paper that “normal” code is left-to-right), but I am not convinced custom infix operators are worth it, but some languages. - say Haskell - allow infixed named operators.

                                              1. 2

                                                Years ago I was writing a haskell compiler based on GHC, and space leaks were a problem I just decided to ignore, because on .net I found that weak references for logical global values were too easily dropped and resulted in multiple evaluations. It’s interesting to see that the issue is considered large enough to warrant a thesis topic. Also surprised to see Stanford is offering honours degrees which I didn’t think the US did?

                                                Space leaks are also in no way limited to Haskell, and occur in every GC language, but they are especially chronic in languages with closures. They’re a significant problem in JS, and historically were even worse as the early JS engines did not perform free variable analysis so all locals would be captured. That led to variables that were not going to be used again, that looked like they would not be captured, still being captured and keeping everything alive. Obviously this was worst in Trident as it’s JS implementation had a tendency to use ref counting for some structures and did not have cycle breaking.

                                                1. 2

                                                  The pdf is hosted on a stanford user page but seems to have been written for IIT Bombay, India. (Also, Seminar Report as part of a bachelor’s, not thesis.)

                                                  1. 1

                                                    If it’s the research report for an honours degree it’s ostensibly a mini-thesis, but being from IIT and just hosted on Stanford’s site explains the honours degree :D

                                                  2. 1

                                                    The major space leak problem in Haskell isn’t from closures, it’s from lazy evaluation. In most languages, an int is 64 bytes (or whatever). In Haskell an Int (the small version, I don’t mean Integer) is arbitrarily large, because it might contain an arbitrarily large computation that hasn’t been forced yet.

                                                    Real world example: https://stackoverflow.com/questions/7768536/space-leaks-in-haskell#7769510

                                                    1. 1

                                                      I know what causes space leaks in haskell, closures are a problem for the other languages because lazy evaluation is not common.

                                                  1. 1

                                                    It’s so weird that there’s so much pressure to maintain 4k pages - it was worth the cost to Apple to take the compatibility issues from moving from 4k to 16k pages in hardware, I’m not sure why people are so hell bent on thinking that 4k is still the best option.

                                                    1. 2

                                                      It’s not, but for x86 hardware the only options are 4kb or 2mb, and even today 2mb is just a bit too big to be convenient for small programs. Looks like Aarch64 has more options (16 and 64 kb pages), which I actually didn’t know

                                                      1. 1

                                                        I’m not sure why people are so hell bent on thinking that 4k is still the best option.

                                                        Which people are those?

                                                        1. 1

                                                          Has Apple managed this for macOS? I spoke to some of Apple’s CoreOS team about RISC-V having an 8 KiB default page size at ASPLOS a few years ago and their belief was that it would break large amounts of code. A lot of *NIX software assumes that you can mprotect a 4 KiB granule. For iOS, they’ve generally been willing to break a load of things and require people to put in porting effort but less so for macOS.

                                                          1. 2

                                                            The 4k assumption is actually even worse than just an “assumption” even code that tried to do the right thing: using getpagesize() didn’t work as it was a macro that expanded to 4k on intel machines (at least on Darwin), which made rosetta challenging. The M-series SoCs support some kind of semi-4k page allocation in order to deal with this problem under rosetta.

                                                            For ARM builds on iOS have been 16k for many years preceding that, so a moderate amount of Mac software (which shared code with iOS) had clear source changes to do the right thing, and getpagesize() stopped being a macro so new builds of software got the correct thing.

                                                            1. 1

                                                              They seem to have, as part of the Intel -> Apple Silicon transition (which I guess requires some porting effort anyway). On the M1/M2, macOS has 16k page sizes, and a quick GitHub search turns up that this did initially break various things that assumed 4k.

                                                          1. 1

                                                            He’s using Python as an example of non refcounted garbage collection, but historically Python was refcounted with cycle breakers - is it now fully mark sweep?

                                                            1. 2

                                                              Python is still ref counting with cycle detection. It’s extraordinarily difficult to change because every function in every C extension is littered with Py_INCREF, Py_DECREF, etc.

                                                              FWIW one thing I really learned in implementing a collector is how “littered all over” the codebase memory management is. It’s hard to change later, unless you have a layer of indirection like PyPy or Oil, which I mentioned here:

                                                              https://www.oilshell.org/blog/2023/01/garbage-collector.html#a-delicate-octopus-with-thousands-of-arms

                                                              I think non-moving tracing (e.g. mark and sweep) is the least “littered all over” of all strategies


                                                              Aside: Looks like they have a new dev guide which is kinda cool, haven’t seen this:

                                                              https://devguide.python.org/internals/garbage-collector/

                                                              1. 2

                                                                Yeah I would have been shocked if they managed to change it, though there are ways (for example you could have a non-zero refcount push an object into a separate root table, though then you need manage cycle breaking again :) ).

                                                            1. 1

                                                              This is mostly complaining about use of “idiomatic” std::optional results in potentially expensive behaviour constructor calls. The canonical example is the unwrap_or (or whatever it’s called) which instantiates a std::string rather than somehow .. not? But this is a standard foot gun that comes from c++ refusing to actually acknowledge strings should have compiler support. Which is a common issue in c++ :-/

                                                              1. 1

                                                                This generally isn’t a problem with std::string because it can be move constructed so the constructions shouldn’t incur a heap allocation. The value method on std::optional returns a reference to the contained object and so there’s no construction there and you can move construct it out.

                                                                I still consider having a compiler aware of strings to be an antifeature in a language and C++ does the right thing there. The problem is that the standard library string model is not very good:

                                                                • It conflates interface and representation. If your standard library does refcounting or does the small-string optimisation and that isn’t appropriate for your use, you are stuck with it. For a lot of workloads, you can get a very big speedup (I’ve seen doubling of transaction throughput rates) by changing the representation of strings to something tailored to your workload. This isn’t possible if you’re using std::string and so a load of libraries add their own, which adds copying overhead everywhere.
                                                                • It doesn’t support unicode (it now supports unicode storage, but doesn’t even have iterators for unicode code points, let alone glyphs or words).
                                                                • It is inconsistent about whether simple operations belong as methods or functions that take a templated thing that might be a std::string.
                                                              1. 5

                                                                I remember the surprise at seeing KHTML then. I’d used Konquerer, but it encountered a lot of pages that it couldn’t render correctly, even then. It looked like more engineering effort than I would expect a company like Apple (remember, this was pre-iPhone, with Macs having around 7% of the market) to be able to afford. Hugely impressed with the vision of the folks behind all of the decisions that led to that point. Safari 1.0 was so much better than IE and was far less clunky than Camino on OS X, though not quite as nice as Opera, but eventually Opera’s pricing model pushed me away and I never left Safari on the Mac (though the last version is so buggy I might: it refused to connect to any UK government web sites recently for me, whereas Firefox and Edge on the same machine had no problems, and it increasingly often goes into a loop of crashing a renderer process and telling me that it has encountered a problem with a page, on large mainstream company web sites).

                                                                1. 2

                                                                  Lots of people act like safari/webkit was just khtml, but the reality was that khtml was woefully behind gecko and trident, and it took a large amount of engineering effort to get compatibility to the level it had with safari’s first beta, let alone customer release.

                                                                  This isn’t to dump on the khtml folk: making a browser engine is a huge amount of work, and pre-whatwg/html5&es3.1 I would argue much harder than it is today. Obviously the modern web has much more API surface and features, but the piss-poor specifications from the w3c and ecma mean that basically every part of the browser had to be implemented by some variation of trial and error comparing behavior with gecko and trident. Nowadays you can follow along with serenity’s browser dev and their dev process often starts by copying the spec text in as a comment, and just implement that, and be fairly confident that it will be correct.

                                                                  If we want to be dismissive of the work apple engineers had to do to the underlying k* libraries it would be better to point to the ksvg2 code that was eventually imported as the svg implementation. That was much closer to just working as produced by the ode folk (Nicholas Zimmerman and rob buis iirc?) - most of the work in once in webkit was performance and security rather than rendering correctness (probably due to SVG being much stricter and having much more less content at the time). Obviously in the subsequent 15 years that has been rewritten as well, but the “it’s just khtml” claims re safari were never accurate, whereas “it’s just ksvg2” would have been reasonably accurate for at least 2 or 3 years of safari releases.

                                                                  1. 1

                                                                    I don’t mean to diminish that work in any way. I remember grabbing the tarballs of Apple’s KHTML fork (before they started developing it in the open) and comparing it against upstream and the changes were huge. Someone at Apple chose to commit enough engineers to a zero-revenue project to turn KHTML into something competitive and that was an amazing display of management foresight: I doubt the iPhone would have been possible, for example, without WebKit being mature.

                                                                    1. 1

                                                                      Sorry, I didn’t mean to say you were doing so - your comment was very clearly not - but seriously, even today I see people trying to say webkit/jsc is “just a khtml/kjs fork that did the real work”. I (and others) didn’t rewrite large swathes of everything multiple times to be dismissed as just being khtml/kjs :D

                                                                      Happily because the powers that be went for an open source base (Rather than a clean room), the webkit project (post-khtml’s reasonable “wtf with these tarballs” post), as that led pretty directly to my career :D

                                                                      1. 1

                                                                        The decision to take contributions to support other platforms was also very foresighted. I remember at one stage seeing over a dozen WebKit integrations, including S60 and other embedded platforms, in the main repo. It’s a real shame that Google decided with Blink to refuse to take patches for even platforms that are almost identical to their current ones (e.g. FreeBSD).

                                                                        1. 1

                                                                          I recall when the blink fork occurred and everyone talked about how much code they saved .. but no one seemed to ask what those code savings were (removing JSC, JSC DOM bindings, then the actual toolkit support: Qt, Gtk, Wx, etc). That said the cost of support for some of JSC JIT back ends was annoying back when I worked on it (many ifdefs for things like MIPS and SH4, because as much as you try to abstract things, fundamentally when dealing with CPUs directly you get squirrely stuff).

                                                                          My recollection of the pre-blink fork also had a fairly high workload from dealing with google folk who would come by, get enough patches in to get commit access (often make-work tasks like reformatting, “clean up”, “documentation”, or such), and then never be seen again. The rumor mill was that google was offering bonuses to people who go commit and/or review rights in webkit, which obviously just created make work churn for everyone else, as you’d help people get up to speed and then they disappeared once they got whatever checkmark they were after. It also meant that trying to keep track of who else was involved became hard (there were the actual competent and focused engineers, but they were drowned out by the constant churn of people with no long term interest in participation).

                                                                1. 2

                                                                  Who is storing this, and what is the attack vector?

                                                                  The cost model is drastically different for “my local password->drive encryption” vs “my password->access to all my passwords stored on a remote server”.

                                                                  Take the hard disk encryption w/a local only password. The solution on decent hardware is a 128 or 256bit random key, with an hsm gating access to the encryption material - or performing the encrypt/decrypt itself.

                                                                  In the absence of an hsm but still local only you can easily burn a few hundred ms + gigs of memory on PBKDF/scrypt/etc during initial boot to get the drive key. Note that time+memory is important, and the modern pbkdfs are generally configurable to use as much of both as you want. What you want is as much as you can get away with without your user’s noticing the delay.

                                                                  If instead you’re a hypothetical password syncing company, you should be investing in enterprise/high throughput HSMs to gate access to the user’s [encrypted] data on the basis that user passwords are generally not great. This means if someone gets a dump of your user’s data it’s all encrypted via strong actually random keys, rather than relying on whatever weak passwords they chose + a pbkdf.

                                                                  For data encryption purposes It’s also good (sane?) practice to use the root key material to encrypt actual random keys for the real encryption, otherwise changing password suddenly requires re-encrypting everything.

                                                                  1. 14

                                                                    I do not like this article, it seems very blasé about the cost of breaking syntax, uninterested in why decisions about these things (like the syntax of async/await) were made, and it seems to lack a practical understanding of how language features have to mechanically work in dynamic languages.

                                                                    Kill function scope and fix binding keywords

                                                                    If this matters so much, a lint tool to remove var is not hard, and doesn’t break any existing code, or require your site to ensure it never combines the wrong libraries together in single responses. A mode switch to remove var seems absurdly expensive.

                                                                    Fix import keyword ordering

                                                                    This makes it seem like the keyword ordering is an unintentional mistake, which it isn’t. It’s the result of unending discussions, and the end result was that the current ordering was the most sensible, but the both are perfectly reasonable. The order switching that this is proposing means that now I can’t just see the list of identifiers being imported at the head of the line, which is just as important, if not more so. Reversing the order also means you’re creating an assign-to-the-right semantic that differs from the rest of the language. On net, the current ordering is technically the best choice here, but the edge is not so extreme as to say that either choice is inherently “wrong”, which is what this article is claiming.

                                                                    Get rid of await soup for most code

                                                                    Multiple issues here. First: there’s no such thing as an “async function” - async functions are syntactic nicety to make working with promises easier, but any function can return a promise. Second there is the semantic of how you actually implement the continuation points as the absence of ‘await’ means you no longer know where you need to break up the codegen to return a promise vs continuing execution. The use of |await| is also useful for people reading the code as it shows you where the function can be interrupted, otherwise you have to assume every function call can result in the function being halted.

                                                                    This reads very much like the author is deciding to intentionally ignore that function lookup in JS is dynamic, and you don’t know at parse/codegen time anything about the call target.

                                                                    The easier await alternative

                                                                    I have no idea how the author thinks property access works, but there is no way to statically distinguish this new invented syntax from a normal property access. Moreover I fail to understand what makes foo().await superior to await foo()

                                                                    Make object-returning arrow functions cleaner

                                                                    Or you could just do

                                                                    let generatePerson =
                                                                      name => ({name, id: generateId()})
                                                                    

                                                                    Which is not challenging and doesn’t add a pile of syntactic ambiguity, parser complexity, and weird semantics to save two characters.

                                                                    1. 4

                                                                      Moreover I fail to understand what makes foo().await superior to await foo()

                                                                      From the article:

                                                                      On top of that, when we combine fluent APIs with this, suddenly we get code that fails the “can be read from left to right” test:

                                                                      const VIPs = (await getUsers()).filter(isVIP);
                                                                      

                                                                      Typing this is often “type getUsers(), await it, parens-wrap it, then add isVIP()”.

                                                                      This is definitely something I encounter a fair bit — it’s awkward.

                                                                      1. 1

                                                                        Ah, is the proposal actually that there’s an additional syntax:

                                                                        <expr> . |await| <identifier>
                                                                        

                                                                        Not saying

                                                                        <expr> . |await|
                                                                        

                                                                        Is now something with different semantics from any other .ident. You could likely avoid some syntactic ambiguity issues when in conjunction with ASI if it was

                                                                        <expr> await . <Ident>
                                                                        

                                                                        Which would have the benefit of fairly consistently extending to

                                                                        <expr> await [ <expr> ]
                                                                        
                                                                        1. 1

                                                                          I don’t understand the syntax you’re expressing the syntax here with at all, despite my best efforts! What?

                                                                          1. 1

                                                                            I’m saying that you logically get an “await.” token, e.g:

                                                                            const VIPs = getUsers() await.filter(isVIP);
                                                                            

                                                                            or

                                                                            someThing await["foo"]
                                                                            
                                                                        2. 1

                                                                          It’s slightly awkward but a syntax change that saves one character and removes a potential object property is not a reasonable solution.

                                                                          1. 2

                                                                            I’m not terribly fussed about the object property argument here — we’re spitballing, so let’s say instead of .await (per Rust) instead it was some other kind of postfix, maybe 💤 for lulz.

                                                                            The real benefit isn’t to save a character (also, does it?), but to fix the “inside-out” property that multiple awaits produce. Imagine function calls were similarly denoted with a prefix keyword, call. Instead of:

                                                                            function onClearBackgroundColor() {
                                                                                editor.chain().focus().unsetBackgroundColor().run()
                                                                            }
                                                                            

                                                                            We could have:

                                                                            function onClearBackgroundColor() {
                                                                                call (call (call (call editor.chain).focus).unsetBackgroundColor).run
                                                                            }
                                                                            

                                                                            Is this example kinda facetious? Sure. But what happens in a month’s time when the library I depend on decide to async-ify half their API? This is a way more plausible future than I’d like:

                                                                            function onClearBackgroundColor() {
                                                                                await (await (await (await editor.chain()).focus()).unsetBackgroundColor()).run()
                                                                            }
                                                                            
                                                                        3. 3

                                                                          Thanks for reading this and going in depth. I’d like to preface that I threw out these ideas a bit whimsically because I want to hear how people think about them (and don’t seriously consider any of them to be ever adopted), and this is a lot to chew on! You’re totally right about me being a bit blase’ about the consequences, but I do think they’re all technically possible without “breakage” from transitive packages

                                                                          You mentioned breaking existing code RE let/var syntax. In my magical universe you are opting into stricter mode at a file level. The internal representation used by JS engines can be unchanged! It’s a “keyword swap”, just one that I believe lands us where we would want to be if we had a do-over (and maybe reserve const for something closer to C++ const for example). But importantly this is at the source level so existing source files would continue to be usable as-is.

                                                                          Reversing the order also means you’re creating an assign-to-the-right semantic that differs from the rest of the language. On net, the current ordering is technically the best choice here, but the edge is not so extreme as to say that either choice is inherently “wrong”, which is what this article is claiming.

                                                                          Inverting the assignment direction is something I hadn’t thought of! I do think one interesting thing here is how other languages sidestep this with syntax like import my_package.foo (and now you have foo in the namespace), which feels even better to me but less trivial of a change implementation-wise.

                                                                          First: there’s no such thing as an “async function”

                                                                          Async functions are defined as their own thing in the spec, so while during the initial popularization phase they really were just syntactic sugar they are their own beasts. When they get executed there is special behavior to manage calling into them.

                                                                          any function can return a promise

                                                                          yes, but async functions are guaranteed to return a promise and are opting into async/await syntax. I think I said this in the original post but I am explicitly opting out of the “normal function returns a promise” case, because that is something that you can determine “statically” for a given function object. You want to be able to know about whether this will require awaiting when calling the function, not on return!

                                                                          I also want to point out that I’m not proposing the outright removal of await, simply saying that with my proposed semantics then await is needed much less often. This can be backed up via linters or things like typescript.

                                                                          “async functions are just functions returning promises” isn’t true unless you are transpiling as such. I do understand that codegen is a thing, but async functions are not transformed in the ways you are implying when pointing to modern targets.

                                                                          You do correctly point out that function lookups are dynamic (as they always are), so one might think that this will mean that each function call in an async function is now having to check a bit. But actually you can do this the other way around: every async function can check its calling context to determine what to do with the generated promise! This is not a clean retrofit by any means, though.

                                                                          Ultimately though this was more an idea I wanted to get out of my head.

                                                                          but there is no way to statically distinguish this new invented syntax from a normal property access.

                                                                          To be clear, obj["await"] would not be considered an await command. What I am saying is a new syntax, and the actual syntax obj.await would have the same semantics as (await obj). This is something at the syntactic level, and not at the “property lookup” level.

                                                                          parenthesized arrow function returns are indeed good. I want the best thing but as you point out, it’s two characters.

                                                                          1. 1

                                                                            You mentioned breaking existing code RE let/var syntax. In my magical universe you are opting into stricter mode at a file level.

                                                                            You hit the same problems we had with strict mode. First you break copy/paste for JS, as now you need to copy/paste into identical contexts, then there’s the production use case which is that servers will concatenate responses for performance, and you can run strict code in an non-strict code, but your stricter mode would mean such concatenation would fail to parse.

                                                                            which feels even better to me but less trivial of a change implementation-wise.

                                                                            The order of terms in an import statement is entirely irrelevant from the PoV of the language implementation, so the only thing that matters is humans reading the code. Hence the questions of language consistency, etc impacting the design decisions.

                                                                            Async functions are defined as their own thing in the spec, so while during the initial popularization phase they really were just syntactic sugar they are their own beasts.

                                                                            Sorry, badly phrased, the spec has the idea of an async function because it needs to be able to describe the toString behavior and whether yield, etc are valid requires an ability to describe when await and yield keywords are valid, and what the semantics of return is.

                                                                            VMs have to convert the JS function in an async function in to a series of internal functions and continuations, based on where await is.

                                                                            This change means that every function call has to be turned into yet another continuation, because you can’t know which calls will return a promise until you call it.

                                                                            But actually you can do this the other way around: every async function can check its calling context to determine what to do with the generated promise! This is not a clean retrofit by any means, though.

                                                                            What exactly is an async function meant to do that is caller dependent? Spontaneously become blocking?

                                                                            I do understand that codegen is a thing, but async functions are not transformed in the ways you are implying when pointing to modern targets.

                                                                            No, this is exactly how they are transformed. JS runtimes aren’t generating a single piece of code and creating multiple entry points. The wait async codegen works is you create a series of functions with continuations, that’s literally how you implement it.

                                                                            What I am saying is a new syntax, and the actual syntax obj.await would have the same semantics as (await obj). This is something at the syntactic level, and not at the “property lookup” level.

                                                                            To clarify “await” would become a special and unique token that isn’t treated the same as any other token in JS? Because .identifier is property lookup, JS does not have different semantics for a.b vs a[“b”], e.g a.of is the same as a[“of”].

                                                                          2. 2

                                                                            I programmed in Python for a long time, and I always hated the import order there. For me, JavaScript’s import order just fits my brain better.

                                                                            1. 1

                                                                              To me, the whole await thing is the worst thing that could happen, because it instantly colors your functions. I don’t have to deal with that in Lua. Here’s a function that requests a gopher page with blocking calls:

                                                                              tcp = require "org.conman.net.tcp"
                                                                              
                                                                              function get_gopher(host,port,selector)
                                                                                local conn = tcp.connect(host,port)
                                                                                if conn then
                                                                                  conn:write(selector,"\r\n")
                                                                                  local document = conn:read("*a")
                                                                                  conn:close()
                                                                                  return document
                                                                                end
                                                                              end
                                                                              

                                                                              And here’s the function I would write if I needed to request a page in a network-event based project:

                                                                              tcp = require "org.conman.nfl.tcp"
                                                                              
                                                                              function get_gopher(host,port,selector)
                                                                                local conn = tcp.connect(host,port)
                                                                                if conn then
                                                                                  conn:write(selector,"\r\n")
                                                                                  local document = conn:read("*a")
                                                                                  conn:close()
                                                                                  return document
                                                                                end
                                                                              end
                                                                              

                                                                              If they look the same, it’s beacuse they are, except for the require()—one pulls in the “blocking version” module and the other one pulls in the “non-blocking async version” module. To the programmer, the API is the same, it’s just the implementation is way different underneath the hood and I don’t have to deal with colored functions. Then again, JavaScript doesn’t have coroutines (to my knowledge, I don’t program in it) so maybe that’s why.

                                                                              1. 1

                                                                                JS explicitly does not have coroutines, because JS is explicitly single threaded. Retroactively adding concurrent execution would cause many many problems - which makes sense when you recognize that JS is somewhat intrinsically tied to UI, and UIs also tend to be single threaded.

                                                                                That said, I’d argue your example is a good example of why |await| has value: I look at that code, and I can’t tell if it’s blocking or not, I can’t tell if I need to be aware of changes to global state, etc

                                                                                1. 1

                                                                                  So is Lua (technically, the C based VM is not thread safe). Coroutines are cooperatively scheduled, there is no preemption, so all data is in a known state. Granted, once a coroutine yields, the state of the program can change, but not, for lack of a better term, unexpectedly with respect to running code. And why do you need to know what blocks and what doesn’t if the system will handle it for you?

                                                                                  I see the JavaScript with all the awaits and promises and all I can see is callback hell and hard to follow program flow.

                                                                                  1. 1

                                                                                    Lua blocking is fine because it’s not used in UI processes. That means randomly making some code block non-visibly is acceptable. Lua also runs only trusted code, so randomly alternating between blocking and non-blocking is acceptable as there’s functionally a single author.

                                                                            1. 8

                                                                              I also really like XML! It has two more things that make it really good for text markup:

                                                                              1. It preserves whitespace, so you can embed code and other whitespace-sensitive stuff in it
                                                                              2. You can nest tags inside content, <a>lik<b>e thi</b>s</a>.

                                                                              I used both features to cut a week off the delivery time of learntla, in a way that I couldn’t have done with JSON, YAML, or even rST.

                                                                              1. 1

                                                                                Could you have done this with HTML?

                                                                                I think HTML has all the nice properties of XML, without a lot of weird stuff like XML namespaces

                                                                                The downside of HTML is that any typo still makes a valid doc. The browser often guesses what you mean in a weird way (i.e. non-balanced tags)

                                                                                It feels like HTML is too liberal, and XML is too strict, although I only really use HTML.

                                                                                What libraries did you use?

                                                                                1. 4

                                                                                  The downside of HTML is that any typo still makes a valid doc. The browser often guesses what you mean in a weird way (i.e. non-balanced tags)

                                                                                  That’s a pretty big downside. Also, HTML is now defined by a “living” standard put forth by the browser cartel, AKA WHATWG. In other words, HTML is what the big browser makers say it is. That’s a poor foundation on which to build.

                                                                                  All of that querkyness in HTML, like the no closing tags and so forth, is a relic of bygone days when most HTML was written by hand. It made the format friendlier to human authors. I really don’t want to ever write XML or HTML by hand in the general case. Doing so is the equivalent of manually speaking SMTP to a mail server. Sometimes I do manually speak SMTP to mail servers for debugging and such, but for the most part, that’s my MUA’s job, not mine.

                                                                                  1. 3

                                                                                    That’s a pretty big downside.

                                                                                    It does however have the benefit of matching what actually happens on the web. You can argue it’s bad, but the alternative is XML, and xhtml, which have repeatedly failed due to the strictness requirements vs human editing.

                                                                                    Also, HTML is now defined by a “living” standard put forth by the browser cartel,

                                                                                    Well that’s nonsense. Any one can make proposals and contribute to the various html specs, which pre html5/whatwg they could not.

                                                                                    In other words, HTML is what the big browser makers say it is.

                                                                                    I mean yes: if the browsers don’t implement something, then the fact that it’s in a “spec” is irrelevant. If the browser do implement something, then you want it to be specified so you don’t recreate the pre-whatwg version of html, where the spec does not match reality and cannot be used to write a browser.

                                                                                    That’s a poor foundation on which to build.

                                                                                    I’m sorry, this is complete BS. Before the WHATWG and the “browser cabal”, what we had was a bunch of specifications that were incomplete, ambiguous, and often times just outright incorrect. The entire reason WHATWG came to exist is because groups like W3C, ECMA, etc would define what they wanted things to be, and not what they actually were. You want them to be “living” because what exactly do you think the alternative is? Again, we’ve had static “versioned” specs were bad.

                                                                                    I get that many people have started their careers in the last decade or so, and I think that means that they are unaware of what the pre-whatwg world was like, and the costs that came with it. Similarly there are strange beliefs about how these standards bodies actually or operate, which can only come about from actually ever interacting with them (and certainly without ever trying to participate in the pre-whatwg bodies, because you generally needed to be an invited expert or a paying corporate member which seems less open than the apparently evil cabal)

                                                                                  2. 3

                                                                                    HTML isn’t extensible. It’s an object model customized to displaying content in a browser. You could technically replace all markup usages of XML with HTML but it gets ugly fast

                                                                                    • docx, pdf — uses XML to create a different object model that’s meant for being printed
                                                                                    • translating a work into lots of languages
                                                                                    • annotating music or lyrics

                                                                                    Markup isn’t necessarily for display, it’s adding data to text

                                                                                    1. 2

                                                                                      What’s not extensible about HTML?

                                                                                      That was the idea behind microformats

                                                                                      https://developer.mozilla.org/en-US/docs/Web/HTML/microformats

                                                                                      I guess I’m mainly talking about the syntax of HTML vs. the syntax of XML.

                                                                                      I know XML has DOM and SAX APIs. Lets leave out DOM altogether

                                                                                      If you just consider a SAX API (even driven) then I don’t see a huge difference between XML and HTML, besides syntax (and all the optional elaborations that are only used by certain apps ?)


                                                                                      I’m asking what libraries people use because maybe XML has a bunch of great libraries that make things easier than using HTML, but so far I am not aware of the benefit

                                                                                      1. 3

                                                                                        Isn’t HTML specifically about using a specific set of tags? Like if you are writing HTML with a bunch of custom tag names you’re actually writing XML and not HTML?

                                                                                        1. 1

                                                                                          I think that was the theory with XHTML, but it never happened in practice. XHTML meant that HTML was just XML with specific tags.

                                                                                          But we don’t use XHTML anymore – we use HTML5, which is more compatible with HTML 4, HTML 3, etc.

                                                                                          All browsers have always ignored HTML tags they don’t understand, because otherwise HTML could never be upgraded … e.g. I just tested this and it just shows “foo”. Maybe it makes it behave like <span> or something.

                                                                                          echo '<mytag>foo</mytag>' > _tmp/i.html
                                                                                          
                                                                                          1. 1

                                                                                            Fun fact, if you put a hyphen in a tag name in html, it’s specifically not a spec tag, so you can create your own formatting around custom tags instead of classes

                                                                                            1. 1

                                                                                              That’s not quite correct. All tags are explicitly valid, what changes is whether a tag has

                                                                                              • Additional non-display semantics (e.g. , , , ….)

                                                                                              • Builtin non-default styling - e.g. , (lol!), etc tags that predated that ability to style/layout that isn’t explicitly built in to the browser

                                                                                              But fundamentally the tag name of an element is not important - the DOM APIs are generic string base document.getElementsByTagName, and CSS’s sigil free (so arguably “default”) identifier matches the tag of an element.