Threads for cmcaine

  1. 4

    You can store the length of the tail in the node to avoid the two extra trips through the list (to compute length).

    I wouldn’t assume someone who uses malloc/new would worry too much about putting another ptrdiff_t length in the node, but if this gives you pause, consider this: Up to an obscene 256TiB, the 16-byte aligned node has 20 bits of “free space” in its pointer (4 at the bottom, and 16 at the top), and you can steal at least an extra 16 bits from the value, meaning you’ve basically got a “free” 36-bit integer with which to store the length of the tail. The bit-shifting might seem expensive, but when you’re pointer-chasing the cache-effects will almost certainly dominate, and a better than 3x-improvement in time is probably worth it.

    1. 1

      Sure, but then you lose O(1) insertion into the lists, and if you’re not using that then why are you using linked lists at all?

      (if the tails of some lists are shared then you may lose the ability to do insertion at all unless you use even more memory on bookkeeping)

      1. 1

        You lose O(1) insertion at the tail but you keep it at the head, so prepending items to the list is still O(1)

        1. 1

          Sure, but if you’re only adding items at one end you might as well use an array (well, I guess if insertion latency matters you might still prefer a list).

      2. 1

        You can store the length there. But there might be something more important to store in those bits. Which is part of the issue with the linked post: it doesn’t contextualise the problem sufficiently to take a holistic approach to optimising it.

      1. 2

        I agree, but I feel that it’s gonna take a long time, given that so many open data geo projects, especially once related to government or governmental organizations seem to use it as the go-to. And a lot of typical software it is used with expects it. I recently added support for it to a piece of software, because of this.

        Also I am not so sure about whether GeoPackage is such a great idea in terms of parsing it. That is not using SQLite, which is more an implementation than a standard.

        While the other options like GeoJSON have their own downsides, I’d really go for “container formats” for which one either have many implementations (JSON being natively implemented in almost every language) or something that is easy to parse to convert into whatever your software needs, not relying on a single major implementation and not having to essentially implement large parts of a database.

        1. 2

          While I agree with parts of your point, IMO SQLite is so incredibly ubiquitous, support basically every platform and language you could image, that I think the point is moot. Whether we think about it or not, a very large amount of our software lives runs on SQLite. it’s incredibly high quality, and one of the best tested pieces of code in wide use. It’s also a format which is seeing a lot of other interesting uses in our industry right now (I keep thinking SQLite is so hot right now, as there’s one or two articles about it a week on lobste.rs at the moment).

          There’s no reason we need to have The One True Format, but as a replacement for Shapefiles and transmitting static geospatial data, GeoPackage seems like an excellent place to start. When I was working on nationalmap.gov.au, shapefiles were the Bain of my existence; so often I’d get an upside down Australia made of dots over Japan, or a teeny weeny Australia sitting right next to null island. And promoting a non-proprietary format over anything from ESRI will always get support from me.

          1. 1

            I completely agree on that. This was in no ways meant to be directed against SQLite (i love it and I wished people at large understood it better), but as consideration for long term standards for example in the context of governmental institutions releasing open data.

            It’s certainly a better option than shapefiles for the reasons given in the article.

            1. 1

              I’m not sure what you meant about the long term comment, isn’t SQLite one of the formats recommended by the US Library of Congress?

          2. 2

            Also I am not so sure about whether GeoPackage is such a great idea in terms of parsing it. That is not using SQLite, which is more an implementation than a standard.

            The database file format for sqlite hasn’t changed since 2006, is reasonably simple and several third party parsers exist. I don’t think this is likely to be a problematic format, even for people who for whatever reason don’t want to use the sqlite3 library.

            1. 1

              I wouldn’t say it’s hard or problematic, but simply that sqlite is a database format and not a generic or GIS-specific format. Of course one can always use the other formats, but given that shape files currently are sadly often used as the only format I think being conservative here makes sense or else we might only end up with a short list of downsides. Or in other words, if you want to convince everyone to use a specific, widely used format you want to eliminate as much reasons to use something else as possible and going through SQLite first might be such a reason in certain situations.

              I just hope it doesn’t cause a lot of headaches and weird situations should SQLite decide to change that format or being scared of doing so or have an old and a new format. Standards that everyone agrees on can have very long term consequences after all.

          1. 16

            Oof, accessing out of bounds memory is pretty surprising to me for a dynamic language … But I guess it’s not surprising if your goal is to compile to fast native code (e.g. omit bounds checks).

            I don’t know that much about how Julia works, but I feel like once you go there, you need to have very high test coverage, and also run your tests in a mode that catches all bound errors at runtime. (they don’t have this?)

            Basically it’s negligent not to use ASAN/Valgrind with C/C++ these days. You can shake dozens or hundreds of bugs out of any real codebase that doesn’t use them, guaranteed.

            Similarly if people are just writing “fast” Julia code without good tests (which I’m not sure about but this article seems to imply), then I’d say that’s similarly negligent.


            I’ve also learned the hard way that composability and correctness are very difficult aspects of language design. There is an interesting tradeoff here between code reuse with multiple dispatch / implicit interfaces and correctness. I would say they are solving O(M x N) problems, but that is very difficult, similar how the design of the C++ STL is very difficult and doesn’t compose in certain ways.

            1. 8

              I haven’t tested it, but I also wondered “how can this be?” You can launch Julia as julia --check-bounds=yes which should override the @inbounds disabling of bounds checking.

              If that works, that @inbounds bugs of the original article persist for many years in spite of this “flip a switch and find them” probably says the issue is more the “culture”. People often confuse culture and PLs, but it is true that as a consumer (who does not write all their own code) both matter.

              1. 4

                Yeah one thing I would add that’s not quite obvious is that you likely need “redundant” tests for both libraries and APPLICATIONS.

                This is because composing libraries is an application-specific concern, and can be done incorrectly With Julia’s generality and dynamic nature, that concern is magnified.

                Again I’d make an analogy to C++ STL – you can test with one set of template instantiations, e.g. myfunction<int, float>. But that doesn’t mean myfunction<int, int> works at all! Let alone myfunction<TypeMyAppCreatedWhichLibraryAuthorsCannotKnowAbout, int>.

                In C++ it might fail to compile, which is good. But it also might fail at runtime. In a dynamic language you only have the option of failing at runtime.


                I have a habit that I think is not super common – I write tests for the libraries I use, motivated by the particular use cases of my application. I also avoid package managers that pull in transitive dependencies (e.g. npm, Cargo, etc.)

                But yeah it sounds like there is a cultural change needed. I have some direct experience with an open source C maintainer rejecting ASAN changes… simply due to ignorance. It can be hard to change the “memes”.

                So to summarize I would say that in Julia it’s not enough for libraries to run tests with --check-bounds=yes – applications also need tests that run with it. And the tests should hit all the library code paths that the application hits.

                1. 5

                  Any Julia project (library or application) that has tests and runs them in the standard way will run them with bounds checking turned on.

                  The issue was that Yuri was using a type with an indexing scheme the author hadnt expected, and that scenario was not tested.

                  1. 4

                    Yeah so then I think it boils down to the (slightly non-obvious) practice of applications writing tests for libraries they depend on.

                    This is not limited to Julia – I do it all the time, e.g. for C and C++. But it does seem more crucial because of the power of the language.

                    1. 1

                      I think my take is that Julia libraries should do (more) proptesting/fuzzing and that type of array and type of number should be some of the dimensions that vary.

                      Tho I also agree that you should test the combinations of libraries you use in experimental work, and perhaps the relative immaturity of Julia’s ecosystem means that extends to quite a lot of projects in Julia.

                  2. 3

                    So to summarize I would say that in Julia […] applications also need tests that […] hit all the library code paths that the application hits.

                    You are probably correct; this sounds like a well informed opinion.

                    It is also great information warning off any would-be users of Julia. Knowing this is a legitimate best practice in the language ecosystem there is no way I would even consider getting entangled.

                    1. 3

                      I haven’t used Julia much, but for some problems your only real options are Matlab and Julia (excluding C++ and Fortran). If I had those types of problems I would pick Julia!

                      (On the other hand, I pick R for data manipulation and stats. Python and Julia are both trying to port R’s abstractions in that domain, but R is where they originated, and I think they’re the best.)

                      FWIW I do think Julia has some really good language design decisions. For example I borrowed their whitespace stripping rule for multi-line strings in Oil.

                      And there is a huge UPSIDE to multiple dispatch – but there is also a downside. I think it can be mitigated with culture.

                  3. 2

                    Bounds checking is turned on by default when running package tests. The issue is that the bounds are not broken for regular arrays. If some tests were written with OffsetArrays then the errors would have been seen.

                    There’s also the context that many of the affected packages were written before Julia supported arrays that are not indexed from 1 and were not updated (to be fair, not that many people use weirdly indexed arrays).

                    1. 3

                      It sounds like they need a “meta-test” aka a “test linter” to validate that “tricky” types are tested. This gets “AI complete” quickly, of course, but just taking a list of types with one that includes OffsetArray { EDIT: or your new EvilArrays} by default to “check that you test” might not be a bad start.. :) Maybe this also already exists but is not used?

                      “Tools that could help were they only used but they aren’t” sounds like a cultural problem to me. As I said, that does not make it an unreal problem. Academic coders can be especially difficult to habituate to any sort of discipline. In fact, I might say it’s a harder to solve problem than more technical things. Many can just “graduate and move on”.

                      I cannot find it now, but there was a recent article about “what programming languages usher/cajole you into doing” mattering. As with any culture, there is also “what kind of people a PL attracts”. They relate. Julia has been REPL/JIT compile focused from the start (to displace Matlab/Python). People who prioritize that kind of development over Text Editor/Version Controlled/Test-Driven|Regulated dev are simply different. {EDIT: or it could be the same person wearing different hats/being in a different mindset…Humans are complicated. :-) }

                  4. 2

                    FWIW I feel my message led to some “piling on”, and not all the replies understood the difficult language design issues here.

                    So I posted this counterpoint: The Unreasonable Effectiveness of Multiple Dispatch (JuliaCon 2019)

                    I would rather work with a language that composes, but where you have to be slightly careful. (Shell is a lot like this.)

                    As opposed to working with QUADRATIC AMOUNTS of brittle, poorly composing, bespoke code. That leads to a different explosion in the test matrix, which makes programs even more unreliable.


                    I think the Julia designers did something very difficult, with huge benefits. But there is a downside, which is very well expressed by this post.

                    But I think that solving / mitigating O(M x N) code explosions is the most important and most difficult thing in language design. Drafts here related to “narrow waists”:

                    https://oilshell.zulipchat.com/#narrow/stream/266575-blog-ideas/topic/Solving.20M.20x.20N.20code.20explosions.20is.20the.20most.20important.20thing (requires login)

                    In other words, in all good languages, everything is an X. For Julia that’s fine grained and dynamically typed pure data, which can be operated on functions with multiple dispatch, and which can be compiled to efficient code.

                  1. 9

                    I can definitely +1 the code quality issues around community packages. The core lang core quality is pretty good, but there’s so many projects that are one-offs written by an academic, and these become core libraries for other more complex libraries!

                    1. 4

                      That’s because they alienate professional engineers by not fixing the warm-up issue, the compiled exe size, and other such earthly considerations. Too bad too, because the language itself is beautifully designed.

                      1. 6

                        I think that’s a bit unfair. If those issues were easily solved then they would have been already. The core devs have put a lot of effort into both of those issues, and things are getting better (as of v1.8 you can now opt-out of bundling llvm with your binary and can strip symbols and IR for smaller binaries; and there have been loads of changes over the last few years to reduce warm-up time (really, compilation time)).

                        1. 1

                          I don’t mean to be unfair. I’m sure there are difficulties in achieving those goals.

                          My impression is that these problems could have been solved, had they been prioritized.

                          Either way, intentions aside, the result is that Julia currently isn’t practical for most real-world software, i.e. that involves distributing to users, and providing a reasonable user experience. I’m fairly certain it’s a big factor in the immaturity of the ecosystem.

                          1. 2

                            I completely disagree. Julia is extremely practical and it’s replacing a TON of R / Python / Fortran code. Especially in the HFT / ML / Bioinformatics worlds.

                        2. 2

                          I don’t think that’s true. There’s a fair number of professional engineers working on libraries in the space. The problem is that you have very smart academics who are writing one-off libraries to finish their research, then no one wants to maintain it.

                      1. 2

                        So… no QA?

                        1. 13

                          QA is a much broader topic than “gating deployments on manual testing” which is what you seem to imply. If you do many of the other things right, you might find that gating on manual testing no longer catches enough defects to be worth it.

                          In fact, gating on manual testing actually works counter to many other good QA practises – some of which are described in TFA, like small batches.

                          1. 2

                            This. Sometimes you need to gate this way, but it’s a sign of a problem, not a feature to bring everywhere.

                          2. 10

                            QA is integrated into development in the form of full tests.

                            Having QA be an extra few month process is a sign of less than ideal development practices.

                            1. 8

                              Hopefully more companies take this approach, it is easier to compete with companies that don’t have QA.

                              1. 13

                                Honestly I’ve never worked anywhere where I felt like a QA organization added much value. They would automate tests for product requirements which were usually covered by product-dev test suites anyway, except the QA tests would take longer to run, they’d be far flakier, and they would take much longer to identify the cause of failure. Moreover, the additional manual testing they would perform in-band of releases rarely turns up issues that would reasonably block a release (they aren’t usually the ones finding the security issues, for example). It really feels like QA is something that needs to be absorbed into product/dev much like the devops philosophy regarding infrastructure–basically, developers should just learn to write automated tests and think about failure modes during development (perhaps via a design review process if necessary).

                                It seems like the thinking behind QA orgs is something like “these well-paid devs spend all of this time writing tests! let’s just hire some dev bootcamp grads for a fraction of the cost of a developer and have them write the tests!”. Unfortunately, writing good tests depends on an intimate understanding of the product requirements and the source code under test as well as strong automation skills, and the QA orgs I’ve worked with rarely tick any of these boxes (once in a while you get a QA engineer who ticks one or more of these boxes, but they’re quickly promoted out of QA).

                                1. 2

                                  I have had an experience where the QA team found a showstopper “go back and fix before our customers get broken” bug.

                                  Sometimes a long QA cycle can really help, but it’s not something you want to need.

                                  1. 1

                                    I don’t doubt that these happen, but plausibly a company without a QA org might’ve done more diligence and caught it early rather than throwing it over the wall to QA. Moreover, even if the QA org caught this one, it must be weighed against the overall inefficiency that the org introduces. Is one show stopper really worth being consistently late to market? In some orgs the answer might be “yes” but in most it probably isn’t (unless “show stopper” means “a bug so serious it sinks the company” by definition).

                                2. 6

                                  As it turns out, not always (e.g. https://spectrum.ieee.org/yahoos-engineers-move-to-coding-without-a-net)

                                  I have worked at companies with long QA cycles for good reasons - among other things we shipped software on physical device and things like iOS apps where you cannot fix a mistake instantly because of validation. But at a Web-based software company I would also say it is probably not always a good trade-off.

                                  1. 2

                                    FWIW, Monzo is mostly accessed through apps on your phone. When they launched the bank you couldn’t even login on their website. Their apps do seem like they’re mostly just browsers accessing some private site, though.

                              2. 2

                                fast deploys and QA aren’t incompatible. The classic way of handling this is to write code behind feature flags so that code can be integrated into production-esque environments (like a staging env) and handle at various paces.

                                This has the added benefit of making it easy to roll out refactors and other pieces of code without having long-running branches that accumulate a lot of stuff. And you can do stuff like release features for a subset of users, do betas, roll back feature releases, all mostly decoupled from the code writing.

                                And of course 100 deploys to prod per day in a full CI/CD system is likely a lot of tiny deploys that are just like…. version bumps of libaries for some random service and the like.

                                1. 1

                                  We do QA! This is not something I’m very involved with, so I’ll struggled to give you details. I’d recommend you check out this recent blog post on our approach to QA.

                                1. 2

                                  How would you notice an error returned by the close() syscall for a file?

                                  1. 1

                                    Exceptions, maybe?

                                    1. 2

                                      In Rust?

                                      1. 1

                                        Panic, probably.

                                        1. 3

                                          Error handling in drop is problematic indeed. Some libraries provide close(self) -> Result for handling this the hard way when you really care about the result.

                                          1. 2

                                            std::file::File chooses to ignore those errors.

                                            https://doc.rust-lang.org/stable/std/fs/struct.File.html

                                            1. 2

                                              Ah, but importantly, it gives you the option to explicitly handle them and also explicitly documents what happens in the default case:

                                              https://doc.rust-lang.org/stable/std/fs/struct.File.html#method.sync_all

                                      2. 1

                                        To be honest how would you like to handle that situation in your program? Terminate it completely? Retry closing? What if you can’t close the file at all? This is one of those scenarios where error handling isn’t obvious.

                                        1. 1

                                          I agree that there is no obvious right answer. But hiding the error is obviously wrong.

                                      1. 6

                                        Applications can be built either using cloud native buildpacks, Dockerfiles or arbitrary docker images that you generated with something like Nix’s pkgs.dockerTools.buildLayeredImage

                                        How long until we can just start using Nix easily in deployment environments without jumping through unnecessary hoops with containers or buildpacks? Why not built-in support to run the Flake’s defaultApp?

                                        1. 2

                                          What benefit would that give?

                                          1. 5

                                            Much faster builds with package-level caching. Deploying from a flake may even be cheaper for the provider than dealing with docker images.

                                            1. 1

                                              Thanks for explaining.

                                        1. 4

                                          Another thing to consider is that a cryptographic hash can reliably give you 128 bits that will uniquely and deterministically identify any piece of data. Very handy if you need to make UUIDs for something that doesn’t have them.

                                          1. 1

                                            But even a cryptographic hash has collisions, however unlikely. So there is always a chance that two distinct pieces of data will end up with the same id. But probably the same can happen, for example, with random UUID (except here you rely on the quality of your randomness source rather than the quality of your hash function and the shape of your data). Somehow using a crypto hash as a long-term id always feels iffy to me.

                                            1. 3

                                              Given the size of hashes, using the hash of the content as its id is totally safe. The hashing algorithms are designed such that collisions are so unlikely that they are impossible to happen.

                                              If this wasn’t the case, all systems based on content adressing would be in serious trouble. Systems like git or ipfs

                                              1. 1

                                                If this wasn’t the case, all systems based on content adressing would be in serious trouble. Systems like git or ipfs

                                                Any system that assumes a hash of some content uniquely identifies that content is in fact in serious trouble! They work most of the time, but IPFS is absolutely unsound in this regard. So is ~every blockchain.

                                                1. 2

                                                  I’ve seen several cryptographic systems rely on exactly this fact for their security. So while it’s probabilistic, you’re relying on the Birthday Paradox to ensure it’s highly unlikely.

                                                  From that table, for a 128-bit hash function, for a 0.1% chance of a collision, you’d need to hash 8.3e17 items. In practical terms, a machine that can hash 1,000,000 items per second would need to run for just over 26 millennia to have 0.1% chance of a collision.

                                                  For systems that use 256-bit digests (like IPFS), it would take many orders of magnitude longer.

                                                  1. 1

                                                    I’ve seen several cryptographic systems rely on exactly this fact for their security. So while it’s probabilistic, you’re relying on the Birthday Paradox to ensure it’s highly unlikely.

                                                    If collisions are an acceptable risk as long as their frequency is low enough, then sure, no problem! Engineering is all about tradeoffs like this one. You just can’t assert that something which is very improbable is impossible.

                                                    1. 1

                                                      I can still pretend and almost certainly get away with it. If the chances of getting bitten by this pretense is ten orders of magnitude lower than the chance of a cosmic ray hitting the right transistor in a server’s RAM and cause something really bad to happen, then for all practical purposes, I’m safe to live in blissful ignorance. And what a bliss it is; Assuming a SHA-256 hash uniquely identifies a given string can immensely simplify your system architecture.

                                                    2. 1

                                                      You’ve misread the table. 1.5e37 is for 256 bit hashes. For 128 bits it’s 8.3e17, which is obviously a lot smaller.

                                                      For context with IPFS, the Google search index is estimated to contain between 10 and 50 billion pages.

                                                      1. 1

                                                        You’ve misread the table

                                                        Thanks. Although it’s what happens when you start with an 256bit example, then remember everyone’s talking about UUIDs, and hastily re-calculate everything. :/

                                                2. 3

                                                  No, this is absolutely not a consideration. Your CPU and RAM have lower reliability than a 128-bit cryptographic hash. If you ever find a collision by chance it’ll be more likely to be false positive due to hardware failure (we’re talking 100-year timespan at a constant rate of billions of uuids per second).

                                                  And before you mention potential cryptographic weaknesses, consider that useful attacks need a preimage attack, and the “collision” attacks known currently are useless for making such uuids collide.

                                                  1. 2

                                                    Whenever you map from a set of cardinality N (content) to a set of cardinality less than N (hashes of that content) by definition you will have collisions. A hash of something is just definitionally not equivalent to that thing, and doesn’t uniquely identify it.

                                                    As an example, if I operate a multi-tenant hosting service, and a customer uploads a file F, I simply can’t assert that any hash of the content of F can be used as a globally unique reference to F. Another customer can upload a different file F’ which hashes identically.

                                                    “Highly unlikely” isn’t equivalent to “impossible”.

                                                    1. 4

                                                      It’s absolutely impossible for all practical purposes. It’s a useless pedantry to consider otherwise.

                                                      Remember we’re talking about v4 UUID here which already assumes a “risk” of collisions. Cryptographic hashes are indistinguishable from random data, and are probably more robust than your prng.

                                                      The risk of an accidental collision is so small, you can question whether there’s enough energy available to our entire civilisation to compute enough data to ever collide in the 128-bit space in it’s lifetime.

                                                      1. 1

                                                        It’s absolutely impossible for all practical purposes. It’s a useless pedantry to consider otherwise.

                                                        I mean it literally isn’t, right? “Absolutely impossible” is just factually not equivalent to “highly improbable” — or am I in the wrong timeline again? 😉 Going back to the hosting example, if you want to use UUIDs to identify customer documents that’s fine, but you can’t depend on the low risk of collisions to establish tenant isolation, you still have to namespace them.

                                                        1. 1

                                                          By your definition of impossible, there is literally no possible solution since it would require infinite memory. At that point you should question why using a computer at all.

                                                          The fact is that people don’t quite understand uuid and use it everywhere in meme fashion. Most uuid usages as database keys i’ve seen, it is even stored as a string. Which creates much more serious problems than those in discussion here.

                                                          1. 1

                                                            You don’t need infinite memory to uniquely identify documents among customers. You just need a coördination point to assign namespaces.

                                                            I agree that UUID using in the wild is…. wild. I don’t think most developers even really understand that the dashed-hex form of a UUID is actually just one of many possible encodings of what is ultimately just 16 bytes in memory.

                                                      2. 3

                                                        The only system behaviour that the universe guarantees is that all systems will eventually decay.

                                                        For any other behaviour you have to accept some probability that it won’t happe (hardware failure, bugs, operator error, attacks, business failure, death, and, yes, hash collisions).

                                                        Hash collisions with a good algorithm will often be a risk much lower than other factors that you can’t control. When they are, what sense does it make to worry about them?

                                                        1. 1

                                                          There is a categorical difference between hash collisions and the kind of unavoidable risk you’re describing, e.g. solar flares flipping a bit in my main memory.

                                                      3. 1

                                                        After doing some more thinking/reading, I would agree with you for something like SHA-256. But using a 128-bit hash still seems like a bad idea. I found this paragraph from a reply on HN summarizes it quite well:

                                                        In cases where you can demonstrate that you only care about preimage resistance and not collision resistance, then a 128-bit hash would be sufficient. However often collision attacks crop in in unexpected places or when your protocol is used in ways you didn’t design for. Better to just double the hash size and not worry about it.

                                                        I think Git’s messy migration from SHA-1 is a good cautionary tale. Or do you believe it’s completely unnecessary?

                                                        1. 1

                                                          Git’s migration is due to a known weakness in SHA-1, not due to the hash size being too small. I believe git would be perfectly fine if it used a different 128-bit cryptographic hash.

                                                          The first sentence you’ve quoted is important. There are uses of git where this SHA-1 weakness could matter. For UUID generation it’s harder to imagine scenarios where it could be relevant. But remember you don’t need to use SHA-1 — you can use 128 bits of any not-yet-broken cryptographic hash algorithm, and you can even pepper it if you’re super paranoid about that algorithm getting cracked too.

                                                          1. 1

                                                            The first sentence you’ve quoted is important. There are uses of git where this SHA-1 weakness could matter.

                                                            Yes, but isn’t git’s situation exactly what the rest of that paragraph warns about: SHA-1 is succeptible to a collision attack, not a preimage attack. And now everyone is trying to figure out whether this could be exploited in some way even though on the surface git is just a simple conten-addressable system where collision attacks shouldn’t matter. And as far as I can tell there is still no consensus either way.

                                                            And as the rest of that reply explains, 128-bit is not enough to guarantee collision resistance.

                                                            1. 1

                                                              If the 128-bit space is not enough for you, then it means you can’t use UUID v4 at all.

                                                              Their whole principle of these UUIDs is based on the fact that random collisions in the 128-bit space are so massively improbable that they can be safely assumed to never ever happen. I need to reiterate that outputs of a not-broken cryptographic hash are entirely indistinguishable from random.

                                                              Resistance of a hash algorithm to cryptographic (analytic) attacks is only slightly related to the hash size. There are other much more important factors like the number of rounds that the hash uses, and that factor is independent of the output size, so it’s inaccurate to say that 128-bit hashes are inherently weaker than hashes with a larger output.

                                                              Please note that you don’t need to use SHA-1. SHA-1’s weakness is unique its specific algorithm, not to 128-bit hashes in general. You can pick any other algorithm. You can use SHA-2, SHA-3, bcrypt/scrypt, or whatever else you like, maybe even a XOR of all of them together.

                                                  1. 6

                                                    100 versions later

                                                    This seems to be playing a little loose with the facts. At some point Firefox changed their versioning system to match Chrome, I assume so that it wouldn’t sound like Firefox was older or behind Chrome in development. Firefox did not literally travel from 1.0 to 100. So it probably either has fewer or more than 100 versions, depending on how you count. UPDATE: OK I was wrong, and that was sloppy of me, I should have actually checked instead of relying on my flawed memory. There are in fact at least 100 versions of Firefox. Seems like there are probably more than 100, but it’s not misleading to say that there are 100 versions if there are more than 100.

                                                    That said, this looks like a great release with useful features. Caption for picture-in-picture video seems helpful, and I’m intrigued by “Users can now choose preferred color schemes for websites.” On Android, they finally have HTTPS-only mode, so I can ditch the HTTPS Everywhere extension.

                                                    1. 6

                                                      Wikipedia lists 100 major versions from 1 to 100.

                                                      https://en.m.wikipedia.org/wiki/Firefox_version_history

                                                      What did happen is that Mozilla adopted a 4 week release cycle in 2019 while Chrome was on a 6 week cycle until Q3 2021.

                                                      1. 4

                                                        They didn’t change their version scheme, they increased their release cadence.

                                                        1. 7

                                                          They didn’t change their version scheme

                                                          Oh, but they did. In the early days they used a more “traditional” way of using the second number, so we had 1.5, and 3.5, and 3.6. After 5.0 (if I’m reading Wikipedia correctly) they switched to increasing the major version for every release regardless of its perceived significance. So there were in fact more than 100 Firefox releases.

                                                          https://en.wikipedia.org/wiki/Firefox_early_version_history

                                                          1. 3

                                                            I kinda dislike this “bump major version” every release scheme, since it robs me of the ability to visually determine what may have really changed. For example, v2.5 to v2.6 is a “safe” upgrade, while v2.5 to v3.0 potentially has breaking changes. Now moving from v99 to v100 to v101, well, gotta carefully read release notes every single time.

                                                            Oracle did something similar with JDK. We were on JDK 6 for several years, then 7 and then 8, until they ingested steroids and now we are on JDK 18! :-) :-)

                                                            1. 7

                                                              Sure for libraries, languages and APIs, but Firefox is an application. What is a breaking change in an application?

                                                              1. 4

                                                                I got really bummed when Chromium dropped the ability to operate over X forwarding in SSH a few years ago, back before I ditched Chromium.

                                                                1. 1

                                                                  Changing the user interface (e.g. keyboard shortcuts) in backwards-incompatible ways, for one.

                                                                  And while it’s true that “Firefox is an application”, it’s also effectively a library with an API that’s used by numerous extensions, which has also been broken by new releases sometimes.

                                                                  1. 1

                                                                    My take is that it is the APIs that should be versioned because applications may expose multiple APIs that change at different rates and the version numbers are typically of interest to the API consumers, but not to human users.

                                                                    I don’t think UI changes should be versioned. Just seems like a way to generate arguments.

                                                                2. 6

                                                                  It doesn’t apply to consumer software like Firefox, really. It’s not a library for which you care if it’s compatible. I don’t think version numbers even matter for consumer software these days.

                                                                  1. 5

                                                                    Every release contains important security updates. Can’t really skip a version.

                                                                    1. 1

                                                                      Those are all backported to the ESR release, right? I’ve just noticed that my distro packages that; perhaps I should switch to it as a way to get the security fixes without the constant stream of CADT UI “improvements”…

                                                                      1. 2

                                                                        Most. Not all, because different features and such. You can compare the security advisories.

                                                                  2. 1

                                                                    Oh, yeah, I guess that’s right. I was focused in on when they changed the release cycle and didn’t think about changes earlier than that. Thank you.

                                                              1. 5

                                                                Great news (assuming there is a sustainable business model to fund further development)! On the gossip side, how much do you think this was influenced by Dagger.io entering the same niche of portable CI pipelines (and being open-source from the get-go)?

                                                                Edit: (fragment of) a related discussion on HN with Dagger’s CEO take https://news.ycombinator.com/item?id=30858823

                                                                1. 1

                                                                  From twitter it seems like more people know about Dagger.io than Earthly already, which is a shame.

                                                                  1. 1

                                                                    Why is that a shame? Is earthly notably better?

                                                                    1. 2

                                                                      No, I don’t know which one is better yet, but they fill very very similar roles. In my opinion it’s a shame just because I wish there was more momentum into either one of them rather than fragmentation. Earthly probably should have got it given it’s head start.

                                                                      On the flip side competition is nice as we can see in this case, but I don’t particularly trust Docker after Docker Desktop’s pricing/usage changes.

                                                                      1. 1

                                                                        Note if anyone’s reading this still; I was confused by Daggers tagline “From the creators of Docker”. Those individuals are probably fine, and were not necessarily the ones that made choices leading to the relative downfall of Docker including their recent pricing changes.

                                                                        1. 1

                                                                          Thanks for explaining.

                                                                  1. 5

                                                                    I always cringe a bit when I read things like:

                                                                    However, the most recent major update of text changed its internal string representation from UTF-16 to UTF-8.

                                                                    One of the biggest mistakes that a language can make is to have a string representation. Objective-C / OpenStep managed to get this right and I’ve seen large-scale systems doubling their transaction throughput rate by having different string representations for different purposes.

                                                                    This is particularly odd for a language such as Haskell, which excels at building abstract data types. This post is odd in that it demonstrates an example of the benefits of choosing a string representation for your workload (most of their data is ASCII, stored as UTF-8 to handle the cases where some bits aren’t), yet the entire post is about moving from one global representation to another.

                                                                    For their use, if most of their data is ASCII, then they could likely get some big performance boots from having two string representations:

                                                                    • A unicode string stored as UTF-8, with a small (lazily-built - this is Haskell, after all) look-aside structure to identify code points that span multiple code units.
                                                                    • A unicode string stored as ASCII, where every code point is exactly one byte.
                                                                    1. 6

                                                                      One of the biggest mistakes that a language can make is to have a string representation.

                                                                      By this optic, we are in luck! Haskell has ~6 commonly used string types. String, Text, lazy Text, ByteString, lazy ByteString, ShortByteString and multiple commonly used string builders! /i

                                                                      I am very happy with the text transition to UTF-8. Conversions from ByteString are now just a UTF-8 validity check and buffer copy and in the other direction a zero-copy wrapper change.

                                                                      1. 4

                                                                        I think what David is saying is that ObjC has one string type (NSString/NSMutableString) with several underlying storage representations, including ones that pack short strings into pointers. That fact does not bubble up into several types at the surface layer.

                                                                        1. 3

                                                                          Exactly as @idrougge says: a good string API decouples the abstract data type of a string (a sequence of unicode code points) from the representation of a string and allows you to write efficient code that operates over the abstraction.

                                                                          NSString (OpenStep’s immutable string type) requires you to implement two methods:

                                                                          • length returns the number of UTF-16 code units in the string (this is a bit unfortunate, but OpenStep was standardised just before UCS-2 stopped being able to store all of unicode. This was originally the number of unicode characters.)
                                                                          • characterAtIndex: returns the UTF-16 code unit at a specific point index (again, designing this now, it would be the unicode character).

                                                                          There is also an optional -copyCharacters:inRange:, which amortises Objective-C’s dynamic dispatch cost and bounds checking costs by performing a batched sequence of -characterAtIndex: calls. You don’t have to provide this, but things are a lot faster if you do (the default implementation calls -characterAtIndex: in a loop). You can also provide custom implementations of various other generic methods if you can do them more efficiently in your implementation (for example, searching may be more efficient if you convert the needle to your internal encoding and then search).

                                                                          There are a couple of lessons that ICU learned from this when it introduced UText. The most important is that it’s often useful to be able to elide a copy. The ICU version (and, indeed, the Objective-C fast enumeration protocol, which sadly doesn’t work on strings) provides a buffer and allows you to either copy characters to this buffer, or provide an internal pointer, when asked for a particular range and allows you to return fewer characters than are asked for. If your internal representation is a linked list (or skip list, or tree, or whatever) of arrays of unicode characters then you can return each buffer in turn while iterating over the string.

                                                                          The amount of performance that most languages leave on the floor from mandating that text is either stored in contiguous memory (or users must write their entire set of text-manipulation routines without being able to take advantage of any optimised algorithms in the standard library) is quite staggering.

                                                                          1. 4

                                                                            a good string API decouples the abstract data type of a string (a sequence of unicode code points) from the representation of a string and allows you to write efficient code that operates over the abstraction.

                                                                            How, when different abstractions have different tradeoffs? ASCII is single-byte, UTF-8 and UTF-16 are not, and so indexing into them at random character boundaries is O(1) vs. O(n). The only solution to that I know of is to… write all your code as if it were a variable-length string encoding, at which point your abstract data type can’t do as well as a specialized data type in certain cases.

                                                                            1. 3

                                                                              Tangentially, you can find the start of the next (or previous) valid codepoint from a byte index into a UTF8 or UTF16 string with O(1) work. In UTF8, look for the next byte that doesn’t start with “0b10” in the upper two bits. I’m a known valid UTF-8 string it’ll be occur within at most 6 bytes. :)

                                                                              (Indexing into a unicode string at random codepoint indices is not a great thing to do because it’s blind to grapheme cluster boundaries.)

                                                                              Serious question, have you ever actually indexed randomly into ASCII strings as opposed to consuming them with a parser? I can’t personally think of any cases in my career where fixed-width ASCII formats have come up.

                                                                              1. 2

                                                                                Serious question, have you ever actually indexed randomly into ASCII strings as opposed to consuming them with a parser? I can’t personally think of any cases in my career where fixed-width ASCII formats have come up.

                                                                                I have, yes, but only once for arbitrary strings. I was writing a simple mostly-greedy line-breaking algorithm for fixed-width fonts, which started at character {line length} and then walked forwards and backwards to find word breaks and to find a hyphenation point. Doing this properly with the dynamic programming algorithm from TeX, in contrast, requires iterating over the string, finding potential hyphenation points, assigning a cost to each one, and finally walking the matrix to find the minimal cost for the entire paragraph.

                                                                                I’ve also worked with serialised formats that used fixed-width text records. For these, you want to split each line on fixed character boundaries. These are far less common today, when using something like JSON adds a small amount of size (too much in the ’80s, negligible today) and adds a lot more flexibility.

                                                                                For parallel searching, it’s quite useful to be able to jump to approximately half (/ quarter / eighth / …) of the way along a string, but that can be fuzzy: you don’t need to hit the exact middle, if you can ask for an iterator about half way along then the implementation can pick a point half way along and then scan forwards to find a character boundary.

                                                                                More commonly, I’ve done ‘random access’ into a string because integers were the representation that the string exposed for iterators. It’s very common to iterate over a string, and then want to backtrack to some previous point. The TeX line breaking case is an example of this: For every possible hypenation point, you capture a location in the string when you do the forward scan. You then need to jump to those points later on. For printed output, you probably then do a linear scan to convert the code points to glyphs and display them, so you can just use an integer (and insert the hyphen / line break when you reach it), but if you’re displaying on the screen then you want to lay out the whole paragraph and then skip to the start of the first line that is partially visible.

                                                                                ICU’s UText abstraction is probably the best abstract type that I’ve seen for abstracting over text storage representations. It even differentiates between ‘native’ offsets and code unit offsets, so that you can cache the right thing. The one thing I think NSString does better is to have a notion of the cheapest encoding to access. I’d drop support for anything except the unicode serialisations in this, but allow 7-bit ASCII (in 8-bit integers), UTF-8, UTF-16, UTF-32 (and, in a language that has native U24 support, raw unicode code points in 24-bit integers) so that it’s easy to specialise your algorithm for a small number of cases that should cover any vaguely modern data and just impose a conversion penalty on people bringing data in from legacy encodings. There are good reasons to prefer three of the encodings from that list:

                                                                                • ASCII covers most text from English-speaking countries and is fixed-width, so cheap to index.
                                                                                • UTF-8 is the densest encoding for any alphabetic language (important for cache usage).
                                                                                • UTF-16 is the densest encoding for CJK languages (important for cache usage).

                                                                                UTF-32 and U24 unicode characters are both fixed-width encodings (where accessing a 32-bit integer may be very slightly cheaper than a 24-bit one on modern hardware), though it’s still something of an open question to me why you’d want to be able to jump to a specific unicode code point in a string, even though it might be in the middle of a grapheme cluster.

                                                                                Apple’s NSString implementation has a 6-bit encoding for values stored in a single pointer, which is an index into a tiny table of the 64 most commonly used characters based on some large profiling thing that they’ve run. That gives you a dense fixed-width encoding for a large number of strings. When I added support for hiding small (7-bit ASCII) strings in pointers, I reduced the number of heap allocations in the desktop apps I profiled by over 10% (over 20% of string allocations), I imagine that Apple’s version does even better.

                                                                              2. 1

                                                                                I’ve written code in Julia that uses the generic string functions and then have passed in an ASCIIStr instead of a normal (utf8) string and got speedups for free (i.e. without changing my original code).

                                                                                Obviously if your algorithm’s performance critically depends on e.g. constant time random character access then you’re not going to be able to just ignore the string type, but lots of the time you can.

                                                                                1. 1

                                                                                  indexing into them at random character boundaries is O(1) vs. O(n).

                                                                                  Raku creates synthetic codepoints for any grapheme that’s represented by multiple codepoints, and so has O(1) indexing. So that’s another option/tradeoff.

                                                                                  1. 1

                                                                                    Julia similarly allows O(1) indexing into its utf8 strings, but will throw an error if you give an index that is not the start of a codepoint.

                                                                                    1. 3

                                                                                      But that’s just UTF-8 code units, i.e. bytes; you can do that with C “strings”. :)

                                                                                      Not grapheme clusters, not graphemes, not even code points, and not what a human would consider a character.

                                                                                      If you have the string "þú getur slegið inn leitarorð eða hakað við ákveðinn valmöguleika" and want to get the [42]nd letter, ð, indexing into bytes isn’t that helpful.

                                                                                      1. 1

                                                                                        Oh, I see I misunderstood. So Raku is storing vectors of graphemes with multi-codepoint graphemes treated as a codepoint. Do you know how it does that? A vector of 32bit codepoints with the non-codepoint numbers given over to graphemes + maybe an atlas of synthetic codepoint to grapheme string?

                                                                                  2. 1

                                                                                    How, when different abstractions have different tradeoffs? ASCII is single-byte, UTF-8 and UTF-16 are not, and so indexing into them at random character boundaries is O(1) vs. O(n).

                                                                                    Assuming that your data structure is an array, true. For non-trivial uses, that’s rarely the optimal storage format. If you are using an algorithm that wants to do random indexing (rather than small displacements from an iterator), you can build an indexing table. I’ve seen string representations that store a small skip list so that they can rapidly get within a cache line of the boundary and then can do a linear scan (bounded to 64 bytes, so O(1)) to find the indexing point.

                                                                                    If you want to be able to handle insertion into the string then a contiguous array is one of the worst data structures because inserting a single character is an O(n) operation in the length of the string. It’s usually better to provide a tree of bounded-length contiguous ranges and split them on insert. This also makes random indexing O(log(n)) because you’re walking down a tree, rather than doing a linear scan.

                                                                                  3. 1

                                                                                    I really miss working in the NS* world.

                                                                                  4. 2

                                                                                    ByteString isn’t a string type though, it’s a binary sequence type. You should never use it for text.

                                                                                    1. 3

                                                                                      ByteString is the type you read UTF-8 encoded data into, then validate it is properly encoded before converting into a Text - it is widely used in places where people use “Strings” in other languages like IO because it is the intermediate representation of specific bytes. It fits very well in with the now common Haskell mantra of parse, don’t validate](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/) - we know we have some data, and we need a type to represent it; we parse it into a Text which we then know is definitely valid (which these days is just a zero copy validation from a UTF-8 encoded ByteString). It’s all semantics, but we’re quite happy talking about bytestrings as one of the string types, because it represents a point in the process of dealing with textual data. Not all ByteStrings are text, but all texts can be ByteStrings.

                                                                                  5. 2

                                                                                    This comment reads very much like you’re quite ignorant of the actual state of strings in Haskell, particularly given how many people complain that we have too many representations.

                                                                                    Also, this article is specifically about code which relies on internal details of a type, so I’m not sure how your suggestions help at all - this algorithm would need to be written for the specific representations actually used to be efficient.

                                                                                    One thing I have wanted to do for a while is add succinct structures to UTF-8 strings which allow actual O(1) indexing into the data, but that’s something that can be built on top of both the Text and ByteString types.

                                                                                    1. 1

                                                                                      It sounds like you missed the /i in the parent post. I know, it’s subtle ;)

                                                                                      1. 1

                                                                                        That is not the parent post. Axman6 was replying to David. :)

                                                                                        1. 1

                                                                                          argh, thread’s too too long :)

                                                                                      2. 1

                                                                                        This comment reads very much like you’re quite ignorant of the actual state of strings in Haskell, particularly given how many people complain that we have too many representations.

                                                                                        I don’t use Haskell but the complaints that I hear from folks that do are nothing to do with the number of representations, they are to do with the number of abstract data types that you have for strings and the fact that each one is tied to a specific representation.

                                                                                        Whether text is stored as a contiguous array of UTF-{8,16,32} or ASCII characters, as a tree of runs of characters in some encoding, embedded in an integer, or in some custom representation specifically tailored to a specific use should affect performance but not semantics of any of the algorithms that are built on top. You can then specialise some of the algorithms for a specific concrete representation if you determine that they are a performance bottleneck in your program.

                                                                                        One thing I have wanted to do for a while is add succinct structures to UTF-8 strings which allow actual O(1) indexing into the data, but that’s something that can be built on top of both the Text and ByteString types.

                                                                                        It’s something that can be built on top of any string abstract data type but cannot be easily retrofitted to a concrete type that exposes the implementation details without affecting the callers.

                                                                                        1. 1

                                                                                          number of abstract data types that you have for strings and the fact that each one is tied to a specific representation

                                                                                          The types are the representations.

                                                                                          You can write algorithms that would work with any of String and Text and Lazy.Text in Haskell using the mono-traversable package.

                                                                                          However, that whole bunch of complexity is only justified if you’re writing a library of complex reusable text algorithms without any advanced perf optimizations. Otherwise in practice there just doesn’t seem to be that much demand for indirection over string representations. Usually a manual rewrite of an algorithm for another string type is faster than adding that whole package.

                                                                                    1. 3

                                                                                      I was amused that the first three times I tried to read the article it showed the content briefly then replaced it with an error message about not being able to load a JS dependency.

                                                                                      1. 2

                                                                                        Interesting, do you know what the dependency was? Would definitely like to fix that, the site is still relatively new so have been working out the kinks!

                                                                                        1. 3

                                                                                          The error message didn’t say and I can’t get it to reproduce any more. I was (and am) using Firefox for Android, if that helps at all.

                                                                                          1. 3

                                                                                            That does! I think I’ll add some basic error reporting, I didn’t want to include a full framework like Sentry (they seem to track a lot of info and are pretty huge, and I wanted the site to be as minimal and non-invasive as possible) but adding an extra error handling route will be a breeze, so I can get some indicator of folks hitting client side errors.

                                                                                        2. 2

                                                                                          You know what, I think this may have been an update I made rolling out, the previous scripts/deps would have failed to load but the app may have been cached (Cloudflare has a fairly long caching period on the free plan I’ve found) so it was attempting to load them asynchronously and failing. Definitely something I’ll be more mindful of when making updates!

                                                                                          1. 2

                                                                                            Cool, happy to help!

                                                                                        1. 18

                                                                                          Failing to consider jQuery a “framework” seems arbitrary and wrong.

                                                                                          Yes, jQuery provided massive compatibility fixes that the very fad driven “first frameworks” from the article lack, but to then dismiss it as just a compatibility layer is nonsense. jQuery was so widely deployed (well intentioned, but wrong :) ) people pushed for it to be included as part of the literal standards.

                                                                                          Beyond those core compatibility and convenience functions jQuery feature an extensive plugin architecture, and provided UI components, reactive updates, various form, and other user data supports. All of which sound not significantly different to the feature set of those “frameworks”.

                                                                                          This article then goes on to dismiss Ruby on Rails with a single sentence. Given Ruby on Rails pretty much created the entire concept of the integrated frontend and backend, with integrated databases, that seems bizarre?

                                                                                          Honestly, reading this post felt like it was written by someone who had encountered a few fad frameworks, added a few of the still major ones, and then called that a history. Honestly I don’t think this is worth spending time reading, if your goal is to actually learn something about the history of web framework.

                                                                                          1. 9
                                                                                            1. It’s definitely true that there was a massive ecosystem around jQuery, but I think it’s fair to say it was (and so were MooTools and Prototype) doing a fundamentally different thing than the “frameworks” which emerged in the early 2010s. The shift from managing state and reactivity in the DOM to managing it in the framework layer was a major shift.
                                                                                            2. Ruby on Rails… isn’t a JavaScript framework. So it’d be pretty weird to spend a bunch of time on it in a post about eras in JS frameworks!
                                                                                            3. The author was a major contributor to and member of the Framework Core team for Ember.js for several years, helped build its modern reactivity system, and has spent quite a bit of time with others along the way. Your dismissal is kind of hilariously wrong.
                                                                                            1. 4

                                                                                              I disagree with (1), unless the post were updated to state that it is talking about a specific framework architecture, rather than “frameworks” in general.

                                                                                              As I think about it, I agree on (2), because now that I recall people were still using separate libraries in client code. What it thinking of when writing the above was the adoption of the concept of “application frameworks”, which Ruby is a major early driver of, but as you say it didn’t actually interact with JS directly, you were using frameworks like jQuery, etc in the client, and rails was just providing the application data and state.

                                                                                              I’ll take response (3) as a mea culpa :D

                                                                                              1. 5

                                                                                                I think it’s a fair point on further reflection. By the time I was starting “application frameworks” were just the default, Ruby on Rails and Django had already been around and matured, modern JS frameworks were also trying to be entire application frameworks, etc. And in our modern context, when we refer to frameworks, we’re usually talking about application frameworks.

                                                                                                But that doesn’t mean UI widget frameworks are any less of a “framework”, it’s just that we were collectively thinking about software differently back then. I unfortunately don’t have that context, to me jQuery was always a “library” whereas Backbone was a “framework”, but I can totally see your perspective here. If I have time I’ll try to go back and work that in somehow in my discussion of the first era, thanks for reading and commenting!

                                                                                                1. 4

                                                                                                  I really think we have failed at having the required communication structure for an Internet forum :D

                                                                                              2. 2

                                                                                                Maybe we’re all a bit wrong and right here? That would seem to be the theme of eternal September in JavaScript frameworks.

                                                                                                Put another way,I think it’s possible for someone to know current JavaScript frameworks quite deeply and still miss the history or the underlying terrain that’s shaped it. A few things I thought of reading this:

                                                                                                1. On prior art for transpilation and components: ClosureTools was annouced in 2009, and included the tooling Google had been using for years to provide minified JavaScript with type checking (using JSDOC comments). Closure Library also included things like goog.ui.Component, and also shipped with a template library. These things had been used in Gmail and Google Docs for some time - though I’m not sure they’re used before anymore.
                                                                                                2. On prior art for full frameworks and server side rendering: similarly see Google Web Toolkit. The Google Wave launch for instance was all GWT, IIRC with server side rendering.
                                                                                                3. On “browser integration of components”: I can’t speak in detail, but I remember this Working Draft on Web Components coming out in 2013. . . I’m not sure but I don’t think much progress happened then. The author suggests that this will change. But I wonder like Bourdieu, how is today different from yesterday?
                                                                                                1. 3

                                                                                                  Put another way,I think it’s possible for someone to know current JavaScript frameworks quite deeply and still miss the history or the underlying terrain that’s shaped it.

                                                                                                  Totally! The author fully acknowledges a knowledge gap in the era you’re commenting on and invites people to give exactly the kind of info you’re responding with. :D

                                                                                                  Web components… have indeed not really made a ton of progress. There’s more motion on some of the fundamental problems in that space in the past couple years but they were stuck for a very long time. My own take is that they are trying to do something quite different from what the component-oriented view layers and frameworks were trying to do: the APIs are basically “How would you implement a new built-in element?” rather than picking up the themes around reactivity etc. that the view-layer-frameworks tackled. We’ll see if and how they change going forward.

                                                                                                  1. 3

                                                                                                    FWIW, I admitted in the “before times” section that I didn’t have a ton of knowledge of how everything worked prior to 2012 or so, when I started coding 😅 definitely simplified and miss bits of history there for sure, but it’s hard to capture everything without writing a novel (or having been there).

                                                                                                    Re: Google’s tooling, that’s amazing to hear about now, but I don’t believe these tools were really adopted by the community. At least, I’ve never heard of an app other than Google ones being built with them. I did point out that Google proved JS frameworks could work though, with Gmail being the first app most people seem to remember as being the moment when they realized how powerful JS had become.

                                                                                                    Re: Web components, there has been a lot of progress here actually! In my work on decorators I’ve been collaborating closely with the folks who are pushing them forward, such as Lit and Fast, they are in fact a standard and part of the browser now. That said, they are severely limited compared to mainstream JS frameworks, I think in large part because the platform moves much more slowly than the ecosystem as a whole. But, if we step back, this is similar to the pattern we saw with View-Layer frameworks - letting patterns evolve on their own, and adopting the best ones. Some of the current patterns they’re working on include:

                                                                                                    1. SSR standards
                                                                                                    2. Contexts
                                                                                                    3. Using imports instead of global namespace-based registration

                                                                                                    Given time, I think they still have a lot of potential, but I also think that they’re not really usable for larger-scale apps at the moment (I had a particularly painful experience with Stencil.js last year, would not go back). But for smaller components and UI widgets, they’re pretty great!

                                                                                                    1. 3

                                                                                                      Re: Google’s tooling, that’s amazing to hear about now, but I don’t believe these tools were really adopted by the community. At least, I’ve never heard of an app other than Google ones being built with them. I did point out that Google proved JS frameworks could work though, with Gmail being the first app most people seem to remember as being the moment when they realized how powerful JS had become.

                                                                                                      It’s probably worth elaborating on the reasons for this because they repeat with different technologies:

                                                                                                      1. The more popular products built with these tools didn’t survive, or have been supplanted by other products.
                                                                                                      2. When the products have survived, the codebases have been modernized with replacement of the original frameworks. Many places have standardized around React for better, or worse.
                                                                                                      3. Barriers to adoption meant that the frameworks only got serious adoption in niches. When Google’s Closure was attempting to make a splash, it was still primetime for Ruby on Rails. You can look inwards at what RoR offers, or you can look outwardly at how it sat within the web development ecosystem. People had started building SPAs, but they hadn’t displaced the approaches of the day.
                                                                                                      1. 1

                                                                                                        Re: Google’s tooling, that’s amazing to hear about now, but I don’t believe these tools were really adopted by the community. At least, I’ve never heard of an app other than Google ones being built with them. I did point out that Google proved JS frameworks could work though, with Gmail being the first app most people seem to remember as being the moment when they realized how powerful JS had become.

                                                                                                        For what it’s worth, ClojureScript heavily depends on the Google Closure Compiler to perform optimization of generated JS code, and the official docs encourage people to use some of the features from the Google Closure Library.

                                                                                                1. 1

                                                                                                  Java allocates a new array containing the result of each successive concatenation resulting in the O(N^2) runtime complexity.

                                                                                                  Why is this quadratic rather than linear?

                                                                                                  1. 4

                                                                                                    It’s N^2 where N is the number of strings you want to concatenate or the total length of the strings (either framing is fine so long as the strings are roughly equal in length).

                                                                                                    Imagine each string is 1 character long, first you copy the first two strings into a new 2 character string, then you copy the 2 character string plus the third string into a 3 character string, … eventually we copy the N-1 length string, plus the final 1 character string into the result.

                                                                                                    Each copy of a character as an operation, and there are N(N-1)/2 copies (sum of natural numbers), so that’s O(N^2) when you take all the coefficients and constant terms out.

                                                                                                    1. 1

                                                                                                      … eventually we copy the N-1 length string, plus the final 1 character string into the result

                                                                                                      And then we are done…?.it’s linear time.

                                                                                                      Quite a stretch to include copy char by char as an iteration. Such operations are highly optimized and and are smaller in terms of cycles.

                                                                                                      The charts do show linearity except in very extreme cases.

                                                                                                      1. 4

                                                                                                        No it’s not, and a basic loop

                                                                                                        Result=“” For (I=0;I<N;I++) Result += “c”

                                                                                                        Will run in O(N^2). It doesn’t matter how fast the append is, each iteration requires copying (I+1) bytes. You do that for i in 0..<N iterations. That means on average you copy N/2 bytes, you do the N times. That gives you O(N^2).

                                                                                                        This is a known footgun that people hit when learning languages with automatic strings.

                                                                                                        1. 1

                                                                                                          Hopefully olliej and thedufer have persuaded you on the other points, so I’ll just mention that it’s perfectly legit to count a copy of a single character as an operation for the purposes of asymptotic complexity analysis.

                                                                                                          Imagine that your architecture can actually copy 1000 characters in a single operation. As N grows much larger than 1000 (approaching infinity) the number of characters copied in a single operation is irrelevant so long as it is a constant number.

                                                                                                          If that doesn’t make sense then it’s probably best to go back to basics and read the wikipedia article on complexity analysis or something if you want to understand it better.

                                                                                                          1. 1

                                                                                                            Sorry, that is just for the case of growing a string to arbitrarily large size, which per se is already a memory leak, regardless the method you use.

                                                                                                            String concatenation is not exponential as its stated in the article. If you have a large collection of strings and apend another one to each of them, the complexity is linear.

                                                                                                            What is the use case of jamming,say gigabytes of data, into a string anyways?

                                                                                                            1. 1

                                                                                                              You are describing a different problem to the one in the article.

                                                                                                              The article (and I) are talking about joining many strings together to make a new string. This is a common enough task and many standard libraries have a string.join method or similar.

                                                                                                              Anyway, if you do it naïvely with a loop of concatenations, that will be quadratic in many languages for the reasons that we have explained.

                                                                                                              To do it in linear time you need to either use a Rope, allocate a large buffer, or collect the strings into an array and then use the standard library function.

                                                                                                          2. 1

                                                                                                            Note that the plot omits the n^2 cases (because they’d stretch the scale too much). The plotted ones are all n or n*log(n), which is why they look roughly linear. If you look at the “Java 8 naive +=” or “Java 14 naive +=” rows in the table, you can see clear quadratic behavior - each successive column is a doubling in # of concatenations, and those rows show roughly a quadrupling in each step (actually a bit worse than that).

                                                                                                      1. 11

                                                                                                        People get the browsers they deserve.

                                                                                                        We’d see more competition in this space, but developers have voted with their feet every time Google or whoever implements a feature and dangles it out. Developers wanted a more complex web and more complicated services–well guess what that means for browser complexity? Webshits played themselves. Don’t complain about browser monocultures enabling spying at the same time you support endless feature creep and evergreen standards.

                                                                                                        We’d see better privacy, but consumers flocked to hand over their digital everything to anybody willing to dangle a blinking cat picture or whatever in their face. People who don’t take responsibility for behaviors that, by construction, undermine their freedom and privacy shouldn’t act surprised when they lose either.

                                                                                                        1. 8

                                                                                                          The domination of Chrome came way before “stuff only works in Chrome” things started becoming the norm. Chrome got popular cuz it was super fast and had a smooth UI.

                                                                                                          I do understand that an expensive-to-implement standard plays into the lock-in effect… I do think it’s not super cut and dry, though. Flash existed, plugins existed… maybe the web shouldn’t have any of those either, but lots of people wanted them. And I’m honestly glad I don’t have to download “the netflix application”.

                                                                                                          I don’t know how you square the circle of “people want to use interactive applications in a low friction way” with “we should not make web browsers turing machines” , without the gaps being filled by stuff that could be worse. I don’t have a good solution though

                                                                                                          1. 6

                                                                                                            Do you really think developer preferences played a large role in Chrome’s dominance of the market? Seems to me that Google created their market share through PR and advertising, especially on their own sites, and from their control of the default apps on Android.

                                                                                                            1. 4

                                                                                                              This is where the glib “nobody actually cares about privacy” rejoinder comes from. When it comes down to it, consumers don’t actually seem to care about privacy. I don’t know if it’s an education thing (“hey look your personal data is being sold to target ads to you”) or maybe people really don’t care and it’s odd folks like us that do. These days I genuinely believe that data privacy is a niche interest and the average user doesn’t care as long as they can connect with their friends and loved ones.

                                                                                                              At the very least GDPR style disclosures of what data is being collected can help folks who are willing understand what data they are giving up.

                                                                                                              1. 12

                                                                                                                This comic tried to address it near the end but I think the big problem is that most consumers don’t really understand what it means to lose something as nebulous as ‘privacy’. If you asked if they want a webcam in their bedroom streaming data to Google / Amazon / Facebook, that’s one thing, but having one of these companies track everything that you do on the web? It’s much harder to understand why that’s bad. As the comic explains, the real harm comes from aggregation and correlation signals. Even then, most of the harm isn’t done directly to the individual who is giving up their privacy.

                                                                                                                Bruce Schneier had a nice example attack. If people see ‘I have voted’ badges on their friends social media things, then they are around 5-10% more likely to vote. If you track browsing habits, especially which news sites people visit, then you can get a very good estimate of someone’s voting intention. You can easily correlate that with other signals to get address. In a constituency with a fairly narrow margin (a lot of them in states with effectively two-party systems) then you can identify the people most likely to vote for candidates A and B. If you hide ‘I’ve voted’ badges from the social media UIs for people who lean towards B and show them for people who lean towards A then you have a very good chance of swinging the election.

                                                                                                                That said, the fact that a person using Chrome / Facebook / WhatsApp / whatever is giving that company a hundred-millionth of the power that they need to control the government in their country is probably not a compelling reason for most people to switch. Individually, it doesn’t make much of a difference whether you use these services or not.

                                                                                                                Unless you’re a member of a minority, of course. Then you have to worry about things like the pizza-voucher attack (demonstrated a few years ago, you can place an ad with Google targeting gay 20-somethings in a particular demographic with a voucher that they can claim for discounted pizza delivery. Now you have the names and addresses of a bunch of victims for your next hate crime spree).

                                                                                                                1. 9

                                                                                                                  I think the 2 main reasons people don’t care about privacy are that

                                                                                                                  • it simply doesn’t make a huge difference in their lives whether their right to privacy is respected or not. Most people simply have bigger fish to fry and don’t have the cycles to spare on things that may be bad but aren’t actively causing them harm.
                                                                                                                  • technology companies like Google, Meta, etc. have done a great job of presenting their software as “free”. I think most people think of signing up for Gmail or Instagram like they would getting a driver’s license or library card; they’re just signing up for some public service. These companies do the most to avoid framing this for what it is: an exchange of value, just like any other. You’re paying with your data, and you’re getting access to their service in exchange for that data. As long as using “free” software isn’t understood by consumers as a value exchange, they will never demand protection of their right to privacy and data dignity.

                                                                                                                  As someone who works in the data privacy and governance space, it’s encouraging to see growing awareness of these issues at the consumer and government regulation level. Hopefully with enough movement from the government and private sector, we can keep fighting “Big Tech’s” deceptive narratives around data and their software.

                                                                                                              1. 15

                                                                                                                That got me thinking: there’s no reason you couldn’t use a selenium instance as your daily browser. 99% of the time you’d treat it like a normal Firefox browser, but you could also run scripts to control it. This would be more powerful than injecting javascript as you could automate across tabs and pages. I’ve never tried it out in practice, though.

                                                                                                                Projects like tridactyl, qutebrowser, surfingkeys, vim-vixen, nyxt browser, etc all offer programmable browser that are (imo) a bit easier to use than selenium. These typically provide an extra vim-inspired user interface in the browser using a web extension, tho qutebrowser and nyxt are their own browsers.

                                                                                                                Disclaimer: I am one of the two original developers of tridactyl.

                                                                                                                1. 26

                                                                                                                  Subdomains as identity leaks information about the requested path via DNS.

                                                                                                                  https://example.com/saysbadthingsaboutbadpeopleinpower - DNS request is for example.com, TLS connection prevents request information from leaking

                                                                                                                  https://saysbadthingsaboutbadpeopleinpower.example.com - DNS gets the requested user

                                                                                                                  I am fortunate to live in a country where the latter is extraordinarily unlikely to be immediately actionable by law enforcement and my action is protected by well-established law.

                                                                                                                  Moreover, a path-based identity is better for marketing. Put the company name first so people know “Oh, right, Example, that social media platform,” and not “SomethingIdontreallycarethatmuchabout on Example.”

                                                                                                                  1. 15

                                                                                                                    Given that this user is also providing each subdomain a certificate, you could enumerate the entire userbase by looking at the CT Logs with a tool like crt.sh

                                                                                                                    1. 4

                                                                                                                      Given that this user is also providing each subdomain a certificate, you could enumerate the entire userbase by looking at the CT Logs with a tool like crt.sh

                                                                                                                      Even if they use wildcard certs?

                                                                                                                      1. 3

                                                                                                                        Obviously that will not work in such case.

                                                                                                                      2. 2

                                                                                                                        Oh wow that’s a big one

                                                                                                                      3. 5

                                                                                                                        I’m definitely not an expert, just a curious observer. But if everyone used DNS over HTTPS, would this no longer be an issue? DoH does have some problems though (centralization, can be blocked, SNI leaks, etc), and I’m not sure how widespread it is.

                                                                                                                        1. 4

                                                                                                                          Even with DNS over HTTPS you’d still be leaking the domain name to a third party, so it is less private than putting the same info in the path. Obviously for many use cases that is fine.

                                                                                                                          1. 2

                                                                                                                            DoH to a third-party would help but then there’s a SPOF for resolution.

                                                                                                                            1. 1

                                                                                                                              Or DoT or DNS over Tor or any other privacy solution.

                                                                                                                              Breaking websites is not the solution.

                                                                                                                          1. -1

                                                                                                                            Modal, terminal based text editor written in… Rust? In 2022?

                                                                                                                            Wouldn’t C be a better fit here, considering the 70s sensibilities?

                                                                                                                            1. 15

                                                                                                                              Wait til you find out that people write vim plugins in typescript

                                                                                                                              1. 14

                                                                                                                                It’s okay to not like things

                                                                                                                                1. 4

                                                                                                                                  I think your sarcasm was on lost on people here. I detected and appreciated it.

                                                                                                                                  1. 2

                                                                                                                                    Guess I’m dense, could you explain?

                                                                                                                                    1. 3

                                                                                                                                      It’s my impression that Emacs and Vim are largely inspired by development in editors from the 70s and 80s. The joke is that developing tools with their aesthetic would lead one toward C to reflect the time period appropriately.

                                                                                                                                  2. 2

                                                                                                                                    Until the new Strict Provenance work, Rust embraced the PDP-11’s model of memory, so it seems very appropriate here.

                                                                                                                                  1. 2

                                                                                                                                    I love these tiny personal programs.

                                                                                                                                    I’ve been working on an omnifocus clone that I just have running in Heroku for my own usage, it’s a clunky program but it’s also nice cuz I’m not having to really do as much product design.

                                                                                                                                    One thing that’s frustrating for me recently (especially with web stuff) is I have to choose between “nice to code in” or “easy to deploy/make accessible”. I really wish the Django deployment story was nicer (and like… single-file Django projects were doable), cuz I’m super familiar with it but the “easy” solution (Heroku) is pretty costly

                                                                                                                                    1. 2

                                                                                                                                      Have you looked at Caprover? I’ve used it to consolidate some apps onto a single VPS.

                                                                                                                                      1. 1

                                                                                                                                        @rtpg see also https://dokku.com/, which claims to be heroku compatible (but it’s all hosted on one box you rent).

                                                                                                                                      2. 2

                                                                                                                                        More modern alternatives to Heroku like fly.io or render.com have generous free plans which include databases / storage if you are inclined to use those.

                                                                                                                                      1. 74

                                                                                                                                        The fact that the NFT spec only stores an image URL on the blockchain, and no hash to verify the image, is just absolutely astounding. Moxie is more generous than I am; IMO it highlights the grift that NFTs are – like whoever threw it together gave no thought to security or longevity.

                                                                                                                                        1. 28

                                                                                                                                          Yeah to me this is the “tell” that the main thing driving it is other people’s FOMO money.

                                                                                                                                          Basically software devs in this ecosystem realized they didn’t have to do their jobs to make money. The whole system keeps working even if you don’t do your job!!!

                                                                                                                                          You just have to repeat the memes long enough, and it doesn’t matter if the tech actually does what it says it does, because nobody’s checking! Nobody really wants to stand up their own copies of these things and check if it works.

                                                                                                                                          There’s little benefit in that. The benefit is talking about it and repeating it to your friends.


                                                                                                                                          I was interested in IPFS before it grew the blockchain component. I went back the original paper, and it is genuinely a good idea, and something we still need: a cross between git and BitTorrent (it mentions this in the abstract). I have long wanted Debian repos, PyPI, CPAN, etc. to be stored in such a system, i.e. consistently versioned with metadata.

                                                                                                                                          But it’s 6 years later, and when I go to look at it, apparently it just doesn’t work very well. And it probably won’t because it grew so many components. (Gall’s law: to get a big working system, you have to start from a small working system.)

                                                                                                                                          https://news.ycombinator.com/item?id=20137918

                                                                                                                                          So what should we expect of IPFS? At five years old, is this a project that’s usable ‘here and now’, as the homepage promised in 2017? Are all the parts in place, just waiting for web and application developers to see the light? Have the stumbling blocks I noticed in 2017 been smoothed over?

                                                                                                                                          No

                                                                                                                                          IPFS is still not usable for websites.

                                                                                                                                          https://esteroids.medium.com/how-can-ipfs-reach-wide-adoption-42b9a5011bdf

                                                                                                                                          As one commenter astutely put it, “IPFS struggles to host a plaintext bulletin board with logins like you’d find in the late 80s”


                                                                                                                                          And to give some light on the other side … Despite having some interest in BitCoin since 2013 or so, I first bought it last year.

                                                                                                                                          Because the founder of SciHub asked for donations in crypto on her site. https://en.wikipedia.org/wiki/Alexandra_Elbakyan

                                                                                                                                          So I just did it via CoinBase and I suppose it worked. So I would say there’s non-zero number of real use cases for cryptocurrency and blockchain. I think IPFS started out as genuine too, but it basically got ruined as working tech by an influx of money and employees.

                                                                                                                                          1. 11

                                                                                                                                            The situation with IPFS also affected gittorrent, another combination of git and bittorrent. Blockchains are like fig trees; as they grow, they choke whatever software projects originally gave them structure and purpose.

                                                                                                                                            1. 2

                                                                                                                                              Hm what happened to it? It doesn’t look like very much code.


                                                                                                                                              Thinking more about the original comment … To “steel man” the blockchain, I do think there is a place for regular old web apps to serve data from a blockchain. I don’t think that is inherently a bad architecture. You just have to be honest about it!

                                                                                                                                              I think there will always be a place for money in the blockchain, however niche, as the Scihub example shows.

                                                                                                                                              It’s more logical for an end user like me to use Coinbase, which is centralized, but that’s OK. (Also big irony: it relies on sending your driver’s license in for verification, so it’s building on top of US state regulations which they want to be free of.)

                                                                                                                                              One viewpoint I heard is “Bitcoin is a settlement network, not a payment network”. That is, the logical end user is banks and companies like Coinbase, not consumers. It could make sense to have a two-tiered system.

                                                                                                                                              (Although then you have the same problem that those banks are subject to regulation of the countries they operate in, so you’ve lost a lot of the purported benefit of blockchain. I think people will gradually come to understand this and a much smaller set of use cases will be ironed out.)

                                                                                                                                              But I do think the logical evolution of blockchain is to evolve to be less “consumer” and more “enterprise”.

                                                                                                                                              I think the problems with the centralization of the cloud are real, and while web3 is mostly a BS term, and it’s not really a solution as is, I can see this type of distributed consensus as a useful primitive for small but important parts of real systems.

                                                                                                                                            2. 2

                                                                                                                                              I believe that the Dat protocol and its successor Hypercore are both basically Bittorrent plus version control, but with no connection to blockchains or cryptocurrency that I know of. Please correct me if I’m incorrect :)

                                                                                                                                              I think Beaker Browser is a really cool project that suggests what could be done with Hypercore, but unfortunately a lot of websites and apps that were designed for its first iteration using Dat didn’t succeed in making the transition to the Hypercore version. I think the tech change cost it some momentum / users. Hopefully it will build up steam again, but I’m afraid IPFS has stolen its thunder by providing similar tech plus a chance to get rich quick :(

                                                                                                                                              1. 2

                                                                                                                                                Dat is so good aside from one critical flaw (IMHO, anyway): the “spec” is more or less “whatever the version you pull from NPM right now does.” Yes, there is some additional documentation, and a running reference implementation is great, but going a decade+ without a compatible version in some environment other than Node is a pretty big handicap and IMHO a major oversight by the project owners.

                                                                                                                                                Secure Scuttlebutt suffers from the same problem, and the “rewrite it in Rust” efforts for both are at best implicitly— if not explicitly, as in the most recent SSB -> Rust work — aimed at performance, not broad interop or adoption.

                                                                                                                                                So neither one can effectively run without hundreds of MBs of JS packages, there’s no support for languages that offer better native platform integration or type safety…heck, they don’t even ship TS definitions, and the internal APIs are so idiosyncratic it’s extremely difficult to build new apps on the platform.

                                                                                                                                                In a world where I had infinite time and attention to spend on an open software + network stack I would love to built a libdat or libssb that was straightforward to link in a Python project, or an iOS app, or really anything that can do normal C FFI. Alas, I don’t, so I haven’t. Maybe someday.

                                                                                                                                                1. 2

                                                                                                                                                  Hm I heard of Dat a few years ago, but I didn’t know about Hypercore. Thanks for the pointer!

                                                                                                                                                  I think eventually we will get something like this … just because there is a need. In the meantime I might write my own with shell scripts invoking git, and then curl or BitTorrent :)

                                                                                                                                                  I forget which post I read this in, but there are really 2 kinds of “decentralized systems”, i.e. ones that don’t rely on “the cloud”:

                                                                                                                                                  1. Something like git, where it’s trivial to stand up your own. BitTorrent is also in this category. As well as a web server with Apache or Nginx.
                                                                                                                                                  2. Something like BitCoin where there’s no central node.

                                                                                                                                                  Notably, it’s not only hard to run your own Ethereum node, but you also don’t want to run your own Ethereum network!

                                                                                                                                                  So the word is overloaded, and they have very different architectures. I care more about the first kind. So really I don’t want a global file system like IPFS – I’d rather have distributed storage that runs on a few nodes, is easy to administer, etc.

                                                                                                                                                  1. 1

                                                                                                                                                    Whoops, it seems Beaker Browser is being discontinued: https://github.com/beakerbrowser/beaker/discussions/1944

                                                                                                                                                    I’m not surprised but I am really sad about this. I thought it had a lot of potential for building a resilient web that could e.g. handle network outages by making it trivial to continue sharing websites over a LAN. It seemed to be great at sharing static websites which are where I’m putting my development efforts these days, but it appears the developer was frustrated by this and wanted people to share web apps instead.

                                                                                                                                                    Agregore looks like it may be continuing on with support for Hypercore, but it also supports IPFS and I saw not interacting with Protocol Labs and their reliance on / promotion of cryptocurrency as a significant feature.

                                                                                                                                                    1. 1

                                                                                                                                                      IPFS is a very pure “better BitTorrent” and the fact that some web3 stuff hosts content on IPFS should not taint the protocol, even if you happen to hate web3. Lots of web1 content on IPFS too, the protocol predates web3 by a lot, etc

                                                                                                                                                      1. 4

                                                                                                                                                        Part of the problem here is that “hosts content on IPFS” is a misnomer, for the same reason that things aren’t “hosted on BitTorrent”. IPFS is a distribution mechanism, not a storage mechanism (despite what their marketing implies), and so something can only be distributed through IPFS - it needs to be hosted somewhere else, and that “somewhere else” is left undefined by IPFS.

                                                                                                                                                        That might sound like pedantry, but it has some crucial implications: it means that by default, the network cannot provide meaningfully better availability than BitTorrent can, which is to say the availability is really bad and unsuitable for hosting websites on. You could address this by seeding your content from a server, but then what have you really accomplished, other than a potential ‘download accelerator’ (which is again distribution, not storage)?

                                                                                                                                                        1. 1

                                                                                                                                                          which is to say the availability is really bad and unsuitable for hosting websites on. You could address this by seeding your content from a server

                                                                                                                                                          This seems like a contradiction. Of course you cannot host an IPFS powered website without seeding it from somewhere! That doesn’t make availability bad or unsuitable any more than HTTP is bad or unsuitable. What I love about IPFS is that I can pin my websites on any computer connected to the internet, and usually much more than one! No special setup is needed, and if I want to change what computer pins the content I can easily do so at any time without changing any settings anywhere else. It just seamlessly keeps working.

                                                                                                                                                    2. 1
                                                                                                                                                      IPFS is still not usable for websites.
                                                                                                                                                      

                                                                                                                                                      I have been using IPFS for websites for years. It works great.

                                                                                                                                                      1. 1

                                                                                                                                                        Where do you persist user-mutable state and how?

                                                                                                                                                        1. 2

                                                                                                                                                          Websites do not have user-mutable state. I guess if you want your blog to have a user theme toggle or some other appy feature a website might want you can use a cookie or localStorage just as you would on the regular web.

                                                                                                                                                    3. 7

                                                                                                                                                      Moxie is more generous than I am

                                                                                                                                                      Indeed. That’s what makes this a balanced critique. Good-faith.

                                                                                                                                                      1. 44

                                                                                                                                                        I don’t think bad faith is necessary to provide valid criticism. The fact that NFTs are sold to the public as ‘stored forever in the blockchain’ and the reality is that there’s no actual mechanism storing anything other than a URL pointing to the content is almost by definition a scam. If I was going around selling ownership contracts to a house I don’t own, I’d be charged with fraud.

                                                                                                                                                        1. 4

                                                                                                                                                          I don’t think bad faith is necessary to provide valid criticism

                                                                                                                                                          That’s not what I meant. When @tao_oat said “Moxie is more generous than I am”, the difference between a “generous” take and a “not generous” take can often be the assumption of good faith. Are you trying to take a keep a neutral eye while evaluating this argument, or are you just looking for more evidence confirm your existing bias, that’s the difference between having good faith or approaching a topic cynically.

                                                                                                                                                          I think you’re prejudiced against NFTs and are just looking to criticize them. That’s fine, I think NFTs being used to attest art and their associated speculation is pretty stupid, so you aren’t gonna find me defending any of this. I also agree with each of Moxie’s criticisms. That said, I just don’t think the sort of cynical, charged rhetoric in this thread is indicative of good faith. Your comment even assumes a position from me that I don’t have. Good faith keeps discussions intellectually interesting IMO. I don’t have that much more to say here, I’ll let everyone else continue ranting in anger.

                                                                                                                                                          1. 4

                                                                                                                                                            FWIW I agree with you – on doing more research, a good-faith take would be that whoever wrote the ERC721 spec might have seen the promise of an image that changes over time. They might not have foreseen that art collectibles would be the primary driver behind NFTs. In the spec they write:

                                                                                                                                                            A mechanism is provided to associate NFTs with URIs. We expect that many implementations will take advantage of this to provide metadata for each NFT. The image size recommendation is taken from Instagram, they probably know much about image usability. The URI MAY be mutable (i.e. it changes from time to time). We considered an NFT representing ownership of a house, in this case metadata about the house (image, occupants, etc.) can naturally change.

                                                                                                                                                            (Though having a sentence like “The image size recommendation is taken from Instagram, they probably know much about image usability” in an official spec doesn’t exactly scream professionalism or care to me).

                                                                                                                                                            I should direct my critique against those who push art NFTs without mentioning the serious technical issues, and not the spec authors themselves.

                                                                                                                                                            I think the broader question is: when you keep seeing red flags in an ecosystem/community how long can or should you make an effort to retain good faith?

                                                                                                                                                            1. 1

                                                                                                                                                              I think the broader question is: when you keep seeing red flags in an ecosystem/community how long can or should you make an effort to retain good faith?

                                                                                                                                                              I think that’s the wrong question to ask. Technology usually has two components. One is the design and engineering that goes into it. Think with XMPP the protocol and standards used to send/receive messages. The other is adoption: this can be success in number of users, its usage, or its revenue. There’s plenty of technologies, ones much less complicated and more “clearly” incremental than a blockchain, that have failed purely because their adoption was lacking, despite years of attempts to make themselves relevant. Examples here are Laserdisc, Betamax, HD-DVD, and more. Adoption of a technology has much more to do with culture, user-experience, or business considerations than the engineering behind the technology. These questions you’re asking, about whether the space is filled with hucksters and such, are questions affecting the adoption of the technology. Discussions concerning the adoption of a technology are much more complicated (well IMO at least, probably because I’m an engineer first and foremost!) than discussions dealing with the design and engineering behind a technology, and also off-topic for Lobsters (probably due to the challenge of dealing with the nature of those discussions properly.) I will say though, a space filled with fraud doesn’t exactly instill confidence, especially when it’s a person’s money or other resources on the line…

                                                                                                                                                              But I also don’t think it’s necessary to get inordinately angry at the blockchain space. Businesses are made and die every day, some full of hucksters, others just blatant copies of existing businesses. Community projects are made and die every day. It’s the churn of human creativity. If humans knew exactly which projects would succeed and which would fail, then we’d already have solved our problems, wouldn’t we?

                                                                                                                                                          2. 3

                                                                                                                                                            I don’t really care for collectibles, real or virtual, but to be fair the NFT is stored in the blockchain. The media is not the NFT, just an associated decoration.

                                                                                                                                                            1. 24

                                                                                                                                                              This sounds like a solution looking for a problem, and then someone inventing a problem for the solution to fix. One thing I’ll give ‘crypto bros’ - specially those pumping the NFT racket - is that they’ve managed to befuddle the world into thinking they created something actually revolutionary. Every single problem the article describes is stuff anyone who understands this technology works could’ve guessed from day 1. Ultimately ‘web3’ is just a nebulous term that means nothing and everything, depending on who you ask and what time you ask them.

                                                                                                                                                              1. 2

                                                                                                                                                                Sure, I don’t collect baseball or magic cards either, so the while thing doesn’t connect with me personally.

                                                                                                                                                              2. 18

                                                                                                                                                                But nobody says that an NFT is a URL, i.e. a string starting with “https://…” Because nobody would find such a string, in itself, interesting enough to pay money for.

                                                                                                                                                                The scam is that people describe and sell NFTs as being a file, or even a unique fingerprint of such a file, when they’re no such thing.

                                                                                                                                                                It’s like selling you a painting, only it turns out you only bought the frame, and the gallery reserves the right to swap out the canvas for another one, or just take it away, at their pleasure.

                                                                                                                                                                1. 8

                                                                                                                                                                  Yeah but the funny thing is: who cares if it was actually pinned to the right file?

                                                                                                                                                                  What’s to stop me from minting another NFT for the same artwork and selling it? I just have to convince enough people it’s valuable.

                                                                                                                                                                  As far as I can tell, the thing that makes NFTs “work” in any sense is specific pockets of social media, e.g. Twitter. Reality is socially constructed.

                                                                                                                                                                  Like the artist Beeple has to go on Twitter and say he’s selling some number that represents art that he created.

                                                                                                                                                                  https://twitter.com/beeple

                                                                                                                                                                  And other people witness that and they believe it is him, i.e. the person who keeps creating all the enjoyable pictures. Twitter and other centralized social networks provide some continuity of identity. lobste.rs does this too.

                                                                                                                                                                  If somebody else tries to sell an NFT of his artwork, maybe he can use Twitter to shame them or whatever. That’s about it.


                                                                                                                                                                  As far as I can see, social media is really the thing that matters, and not anything in the blockchain. It seems clear that no end users every really look at the blockchain and verify things. (Hell I didn’t when I sent the donation to the Scihub founder. Did it really get sent? I just trusted Coinbase.)

                                                                                                                                                                  So I think there’s no notion of identity, exclusivity, or authenticity on the blockchain. You always have to make some jump between the number/hash and the “thing”, and other people can make that jump too.

                                                                                                                                                                  I’d be interested in any arguments otherwise … I have never actually used NFTs or Ethereum, but it seems like there is an obvious hole either way.

                                                                                                                                                                  i.e. the blockchain is really built on trust built by social media; it can’t stand alone. Social media is very powerful – the former US president got elected in a large part because of it, and then he got blocked from it with huge consequences, and now he wishes he had his own Twitter and there are multiple efforts in that direction, etc. It has a lot of real consequences in the world, and is intimately connected to it. Blockchain doesn’t have that property at all!

                                                                                                                                                                  1. 9

                                                                                                                                                                    Right — I can just touch one pixel in the image, or add a no-op EXIF tag, and suddenly it’s a different file with a different digest that I can sell as a different NFT.

                                                                                                                                                                    That was my day-one objection to NFTs. In terms of my analogy, it’s like buying a limited edition print where I only have the artist’s promise she won’t issue more. But the realization that it’s just a URL makes them orders of magnitude sillier and more scam-like.

                                                                                                                                                                    1. 8

                                                                                                                                                                      Yeah this has been beat to death, but I was reading conversations earlier this year, and one analogy is the company that used to sell naming rights to the stars:

                                                                                                                                                                      https://news.ycombinator.com/item?id=26488430

                                                                                                                                                                      Like they would sell you a certificate that a star was named after you.

                                                                                                                                                                      Never mind that anybody can rename the same star for a different person, and sell that. Nobody ever used those names – not even the one scientist who might have a picture of that star or care about it. (Again, reality is socially constructed.)


                                                                                                                                                                      Another problem is that you’re literally just getting the NFT number itself. You’re not getting the copyright to that work !!!

                                                                                                                                                                      Like you could buy the NFT, and then the artist can sell the actual artwork to somebody else, which IS enforceable by copyright law !!! But your NFT isn’t.

                                                                                                                                                                      Also with music, there are separate rights to perform songs in public places, to play recordings at bars, and to publish the sheet music. You do NOT get that if you buy an NFT of a song.

                                                                                                                                                                      In fact I remember seeing a VC blog or podcast where this was brought up ….

                                                                                                                                                                      And so that basically proves that it doesn’t matter if the NFT has an immutable hash or not. You’re buying a useless pointer to a thing, not anything related to the thing itself … so it doesn’t matter what’s in it!

                                                                                                                                                                  2. 3

                                                                                                                                                                    I don’t understand this sticking point. An NFT is just a provably-unique (i.e. non-fungible) thing maintained on a chain. What it maps to off-chain, or how it does so, is basically incidental. It’s a contract, meaningful only in a specific domain: for legal contracts, that domain is typically a government jurisdiction; for NFTs, it’s the chain on which they exist.

                                                                                                                                                                    1. 4

                                                                                                                                                                      I think it’s because the loudest use is for art collectibles so people get hung up on that. For me the best analogy is baseball cards. No one would say “but the card doesn’t contain an actual baseball player! You don’t get the right to boss the real human around!” But somehow NFTs the detractors think should “be” the art or “be” the copyright in a way that they were never intended to be.

                                                                                                                                                                      1. 7

                                                                                                                                                                        The problem with that analogy is that the issuer of the baseball card could, if it were an NFT, blank the contents of the card or change it to something else at any point. If baseball cards had that property, would people trade them? If a baseball card just had a QR code on it that let you go to a web page of stats about the player, would they be as valuable? Would they keep being valuable once some of the servers hosting the stats went offline (or would ones pointing to dead URLs become more valuable?)? What about when someone buys the domain for a popular card and points it at a porn site?

                                                                                                                                                                        1. 2

                                                                                                                                                                          The problem with that analogy is that the issuer of the baseball card could, if it were an NFT, blank the contents of the card or change it to something else at any point.

                                                                                                                                                                          In this analogy the NFT, the contract which exists on-chain, is itself the baseball card. The fact that one of the metadata fields of the NFT is a URL that may or may not resolve is more or less incidental.

                                                                                                                                                                          Would they keep being valuable once some of the servers hosting the stats went offline (or would ones pointing to dead URLs become more valuable?)?

                                                                                                                                                                          The important properties of NFTs are that, in the context of the chain on which they exist, they’re provably unique, non-fungible, and owned. A broken URL in the metadata doesn’t impact those properties. Of course, value is determined by the market, and the crypto markets are wildly irrational, so you may have a point.

                                                                                                                                                                          1. 1

                                                                                                                                                                            In this analogy the NFT, the contract which exists on-chain, is itself the baseball card. The fact that one of the metadata fields of the NFT is a URL that may or may not resolve is more or less incidental.

                                                                                                                                                                            I’ll buy this, but practically speaking, what then is the use of the NFT? If you’re saying the URL isn’t important, what exactly are you buying?

                                                                                                                                                                            1. 1

                                                                                                                                                                              You’re buying something which is guaranteed to be unique and non-fungible, in the context of a specific chain. There is no intrinsic value, any more than a specific painting or whatever is valuable. The value relies on the market belief that the NFT’s chain-specific scarcity is valuable.

                                                                                                                                                                              It’s kind of like a deed. The property isn’t legally yours until the relevant legal regime accepts that you own the deed. The deed isn’t the property but it uniquely represents the property in a specific domain.

                                                                                                                                                                              But like value is not the only interesting thing about NFTs. The non-fungibility itself is novel and opens the door to lots of interesting things.

                                                                                                                                                                        2. 2
                                                                                                                                                                        3. 2

                                                                                                                                                                          Sure, but that is absolutely not how NFTs are understood by 99.9999% of people. That mismatch is the scam.

                                                                                                                                                                2. 3

                                                                                                                                                                  The fact that the NFT spec only stores an image URL on the blockchain

                                                                                                                                                                  This is not universally true. A lot of NFT projects use IPFS, Filecoin or similar decentralized storage.

                                                                                                                                                                  But for the ones using Amazon S3 it’s kinda hilarious, yes

                                                                                                                                                                  1. 3

                                                                                                                                                                    A lot of NFT projects use IPFS, Filecoin or similar decentralized storage.

                                                                                                                                                                    These solutions have a similar problem as with Bittorrent. The stuff that’s not popular won’t get seeded/hosted. It just adds one layer of indirection to the storage issue, but once the business that’s pushing NFTs goes out of business these files are probably not going to get shared.

                                                                                                                                                                    1. 1

                                                                                                                                                                      I don’t see how that’s a problem in this context? If the URL is IPFS you have a hash, which was the objection being replied to here.

                                                                                                                                                                      1. 1

                                                                                                                                                                        My understanding of how this works is: the NFT is an object on the blockchain, whose rules enforce its uniqueness. We’re gonna assume the chain is going to be continued to be mined and therefore “exist” indefinitely.

                                                                                                                                                                        The NFT contains a hash denoting a location on IPFS. Accessing it using an IPFS gateway will show the user the JPG portraying whatever they paid $2.4M in funny money for. But that JPG has to reside on disk somewhere. And when the company or user goes out of business or the VPS is decommissioned or they get kicked out of AWS for scamming, where is the JPG?

                                                                                                                                                                        Of course, concerned parties can… right click on the image, save it to disk, and then use that as a source of data for the IPFS hash, but that does kind of put a lie to the popular imagination on how all this nonsense works.

                                                                                                                                                                        1. 2

                                                                                                                                                                          You can become a node in the IPFS network yourself, and host just the image.

                                                                                                                                                                          1. 1

                                                                                                                                                                            Just like the baseball player the “hash” (picture) on a baseball card may die, so may the hosting for an NFTs associated JPG go away. Of course if you want to preserve it you can simply pin it yourself and thus your own computer becomes a host for it. So it’s actually more resiliant than the baseball player :). The NFT is not the image, the image is an associated decoration