1. 7

    My own philosophy is that blogs don’t need comments. If you want to respond you can contact me or write your own blog post or tweet or tiktok about it - whatever it may be. I don’t want to assume responsibility for moderating and/or curating somebody else’s words that will show up alongside my own.

    1. 2

      Personally, I’m often disappointed to find that a blog post doesn’t have a comments section. Writing an e-mail just to express your gratitude feels weirder than leaving a comment.

      1. 3

        I appreciate gratitude expressed by email way more than that expressed by comment.

        I also feel a comment section is meant to serve other readers of the article at least as much as its author, so in fact I feel expressions of gratitude about the article are misplaced in the comments.

        1. 1

          I also feel a comment section is meant to serve other readers of the article at least as much as its author

          That’s a good point, and I agree!

          What I dislike about writing e-mails is the ceremony involved. I need to explain what article I’m referring to, why I’m writing, say hello and goodbye and so forth. In a comments section, all of those things are implied. (That’s not to say I dislike writing e-mails in general, just as a replacement for comments.)

          1. 1

            Indeed. That is why I appreciate emailed expressions of gratitude so much more – here I have someone so compelled to say thanks that all the trouble they had to go through didn’t deter them. It just hits differently.

            (Doesn’t mean a graceful and sincere response necessarily comes naturally, so sometimes I have taken a long time to reply and in a handful of cases I never did – but don’t let fear of shouting into a void discourage you. I promise, if the author does see your message (which spam filters unfortunately do make uncertain), it will almost certainly be gladly received, even if you never hear back.)

        2. 1

          I make a point of writing emails to express gratitude. Maybe it’s a little weird, but it’s almost always appreciated.

        3. 1

          I how many (if any) tiktoks exist that are replies to blog posts.

        1. 1

          Anyone have any numbers for the size of 100 SVG files vs. an icon font with the same glyphs? I imagine it’s a pretty big difference. SVG is XML, which is the least efficient way to encode a Bézier path I can think of. Whereas web fonts are TrueType, which having been designed circa 1989, goes to great pains to keep path data compact.

          Plus, isn’t the browser making 100 HTTP requests to fetch those 100 SVGs?

          I do understand the accessibility reasons against web fonts. I just don’t think this article took into account their size benefits.

          1. 6

            You can put multiple icons in the same SVG file, define a viewbox with a different id for each of them, and address that ID with a fragment on the SVG URL to limit the view to just that icon. You can use such URLs in background-image and then use background-size to scale the icon as needed.

            Works in all browsers, all the way back to IE 9 (!) if you want to.

            And unlike with icon fonts (where you have to juggle meaningless codepoint numbers or cryptic Unicode characters in the CSS or markup) and PNG sprites (where you have to juggle meaningless pixel coordinates in the CSS), these SVG sprites also lead to completely readable and maintainable CSS, assuming you assign meaningful names to the IDs (e.g. “background-image: url(toggle.svg#enabled);” or such).

            1. 3

              You can put multiple icons in the same SVG file,

              And then you gzip it and compress away most the redundancy.

              1. 1

                This is pretty close how the sprite in Feather works. Google Material Icons also used to have a sprite sheet, but the set got bloated and they did a structural overhaul in 4.0 that left it out.

            1. 1

              I do wish nation states would report exploits to get them fixed instead of hoarding them to attack other nation states.

              1. 1

                Imagine if all of them had a publicly funded version of Project Zero.

                1. 2

                  I’ve previously had to make a URL shortener at a previous job. One of the requests for it was that any query parameters passed to the short URL, would be passed along to the long URL. So a request to sho.rt/xyz?foo=bar would redirect to example.com/?page_id=123&foo=bar. The main reason for this was that the short URLs would be posted on Facebook for campaigns, and if you then promote a post (buy advertising) Facebook would add extra query parameters onto the URL in the post. So passing on these parameters would allow people to know how many clicks came through organically and how many where through the paid advertisement.

                  Thinking about this again, it seems like a silly use case for a URL shortener, but it might be people wanted the ability to change what the short URL pointed at after it was posted, I can’t recall.

                  1. 1

                    Link shorteners were created in the first place mainly because of sites with huge query strings with lots of junk in them. Their effect has instead been to normalise the wrapping of links with extra tracking layers. The fact that this has gone so far as to mean adding query strings with junk to “shortened” URLs is so deep in unintended consequences territory as to end up sunk in the irony swamplands.

                  1. 1

                    @tedu: You followed “be liberal in what you accept”, but that’s only half the law, and I don’t see where you can credibly claim to have followed “be conservative in what you emit”. It’s a useful set of observations that if you are liberal in what you accept then testing against yourself is insufficient to ensure that you are also conservative in what you emit; that therefore you may be aspiring to Postel but actually failing; and that a failed aspiration to Postel may be costlier than no aspiration at all. But to call this observation a rebuttal of the law is overstretching it. Which actually does it injustice; it does not need to have that grand an implication to be worthwhile.

                    1. 6

                      Seems like web ads are becoming worse for publishers and viewers more and more over time.

                      1. 1

                        Pray I don’t alter it any further.

                      1. 1

                        Aren’t there some people, somewhere, who work in reasonably well defined and unchanging environments? Isn’t it at least theoretically possible to engineer out a whole bunch of these soft variables? How does NASA do things these days?

                        1. 4

                          NASA isn’t unusual for its ability to deliver on time and budget, it’s unusual for its ability to create computing systems reliable enough to put in spacecrafts. They can achieve this because they can absorb a level of cost of development that (for better or for worse) no business would tolerate.

                          In the context of the article linked here, NASA is not a relevant example.

                          1. 4

                            NASA (and ESA, and Roskosmos, and JAXA…) constantly miss their ETAs and run over budgets. And since the house building analogy is even more true for them, it’s not surprising.

                          1. 2

                            I still feel that this is an American issue and isn’t such a big issue in Europe where in ‘most’ countries you can trust your ISP to do the right thing. Another issue I have with forcing DoH is that the system DNS is no longer respected and now all your DNS requests are going to be collected by Cloudflare.

                            1. 1

                              collected by Cloudflare.

                              Instead of speculating, see: https://developers.cloudflare.com/1.1.1.1/commitment-to-privacy/

                              • Cloudflare will not retain any personal data / personally identifiable information, including information about the client IP and client port.
                              • Cloudflare will retain only limited transaction data for legitimate operational and research purposes, but in no case will such transaction data be retained by Cloudflare for more than 24 hours.

                              And this is audited:

                              And we wanted to put our money where our mouth was, so we committed to retaining KPMG, the well-respected auditing firm, to audit our practices annually and publish a public report confirming we’re doing what we said we would.

                              1. 3

                                Turning “trust us” into “we paid consultants a lot of money, so trust us” doesn’t change anything fundamental about the proposition. Even if Cloudflare want to do the right thing, they are set to be a single point of snoopage in terms of the network. Even if Cloudflare are doing the right thing, it may not matter – the NSA didn’t exactly tell Google or Yahoo when they tapped their infrastructure.

                                Of course, your DNS requests always go somewhere, so there is always a trust relationship. You have to make an informed decision here about whether Cloudflare is a bigger risk than wherever you would normally get your DNS service from.

                            1. 2

                              Here’s my related question: should a serialization program be non-deterministic?

                              Because I’m not sure I want to live in a world where json({a:1, b:1}) != json({a:1, b:1})

                              1. 2

                                If both serialised forms represent the same data structure, why should that matter? You care about the data, not its representation (unless you’re writing a json library yourself, or in daft situations like the article, of course).

                                1. 1

                                  Same here. Also, I gave it a thought, and I couldn’t find any value provided by controlling the serialization order :/

                                  1. 1

                                    If your map data structure isn’t already deterministic, then non-deterministic serialisation allows serialising keys out as the map is enumerated, which saves both memory and clock cycles. If you are in right circumstances to be able to use that fact and also in the circumstances to need it, you’d want it.

                                    You certainly don’t want to live in a world where determinism isn’t the default though.

                                    1. 1

                                      Looking at the problem from a more mathematical perspective, we can see that encode: Hash -> String is not a function as it’s multi-valued. However, decode: String -> Hash is a function, although not injective.

                                      We can introduce an equivalence relation in String (for simplicity, I’m assuming only valid JSON outputs) such that s1 and s2 are equivalent iff decode(s1) == decode(s2). Then we can fix the problem by either:

                                      1. Making encode* map to the corresponding equivalence class, i.e. to a set of strings that all decode to the input object.
                                      2. Picking a canonical representation in each equivalence class.

                                      Implementation-wise, option 1 would mean defining a type like:

                                      module JSON
                                        class EncodedValue
                                          attr_reader :object
                                          protected :object
                                      
                                          def initialize(object)
                                            @object = object
                                            @result = nil
                                          end
                                      
                                          def to_s
                                            @result ||= actually_encode(@object)
                                          end
                                      
                                          def ==(other)
                                            other == other.object
                                          end
                                      
                                          # ...
                                        end
                                      end
                                      

                                      Option 2 entails deciding on how to represent values if multiple options are allowed. This includes:

                                      • Key ordering in objects.
                                      • Number representation (e.g. 100 and 1e2 are both valid JSON).
                                      • White space use.
                                      • Escape sequences in strings.
                                    1. 2

                                      New frameworks […] make it easy to break up a UI into components. They are designed to handle rendering a page as well as updating it. jQuery is typically only used for updating a page, relying on the server to provide the initial page. React, Angular, and Vue components, on the other hand, allow for a tight coupling between HTML, code, and even CSS.

                                      Yes, few developers (and, it seems, even fewer designers) have ever truly valued progressive enhancement. It only was, for a time, the least inconvenient way of doing things. Once designs started calling for orders of magnitude more complexity, developers turned against the inconvenience with approaches that throw progressive enhancement out the window; with such approaches available, they are now filtering down to everything. So progressive enhancement now requires a choice to actively inconvenience oneself, and as a result, we are finding out that progressive enhancement never actually was a widely-held value. It merely happened to be the default, for a time, by a happy accident of history.

                                      1. 30

                                        Per their email to customers:

                                        We’re sending this note because people are now asking if this could happen with Keybase teams. Simple answer: no. While Keybase now has all the important features of Slack, it has the only protection against server break-ins: end-to-end encryption.

                                        This is a facile false equivalence on the part of Keybase. Slack’s extenuating incident was because of code injection in their client application. If an attacker achieved code injection in a Keybase client the breach would be exactly as bad as Slack’s.

                                        End-to-end encryption is worth little if the client doing the encryption/decryption is compromised, and Keybase’s implicit claim that end-to-end encryption protects against compromised clients is dangerously inaccurate.

                                        1. 5

                                          where did you see that the injection was client side? I’m wondering if I’m parsing the disclosure incorrectly, but I’m not seeing that spelled out explicitly.

                                          From Slack’s post:

                                          In 2015, unauthorized individuals gained access to some Slack infrastructure, including a database that stored user profile information including usernames and irreversibly encrypted, or “hashed,” passwords. The attackers also inserted code that allowed them to capture plaintext passwords as they were entered by users at the time.

                                          You’re of course correct about the vulnerability of keybase clients. They talk about that here: https://keybase.io/docs/server_security

                                          EDIT: After a reread of the Keybase post, I’m not seeing anywhere that they claim Keybase can 100% protect against client side attacks, but their assertion about server side attacks is true. Where did you see that they’re claiming e2e crypto protects against client attacks?

                                          1. 1

                                            I’m not seeing that spelled out explicitly

                                            capture plaintext passwords as they were entered by users

                                            You can’t do that without injecting code into the client.. plus, modification of server side code is usually not called “injection” at all

                                            1. 9

                                              A) Yes, you can (by modifying server code) - basically no sites hash passwords before sending them over the wire.

                                              B) Modifying running code on the server without changing the code on disk is usually called injection in my experience. Happened at twitter (remote code execution in the rails app exploited to add an in-memory-only middleware that siphoned off passwords).

                                              1. 2

                                                basically no sites hash passwords before sending them over the wire.

                                                Is there a good scheme for doing that?

                                                You can’t just hash on the client, because then the server is just directly storing the credential sent by the client, i.e. as far as the server-side is concerned, you are back to merely directly storing passwords in the clear.

                                                You can implement a scheme where the client proves knowledge of the password by using it to sign a token sent by the server (as in APOP, HTTP Digest Auth, etc.). But then server needs to have the plaintext password stored, otherwise it can’t check the client’s proof of its knowledge.

                                                So either way, the server is storing in the clear whatever thing the client needs to authenticate itself.

                                                The advantage of the usual scheme where the client sends the actual password and the server then stores a derivative of it is that the client sends one thing, and then the server stores another, and the thing the client needs to send cannot be reversed out of the thing the server has stored. That yields the property that exfiltrating the stored credentials data from the server doesn’t allow you to impersonate its users.

                                                But to get this property, the server must know what the actual password is – at least at one instant in time – because the client needs to prove knowledge of this actual password. So you cannot never send the actual password to the server.

                                                Well, that’s not the only way to get that property. The other way is public key cryptography.

                                                Of course trying to go into that direction runs into entirely other trust issues: if you ship the crypto code to the client, you might as well not bother. Notably, “send the actual password to the server” avoids that whole issue too.

                                                1. 2

                                                  If you don’t want the server to know the password, you can use client certs, which have worked since the nineties. The browser UI for it is universally horrid, though, and whatever terminates your TLS needs to provide info to your application server.

                                                  This is a hurdle in both development in production - coupled with the browser UX being bad - has left client certs criminally underused.

                                                  1. 2

                                                    Oh, right. I mentioned public key crypto myself… but I still didn’t even think of client certificates.

                                                    1. 1

                                                      The missing feature is some mechanism to create the TLS certificate.

                                                      Like a “Do you want to use a secure passwordless authentication with that site?” prompt that create a user@this.site CSR, upload it for signing in a POST request, and get the signed cert back, and store it for the next time this.site asks for it.

                                                      1. 2

                                                        … at which point you need a mechanism to extract your credential from the browser and sync it across devices and applications. Hmm.

                                                        1. 1

                                                          Yes, unless you remember them all (them all how many?!), mind space wasted…

                                                          I use an USB key on which my password are stored, and add that to my physical keyring.

                                                          I am now more vunerable to phisical access and less exposed to remote attackers.

                                                          It is not perfect. It is working.

                                                  2. 1

                                                    Private key derivation from the password gives you a private key, from which you may derive a public key, and get public key crypto in javascript in the browser.

                                                    So… you are trusting JavaScript code uploaded from the serverv while doing that (uh oh). If that is compromised, it can upload the password somewhere else in clear : if you trust the server, just send the password in clear to it (within TLS using TLS certificates).

                                                2. 1

                                                  That seems like a huge assumption about the architecture of Slack, unless you work there, I wouldn’t assert that. It doesn’t even seem very plausible — why only infect a subset of clients? Do electron apps get served off of a server somewhere? (Unless I’m massively understanding them, no) And if popping a shell on a database server gave an attacker lateral access to push malicious code to clients, Slack has HUGE problems.

                                                  1. 1

                                                    electron apps

                                                    Ah, I didn’t even think about these. Slack is accessible as a normal web site too, I was only thinking about that.

                                                    Also, my assumption was that “as they were entered by users” meant “letter by letter, keylogger style” :D

                                                  2. 1

                                                    capture plaintext passwords as they were entered by users

                                                    You can’t do that without injecting code into the client..

                                                    Even today, Slack sends credentials from the client to the server in plaintext (just like almost every other website).

                                                    Try it yourself: https://tmp.shazow.net/screenshots/screenshot_2019-07-21_3d7d.png

                                                    Having a remote code execution to modify the server-side code to consume the plain-text passwords that their server receives and exfiltrate them would work just fine.

                                                    Who knows what else they might have modified.

                                                    1. 2

                                                      my assumption was that “as they were entered by users” meant “letter by letter, keylogger style”

                                              1. 1
                                                WITH bounds AS (
                                                  SELECT
                                                    (?)::date as lower,
                                                    (?)::date as upper,
                                                    (SELECT count(*) FROM somewhere WHERE something) as total --- XXX
                                                ),
                                                daystates AS (
                                                  SELECT
                                                    generate_series(
                                                      greatest( bounds.lower, s.begin ),
                                                      least(    bounds.upper, s.until ),
                                                      '1 day'::interval
                                                    )::date as day,
                                                    s.state,
                                                    count(s.state) as count
                                                  FROM
                                                    schedule s, bounds
                                                  WHERE
                                                        s.type = 1
                                                    AND daterange( s.begin::date, s.until::date, '[]')
                                                        &&
                                                        daterange( bounds.lower,  bounds.upper,  '[]')
                                                  GROUP BY
                                                    day, s.state
                                                )
                                                SELECT
                                                  jsonb_object_agg(d.day, d.states) as calendar
                                                FROM (
                                                  SELECT
                                                    ds.day,
                                                    (
                                                      jsonb_build_object(
                                                        'count', bounds.total - sum(ds.count),
                                                        'state', 'available'
                                                      )
                                                      ||
                                                      jsonb_agg((SELECT r FROM (SELECT ds.count, ds.state) r) ORDER BY ds.state)
                                                    ) as states
                                                  FROM daystates as ds, bounds
                                                  GROUP BY ds.day
                                                ) as d
                                                ;
                                                
                                                1. 1

                                                  While I agree change is needed - let’s (on the topic of language needing to be updated) call “the dragon’s appetite,” and “ambient privacy,” what they are:

                                                  People freely handing over information and (continuing) to have conversation in the public discourse.

                                                  Until there is change, we should own our behavior and stop expecting a government (who we have so little faith in anyway /s?) to slap the hands of those (who are just operating according to the principles of the free market /s?) we can’t stop freely giving information.

                                                  I have no pity for those individuals; especially after they post articles like, “I tried to stop using Amazon, or Google, BUT IT FELT SO GOOD and IT’S IMPOSSIBLE.”

                                                  You would find a way, if you can’t, start solving the problem by removing yourself from the public commentary.

                                                  1. 5

                                                    I have no pity for those individuals; especially after they post articles like, “I tried to stop using Amazon, or Google, BUT IT FELT SO GOOD and IT’S IMPOSSIBLE.”

                                                    Well, it’s been demonstrated in the past that at least some companies would build and maintain a profile even on non-users, based on data collected about them from other people. And I’d be surprised if it weren’t still happening today.

                                                    Browser fingerprinting and “supercookie” techniques are such that unless you have well-above-average knowledge of how the technology works you’re unlikely to be able to use the web without being surveilled and profiled. Merely installing an ad-blocking extension, for example, is not enough, and some of the supercookie techniques exploit things (like HSTS) that are deliberately not meant to be blockable.

                                                    So even someone who’s intelligent and reasonably tech-savvy likely cannot avoid inadvertently being tracked, profiled, and so on by major tech companies. You say you have no pity for those people, but I do. And I wonder why it is that they should have to completely disengage from modern society just to have the sort of basic privacy people took for granted a mere generation ago.

                                                    You would find a way, if you can’t, start solving the problem by removing yourself from the public commentary.

                                                    Being exiled from society should not be the cost of suggesting that we should improve society somewhat.

                                                    1. 4

                                                      I can’t agree more. The information assymetry between the “big dragons” (Google, Facebook) and an individual user is staggering.

                                                      The analogy with the environment drawn in the article is apt - everyone needs to breath, eat and drink water. If these resources are threatened by environmental degradation, you can’t just tell people to stop breathing.

                                                      Privacy is higher up on Maslow’s hierarchy, but it’s still a right as fundamental as being able to breath without getting sick.

                                                      1. 1

                                                        Being exiled from society should not be the cost of suggesting that we should improve society somewhat.

                                                        But it should - absurdity for absurdity.

                                                        There was no cost to the individual to make the suggestion in the first place. “Suggestions for free” is a bizarre notion upon which to build real societal change.

                                                        Substitute “cost” for “effort” if you please, for this scenario.

                                                        In what absurd world do we expect something for nothing?

                                                      2. 3

                                                        I must wonder if I am worthy of your pity.

                                                        While Google+ existed, I had a profile there. I didn’t want one. I didn’t make one. But enough other people uploaded their address books that Google+ knew I existed and knew who a whole bunch of my friends and acquaintances were, even though I didn’t want it to know of me. In the end I claimed my profile just so I could write that on it and to tell people who had me in their circles to please not share things with me on there because I wasn’t going to see them. Hooray. How successful do you think I would have been if I emailed all of these people and told them to please either not use Google+ or to selectively not share the data they have about me with Google+?

                                                        Facebook does the same thing, but secretly, and calls it “shadow profiles”. How successful do you think I’m going to be if I try to convince all my relatives to stay off of Facebook or not upload any photos or other data that includes me?

                                                        Aside from that, there is a whole bunch of social activity I have self-excluded myself from because I refuse to use Facebook (or one of its other properties). And I’m glad I’m no longer a student, because if I were, that wouldn’t even be possible. How successful do you think I’m going to be if I try to convince the people in my ballroom dance class not to use a WhatsApp group?

                                                        I’ve never so much as looked at the signup page for GMail, but GMail nevertheless has probably 80% of my mail, because it’s the mail service used by so many of the people I send mail to. Is it feasible for me to refuse to send mail to my boss because he uses GMail?

                                                        It goes on and on.

                                                        The problem is this: I do not exist in a vacuum.

                                                        I may have individual choice in what data about me I give to tech giants, but I can’t feasibly tell everyone I send an email to to not store my address in their address books (and certainly not in their inboxes, if their mail is hosted by GMail), or similarly control everything other people know about me.

                                                        I do all I can do block adtech and other tracking on the web and generally to keep my data profile low. But if my parents knowing who I am becomes a potential privacy breach, the problem isn’t with my individual choice and consent.

                                                        And I said that I may have individual choice in what data about me I give to tech giants – but actually I don’t. You dismiss the notion that it’s impossible to quit the tech giants, but it actually is. Check out this article series about trying to block all of one’s traffic to them, and specifically the experiences with Amazon and Microsoft: because they sell services to other businesses (in a very big way), you cannot possibly elect to avoid at least those two. And to a somewhat lesser extent the author found this to also apply to Google. (The only tech giant the author found feasible to opt out of is – surprised? – Apple.)

                                                        So:

                                                        start solving the problem by removing yourself from the public commentary

                                                        Yes, and move to the desert, and become unborn by your parents. If you aren’t willing to move to the desert and to have no relationships and to never participate in society, then you aren’t interested in privacy and shouldn’t be talking about it.

                                                        The emptiness of this notion of privacy is the point of Maciej’s essay.

                                                      1. 3

                                                        Glad to see this article brought to light again, I’ve long since switched to .tar.lz privately, but the new greatness now is … Zstd, I suppose? Is Zstd suitable for long-term archiving?

                                                        1. 2

                                                          it is an RFC now(rfc8478), so probably. zstandard is pretty nice since decompression is basically the same regardless of compression level(s), so you only pay the cost at create time.

                                                          1. 2

                                                            It is an RFC now, so probably.

                                                            Just because it’s an RFC doesn’t inherently mean anyone analysed the format under the same criteria as examined in this article. The question is, has anyone?

                                                            1. 1

                                                              There’s zchunk ( “A file format designed for highly efficient deltas while maintaining good compression” - https://github.com/zchunk/zchunk ) which description makes me think errors in a zstd stream might “kill” large blocks of the output. That’s only an uninformed supposition on my end though.

                                                        1. 10

                                                          This page is really painful to read: it’s quite aggressive towards the author of xz. The tone is really needlessly nasty. There are only elements against xz/lzma2, nothing in favor; it’s just criticism which conclusion is “use lzip [my software]”.

                                                          Numbers are shown the way they look bigger: “0.015% (i.e. nothing) to 3%” efficiency difference is then turned as “max compression ratio can only be 6875:1 rather than 7089:1” but that’s over 1TB of zeroes and only 3% relative to the compressed data, which amounts to a 4*10^-6 difference on the uncompressed data! (and if you’re compressing that kind of things, you might want to look at lrzip)

                                                          The author fails to understand that xz’s success has several causes besides compression ratio and the file format. It’s a huge improvement over gzip and bzip2 for packages. The documentation is really good and helps you get better results both with compression ratio and speed (see “man xz”). It is ported pretty much everywhere (that includes OS/2 and VMS iirc). It is stable. And so on.

                                                          As a side-note, this is the only place where I’ve seen compression formats being used for archiving and expecting handling of potential corruption. Compression goes against archiving. If you’re doing archiving, you’ll be using something that provides redundancy. But redundancy is what you eliminate when you compress. What is used for archiving of audio and video? Simple formats with low compression at best. The thing with lzip is that while its file format might be better suited for archiving, lzip itself as a whole still isn’t suited for archiving. And that’s ok.

                                                          Now, I just wish the author gets less angry. That’s one of the ways to a better life. Going from project to project and telling them they really should abandon xz in favor of lzip for their source code releases is only a proof of frustration and a painful life.

                                                          1. 6

                                                            The author fails to understand that xz’s success has several causes besides compression ratio and the file format.

                                                            But the author doesn’t even talk about that? All he has to say about adoption is that it happened without any analysis of the format.

                                                            Compression goes against archiving. If you’re doing archiving, you’ll be using something that provides redundancy. But redundancy is what you eliminate when you compress.

                                                            This sounds like “you can’t be team archiving if you are team compression, they have opposite redundancy stat”. It’s not an argument, or at least not a sensical one. Compression makes individual copies more fragile; at the same time, compression helps you store more individual copies of the same data in the same space. So is compression better or worse for archiving? Sorry, I’m asking a silly question. The kind of question I should be asking is along the lines of “what is the total redundancy in the archiving system?” and “which piece of data in the archiving system is the weakest link in terms of redundancy?”

                                                            Which, coincidentally, is exactly the sort of question under which this article is examining the xz format…

                                                            What is used for archiving of audio and video? Simple formats with low compression at best.

                                                            That’s a red herring. A/V archiving achieves only low compression because it eschews lossy compression and the data typically doesn’t lend itself well to lossless compression. Nevertheless it absolutely does use lossless compression (e.g. FLAC is typically ~50% smaller than WAV because of that). This is just more “team compression vs team archiving”-type reasoning.

                                                            The thing with lzip is that while its file format might be better suited for archiving, lzip itself as a whole still isn’t suited for archiving.

                                                            Can you actually explain why, rather than just asserting so? If lzip has deficiencies in areas xz does well in, could you step up and criticise what would have to improve to make it a contender? As it is, you seem to just be dismissing this criticism of the xz format – which as a universal stance would result in neither xz nor lzip improving on any of their flaws (in whatever areas those flaws may be in).

                                                            As a side-note, this is the only place where I’ve seen compression formats being used for archiving and expecting handling of potential corruption.

                                                            Juxtaposing this with your “author fails to understand” statement is interesting. Should I then say that you fail to understand what the author is even talking about?

                                                            This page is really painful to read: it’s quite aggressive towards the author of xz.

                                                            I saw only a single mention of a specific author. All the substantive statements are about the format, and all of the judgements given are justified by statements of fact. The very end of the conclusion speaks about inexperience in both authors and adopters, and it’s certainly correct about me as an adopter of xz.

                                                            There are only elements against xz/lzma2, nothing in favor; it’s just criticism which conclusion is “use lzip [my software]”.

                                                            Yes. The authors of xz are barely mentioned. They are certainly not decried nor vilified, if anything they are excused. It’s just criticism. That’s all it is. Why should that be objectionable? I’ve been using xz; I’m potentially affected by the flaws in its design, which I was not aware of, and wouldn’t have thought to investigate – I’m one of the unthinking adopters the author of the page mentions. So I’m glad he took the time to write up his criticism.

                                                            Is valid criticism only permissible if one goes out of one’s way to find something proportionately positive to pad the criticism with, in order to make it “fair and balanced”?

                                                            Frankly, as the recipient of such cushioned criticism I would feel patronised. Insulting me is one thing and telling me I screwed up is another. I can tell them apart just fine, so if you just leave the insults at home, there’s no need to compliment me for unrelated things in order to tell me what I screwed up – and I sure as heck want to know.

                                                            1. 2

                                                              The author fails to understand that xz’s success has several causes besides compression ratio and the file format.

                                                              But the author doesn’t even talk about that? All he has to say about adoption is that it happened without any analysis of the format.

                                                              Indeed, this is more a comment about what appears to be biterness from the author. This isn’t part of the linked page (although the tone of the article is probably a consequence).

                                                              Compression goes against archiving. If you’re doing archiving, you’ll be using something that provides redundancy. But redundancy is what you eliminate when you compress.

                                                              This sounds like “you can’t be team archiving if you are team compression, they have opposite redundancy stat”. It’s not an argument, or at least not a sensical one. Compression makes individual copies more fragile; at the same time, compression helps you store more individual copies of the same data in the same space. So is compression better or worse for archiving? Sorry, I’m asking a silly question. The kind of question I should be asking is along the lines of “what is the total redundancy in the archiving system?” and “which piece of data in the archiving system is the weakest link in terms of redundancy?”

                                                              Agreed. I’m mostly copying the argument from the lzip author. That being said, one issue with compression is that corruption on compressed data is amplified with no chance to be able to reconstruct the data, even by hand. Intuitively I would expect the best approach for archiving would be compression followed by adding “better” (i.e. more even) redundancy and error recovery (within the storage budget). Now, if your data has some specific properties, the best approach might be different, especially if you’re more interested in some parts (for instance if you have a progressive image, you might value more the less specific parts because losing the more specific ones implies only losing on the image resolution).

                                                              Which, coincidentally, is exactly the sort of question under which this article is examining the xz format…

                                                              What is used for archiving of audio and video? Simple formats with low compression at best.

                                                              That’s a red herring. A/V archiving achieves only low compression because it eschews lossy compression and the data typically doesn’t lend itself well to lossless compression. Nevertheless it absolutely does use lossless compression (e.g. FLAC is typically ~50% smaller than WAV because of that). This is just more “team compression vs team archiving”-type reasoning.

                                                              If you look for some stuff from archivists, FLAC isn’t one of the preferred format. It is acceptable but the preferred one still seems to be WAV/PCM.

                                                              Sources:

                                                              The thing with lzip is that while its file format might be better suited for archiving, lzip itself as a whole still isn’t suited for archiving.

                                                              Can you actually explain why, rather than just asserting so? If lzip has deficiencies in areas xz does well in, could you step up and criticise what would have to improve to make it a contender? As it is, you seem to just be dismissing this criticism of the xz format – which as a universal stance would result in neither xz nor lzip improving on any of their flaws (in whatever areas those flaws may be in).

                                                              I had intended the leading sentences to explain that. The reasonning is simply that compression is mostly at odds with long-term preservation by itself. As discussed above, proper redundancy and error recovery can probably turn that into a good match but then the qualities of the compression format itself don’t matter that much since the “protection” is done at another layer that is dedicated to that and also provides recovery.

                                                              As a side-note, this is the only place where I’ve seen compression formats being used for archiving and expecting handling of potential corruption.

                                                              Juxtaposing this with your “author fails to understand” statement is interesting. Should I then say that you fail to understand what the author is even talking about?

                                                              You’re obviously free to do so if you wish to. :)

                                                              This page is really painful to read: it’s quite aggressive towards the author of xz.

                                                              I saw only a single mention of a specific author. All the substantive statements are about the format, and all of the judgements given are justified by statements of fact. The very end of the conclusion speaks about inexperience in both authors and adopters, and it’s certainly correct about me as an adopter of xz.

                                                              Being full of facts doesn’t make the article objective. It’s easy to not mention some things and while the main author of xz/liblzma could technically answer, he doesn’t really wish to do so (especially since it would cause a very high mental load). That being said, I’ll take liberalities and quote from IRC where I basically only lurk nowadays (nicks replaced by “Alice” and “Bob”). This is a recent discussion, there were more detailled ones earlier but I’m not only taking the most recent one.

                                                              Bob : Alice the lzip html pages says that lzip compresses a bit better than xz. Can you tell me the technical differences that would explain that difference in size ?

                                                              Bob : Alice do you have ideas on how improving the size with xz ?

                                                              Alice : Bob: I think it used to be the opposite at least with some files since .lz doesn’t support changing certain settings. E.g. plain text (like source code tarballs) are slightly better with xz –lzma2=pb=0 than with plain xz. It’s not a big difference though.

                                                              Alice : Bob: Technically .lz has LZMA and .xz has LZMA2. LZMA2 is just LZMA with chunking which adds a slight amount of overhead in a typical situation while being a bit better with incompressible data.

                                                              Alice : Bob: With tiny files .xz headers are a little bloatier than .lz.

                                                              Alice : Bob: In practice, unless one cares about differences of a few bytes in either direction, the compression ratios are the same as long as the encoders are comparable (I don’t know if they are nowadays).

                                                              Alice : Bob: With xz there are extra filters for some files types, mostly executables. E.g. x86 executables become about 5 % smaller with the x86 BCJ filter. One can apply it to binary tarballs too but for certain known reasons it sometimes can make things worse in such cases. It could be fixed with a more intelligent filtering method.

                                                              Alice : Bob: There are ideas about other filters but me getting those done in the next 2-3 years seem really low.

                                                              Alice : So one has to compare what exist now, of course.

                                                              Bob : Alice btw, fyi, i have tried one of the exemples where the lzip guy says that xz throws an error while it shouldn’t

                                                              Bob : but it is working fine, actually

                                                              Alice : Heh

                                                              Two main points here: the chunking, the point of view that the differences are very small; and the fact that one of the complaint seems wrong.

                                                              If I look for “chunk” in the article, the only thing that comes up is the following:

                                                              But LZMA2 is a container format that divides LZMA data into chunks in an unsafe way. In practice, for compressible data, LZMA2 is just LZMA with 0.015%-3% more overhead. The maximum compression ratio of LZMA is about 7089:1, but LZMA2 is limited to 6875:1 approximately (measured with 1 TB of data).

                                                              Indeed, the sentence “In practice, for compressible data, LZMA2 is just LZMA with 0.015%-3% more overhead.” is probably absolutely true. But there is no mention of what happens for uncompressible data. I can’t tell whether that omission was voluntary or not but it makes this paragraph quite misleading.

                                                              Note that xz/liblzma’s author acknowledges some of the points of lzip’s author, but not the majority of them.

                                                              There are only elements against xz/lzma2, nothing in favor; it’s just criticism which conclusion is “use lzip [my software]”.

                                                              Yes. The authors of xz are barely mentioned. They are certainly not decried nor vilified, if anything they are excused. It’s just criticism. That’s all it is. Why should that be objectionable? I’ve been using xz; I’m potentially affected by the flaws in its design, which I was not aware of, and wouldn’t have thought to investigate – I’m one of the unthinking adopters the author of the page mentions. So I’m glad he took the time to write up his criticism.

                                                              Is valid criticism only permissible if one goes out of one’s way to find something proportionately positive to pad the criticism with, in order to make it “fair and balanced”?

                                                              I concur that writing criticism is a good thing but the article is not really objective and probably doesn’t try to be. In an ideal world there would be a page with rebuttals from other people. In a real world, that would probably start a flamewar and the xz/libzma author does not wish to get involved into that.

                                                              I’ve just looked up the author name + lzip and first result is: https://gcc.gnu.org/ml/gcc/2017-06/msg00044.html “Re: Steering committee, please, consider using lzip instead of xz”.

                                                              Another scary element is that nor “man lzip” nor “info lzip” mention “xz”. They mention gzip and bzip2 but not xz (“Lzip is better than gzip and bzip2 from a data recovery perspective.”). Considering the length of this article, not seeing a single mention of xz makes me think the lzip author does not have a peaceful relation to xz.

                                                              You might think that the preference of lzip in https://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html would be a good indication but the author of that manual is also lzip’s author!

                                                              And now scrolling down my search results, I see https://lists.debian.org/debian-devel/2015/07/msg00634.html “Re: Adding support for LZIP to dpkg, using that instead of xz, archive wide” and the messages there again make me think he doesn’t have a peacfeful relation to xz.

                                                              I don’t like criticizing authors but with this one-way article with surprising omissions and incorrect elements (no idea if that’s because things changed at some point), I think more context (and an author’s personnality and history are context) helps decide how much you trust the whole article.

                                                              Frankly, as the recipient of such cushioned criticism I would feel patronised. Insulting me is one thing and telling me I screwed up is another. I can tell them apart just fine, so if you just leave the insults at home, there’s no need to compliment me for unrelated things in order to tell me what I screwed up – and I sure as heck want to know.

                                                              Yes, it’s cushioned because I don’t like criticizing authors as I said above so I’m uncomfortable doing, I try to avoid doing it but sometimes that’s not something we can separate from a topic or article so I still ended up doing it at least a bit (you can now see that I did it as little as possible in my previous message). With that being said, I don’t think the author needs to be told all of this, or at least I don’t want to start such a discussion with the author who seems to be able to go on for years (and tbh, I’m not sure that’s healthy for him).

                                                              edit: fixed formatting of the IRC quote

                                                            2. 3

                                                              As a side-note, this is the only place where I’ve seen compression formats being used for archiving and expecting handling of potential corruption. Compression goes against archiving. If you’re doing archiving, you’ll be using something that provides redundancy.

                                                              This is not true at all. [Edit: Most of the widely used professional backup and recovery software that was specifically designed for long-term archiving also included compression as an integral part of the package, and advertised it’s ability to work in a robust manner.]

                                                              BRU for UNIX, for example, does compression, and is designed for archiving and backup. This tool is from 1985 and is still maintained today.

                                                              Afio is specifically designed for archiving and backup. It also supports redundant fault-tolerant compression. This tool is also from 1985 and is still maintained today.

                                                              [Edit: LONE-TAR is another backup product I remember using from the mid 1980s, was originally produced by Cactus Software. It’s still supported and maintained today. It provided a fault-tolerant compression mode, so it would be able to restore (most) data even if there was damage to the archive.]

                                                              As to all your other complaints, it seems you are attacking the documents “aggressive tone” and you mention that you find it painful (or offensive) to read, but you haven’t actually refuted any of the technical claims that author of the article makes.

                                                              1. 1

                                                                Sorry, I had compression software in mind when I wrote that sentence. I meant that I had never seen a compression software that made the resistance to corruption such an important feature.

                                                                Thanks for the links! I’m not that surprised that there are some pieces of software that already exist and fit in that niche (I would have had to build a startup otherwise!). I’m quite curious at their tradeoff choices (in space vs. recovery capabilities) but since two of them are proprietary, I’m not sure there is one unfortunately.

                                                                As to all your other complaints, it seems you are attacking the documents “aggressive tone” and you mention that you find it painful (or offensive) to read, but you haven’t actually refuted any of the technical claims that author of the article makes.

                                                                Indeed. Part of that is because comments are probably not really a good place for that because the article itself is very long. The other part is because xz’ author does not wish to get into that debate and I don’t want to pull him in by publishing his answers on IRC on that topic. It’s not a great situation and I don’t really know what to do so I end up hesitating. Not perfect either. I think I mostly only hope to get people to question a bit the numbers and facts on that page and to not forget everything else that goes into making a file format useful in practice and it’s not because there’s no rebuttal that the article is true, spot-on, unbiaised and so on.

                                                              2. 2

                                                                I agree about the tone of the article, but I’m not sure that archiving and compression run counter to each other.

                                                                I’ve spent a lot of time digging around for old software, in particular to get old hardware running, but also to access old data. Already we are having to dig up software from 20+ years ago for these things.

                                                                In another 20 years, when people need to do the same job, it will be more complicated: if you need to run one package, you may find yourself needing tens or worse of transitive dependencies. If you’re looking in some ancient Linux distribution mirror on some forgotten server, what are the chances that all the bits are still 100% perfect? And certainly nobody’s going to mirror all these in some uncompressed format ;-)

                                                                This is one case where being able to recover corrupted files is important. It’s also helpful to be able to do best-effort recovery on these; in any given distro archive you can live with corruption in some proportion of the resulting payload bytes - think of all the documentation files you can live without - but if a bit error causes the entire rest of the stream to be abandoned then you’re stuffed.

                                                                I’d argue that archival is something we already practice in everyday release of software. The way people tend to name release files as packagename-version.suffix is a good example: it makes the filename unique and hence very easy to search for in the future. And here, picking one format over another where it has better robustness for future retrievers seems pretty low-cost. It’s not like adding parity data or something that increases sizes.

                                                                1. 2

                                                                  Agreed. :)

                                                                  Makes me think of archive.org and softwareheritage.org (which already has pretty good stuff if I’ve understood correctly).

                                                              1. 5

                                                                Is it just me or does his argument about proprietary software and free software hinge on the equivocation of proprietary software licenses protecting its authors against users being the same thing as free software licenses protecting its authors against companies?

                                                                In the article he uses the word “users” in both cases, but everyone knows free software licenses don’t protect authors from the end-users. Free software licenses don’t protect the author at all, except in so far as the author is just another user.

                                                                Edit: OK fine; free software licenses do protect the author from the user from a liability perspective, but that’s unrelated to the argument in the article.

                                                                1. 4

                                                                  I thought the same thing immediately and am surprised no one else commented on it.

                                                                  DHH claims Stallman was afraid he wouldn’t get modifications back, when in fact Stallman was afraid he wouldn’t be able to ensure his software being given away – because others could first make modifications to it before passing it on and then hold those modifications back as a means to make their users dependent on themselves. Legally enforcing the publication of those modifications wasn’t so Stallman could get them, it was so that users could get them.

                                                                  DHH tries to get in front of objections by saying this:

                                                                  You might find this comparison a stretch (or even offensive), and I’m sympathetic to the challenge.

                                                                  But I find it neither a stretch nor offensive; I simply find it entirely invalid. That doesn’t invalidate the argument that that there is a shared architecture in Gates’ and Stallman’s strategies, which is in fact interesting to consider. But it does mean that the extrapolation from that shared architecture to a shared mindset of scarcity is unsound.

                                                                  Indeed, while such a mindset can be attributed to Gates, Stallman’s motivation is, in fact, a mindset of abundance and a fear of manufactured scarcity, as a look at their “origin stories” makes clear: Bill Gates’ Open Letter to Hobbyists vs Stallman’s printer driver anecdote.

                                                                  The whole article hinges on a complete mischaracterisation of what copyleft licenses demand. DHH claims that a user who modifies the software must give those modifications back – which, highly ironically, is exactly the old Microsoft FUD scare tactic against the GPL. The reality is that modifications have to be given back (in source form) only if those modifications are being distributed already (in build artefact form).

                                                                  All DHH has to say to that (indirectly, so I’m paraphrasing) is “well nobody ever attempted a proprietarily controlled fork of Rails”. It’s certainly interesting to consider that one particular case, but it does nothing to address the known cases where such a thing has happened, nor Stallman’s motivating experience.

                                                                  1. 2

                                                                    I agree with this. I find the article in its entirety interesting as a whole, but the lumping together of RMS and Gates is unfair. DHH could develop Rails and give it away for free and build a reputation and a company on it because Stallman thought through the consequences of Free Software and carefully designed the GPL to address a bunch of issues.

                                                                    In other words, he’s arguing from a position of privilege he owes in large part to RMS.

                                                                    I have a lot of sympathy for the views of the BSD/MIT license crowd. The GPL isn’t for everyone. But without RMS’ tireless advocacy there would be no Free Software or Open Source. The BSD/MIT licenses software would be developed in universities and be monetized by companies, and kept away from users outside universities and companies. RMS expanded the userbase of Free Software for everyone, creating the ecosystem in which DHH could even consider writing Rails.

                                                                1. 5

                                                                  2014 for anyone who is wondering roughly how old this blog post is. I think it ultimately fails to address the whole reason why people seek out the Learn X in Y hours books which is to learn enough to get a job so that they can be trained on the job.

                                                                  1. 4

                                                                    This article also makes the mistake of assuming “Learn X in 24 Hours” is about learning to be a software developer. It’s not. It’s about learning the basics and gotchas of a specific language. I went to college to learn the stuff he rattles off like the realities of the hardware limitations. But I actually had Learn C++ in 24 Hours, because I needed to look up simple stuff like how C++ does vectors vs Java.

                                                                    Nowadays you just look that up in a language’s tutorial docs or on Stack Overflow, but these books served that function back then.

                                                                    Also, not all jobs that require a bit of coding skill require you to be a full software developer. Visual Basic is reviled for unmaintainable spaghetti code–and yet those programs did mission critical functions for business.

                                                                    1. 2

                                                                      Also, not all jobs that require a bit of coding skill require you to be a full software developer. Visual Basic is reviled for unmaintainable spaghetti code–and yet those programs did mission critical functions for business.

                                                                      Still do! Production Driven Development is what I call it. It’s buggy as hell for the first year but after 20 years it becomes rock solid. While it’s completely insufferable in the process once it becomes so extremely battle tested it can be hard to replace because “It works and doesn’t have any bugs”.

                                                                    2. 1

                                                                      2014 for anyone who is wondering roughly how old this blog post is

                                                                      Uh. I don’t know where you got 2014 from, but it’s rather older than that.

                                                                    1. 1

                                                                      These days I just use dns-based blocklists. Just run dnsmasq with an adblock blocklist locally, or on your home network with a raspberry pi.

                                                                      1. 5

                                                                        You say that, and it’s fine for some sites, but a lot of them have anti-adblock scripts baked in alongside the site logic. The only way you’re going to work around that is with redirect rules, like what uBlock Origin does. It also isn’t possible to do annoyance removal, like getting rid of fixed banners, using DNS.

                                                                        1. 3

                                                                          For the sites that it doesn’t work for, I close the tab and move on. It wasn’t worth my time anyway.

                                                                          1. 1

                                                                            To me, attempting to get blanket web-wide annoyance removal feels like freeloading. That’s not why I block ads. It’s my prerogative to avoid privacy invasion, malware vectors, and resource waste; if the site owner goes to lengths to make it hard to get the content without those, that’s their prerogative, and I just walk away. I’m not going to try to grab something they don’t want me to have. (The upshot is that I don’t necessarily even use an ad-blocker, I simply read most of the web with cookies and Javascript disabled. If a page doesn’t work that way, too bad, I just move on.)

                                                                            1. 1

                                                                              I figure that living in an information desert of my own making is not a very effective form of collective action. There simply aren’t enough ascetics to make it worth an author’s time testing their site with JavaScript turned off. And if it isn’t tested, then it doesn’t work. If even Lobsters, a small-scale social site that you totally could’ve boycotted, can get you to enable JavaScript, then it’s a lost cause. Forget about getting sites with actual captive audiences to do it.

                                                                              People need to encourage web authors to stop relying on ad networks for their income, and they need to do it without becoming “very intelligent”. An ad blocker that actually works, like uBlock Origin, is the only way I know of to do that; it allows a small number of people (the filter list authors) to sabotage the ad networks at scale, in a targeted way.

                                                                              1. 1

                                                                                Thank you for bringing up Mr. Gotcha on your own initiative, because that sure feels like what you’re doing to me here. “You advocate for browsing with Javascript off. Yet you still turn it on in some places yourself.”

                                                                                That’s also my objection to the line of argument advanced in the other article you linked: “JavaScript is here. It is not going away. It does good, useful things. It can also do bad, useless, even frustrating things. But that’s the nature of any tool.” I’m sorry, but the good-and-useful Javascript I download daily is measured in kilobytes; the amount of ad-tech Javascript I would be downloading if I didn’t opt out would be measured in at least megabytes. That’s not “just like I can show you a lot of ugly houses”; it inverts the argument to “sure, 99.9% of houses are ugly but pretty ones do exist as well, you know”. Beyond that, it’s a complete misperception of the problem to advocate for “develop[ing] best practices and get[ting] people to learn how to design within the limits”. The problem would not go away if webdevs stopped relying on Javascript, because the problem is not webdevs using Javascript, the problem is ad-tech. (And that, to respond to Mr. Gotcha, is why I enable JS in some places, even if I mostly keep it off.)

                                                                                In that respect I don’t personally see how “if you insist on shovelling ads at me then I’ll just walk away” is a lesser signal of objection than “then I’ll crowdsource my circumvention to get your content anyway”. But neither seems to me like a signal that’s likely to be received by anyone in practice anyway, and I think you operate under an illusion if you are convinced otherwise. I currently don’t see any particularly effective avenue for collective action in this matter, and I perceive confirmation of that belief in the objectively measurable fact that page weights are inexorably going up despite the age and popularity of the “the web is getting bloated” genre. All webbie/techie people agree that this has to stop, and have been agreeing for years, yet it keeps not happening, and instead keeps getting worse. Maybe because business incentives keep pointing the other way and defectors keep being too few to affect that.

                                                                                Until and unless that changes, all I can do is find some way of dealing with the situation as it concerns me. And in that respect I find it absurd to have it suggested that I’m placing myself in any sort of “information desert of my own making”. Have you tried doing what I do? You would soon figure out that the web is infinite. Even if I never read another JS-requiring page in my life, there is more of it than I can hope to read in a thousand lifetimes. Nor have I ever missed out on any news that I didn’t get from somewhere else just as well. The JS-enabled web might be a bigger infinity than the non-JS-enabled web (I am not even sure of that, but let’s say it is), but one infinity’s as good as another to this here finite being, thank you.

                                                                                1. 2

                                                                                  But neither seems to me like a signal that’s likely to be received by anyone in practice anyway.

                                                                                  I, personally, can handle a script blocker and build my own custom blocking list just fine. I can’t recommend something that complex to people who don’t even really know what JavaScript is, but I can recommend uBlock Origin to almost anyone. They can install it and forget about it, and it makes their browser faster and more secure, while still allowing access to their existing content, because websites are not fungible. Ad networks are huge distributors of malware, and I don’t mean that in the “adtech is malware” sense, I mean it in the “this ad pretends to be an operating system dialog and if you do what it says you’ll install a program that steals your credit card and sells it on the black market.” I find it very easy to convince people to install ad blockers after something like that happens, which it inevitably does if they’re tech-illiterate enough to have not already done something like this themselves.

                                                                                  uBlock Origin is one of the top add-ons in Chrome and Firefox’s stores. Both sites indicated millions of users. Ad blocker usage is estimated to be between 20% in the United States, 30% in Germany, and around that spot in other countries, while the percentage of people who browse without JavaScript is around 1%. I can show you sites with anti-adblock JavaScript, that doesn’t run when JavaScript is turned off entirely and so can be defeated by using NoScript, indicating that they’re more concerned about ad blocker than script blockers. Websites that switched to paywalls cite lack of profitability from ads, caused by a combination of ad blockers and plain-old banner blindness.

                                                                                  Don’t be fatalistic. The current crop of ad networks is not a sustainable business model. It’s a bubble. It will burst, and the ad blockers are really just symptomatic of the fact that noone with any sense trusts the content of a banner ad anyway.

                                                                                  1. 1

                                                                                    Oh, absolutely. For tech-illiterate relatives for whom I’m effectively their IT support, I don’t tell them to do what I do. Some of them were completely unable to use a computer before tablets with a touchscreen UI come out – and still barely can, like having a hard time even telling text inputs and buttons apart. Expecting them to do what I do would be a complete impossibility.

                                                                                    I run a more complex setup with minimal use of ad blocking myself, because I can, and therefore feel obligated by my knowledge. And to be clear, for the same reason, I would prefer if it were possible for the tech-illiterate people in my life to do what I do – but I know it simply isn’t. So I don’t feel the same way about those people using crowdsourced annoyance removal as I’d feel about using it myself: I’m capable of using the web while protecting myself even without it; they aren’t.

                                                                                    It’s a bubble.

                                                                                    I’m well aware. It’s just proven to be a frustratingly robust one, quelling several smaller external shifts in circumstances that could have served as seeds of its destruction – partly why I’m pessimistic about any attempt at accelerating its demise from the outside. Of course it won’t go on forever, simply because it is a bubble. But it’s looking like it’ll have to play itself out all the way. I hope that’s soon, not least because the longer it goes, the uglier the finale will be.

                                                                                    And of course I would love for reality to prove me overly pessimistic on any of this.

                                                                          2. 2

                                                                            I use /etc/hosts as a block list, but it’s a constant arms race with new domains popping up. I use block lists like http://someonewhocares.org/hosts/hosts and https://www.remembertheusers.com/files/hosts-fb but I don’t want to blindly trust such third-parties to redirect arbitrary domains in arbitrary ways.

                                                                            Since I use NixOS, I’ve added a little script to my configuration.nix file which, when I build/upgrade the system, downloads the latest version of these scripts, pulls the source domain out of each entry, and writes an /etc/hosts that sends them all to 127.0.0.1. That way I don’t have to manually keep track of domains, but I also don’t have to worry about phishing, since the worst that can happen is that legitimate URLs (e.g. a bank’s) get redirected to 127.0.0.1 and error-out.

                                                                            1. 2

                                                                              For anyone interested in implementing this without pi-hole, I have a couple scripts on github which might help. I adapted them from the pi-hole project awhile back when I wanted to do something a bit less fully-featured. They can combine multiple upstream lists, and generate configurations for /etc/hosts, dnsmasq, or zone files.

                                                                            1. 8

                                                                              The attackers did not get in through a security flaw in Matrix itself, but via an outdated Jenkins.

                                                                              The nice thing about Matrix is that it is federated (much like e-mail is); there’s no reason to use the matrix.org server. Instead, you can use your own trusted home server. Another reason to use your own is that the Matrix main server tends to be pretty overloaded so everything is quite slow.

                                                                              1. 3

                                                                                I mean, it doesn’t saying anything about the quality of the Matrix codebase as such, but some things do make you wonder about the level of understanding that the people working on it bring to it:

                                                                                Attacker gains access to production infrastructure by hijacking a forwarded SSH agent logging into the compromised Jenkins worker

                                                                                … and the corresponding Issues stop just short of overtly saying that that account had root access on all servers. The picture painted by that combination of a handful of facts makes me really wary…

                                                                                1. 0

                                                                                  It looks like the core parts of the protocol (including E2E encryption) could now be under takeover-in-progress by French national security agencies.

                                                                                  New Vector doesn’t look likely to say no to one more reasonably-looking paid change request from France, and also not likely to find security implications if there is a nation-state effort to hide them. Some of the incentives are good, though (it was promised that French goverment agencies will have an option of limited external federation with mainline installations; the current public story even claims that the agencies will run the mainline server code).

                                                                                  For some bugfixes in Synapse, it is worrying the the pre-fix state was widely deployed for months…

                                                                                  1. 3

                                                                                    So, speaking as the project lead for Matrix, this really isn’t true. ANSSI (the french cybersecurity agency) have not made any code contributions to the E2E crypto or the protocol itself. If they did make contributions, we’d audit them ourselves incredibly carefully, if we even accepted them at all. So far we’ve minimised the number of contributors to libolm (the E2E library) to a very small set of individuals.

                                                                                    To be clear: we would be utterly stupid to sabotage Matrix by somehow giving any entity (whether that’s France, or New Vector, or Ericsson or whoever) a competitive advantage, no matter how much they offered to pay us. We aren’t building Matrix out of short-termist greed, but long-term altruism - to build an open global standard for comms. And we are not naive, and are very aware that some parties might try to sneak in trojan horses, and will do everything to fight them. We’ve spent a lot of time trying to codify this into the governance of the Matrix.org Foundation over at https://github.com/matrix-org/matrix-doc/blob/matthew/msc1779/proposals/1779-open-governance.md.

                                                                                    Now, in terms of some of Synapse’s long-lived pre-1.0 bugs being concerning (e.g. state resets; fungible event IDs)… this is true. But we were fixing these independently of the French deployment as part of our existing 1.0 roadmap. The only difference is that we knew we had to have them fixed in time for France to go live. The actual design and solution and code was written entirely by us, and France does run off the same public synapse tree as everyone else and so got them at the same time with no privileged access.

                                                                                    So, TL;DR: there is categorically not a takeover-in-progress, and it’s very unfortunate if it seems that way from the outside. Instead, DINSIC have been amazingly good open source citizens, and I only wish all government ministries operated like this.

                                                                                    1. 1

                                                                                      I am indeed surprised that none of the bugs that looked like hotfixes (and persisted for months) were from ANSSI audit. Interesting, thanks for commenting about that.

                                                                                      I only consider the threat model of them managing to find and explain a real problem in a way that leads to a natural fix that has unintended complicated implications — but hopefully they don’t want to create a risk for something they will also be expected to run.

                                                                                      I actually hoped they already started pushing the bug reports — a fresh set of qualified reviewers who have an actual motivation to find out if the protocol works has a good chance of being useful.

                                                                                      Sorry for not wording some parts of my comment well, and thanks for the clarifications that they haven’t yet gave you the results of their (hopefully already ongoing) audit.

                                                                                      1. 3

                                                                                        So to be clear - there have been at least 3 audits, but primarily around the infrastructure of the deployment rather than the E2EE in matrix, mainly because Matrix is obviously still developing rapidly. Once there’s a full audit on the Matrix side I’d push for it to be public, just like our US-government-funded E2EE audit from 2016.

                                                                                  1. 2

                                                                                    Ugh. I’ve thought about opening up the Jenkins server on one of my open source projects to the world, but hesitate due to crap like this. Even though I update it regularly, the risk seems high just to have public build statuses (which I could probably just proxy through something else).

                                                                                    I also hate how you have you mount the underlying machine’s docker socket to allow docker build agents. Surely there’s got to be a better user-mode docker solution.

                                                                                    1. 2

                                                                                      Yeah, that’s unfortunate. Maybe you could script it to generate some static artifacts and then drop them somewhere web accessible?