1. 20

    I love plain text protocols, but … HTTP is neither simple to implement nor neither fast to parse.

    1. 7

      Yeah the problem of parsing text-based protocols in an async style has been floating around my head for a number of years. (I prefer not to parse in the async or push style, but people need to do both, depending on the situation.)

      This was motivated by looking at the nginx and node.js HTTP parsers, which are both very low level C. Hand-coded state machines.


      I just went and looked, and this is the smelly and somewhat irresponsible code I remember:

      https://github.com/nodejs/http-parser/blob/master/http_parser.c#L507

      /* Proxied requests are followed by scheme of an absolute URI (alpha).

      • All methods except CONNECT are followed by ‘/’ or ‘*’.

      I say irresponsible because it’s network-facing code with tons of state and rare code paths, done in plain C. nginx has had vulnerabilities in the analogous code, and I’d be surprised if this code didn’t.


      Looks like they have a new library and admit as much:

      https://github.com/nodejs/llhttp

      Let’s face it, http_parser is practically unmaintainable. Even introduction of a single new method results in a significant code churn.

      Looks interesting and I will be watching the talk and seeing how it works!

      But really I do think there should be text-based protocols that are easy to parse in an async style (without necessarily using Go, where goroutines give you your stack back)

      Awhile back I did an experiment with netstrings, because length-prefixed protocols are easier to parse async than delimiter-based protocols (like HTTP and newlines). I may revisit that experiment, since Oil will likely grow netstrings: https://www.oilshell.org/release/0.8.7/doc/framing.html


      OK wow that new library uses a parser generator I hadn’t seen:

      https://llparse.org/

      https://github.com/nodejs/llparse

      which does seem like the right way to do it: do the inversion automatically, not manually.

      1. 4

        Was going to say this. Especially when you have people misbehaving around things like Content-Length, Transfer-Encoding: chunked and thus request smuggling seems to imply it’s too complex. Plus, I still don’t know which response code is appropriate for every occasion.

        1. 2

          Curious what part of HTTP you think is not simple? And on which side (client, server)

          1. 5

            There’s quite a bit. You can ignore most of it, but once you get to HTTP/1.1 where chunked-encoding is a thing, it starts getting way more complicated.

            • Status code 100 (continue + expect)
            • Status code 101 - essentially allowing hijacking of the underlying connection to use it as another protocol
            • Chunked transfer encoding
            • The request “method” can technically be an arbitrary string - protocols like webdav have added many more verbs than originally intended
            • Properly handling caching/CORS (these are more browser/client issues, but they’re still a part of the protocol)
            • Digest authentication
            • Redirect handling by clients
            • The Range header
            • The application/x-www-form-urlencoded format
            • HTTP 2.0 which is now a binary protocol
            • Some servers allow you specify keep-alive to leave a connection open to make more requests in the future
            • Some servers still serve different content based on the User-Agent header
            • The Accept header

            There’s more, but that’s what I’ve come up with just looking quickly.

            1. 3

              Would add to this that it’s not just complicated because all these features exist, it’s very complicated because buggy halfway implementations of them are common-to-ubiquitous in the wild and you’ll usually need to interoperate with them.

              1. 1

                And, as far as I know, there is no conformance test suite.

                1. 1

                  Ugh, yes. WPT should’ve existed 20 years ago.

              2. 2

                Heh, don’t forget HTTP/1.1 Pipelining. Then there’s caching, and ETags.

            2. 2

              You make a valid point. I find it easy to read as a human being though which is also important when dealing with protocols.

              I’ve found a lot of web devs I’ve interviewed have no idea that HTTP is just plain text over TCP. When the lightbulb finally goes on for them a whole new world opens up.

              1. 4

                It’s interesting to note that while “original HTTP” was plain text over TCP, we’re heading toward a situation where HTTP is a binary protocol run over an encrypted connection and transmitted via UDP—and yet the semantics are still similar enough that you can “decode” back to something resembling HTTP/1.1.

                1. 1

                  UDP? I thought HTTP/2 was binary over TCP. But yes, TLS is a lot easier thanks to ACME cert issues and LetsEncrypt for sure.

                  1. 2

                    HTTP/3 is binary over QUIC, which runs over UDP.

              2. 1

                SIP is another plain text protocol that is not simple to implement. I like it and it is very robust though. And it was originally modeled after HTTP.

              1. 2

                Hi, creator here, thanks for submitting and please share any feedback or thoughts you might have!

                1. 21

                  Without more context it’s difficult at a glance to know how to interpret “Harmful”.

                  It looks like it’s saying “Mozilla’s implementation of the Serial API is harmful” but it sounds like what it’s actually saying is “Mozilla considers the Serial API to be harmful” which is very different!

                  1. 1

                    Hmm, yeah, good point. The wording is taken from their own site which is linked to when one clicks on the status.

                    Suggestion on how it could be improved?

                    1. 8

                      Personally I have no idea what this website is about. Perhaps add a few lines on top that explain what it is?

                      1. 1

                        Further down I’ve written:

                        observations of APIs with controversy around them and where hard facts has often been hard to find

                        Maybe replacing/extending the current “Background” in the top with something similar? Maybe like this:

                        Gathering of Web API specifications that has caused controversy among browser vendors, giving them relevant context

                        1. 3

                          I think you need even more context than that. What is Web API? Why is it controversial? The nice thing about the FAQ format is that you can spend the first 1–3 items answering questions like these and anyone who already has this context can just skip over them.

                          A design note—the text in the FAQ expands to fill the full width of the screen (or at least the 1,280 pixels of my browser window), and there is also no margin between the text and the edge of the screen. Both of these things make the text harder to read. You might consider limiting the width of the text to 800 px or 40 em (very approximate numbers) and, on smaller screens, adding at least 10 px of whitespace on either side.

                          1. 1

                            Suggestion on wording and such is much appreciated, this is just something I threw together quickly in an afternoon to try and gather references in these topics :)

                      2. 3

                        Perhaps you’d consider changing the colour scheme? To me, green = GOOD and red = BAD, which makes it hard to understand what’s actually going on at first glance.

                        1. 1

                          In what way? Green = positive about the state of the spec, Red = negative about the state of the spec, isn’t that the correct way?

                          1. 4

                            I think the problem is that it’s not immediately obvious that this ‘judgement’ of good vs bad is about the spec. At first glance this just looks like chrome has everything green and is thus good, while firefox/safari have everything red and are thus bad.

                            1. 1

                              Yeah, good feedback, will try to find time to improve it asap

                        2. 2

                          “Harmful to Users” or “Deemed Harmful to Users” perhaps? The key point that needs communicating is that Mozilla has determined that implementing the spec would be harmful to its own users, e.g. someone might use the serial api to modify their insulin delivery device.

                          1. 1

                            Well, Mozilla’s own description of their “Harmful” label is “Mozilla considers this specification to be harmful in its current state.”

                            That the focus is on the spec, not the implementation should get clarified

                            1. 5

                              “Harmful” doesn’t mean anything on its own, you have to tell a person what is being harmed.

                              1. 1

                                Totally, but that’s better explained by the ones considering it to be harmful than for me to try and summarize and maybe misinterpret

                                1. 1

                                  By using the single word “Harmful” I’d argue that you have summarized. It’s just that that summary is ambiguous and prone to misinterpretation, as others in this thread have pointed out.

                                  1. 1

                                    How would you summarize it better? I would love to do it better

                                    1. 1

                                      Maybe “Mozilla considers it harmful” or “No plans to implement”? I know these are more wordy than what you’ve got now, but I can’t think of a shorter bit of text that still conveys the right meaning.

                            2. 1

                              Intentionally omitted.

                            3. 2

                              Added an issue for it to ensure it doesn’t get lost: https://github.com/voxpelli/webapicontroversy.com/issues/1

                              1. 1

                                Perhaps “Rejected” instead of “Harmful”?

                                Though Mozilla themselves refer to it as “harmful”.

                                1. 1

                                  They often haven’t rejected the specs though, rather they have found that they in their current state would be harmful to the web.

                                  Remember: All of these specs are drafts and still under discussion, even though Chrome has decided to ship them

                            4. 2

                              I think it would be nice to link to discussions directly in the details, e.g. https://github.com/mozilla/standards-positions/issues/336

                              1. 2

                                I prefer to link to the most official kind of reference and have it refer to the discussions they feel are relevant, feels like that has a better chance of being up to date and staying as objective as possible

                            1. 3

                              Another tip for catching invalid uses of IDs is to use a different autoincrement value for each type (ideally different primes), so increase user IDs by 3, page IDs by 7, widget IDs by 11, and so on. This way they won’t overlap with each other. If you ever mix them up, you’re more likely to notice when program fails due to “missing” data than if it continued with incorrect data.

                              1. 4

                                ideally different primes

                                Note that the increments must be relatively prime to each other or else you will get collisions. Making them all prime is the easiest way to ensure this.

                                1. 1

                                  Actually, disregard what I said, it’s nonsense. If you offset user IDs by 3 and page IDs by 5 then 15, 30, 45, … are still going to appear in both sequences.

                                  If you want non-overlapping sequences of IDs one way to get them is to use powers of distinct primes, so for example the user IDs would be 3¹, 3², 3³, etc., the page IDs would be 5¹, 5², 5³, and so on. The downside is that you’ll exhaust your storage space quickly: there are only 63 powers of 2 that fit into 64 bits (I’m excluding 2⁰, since 2⁰ = 3⁰ = 5⁰ = …), only 40 powers of 3, and only 27 powers of 5.

                                  All in all, I heartily recommend solving this problem in a different way.

                                2. 1

                                  This will get more complex if you need to do replication with active:active(:active…), - which I’d argue is any site that has an uptime goal of anything above DILIGAF - as offsets are used to avoid simple conflicts in auto increments.

                                1. 7

                                  If you’re using integer IDs to identify data or objects, don’t start your IDs at 1. Start them at a huge number (e.g., 2^33) so that no object ID will ever appear in any other role in your application (like a count, a natural index, a byte size, a timestamp, etc).

                                  Passing around random integers and logic like “well it’s somewhere in the order of eight and a half billion, so it must be a user” sounds like a really fucking shitty way to write most programs - both in terms of making assumptions, and in terms of developer productivity.

                                  Ok, very memory sensitive, massively concurrent systems will see a noticeable operational benefit to passing around an integer, rather than an Object, but I’d wager that 99.9% of people will never work on such a project. Even if you don’t want to go full on Model Objects, at least use wrapped integers for your IDs (e.g. class UserID { public int $id } - and then your methods (or your global functions if that’s your kink) can at least typehint to require a UserID, so a function hypothetical get_friends(UserID $id): array will throw immediately if you pass in say a PhotoID, or a GroupID, or any other random integer.

                                  1. 7

                                    well it’s somewhere in the order of eight and a half billion, so it must be a user

                                    That’s not what they’re saying: What the article states is “if you put your user IDs outside the accidentally reachable number space, accidentally trying to parse a small number won’t hand out some user data”. At no point is that arrangement supposed to mean if x >= large_num { x_is_user = true; }

                                    1. 4

                                      Like I said: relying on an arbitrary integer being “outside accidentally reachable space” sounds like a fucking terrible idea, rather than just, you know, using the type system available to you, to say “hey we need a fucking User ID, not just any random integer”.

                                      You (the proverbial you not you specifically) may as well also propose using ranges of integers starting at each billion, for different object types, so you can do away with foreign keys in your RDBMS.

                                      Let me put this another way: if your codebase is written in such a way that you’re relying on user ID’s being some magically unique number, not appearing in any other form, to provide any semblance of security or privacy, you’ve already failed.

                                      1. 3

                                        It’s very poorly explained, but the idea is that you have a bunch of things indexed by some kind of numerical ID. If your language doesn’t give you nice unit types, it’s very easy to confuse integer-representing-a-Foo-ID and integer-representing-a-Bar-ID (and loop induction variable that was supposed to be an index into an array of Foo IDs, and many other things). If you start everything at 0, then an accidental type confusion in your program will probably still find a valid thing. If you start both at different, moderately large, random indexes, then type confusion will probably trigger some kind of thing-not-found error. This is much easier to find in testing: the observable failure is close to the bug.

                                        It’s not about segregating the types and guaranteeing that different numerical ranges refer to different types, it’s about finding the errors where you make the confusion.

                                        If you were doing this in a language like C++, you’d have a separate type for each of these IDs and mark the casts to and from ints as explicit, so you’d have to explicitly write something like: ProductID id(sessionid.as_integer()) and that would be likely to be picked up in code review. PHP doesn’t really help you here.

                                        1. 2

                                          PHP doesn’t really help you here.

                                          Like I said, if your scale is such that passing around actual model instances isn’t feasible (so already a pretty slim minority of the software world), a wrapper class with a single integer property is going to use minimal memory, and still lets you use types to only accept/return an object that is “known” to be a User ID, or Product ID or whatever.

                                          I’m not sure why you would think this won’t work in PHP, or pretty much any language that has even the most basic concept of classes.

                                          1. 3

                                            So you use this nice user type in your code but at some point your code is supposed to present a list of users (e.g. their facebook friends) to the user on the other side of the HTTP connection. For that, you read out the user data (name, picture, …) and present them, but you also need some identifier to put into the URL that is opened when they click on the friend’s link. Now what?

                                            Of course that data gets sanitized on input (and could be wrapped into a User ID object at that point) again, but still: there’s this number floating around in some shape or form. Doesn’t hurt to keep it outside the “normal” number space to avoid running into funny issues down the road (because, you know, coders make mistakes).

                                            I’ve seen similar advice to start such numbers at 2^53 if there’s a chance that double-by-default languages such as Javascript (or JSON parsers that try to be compliant) mess with them, just so developers see resulting issues immediately rather than at some distant point in time when nobody remembers what’s going on.

                                            It’s simply a very cheap defensive programming technique when you deal with something that carries an ID somewhere.

                                            1. 3

                                              Now what? Now you have some actual security. You lookup a user in a table, you check if the active session user has permissions to do whatever the link is supposed to do - neither of those is different logic if the underlying integer is 8 or 8 billion.

                                              Doesn’t hurt to keep it outside the normal search space to avoid running into funny issues down the road (because, you know, coders make mistakes).

                                              What the fuck is “normal search space”?

                                              No user input should be trusted. Your argument is proof of why this ridiculous theory is bad security theatre - there’s nothing to stop a client sending a request with 8 rather than 2^33 in the parameter that identifies the user. If your application is written well, that shouldn’t matter: regular security/privacy should prevent them from seeing/doing things they shouldn’t. At worst they should get an error message.

                                              Your suggestion implies that they might be able to do/see something they shouldn’t be able to, because e.g. the number they send happens to be the ID of a non-user object.

                                              If that is the case, you’re basically arguing in favour of security by obscurity. If that is not the case, then you’re arguing in favour of security theatre.

                                              So which is it?

                                              1. 4

                                                Your suggestion implies that they might be able to do/see something they shouldn’t be able to, because e.g. the number they send happens to be the ID of a non-user object.

                                                I’m arguing that some coder might factor out some user ID handling code into a piece of work that translates it into a plain integer type and write their functions around that. And then have some coder (maybe the same, but just as clueless) cast values when using those function instead of fixing the mess, and again by mistake, they happen to cast an enum type (which typically cast into the 0..n range for small n). And this is all in a strongly-typed environment: The Phabricator folks use PHP and therefore I’d assume that they write their blog posts for a PHP-using audience, and PHP’s type system provides fewer guarantees (although they’re cleaning up their act. slowly.)

                                                I’d rather have it explode on them then, than give reasonably-looking-at-a-glance data because the CEO thought it’s cool to have UID 1.

                                                As I wrote, coders make mistake.

                                                Counter question: what irritates you so much about simply starting a counter at a large value that you exploded like that (see the expletives in the first post)? It’s a no-cost guard rail that is ideally never needed, but as it costs nothing, and might protect against stupid mistakes (even though it’s pretty weak), why bother?

                                                1. 2

                                                  “It’s a no-cost guard rail that is ideally never needed, but as it costs nothing, and might protect against stupid mistakes (even though it’s pretty weak), why bother?”

                                                  I think the problem is that, as stephenr says in his reply, it’s akin security by obscurity. It’s convincing yourself - or your future self, or whoever looks at this system later, that all’s fine because these numbers are big and therefore we’ve solved the problem. By doing something that ‘might protect’ rather than something that will protect, the problem gets worse, as now we’re lulled into a false sense of security.

                                                  Yes, a system shouldn’t explode because someone wanted UID 1, but the system shouldn’t act like it’s fine for 4 years and then explode because we’ve been comparing UIDs and creation timestamps like that joke signpost that seems to be all over the world (population + height above sea level … total = …) and we’ve only just hit the timestamp and the UID where that mattered.

                                                  Typing is important and what the article advocates is an easy ‘solution’ that’s dangerous and is something software development should have moved past by now. UUIDs/GUIDs are now commonplace and absolutely appropriate for, well, unique identifiers. Namespacing prefixes work reasonably well (UID-123) but are prone to humans making up rules (‘I’ve only ever seen UIDs with 3 digits so therefore if someone tells me they have UID 1 that means I should write UID-001 and an ID of UID-1000 is invalid’).

                                                  1. 2

                                                    I would rather have it explode at them as soon as possible rather than when you reach 2^n users, by which time a fix might be much more difficult both to do and to trace.

                                                    1. 2

                                                      as it costs nothing, and might protect against stupid mistakes, why bother

                                                      Lots of things cost nothing and someone claims “might” do something. I’d rather just do something that does protect against the issue, and ignore the security theatre.

                                                      Edited, @vakradrz makes a good point.

                                                      1. 3

                                                        I agree with your argument (and have made my own reply to the parent comment) but please try to keep it civil, as while it’s an important topic and I’m sure we’ve seen disasters due to such designs, there’s still good intention here and education is more difficult with this kind of tone.

                                                        1. 2

                                                          You make a good point.

                                        2. 4

                                          For me, the most compelling reason to assign IDs as described in the article is so that you can grep your log files for those IDs and probably not get false positives. Even just starting your IDs at 256 means that you won’t get collisions with the components of IPv4 addresses. Starting at 10,000 means you won’t get collisions with the components of IPv6 addresses, starting at 32,769 means you won’t collide with PIDs (depending on how your system is configured), and so on. You can go whole-hog with this and use UUIDs for everything, and then you’re even less likely to have collisions, but that has its own drawbacks.

                                          That being said, I think it’s important to view this as a minor developer affordance and not as a substitute for a type system. If you find yourself doing this because you’re getting loop counters confused with entity IDs… I don’t think making the entity IDs larger is the right solution.

                                          1. 2

                                            … is it really that hard to have your logs reflect the type as well as the ID? How do you grep for anything that isn’t using an artificially large PK?. I’d have thought user:123 was easier to get a valid result against than just 9993939191939393

                                            1. 2

                                              Sure, of course your logs should be clear about the meaning of each piece of information. But if you’re grepping through logs multiple times per day, every day, then not having to type user: each time starts to give you a nontrivial time savings—and, more significantly, it feels like there’s less friction in the process. I’m assuming that the numbers are going to be copied and pasted anyway, which means there isn’t a time difference between grepping for a shorter number or a longer one. And your approach depends on a higher level of consistency in writing log messages than I think is common at most places—if someone leaves out the user: in one particular message, you’re back at grep -w.

                                              1. 3

                                                This logic doesn’t make any sense to me, because in any non-trivial application, users are just one type of thing that you’d want to be able to identify.

                                                I don’t buy the idea that users are some unique thing you’d want to search for, but products, sales, payments, groups, etc etc - whatever the actual business items of the application are - are not equally as important. This is why none of the arguments presented make any sense to me: offsetting one type of object by 2^33 doesn’t solve the same supposed issue for all the other object types you have, so unless your application is so trivial you only have two object types: users and… something else, the offset mechanism is not a useful solution to any of the problems presented, IMO.

                                                1. 2

                                                  I think the idea is that you would offset users by 2^33 (for example), products by 2^34, sales by 2^35, and so on. Of course there are obvious problems with this scheme: if you have more than 2^34 products, the IDs for those are going to start overlapping with the IDs for sales. If you have more than around 30 entities in your system, you won’t be able to offset all of them like this and stay within 64 bits.

                                                  That’s why I think it’s vital to treat this scheme as just a developer affordance and not any kind of data integrity or type safety feature. Like I said, I think this is only really helpful for grepping logs… I think I may disagree with some of the other commenters on that point.

                                                  1. 1

                                                    Your comment actually made me go back and re-read the article. I think you’re right that they are talking about all objects, not just users specifically, but their reasoning seems to be essentially, what you alluded to before:

                                                    If you find yourself doing this because you’re getting loop counters confused with entity IDs… I don’t think making the entity IDs larger is the right solution.

                                                    In the example given, getting a list of users returns an associative array using integer IDs as the key, and boolean true as the value. They then proceed to call array_slice without setting the preserve_keys flag to true, and get back a 0-indexed array of boolean true.

                                                    I wouldn’t be surprised if this is a real world example from Facebook, given some of the absolutely garbage examples shown in the leaked dumps of the codebase from several years ago - but to use this ridiculous pattern as a reason for starting your object IDs at 34 billion, is beyond stupid.

                                                    1. 2

                                                      Agreed. I ended up re-reading the article too, and apparently the “larger IDs make searching logs easier” point was something I made up; the article didn’t say that. None of the reasons they give for making the IDs larger seem like sound engineering to me.

                                          2. 2

                                            This isn’t meant to be a primary way to distinguish valid IDs from random numbers, but as a defense in depth in case you screw up your code.

                                            Most likely you’re going to need to work with SQL, JSON, URLs, and other places where you’ll have to put an untyped number. Newtype in the language doesn’t help in cases like this:

                                            get_friends(new UserID($_GET['photo_id']))
                                            
                                            1. 2

                                              So what happens when someone changes your URL from ?uid=9809890809809890 to ?uid=9.

                                              SQL of all places is a ridiculous example. Are you searching every table looking for a PK match?

                                              1. 3

                                                I think you’re still reframing this as if it was meant to be a security measure or some kind of bullet-proof protection. It’s not. It’s a “lint” that may help catch a programmer’s error. It is not intended to catch nor detect any outside interference.

                                                I’ve chosen SQL, because SQL doesn’t accept PHP types as arguments (unless you implement a very fancy type-safe ORM, I guess?). There could be mistakes like misaligning ? placeholders and their values, or selecting columns in a wrong order. Even when you only use named placeholders and fetch rows as assoc arrays, if you join multiple tables with an id column, you might accidentally pick the wrong one. Bugs can happen. The trick is about making such bugs fail louder, sooner.

                                                1. 2

                                                  The trick is about making such bugs fail louder, sooner.

                                                  If your developers don’t notice that their piece of code is returning the wrong user, I honestly don’t think they’ll notice that it’s not returning any user, because they’re clearly not testing what they write, even in the most basic of “I tried this once on my local machine” sense.

                                            2. 1

                                              Yeah this is weird. If am doing ‘get friends’ I want a list of friends, not integers.

                                            1. 17

                                              TLDR:

                                              1. Start with big IDs so you don’t enter in PHP weird behaviors.
                                              2. Use UTF-8, always.
                                              3. Allow-list system with deny by default for anything related to security.
                                              4. Monitor your SQL queries.
                                              1. 5

                                                The one time I read the article before the comments…

                                                1. 2

                                                  Monitor your SQL queries.

                                                  The idea that SQL queries are a special kind of error is really weird. You should be reporting all your errors that aren’t caused by bad user input or network hiccups.

                                                  The allow-list bit is a good call but the rest seems oddly language-specific. Starting with big IDs implies you’re using integers, which is not a great idea to begin with; stick with UUIDs and the whole problem isn’t even possible.

                                                  1. 3

                                                    The idea that SQL queries are a special kind of error is really weird.

                                                    Because SQL syntax errors indicate injection bugs, and SQL injection bugs can exploited to accomplish literally anything. The OP isn’t talking about queries that are supposed to return exactly one result returning more than one, which are also bugs, but probably not exploitable to completely pwn the database.

                                                    1. 1

                                                      The OP isn’t talking about queries that are supposed to return exactly one result returning more than one, which are also bugs […]

                                                      If it’s a bug … why are you ignoring it? You should … fix it instead?

                                                      1. 2

                                                        If a bug exists but nobody discovers it, is it truly a bug?

                                                        1. 2

                                                          If a bug exists and all your users find out about it before you do, are you really a professional?

                                                          1. 1

                                                            Precisely. I’m sure I don’t need to tell a fellow professional how this kind of disparate-parity bug can quietly develop over time.

                                                        2. 2

                                                          Because I need to fix the highest severity bugs first, and SQL injections are much higher severity.

                                                      2. 2

                                                        What’s wrong with using integers as keys? If you want to make sure that two integers with different semantic meanings are never confused for each other, you should be relying on a type system, not the improbability of UUID collisions. (Unless I’ve mistaken your point?)

                                                        1. 2

                                                          Integers are enumerable, while (some kinds of) UUIDs are not.

                                                          1. 1

                                                            What’s the problem with enumerability—that someone might be able to guess a valid ID?

                                                            Which kinds of UUIDs aren’t enumerable?

                                                            1. 3

                                                              You can technically enumerate UUIDs, but considering how many there (quite a few) are and how they are generated (randomly for version 4 UUIDs), it may take you some time.

                                                    1. 6

                                                      I really like the writing. Hope to see more from you!

                                                      I’ve always (ie. the couple of times I’ve played with x86 asm) preferred the Intel syntax and wondered why GNU Assembler defaults to this weird backwards syntax. It’s good to have the opinions of someone more experienced to lean on.

                                                      The great hacker tradition of voicing strong opinions via long, thorough rants has been waning in the last decade(s?). This is a pity, because learning from strong opinions seems a prerequisite of well thought out moderate opinions.

                                                      1. 2

                                                        I really like the writing. Hope to see more from you!

                                                        Agreed! I hope OP will consider adding an RSS/Atom/JSON feed for their blog.

                                                      1. 20

                                                        Python package maintainers rarely use semantic versioning and often break backwards compatibility in minor releases. One of several reasons that dependency management is a nightmare in Python world.

                                                        1. 18

                                                          I generally consider semantic versioning to be a well-intentioned falsehood. I don’t think that package vendors can have effective insight into which of their changes break compatibility when they can’t have a full bottom-up consumer graph for everyone who uses it.

                                                          I don’t think that Python gets this any worse than any other language.

                                                          1. 20

                                                            I’ve heard this opinion expressed before… I find it to be either dangerously naive or outright dishonest. There’s a world of difference between a) the rare bug fix release or nominally-orthogonal-feature-add release that unintentionally breaks downstream code and b) intentionally changing and deprecating API’s in “minor” releases.

                                                            In my view, adopting SemVer is a statement of values and intention. It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                                                            1. 18

                                                              In my view, adopting SemVer is a statement of values and intention. It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                                                              A “statement of values and intention” carries no binding commitment. And the fact that you have to hedge with “as much as is reasonably possible” and “only knowingly break” kind of gives away what the real problem is: every change potentially alters the observable behavior of the software in a way that will break someone’s reliance on the previous behavior, and therefore the only way to truly follow SemVer is to increment major on every commit. Which is the same as declaring the version number to be meaningless, since if every change is a compatibility break, there’s no useful information to be gleaned from seeing the version number increment.

                                                              And that’s without getting into some of my own direct experience. For example, I’ve been on the Django security team for many years, and from time to time someone has found a security issue in Django that cannot be fixed in a backwards-compatible way. Thankfully fewer of those in recent years since many of them related to weird old functionality dating to Django’s days as a newspaper CMS, but they do happen. Anyway, SemVer’s answer to this is “then either don’t fix it, or do but no matter how you fix it you’ve broken SemVer and people on the internet will scream at you and tell you that you ought to be following SemVer”. Not being a fan of no-win situations, I am content that Django has never and likely never will commit to following SemVer.

                                                              1. 31

                                                                A “statement of values and intention” carries no binding commitment.

                                                                A label on a jar carries no binding commitment to the contents of the jar. I still appreciate that my salt and sugar are labelled differently.

                                                                1. 2

                                                                  Selling the jar with that label on it in many countries is a binding commitment and puts you under the coverage of food safety laws, though.

                                                                2. 6

                                                                  Anyway, SemVer’s answer to this is “then either don’t fix it, or do but no matter how you fix it you’ve broken SemVer and people on the internet will scream at you and tell you that you ought to be following SemVer”.

                                                                  What do you mean? SemVer’s answer to “this bug can’t be fixed in a backwards-compatible way” is to increment the major version to indicate a breaking change. You probably also want to get the message across to your users by pushing a new release of the old major version which prints some noisy “this version of blah is deprecated and has security issues” messages to the logs.

                                                                  It’s not perfect, I’m not saying SemVer is a silver bullet. I’m especially worried about the effects of basing automated tooling on the assumption that no package would ever push a minor or patch release with a breaking change; it seems to cause ecosystems like the NPM to be highly fragile. But when taken as a statement of intent rather than a guarantee, I think SemVer has value, and I don’t understand why you think your security issue anecdote requires breaking SemVer.

                                                                  1. 7

                                                                    What do you mean? SemVer’s answer to “this bug can’t be fixed in a backwards-compatible way” is to increment the major version to indicate a breaking change.

                                                                    So, let’s consider Django, because I know that well (as mentioned above). Typically Django does a feature release (minor version bump) every 8 months or so, and every third one bumps the major version and completes a deprecation cycle. So right now Django 3.1 is the latest release; next will be 3.2 (every X.2 is an LTS), then 4.0.

                                                                    And the support matrix consists of the most recent feature release (full bugfix and security support), the one before that (security support only), and usually one LTS (but there’s a period at the end of each where two of them overlap). The policy is that if you run on a given LTS with no deprecation warnings issued from your code, you’re good to upgrade to the next (which will be a major version bump; for example, if you’re on 2.2 LTS right now, your next LTS will be 3.2).

                                                                    But… what happens when a bug is found in an LTS that can’t be fixed in a backwards-compatible way? Especially a security issue? “Support for that LTS is cut off effective immediately, everybody upgrade across a major version right now” is a non-starter, but is what you propose as the correct answer. The only option is to break SemVer and do the backwards-incompatible change as a bugfix release of the LTS. Which then leads to “why don’t you follow SemVer” complaints. Well, because following SemVer would actually be worse for users than this option is.

                                                                    1. 3

                                                                      But… what happens when a bug is found in an LTS that can’t be fixed in a backwards-compatible way?

                                                                      Why do people run an LTS version, if not for being able to avoid worrying about it as a dependency? If you’re making incompatible changes: forget about semver, you’re breaking the LTS contract, and you may as well tell drop the LTS tag and people to run the latest.

                                                                      1. 1

                                                                        you may as well tell drop the LTS tag and people to run the latest

                                                                        I can think of only a couple instances in the history of Django where it happened that a security issue couldn’t be fixed in a completely backwards-compatible way. Minimizing the breakage for people – by shipping the fix into supported releases – was the best available option. It’s also completely incompatible with SemVer, and is a great example of why SemVer is at best a “nice in theory, fails in practice” idea.

                                                                        1. 3

                                                                          Why not just tell them to upgrade? After all, your argument is essentially that stable APIs are impossible, so why bother with LTS? Every argument against semver also applies against LTS releases.

                                                                          1. 3

                                                                            After all, your argument is essentially that stable APIs are impossible

                                                                            My argument is that absolute perfect 100% binding commitment to never causing a change to observable behavior ever under any circumstance, unless also incrementing the major version at the same time and immediately dropping support for all users of previous versions, is not practicable in the real world, but is what SemVer requires. Not committing to SemVer gives flexibility to do things like long-term support releases, and generally people have been quite happy with them and also accepting of the single-digit number of times something had to change to fix a security issue.

                                                                      2. 2

                                                                        “Support for that LTS is cut off effective immediately, everybody upgrade across a major version right now” is a non-starter

                                                                        If it’s a non-starter then nobody should be getting the critical security patch. You’re upgrading from 2.2 to 3.0 and calling it 2.2.1 instead. That doesn’t change the fact that a breaking change happened and you didn’t bump the major version number.

                                                                        You can’t issue promises like “2.2.X will have long term support” because that’s akin to knowing the future. Use a codename or something.

                                                                        1. 7

                                                                          It’s pretty clear you’re committed to perfect technical adherence to a rule, without really giving consideration to why the rule exists. Especially if you’re at the point of “don’t commit to supporting things, because supporting things leads to breaking SemVer”.

                                                                          1. 4

                                                                            They should probably use something like SemVer but with four parts, e.g. Feature.Major.Minor.Patch

                                                                            • Feature version changes -> We’ve made significant changes / a new release (considered breaking)
                                                                            • Major version change -> We’ve made breaking changes
                                                                            • Minor version change -> Non breaking new features
                                                                            • Patch version change -> Other non-breaking changes

                                                                            That way 2.*.*.* could be an LTS release, which would only get bug fixes, but if there was an unavoidable breaking change to fix a bug, you’d signal this in the version by e.g. going from 2.0.5.12 to 2.1.0.0. Users will have to deal with the breaking changes required to fix the bug, but they don’t have to deal with all the other major changes which have gone into the next ‘Feature’ release, 3.*.*.*. The promise that 2.*.*.*, as an LTS, will get bug fixes is honored. The promise that the major version must change on a breaking change is also honored.

                                                                            SemVer doesn’t work if you try to imbue the numbers with additional meanings that can contradict the SemVer meanings.

                                                                            1. 3

                                                                              This scheme is very similar to Haskell’s Package Versioning Policy (PVP).

                                                                            2. 1

                                                                              I’m saying supporting things and adhering to SemVer should be orthogonal.

                                                                      3. 5

                                                                        every change potentially alters the observable behavior of the software

                                                                        This is trivially false. Adding a new helper function to a module, for example, will never break backwards compatibility.

                                                                        In contrast, changing a function’s input or output type is always a breaking change.

                                                                        By failing to even attempt to distinguish between non-breaking and breaking changes, you’re offloading work onto the package’s users.

                                                                        Optimize for what should be the common case: non-breaking changes.

                                                                        Edit: to expand on this, examples abound in the Python ecosystem of unnecessary and intentional breaking changes in “minor” releases. Take a look at the numpy release notes for plenty of examples.

                                                                        1. 7

                                                                          Python’s dynamic nature makes “adding a helper function” a potentially breaking change. What if someone was querying, say, all definitions of a module and relying on the length somehow? I know this is a bit of a stretch, but it is possible that such a change would break code. I still value semver though.

                                                                          1. 3

                                                                            The number of definitions in a module is not a public API. SemVer only applies to public APIs.

                                                                            1. 4

                                                                              If you can access it at run-time, then someone will depend on it, and it’s a bit late to call it “not public”. Blame Python for exposing stuff like the call stack to introspection.

                                                                              1. 2

                                                                                Eh no? SemVer is very clear about this. Public API is whatever software declares it to be. Undeclared things can’t be public API, by definition.

                                                                                1. 7

                                                                                  Python has no concept of public vs private. It’s all there all the time. As they say in python land, “We’re all consenting adults here”.

                                                                                  I’m sure, by the way, when Hettinger coined that phrase he didn’t purposely leave out those under the age of 18. Language is hard. :P

                                                                          2. 1

                                                                            Adding a new helper function to a module, for example, will never break backwards compatibility.

                                                                            Does this comic describe a violation of SemVer?

                                                                            You seriously never know what kinds of things people might be relying on, and a mere definition of compatibility in terms of input and output types is woefully insufficient to capture the things people will expect in terms of backwards compatibility.

                                                                            1. 6

                                                                              No, it does not descripbe a violation of SemVer, because spacebar heating is not a public API. SemVer is very clear about this. You are right people will still complain about backward compatibility even if you are keeping 100% correct SemVer.

                                                                        2. 6

                                                                          I would agree if violations were rare. Every time I’ve tried to solve dependency issues on Python, about 75% of the packages I look into have broken semver on some level. Granted, I probably have a biased sampling technique, but I find it extremely hard to believe that it’s a rare issue.

                                                                          Backwards compatibility is hard to reason about, and the skill is by no means pervasive. Even having a lot of experience looking for compatibility breaks, I still let things slip, because it can be hard to detect. One of my gripes with semver is that it doesn’t scale. It assumes that tens of thousands of open source devs with no common training program or management structure all understand what a backwards breaking change is, and how to fix it.

                                                                          Testing for compatibility breaks is rare. I can’t think of any Python frameworks that help here. Nor can I think of any other languages that address this (Erlang might, but I haven’t worked with it first-hand). The most likely projects to test for compatibility between releases are those that manage data on disk or network packets. Even among those, many rely on code & design review to spot issues.

                                                                          It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                                                                          It’s more likely that current package managers force you into semver regardless if you understand how it’s supposed to be used. The “statement of values” angle is appealing, but without much evidence. Semver is merely popular.

                                                                          1. 7

                                                                            I guess this depends on a specific ecosystem? Rust projects use a lot of dependencies, all those deps use semver, and, in practice, issues rarely arise. This I think is a combination of:

                                                                            • the fact that semver is the only option in Rust
                                                                            • the combination of guideline to not commit Cargo.lock for libraries + cargo picking maximal versions by default. This way, accidental incompatibilities are quickly discovered & packages are yanked.
                                                                            • the guideline to commit Cargo.lock for binaries and otherwise final artifacts: that way folks who use Rust and who have the most of deps are shielded from incompatible updates.
                                                                            • the fact that “library” is a first-class language construct (crate) and not merely a package manager convention + associated visibility rules makes it easier to distinguish between public & private API.
                                                                            • Built-in support for writing test from the outside, as-if you are consumer of the library, which also catches semver-incompatible changes.

                                                                            This is not to say that semver issues do not happen, just that they are rare enough. I’ve worked with Rust projects with 200-500 different deps, and didn’t pensive semver breakage being a problem.

                                                                            1. 5

                                                                              I would add that the Rust type system is expressive enough that many backwards incompatible changes require type signature changes which are much more obvious than violations of some implicit contract.

                                                                          2. 6

                                                                            I don’t think I have a naïve view of versioning; putting on my professional hat here, I have a decade of experience dealing with a dependency modeling system that handles the versions of hundreds of thousands of interrelated software artifacts that are versioned more or less independently of each other, across dozens of programming languages and runtimes. So… some experience here.

                                                                            In all of this time, I’ve seen every single kind of breaking change I could imagine beforehand, and many I could not. They occurred independent of how the vendor of the code thought of it; a vendor of a versioned library might think that their change is minor, or even just a non-impacting patch, but outside of pure README changes, it turns out that they can definitely be wrong. They certainly had good intentions to communicate the nature of the change, but that intention can run hard into reality. In the end, the only way to be sure is to pin your dependencies, all the way down, and to test assiduously. And then upgrade them frequently, intentionally, and on a cadence that you can manage.

                                                                            1. 1

                                                                              I don’t think I have a naïve view of versioning; putting on my professional hat here, I have a decade of experience dealing with …

                                                                              Here here. My experience isn’t exactly like @offby1’s but I can vouch for the rest.

                                                                            2. 4

                                                                              to be either dangerously naive or outright dishonest

                                                                              This phrase gets bandied around the internet so much I’m surprised its not a meme.

                                                                              SemVer is … okay, but you make it sound like lives depend on it. There’s a lot of software running mission critical systems without using SemVer and people aren’t dying everyday because of it. I think we can calm down.

                                                                          3. 3

                                                                            Thats the problem of the package management being so old. Back then semantic versioning wasnt that common and it never really caught on. In my opinion the PyPA should make a push to make more packages use semantic versioning. I‘m seeing this trend already, but its too slow…

                                                                          1. 35

                                                                            e-mail has a lot of legacy cruft. Regardless of the technical merits of e-mail or Telegram or Delta Chat, Signal, matrix.org or whatever, what people need to be hearing today is “WhatsApp and Facebook Messenger are unnecessarily invasive. Everyone is moving to X.” If there isn’t a clear message on what X is, then people will just keep on using WhatsApp and Facebook Messenger.

                                                                            It seems clear to me that e-mail is not the frontrunner for X, so by presenting it as a candidate for replacing WhatsApp and Facebook Messenger, I think the author is actually decreasing the likelihood that most people will migrate to a better messaging platform.

                                                                            My vote is for Signal. It has good clients for Android and iOS and it’s secure. It’s also simple enough that non-technical people can use it comfortably.

                                                                            1. 26

                                                                              Signal is a silo and I dislike silos. That’s why I post on my blog instead of Twitter. What happens when someone buys Signal, the US government forces Signal to implement backdoors or Signal runs out of donation money?

                                                                              1. 10

                                                                                Signal isn’t perfect. My point is that Signal is better than WhatsApp and that presenting many alternatives to WhatsApp is harmful to Signal adoption. If Signal can’t reach critical mass like WhatsApp has it will fizzle out and we will be using WhatsApp again.

                                                                                1. 12

                                                                                  If Signal can’t reach critical mass like WhatsApp has it will fizzle out

                                                                                  Great! We don’t need more silos.

                                                                                  and we will be using WhatsApp again.

                                                                                  What about XMPP or Matrix? They can (and should!) be improved so that they are viable alternatives.

                                                                                  1. 13

                                                                                    (Majority of) People don’t care about technology (how), they care about goal (why).

                                                                                    They don’t care if it’s Facebook, Whatsapp, Signal, Email, XMPP, they want to communicate.

                                                                                    1. 14

                                                                                      Yeah, I think the point of the previous poster was that these systems should be improved to a point where they’re just really good alternatives, which includes branding and the like. Element (formerly riot.im) has the right idea on this IMHO, instead of talking about all sorts of tech details and presenting 500 clients like xmpp.org, it just says “here are the features element has, here’s how you can use it”.

                                                                                      Of course, die-hard decentralisation advocates don’t like this. But this is pretty much the only way you will get any serious mainstream adoption as far as I can see. Certainly none of the other approaches that have been tried over the last ~15 years worked.

                                                                                      1. 7

                                                                                        …instead of talking about all sorts of tech details and presenting 500 clients like xmpp.org, it just says “here are the features element has, here’s how you can use it”.

                                                                                        Same problem with all the decentralized social networks and microblogging services. I was on Mastodon for a bit. I didn’t log in very often because I only followed a handful of privacy advocate types since none of my friends or other random people I followed on Twitter were on it. It was fine, though. But then they shut down the server I was on and apparently I missed whatever notification was sent out.

                                                                                        People always say crap like “What will you do if Twitter shuts down?”. Well, so far 100% of the federated / distributed social networks I’ve tried (I also tried that Facebook clone from way back when and then Identi.ca at some point) have shut down in one way or another and none of the conventional ones I’ve used have done so. I realize it’s a potential problem, but in my experience it just doesn’t matter.

                                                                                        1. 4

                                                                                          The main feature that cannot be listed in good faith and which is the one that everybody cares about is: “It has all my friend and family on it”.

                                                                                          I know it’s just a matter of critical mass and if nobody switches this will never happen.

                                                                                        2. 1

                                                                                          Sure, but we’re not the majority of people.. and we shouldn’t be choosing yet another silo to promote.

                                                                                        3. 5

                                                                                          XMPP and (to a lesser extent) Matrix do need to be improved before they are viable alternatives, though. Signal is already there. You may feel that ideological advantages make up for the UI shortcomings, but very few nontechnical users feel the same way.

                                                                                          1. 1

                                                                                            Have you tried joining a busy Matrix channel from a federated homeserver? It can take an hour. I think it needs some improvement too.

                                                                                            1. 2

                                                                                              Oh, definitely. At least in the case of Matrix it’s clear that (1) the developers regard usability as an actual goal, (2) they know their usability could be improved, and (3) they’re working on improving it. I admit I don’t follow the XMPP ecosystem as closely, so the same could be the same there, but… XMPP has been around for 20 years, so what’s going to change now to make it more approachable?

                                                                                          2. 4

                                                                                            […] it will fizzle out

                                                                                            Great! We don’t need more silos.

                                                                                            Do you realize you’re cheering for keeping the WhatsApp silo?

                                                                                            Chat platforms have a strong network effect. We’re going to be stuck with Facebook’s network for as long as other networks are fragmented due to people disagreeing which one is the perfect one to end all other ones, and keep waiting for a pie in the sky, while all of them keep failing to reach the critical mass.

                                                                                            1. 1

                                                                                              Do you realize you’re cheering for keeping the WhatsApp silo?

                                                                                              Uh, not sure how you pulled that out of what I said, but I’m actually cheering for the downfall of all silos.

                                                                                              1. 2

                                                                                                I mean that by opposing the shift to the less-bad silo you’re not actually advancing the no-silo case, but keeping the status quo of the worst-silo.

                                                                                                There is currently no decentralized option that is secure, practical, and popular enough to be adopted by mainstream consumers in numbers that could beat WhatsApp.

                                                                                                If the choice is between WhatsApp and “just wait until we make one that is”, it means keeping WhatsApp.

                                                                                            2. 3

                                                                                              They can be improved so that they are viable alternatives.

                                                                                              Debatable.

                                                                                              Great! We don’t need more silos.

                                                                                              Domain-name federation is a half-assed solution to data portability. Domain names basically need to be backed by always-on servers, not everybody can have one, and not everybody should. Either make it really P2P (Scuttlebutt?) or don’t bother.

                                                                                              1. 2

                                                                                                I sadly agree, which is why logically I always end up recommend signal as ‘the best of a bad bunch’.

                                                                                                I like XMPP, but for true silo-avoidance you need you run your own server (or at least have someone run it under your domain, so you can move away). This sucks. It’s sort of the same with matrix.

                                                                                                The only way around this is real p2p as you say. So far I haven’t seen anything that I could recommend to former whatsapp users on this front however. I love scuttlebutt but I can’t see it as a good mobile solution.

                                                                                            3. 8

                                                                                              Signal really needs a “web.signal.com”; typing on phones suck, and the destop app is ugh. I can’t write my own app either so I’m stuck with two bad options.

                                                                                              This is actually a big reason I like Telegram: the web client is pretty good.

                                                                                              1. 3

                                                                                                I can’t write my own app either so I’m stuck with two bad options.

                                                                                                FWIW I’m involved with Whisperfish, the Signal client for Sailfish OS. There has been a constant worry about 3rd party clients, but it does seem like OWS has loosened its policy.

                                                                                                The current Whisperfish is written in Rust, with separate libraries for the protocol and service. OWS is also putting work into their own Rust library, which we may switch to.

                                                                                                Technically you can, and the risk should be quite minimal. At the end of the, as OWS doesn’t support these efforts, and if you don’t make a fool of them, availability and use increases their brand value.

                                                                                                Don’t want to know what happens if someone writes a horrible client and steps on their brand, so let’s be careful out there.

                                                                                                1. 2

                                                                                                  Oh right; that’s good to know. I just searched for “Signal API” a while ago and nothing really obvious turned up so I assumed it’s either impossible or hard/hackish. To be honest I didn’t look very deeply at it, since I don’t really care all that much about Signal that much 😅 It’s just a single not-very-active chatgroup.

                                                                                                  1. 1

                                                                                                    Fair enough, sure. An API might sound too much like some raw web thing - it is based on HTTPS after all - but I don’t think all of it would be that simple ;)

                                                                                                    The work gone into the libraries has not been trivial, so if you do ever find yourself caring, I hope it’ll be a happy surprise!

                                                                                                2. 2

                                                                                                  The Telegram desktop client is even better than the web client.

                                                                                                  1. 3

                                                                                                    I don’t like desktop clients.

                                                                                                    1. 4

                                                                                                      Is there a specific reason why? The desktop version of Telegram is butter smooth and has the same capabilities as the phone version (I’m pretty sure they’re built from the same source as well).

                                                                                                      1. 3

                                                                                                        Security is the biggest reason for me. Every other week, you hear about a fiasco where a desktop client for some communication service had some sort of remote code execution vulnerability. But there can be other reasons as well, like them being sloppy with their .deb packages and messing up with my update manager etc. As a potential user, I see no benefit in installing a desktop client over a web client.

                                                                                                        1. 4

                                                                                                          Security is the reason that you can’t easily have a web-based Signal client. Signal is end-to-end encrypted. In a web app, it’s impossible to isolate the keying material from whoever provides the service so it would be trivial for Signal to intercept all of your messages (even if they did the decryption client-side, they could push an update that uploads the plaintext after decryption).

                                                                                                          It also makes targeted attacks trivial: with the mobile and desktop apps, it’s possible to publish the hash that you get for the download and compare it against the versions other people run, so that you can see if you’re running a malicious version (I hope a future version of Signal will integrate that and use it to validate updates before it installs them by checking that other users in your network see the same series of updates). With a web app, you have no way of verifying that you’re running the same code that you were one page refresh ago, let alone the same code as someone else.

                                                                                                          1. 1

                                                                                                            A web based client has no advantages with regards to security. They are discrete topics. As a web developer, I would argue that a web based client has a significantly larger surface area for attacks.

                                                                                                            1. 1

                                                                                                              When I say security, I don’t mean the security of my communications over that particular application. That’s important too, but it’s nothing compared to my personal computer getting hacked, which means my entire digital life getting compromised. Now you could say a web site could also hijack my entire computer by exploiting weaknesses in the browser, which is definitely a possibility, but that’s not what we hear every other week. We hear stupid zoom or slack desktop client containing a critical remote code execution vulnerability that allows a completely unrelated third party complete access to your computer.

                                                                                                          2. 1

                                                                                                            I just don’t like opening a new window/application. Almost all of my work is done with one terminal window (in tmux, on workspace 1) and a browser (workspace 2). This works very well for me as I hate dealing with window management. Obviously I do open other applications for specific purposes (GIMP, Geeqie, etc) but I find having an extra window just to chat occasionally is annoying. Much easier to open a tab in my browser, send my message, and close it again.

                                                                                                  2. 3

                                                                                                    The same thing that’s happening now with whatsapp - users move.

                                                                                                    1. 2

                                                                                                      A fraction of users is moving, the technically literate ones. Everyone else stays where their contacts are, or which is often the case, installs another messenger and then uses n+1.

                                                                                                      1. 2

                                                                                                        A fraction of users is moving, the technically literate ones

                                                                                                        I don’t think that’s what’s happening now. There have been a lot of mainstream press articles about WhatsApp. The technical users moved to Signal when Facebook bought WhatsApp, I’m now hearing non-technical folks ask what they should migrate to from WhatsApp. For example, one of our administrators recently asked about Signal because some of her family want to move their family chat there from WhatsApp.

                                                                                                        1. 1

                                                                                                          Yeah these last two days I have been asked a few times about chat apps. I have also noticed my signal contacts list expand by quite a few contacts, and there are lots of friends/family who I would not have expected to make the switch in there. I asked one family member, a doctor, what brought her in and she said that her group of doctors on whatsapp became concerned after the recent announcements.

                                                                                                          I wish I could recommend xmpp/OMEMO, but it’s just not as easy to set up. You can use conversations.im, and it’s a great service, but if you are worried about silos you are back to square one if you use their domain. They make using a custom domain as friction-free as possible but it still involves DNS settings.

                                                                                                          I feel the same way about matrix etc. Most people won’t run their own instance, so you end up in a silo again.

                                                                                                          For the closest thing to whatsapp, I have to recommend Signal. It’s not perfect, but it’s good. I wish you didn’t have to use a phone number…

                                                                                                    2. 2

                                                                                                      What happens when someone buys Signal, the US government forces Signal to implement backdoors or Signal runs out of donation money?

                                                                                                      Not supporting signal in any way, but how would your preferred solution actually mitigate those risks?

                                                                                                      1. 1

                                                                                                        Many different email providers all over the world and multiple clients based on the same standards.

                                                                                                        1. 6

                                                                                                          Anyone who has written email software used at scale by the general public can tell you that you will spend a lot of time working around servers and clients which do all sorts of weird things. Sometimes with good reasons, often times with … not so good reasons. This sucks but there’s nothing I can change about that, so I’ll need to deal with it.

                                                                                                          Getting something basic working is pretty easy. Getting all emails handled correctly is much harder. Actually displaying all emails well even harder still. There’s tons of edge cases.

                                                                                                          The entire system is incredibly messy, and we’re actually a few steps up from 20 years ago when it was even worse.

                                                                                                          And we still haven’t solved the damn line wrapping problem 30 years after we identified it…

                                                                                                          Email both proves Postel’s law correct and wrong: it’s correct in the sense that it does work, it’s wrong because it takes far more time and effort than it really needs to.

                                                                                                          1. 2

                                                                                                            I hear you (spent a few years at an ESP). It’s still better than some siloed walled garden proprietary thing that looks pretty but could disappear for any reason in a moment. The worst of all worlds except all others.

                                                                                                            1. 2

                                                                                                              could disappear for any reason in a moment

                                                                                                              I’m not so worried about this; all of these services have been around for ages and I’m not seeing them disappear from one day to the next in the foreseeable future. And even if it does happen: okay, just move somewhere else. It’s not even that big of a deal.

                                                                                                              1. 1

                                                                                                                Especially with chat services. There’s not that much to lose. Your contacts are almost always backed up elsewhere. I guess people value their chat history more than I do, however.

                                                                                                    3. 11

                                                                                                      My vote is for Signal. It has good clients for Android and iOS and it’s secure. It’s also simple enough that non-technical people can use it comfortably.

                                                                                                      I’ve recently started using it, and while it’s fine, I’m no fan. As @jlelse, it is another closed-off platform that you have to use, making me depend on someone else.

                                                                                                      They seem to (as of writing) prioritize “security” over “user freedom”, which I don’t agree with. There’s the famous thread, where they reject the notion of distributing Signal over F-Droid (instead having their own special updater, in their Google-less APK). What also annoys me is that their desktop client is based on Electron, which would have been very hard for me to use before upgrading my desktop last year.

                                                                                                      1. 6

                                                                                                        My vote is for Signal. It has good clients for Android and iOS and it’s secure. It’s also simple enough that non-technical people can use it comfortably.

                                                                                                        What I hate about signal is that it requires a mobile phone and an associated phone number. That makes it essentially useless - I loathe mobile phones - and very suspect to me. Why can’t the desktop client actually work?

                                                                                                        1. 2

                                                                                                          I completely agree. At the beginning of 2020 I gave up my smartphone and haven’t looked back. I’ve got a great dumb phone for voice and SMS, and the occasional photo. But now I can’t use Signal as I don’t have a mobile device to sign in to. In a word where Windows, Mac OS, Linux, Android, and iOS all exist as widely used operating systems, Signal is untenable as it only as full featured clients for two of these operating systems.

                                                                                                          Signal isn’t perfect.

                                                                                                          This isn’t about being perfect, this is about being accessible to everyone. It doesn’t matter how popular it becomes, I can’t use it.

                                                                                                          1. 1

                                                                                                            What I hate about signal is that it requires a mobile phone and an associated phone number.

                                                                                                            On the bright side, Signal’s started to use UUIDs as well, so this may change. Some people may think it’s gonna be too late whenever it happens, if it does, but at least the protocols aren’t stagnant!

                                                                                                            1. 1

                                                                                                              They’ve been planning on fixing that for a while, I don’t know what the status is. The advantage of using mobile phone numbers is bootstrapping. My address book is already full of phone numbers for my contacts. When I installed Signal, it told me which of them are already using it. When other folks joined, I got a notification. While I agree that it’s not a great long-term strategy, it worked very well for both WhatsApp and Signal to quickly bootstrap a large connected userbase.

                                                                                                              In contrast, most folks XMPP addresses were not the same as their email addresses and I don’t have a lot of email addresses in my address book anyway because my mail clients are all good at autocompleting them from people who have sent me mail before, so I don’t bother adding them. As a result, my Signal contact list was instantly as big as my Jabber Roster became after about six months of trying to get folks to use Jabber. The only reason Jabber was useable at all for me initially was that it was easy to run an ICQ bridge so I could bring my ICQ contacts across.

                                                                                                              1. 1

                                                                                                                Support for using it without a phone number remains a work in progress. The introduction of PINs was a stepping stone towards that.

                                                                                                          1. 2

                                                                                                            Yes, that graph is showing in gigabytes. We’re so lucky that bandwidth is free on Hetzner.

                                                                                                            But it says 300 Mil on the left. And “bytes” on top. So I guess Mil stands for million, and 300 million bytes is 300 decimal megabytes, not gigabytes, unless my math is all wrong. Is my math all wrong?

                                                                                                            1. 1

                                                                                                              You’re correct that 300 million bytes is 300 MB (or around 286 MiB).

                                                                                                              1. 1

                                                                                                                My bad. I was reading the cloudflare graph when I wrote that. I think I uploaded the wrong image to Twitter. Oops. I’ll fix it.

                                                                                                                1. 1

                                                                                                                  I think nevertheless your scale of “this could get expensive” would only be right if you were on a very expensive provider like google cloud. Or maybe if this were 15 years ago. Hetzner predicts that 20TB/mo is completely normal, and you are nowhere near that! A gigabyte in 2021 is a small thing.

                                                                                                                  Of course, it’s fine to plan very much ahead and optimize things, but maybe this will give people the wrong idea that it’s absolutely necessary to put cloudflare or a caching cdn in front of their website, or cut down RSS feeds. When even at your great level of popularity, it isn’t really needed.

                                                                                                            1. 10

                                                                                                              It feels like this article is trying to have it both ways regarding encryption and compatibility. It touts it as a feature that “Delta Chat encrypts messages automatically” but then says that “Delta Chat allows you to communicate even with people who don’t use Delta Chat at all, all you need is an email address!” That’s great, but it implies not only that (1) Delta Chat can be used in an unencrypted mode, which is not even possible with e.g. Signal, but also that (2) people are being encouraged to use it in an unencrypted mode while also being told that the service is encrypted. Is there UI clarifying which connections are encrypted and which are cleartext? If not, technical users will understand that if you send an email to some arbitrary address then of course it’s going to be unencrypted, but ordinary users are being set up for failure.

                                                                                                              The article seems to allude to this when it says that “it would be advisable not to use Gmail, Outlook or GMX as email service”. You can have an encrypted chat channel, or you can have a backward-compatible unencrypted channel that your email provider can read, but you can’t have both. (Or if you can, then this article doesn’t explain how that would work.)

                                                                                                              Edit: I feel bad about posting such a negative comment. I too have been annoyed at the proliferation of incompatible messaging services. I like federation, and I like the idea that someone would be able to use this without creating yet another account. But I’m worried that people are going to misunderstand the limits of Delta Chat’s opportunistic encryption.

                                                                                                              1. 3

                                                                                                                There’s UI to show whether a chat is encrypted or not, encrypted messages will have a lock symbol. But you’re right, it’s possible to use it without encryption too. There are three options: 1. Message someone without support for autocrypt - the chat will be unencrypted, 2. Message someone with autocrypt (i.e. DeltaChat) - only the first message will be unencrypted, 3. Scan someone’s QR code - all messages will be encrypted.

                                                                                                                1. 3

                                                                                                                  encrypted messages will have a lock symbol

                                                                                                                  As unencrypted HTTP has become less and less common, browsers have switched from highlighting when a page is encrypted to highlighting when it isn’t encrypted. How obvious does Delta Chat make it when a chat is plaintext?

                                                                                                                2. 2

                                                                                                                  BTW, Signal can be used to send normal SMS to your non-Signal contacts, so it can be used in unencrypted mode.

                                                                                                                  1. 1

                                                                                                                    Ah, I forgot about that. (This is an Android-only feature of Signal, IIUC, and I use iOS.)

                                                                                                                  2. 2

                                                                                                                    I clicked to the comments because very similar thoughts occurred to me.

                                                                                                                    I don’t think having insecure messages go through the “secure” chat interface is a good idea at all. A decent compromise might be:

                                                                                                                    • user attempts to send a secure chat message to someone whose email address they have
                                                                                                                    • delta alerts them that secure messaging is not possible and offers to send them an insecure message instead
                                                                                                                    • if the user opts to send an insecure message, it uses the system’s mailto: url handler to open the default mail client so that this looks like every other insecure message the user sends.

                                                                                                                    Sending plaintext from your secure tool should be scary and involve extra steps, IMO.

                                                                                                                    I have other reservations about selecting email as the substrate for this, but those aren’t security related.

                                                                                                                  1. 3

                                                                                                                    Additionally one prefix operator for negation would be nice. Since - is already in use, I chose ~.

                                                                                                                    It seems like it might be possible to treat - as either a binary or a unary operator depending on the context: if it has whitespace before it but not after it—which is illegal for other operators—it’s the unary operator, and otherwise it’s the binary operator. That would make it possible to write things like 3 + -2. (One downside is that the mandatory whitespace on the left would prevent you from writing things like 20+-10 / 2, which would require parentheses if you were going to insist on negating the 10.)

                                                                                                                    1. 2

                                                                                                                      Haskell does something similar: if - has nothing on its right-hand side, it’s a prefix negation operator; otherwise it’s subtraction. This rule is an exception to how the rest of the language works, but it’s considered an acceptable compromise.

                                                                                                                      Another thing I considered was making + and - part of the number literal, so that -1 can be parsed as a single token in addition to - being an infix operator. This makes parsing numbers a bit more complicated, but 1 -2 is still invalid, so it seems to work. The practical effect of this is similar to your idea, and I believe Fortran does this.

                                                                                                                      I didn’t agonize over the decision though. This calculator is mostly a proof-of-concept for an idea that I wanted to try out, and the simplest thing was to pick a different symbol for unary negate.

                                                                                                                      It’s worth noting too that the concept of -1 can be understood as beginning with an implicit 0, as in 0-1. This makes a negation operator pretty unnecessary anyway!

                                                                                                                      1. 3

                                                                                                                        That’s not an issue here. The modern browser will use HTTPS because it observed an HSTS header after a redirect, or because the domain is preloaded, so no injection of malware there. The old browser will not support the HTTPS page, but can still attempt HTTP. For your malicious ISP it doesn’t matter if your webserver answers a redirect to HTTPS or content on HTTP, after a succesful MITM it will be whatever they want it to be.

                                                                                                                        1. 1

                                                                                                                          It’s true that any ISP that is willing to just serve you a completely different site can do so if you connect over unencrypted HTTP. But for ISPs that aren’t willing to go that far—who are only willing to inject ads or Bitcoin-mining JavaScript into pages that are otherwise the ones you were asking for—that activity will be prevented by upgrading all HTTP connections to HTTPS.

                                                                                                                          Put another way, there are different levels of malice that are possible from an ISP, and upgrading HTTP to HTTPS won’t defend against all of them but it can defend against some of them.

                                                                                                                      1. 2

                                                                                                                        I don’t understand the author’s issue with static site generators. By that logic, you should only write assembly and not C, because having a compiler adds another dependency.

                                                                                                                        1. 6

                                                                                                                          I’m not the OP, but I think the category of static site generators—like the category of front-end frameworks*—has a bit of a reputation for moving fast and breaking compatibility. Maybe their objection is not to having a dependency, but to the idea of relying on a tool that forces them to either rework their configuration every six months (or else make an extra effort to stay on an old version).

                                                                                                                          * And unlike the category of endofunctors. (Sorry, couldn’t help myself.)

                                                                                                                          1. 2

                                                                                                                            Maybe, but you don’t need to use an immature or fast-changing tool, or if you do use one, you probably don’t need to ever update it.

                                                                                                                            And in the worst case, a static site generator is a very easy thing to write yourself.

                                                                                                                            1. 1

                                                                                                                              I was gonna say then you’re using the wrong static site generator :)

                                                                                                                              I favor choosing one that’s written in a programming language you understand, with a design that’s simple enough to fully wrap your head around.

                                                                                                                              That’s why I use and love Pelican.

                                                                                                                            2. 1

                                                                                                                              Also not op, but I went from a static site generator to HTML+CSS a few years ago; and for me it was mostly that I kept forgetting how the particular static site generator that I was using worked. So every half a year or so when I wanted to write a blog post, I then had to relearn the tool as well.

                                                                                                                            1. 5

                                                                                                                              Aren’t SEO and Lighthouse scores things that change all the time? So, that requires maintenance.

                                                                                                                              I wonder if we’ll ever go through another transition like ‘mobile-first’ that’d also break your CSS on new devices.

                                                                                                                              OP also compares Wordpress.com and GitHub Pages, two hosted services, while implying his page on GitHub will never have security vulnerabilities. Sure, if your host is outside of your security scope, then your ‘Wordpress.com stack’ will also never have security vulnerabilities.

                                                                                                                              Ok, that last one was probably super nitpicky. And I do agree and appreciate simple fast pages. :-)

                                                                                                                              1. 3

                                                                                                                                To be fair, “a transition that breaks your CSS” is a lot easier to handle when you only have one CSS file :-) I think this is probably a point in favor of the OP’s approach and against something like WordPress, where I imagine that many popular themes would be updated quickly but that a long tail of other themes would be updated much more slowly.

                                                                                                                                1. 0

                                                                                                                                  SEO and Lighthouse scores are to be ignored as a matter of principle.

                                                                                                                                1. 4

                                                                                                                                  You don’t need <head> or <body>. If <html> didn’t have a lang tag you wouldn’t need that either. Also, as @arp242 already mentioned, you probably don’t need x-ua-compatible anymore.

                                                                                                                                  -edit: source (via)

                                                                                                                                  1. 5

                                                                                                                                    The goal of this project (which probably isn’t explained well) is to provide a base template to be extended. With that goal in mind, I think it makes sense to include <head> and <body> tags as a placeholder for people to add to. It’s the same reason that the chartset / viewport / description / title tags are included in the head.

                                                                                                                                    Edit: Seeing your PR is making me ponder this comment a little – but I’m still not sure I’m ready to ditch those wrappers.

                                                                                                                                    1. 5

                                                                                                                                      I think your HTML file would be silly if it didn’t include <head> and <body>. If someone is trying to golf every last byte out of their HTML file then fine, but it would be odd to use such a thing as a starting point.

                                                                                                                                    2. 4

                                                                                                                                      It’s often been said that the regular expression that matches HTML is simply .*, so realistically speaking you don’t need anything – but that sounds like what an edgy teen who says “you don’t need to do anything but die (… so I won’t do my homework)”.

                                                                                                                                      1. 1

                                                                                                                                        unfortunately, if you are developing a plain HTML website and using something like live-server to watch for changes as you edit the file, it needs the <html><body> tags to work properly.

                                                                                                                                        1. 2

                                                                                                                                          That’s probably a bug. The DOM won’t look any different if you omit <html><body>, as the tags are implied. I don’t know live-server, but I suppose it looks for these tags to figure out where to inject its auto-reload JS magic?

                                                                                                                                          (yep, it seems like that’s the problem, and someone made a PR to fix this in 2017, is the project dead?)

                                                                                                                                      1. 1

                                                                                                                                        I think it’s a little unclear where the “delete this” part is supposed to end. (I mean, it’s clear to me, but I do this professionally.) I would suggest adding a blank line before </body> and/or another comment saying “end of the section to delete” or something like that.

                                                                                                                                        1. 2

                                                                                                                                          It looks like the Demo link is 404ing right now.

                                                                                                                                          1. 4

                                                                                                                                            As someone who is building a new tool to generate ePubs right now, I feel your pain. At the same time, I’m happy that I’m not alone doing these kinds of spec juggling.

                                                                                                                                            1. 3

                                                                                                                                              I’m disconcerted that I’ve had nobody come along and say “you’re doing it wrong, use this tool to turn an ODT or DOCX into something that passes epubcheck first time, you foolish person.” Like, not even LO’s internal ePub export passes epubcheck. I thought this would be a sufficient nerdbait …

                                                                                                                                              1. 4

                                                                                                                                                This article is pretty old now but it suggests the author is using just Pandoc to create the EPUB. It’s not clear how much testing and validation they did on different ereader devices and programs, though.

                                                                                                                                                1. 1

                                                                                                                                                  I went and tried pandoc (ver 2.5 from Ubuntu 20.04) and it did a mostly okay job from DOCX. It still messed up cross-reference endnotes, but everything does. And the output doesn’t pass epubcheck. Tralala!

                                                                                                                                                2. 4

                                                                                                                                                  In my book, the more tools the better! It has been 15 minutes since a book built by my own tool passed epubcheck for the first time without any error or warning. I still need better markup for the cover and some way to generate a proper table of contents.

                                                                                                                                                  I’m wishing you all the luck with your books!

                                                                                                                                              1. 4

                                                                                                                                                I agree with all the comments here; but no one mentions that this issue is because of FFI.

                                                                                                                                                If chown was written in Rust, there’d be no problem.

                                                                                                                                                1. 2

                                                                                                                                                  I’m not sure it’s a matter of chown being written in Rust so much as that the version in the libc crate doesn’t expose the full capability of the original function—as the OP points out. If the “real” libc function can accept an argument of -1, it’s arguably a bug that the Rust version requires an unsigned integer for that parameter. (More broadly, it’s a bug that the Rust version lacks a capability that the original version had. It could also allow, say, an Option<u32> for the group ID, with None meaning “I don’t want to change the gid.” That would do a little better at making illegal states unrepresentable.) I don’t see how it matters whether the actual implementation is in C or Rust or whatever.

                                                                                                                                                  1. 4

                                                                                                                                                    There’s no bug in Rust’c libc, see last bullet at https://lobste.rs/s/9e7o8e/comparative_unsafety#c_btqrdt.

                                                                                                                                                    1. 1

                                                                                                                                                      Ah, I understand now. Thanks for explaining in that other comment!

                                                                                                                                                1. 7

                                                                                                                                                  Anyone know how this compares to Fennel, another Lisp-on-Lua language?

                                                                                                                                                  1. 9

                                                                                                                                                    AFAIK Urn is effectively its own language with its own semantics, and Fennel I think is better thought of as a lisp-syntax-frontend for Lua. They both compile to Lua, sure, but I believe the Urn compiler is doing a lot more leg work.

                                                                                                                                                    1. 6

                                                                                                                                                      That’s correct. The big difference is that Urn does a lot more static analysis; I think at one point they were building a gradual type system into it? But some of that made interop with Lua less direct, and it also made it difficult to support interactive development with a repl. (I started using Urn a bit before I discovered Fennel, but the lack of a repl was rather problematic for me, even tho I liked a lot of other things about it.)