1. 13

    seL4 is the peak of operating system design, so let’s assume that’s the basis. If your “OS” can be implemented on top of L4, than it should be. The fundamental security layer for a system should not need updated every few months.

    Complex systems, which will inevitably exist, will need constantly changed to account for constantly-changing user requirements. “Able to run most popular applications that have existed throughout history” is a good sign that your system will be able to host anything that exists in the future (not perfect, but predicting the future never is). This should be possible without horrible hacks. This does mean that I’m kind of rejecting your premise of being tabula rasa, because otherwise, how would I know if it was any good?


    The filesystem is probably the part of POSIX-y and Windows-y systems that I dislike the most, because there’s an impidence mismatch when sqlite tries to build on top of it. Abstraction inversions like dividing a file into blocks are a good sign that the OS-provided abstraction is too high level. My ideal operating system essentially caters to the needs of databases, and treats the directory tree as a cooperatively-maintained database just like sqlite itself is.

    In my ideal operating system, the OS itself would only really prescribe an interface where a process is given access to an undifferentiated region of bytes. The interface between the logical block storage service and the application is a movable, mmapped window into a larger storage area. To sandbox a subprocess, you take your logical access capability token, and use a service call to turn it into a capability token that only gives access to a subsection of what you currently have access to, like how memory is managed in basic L4. It would also expose primitives to take a volatile, CoW snapshot of an arbitrary region of the process’s accessible block storage area, so that an application that just wants to read the data can do it without having to copy it all into memory by hand (with danger of torn reads), and primitives to make small, atomic, reads and writes to the storage area.

    A process can also take a non-atomic read-write mmap of a block storage region. Two processes that do this can use CPU-level atomic memory ops to engage in shared memory concurrency without the block storage service being in the hot path at all, and it would only be involved if the system is low on RAM and it needs to flush it to disk (just like Linux’s block cache does). The only way to guarantee that any of this gets written to permanent storage, however, is to make a sync call to the logical block storage service.

    This sort of P2P protocol is how directories would be implemented, as they would simply be a library that runs in your own address space. The directory would be structured as a B-Tree, so updating the directory would be done by mmapping in some free space, writing a copied, but changed, version of the applicable node, syncing it, and then performing a “write if equals” service call to replace the old node. Or, more likely, it would go through a journal first, to allow a fast path where updating a directory is just a write and a sync without having to copy B different nodes (it should have amortized O(1) complexity).

    The point of this radical design is to eliminate the current-day arbitrary tradeoff of “lots of small files” vs “one huge file” that applications like Git have to deal with. Both would have similar performance (acknowledging that they also have the same weaknesses: if your application doesn’t use the battle-hardened directory library, you can wind up corrupting the directory tree).

    This design also precludes some of the fancier permission systems that POSIX-y operating systems use; if you want more fine-grained control, use services and message passing layered on top instead, like an RDBMS. A POSIX subsystem might be implemented as a service on top of this. Similarly, the content-hosting processes of a web browser would only interact with the filesystem through a broker process that actually uses the directory library to get at files; due to the insufficiently-sophisticated permissions systems in Windows and in POSIX, this is how they wind up working anyhow.


    The general rule, that follows from the above “filesystem” (really “logical block storage”) design, is that there should be no strings in the operating system ABI. Ever. If you’re designating special characters, then you’re doing it wrong, and if your operating system is prescribing a text encoding, then it’s too high level to be properly future-proof.

    1. 2

      3 kinds. Lists 2. What?

      1. 2

        From the Java docs:

        This class consists exclusively of static methods for obtaining encoders and decoders for the Base64 encoding scheme. The implementation of this class supports the following types of Base64 as specified in RFC 4648 and RFC 2045.

        • Basic

          Uses “The Base64 Alphabet” as specified in Table 1 of RFC 4648 and RFC 2045 for encoding and decoding operation. The encoder does not add any line feed (line separator) character. The decoder rejects data that contains characters outside the base64 alphabet.

        • URL and Filename safe

          Uses the “URL and Filename safe Base64 Alphabet” as specified in Table 2 of RFC 4648 for encoding and decoding. The encoder does not add any line feed (line separator) character. The decoder rejects data that contains characters outside the base64 alphabet.

        • MIME

          Uses the “The Base64 Alphabet” as specified in Table 1 of RFC 2045 for encoding and decoding operation. The encoded output must be represented in lines of no more than 76 characters each and uses a carriage return ‘\r’ followed immediately by a linefeed ‘\n’ as the line separator. No line separator is added to the end of the encoded output. All line separators or other characters not found in the base64 alphabet table are ignored in decoding operation.

        MIME and Basic both use the same alphabet.

      1. 2

        looks great. I wish the config was not done in JSON, which does not allow comments.

        1. 1

          Thanks!

          The reason the config file is in JSON is because it’s always written to programatically. I know it’s slightly annoying but I believe it’s a worthwhile compromise; when I wrote it I considered SQLite but settled on prettified JSON to allow the file to be tracked in git or similar.

          1. 1

            Fair, you could have opted for YAML or TOML which are also easy to write programatically, but also allow humans to edit them easily (including comments).

            Great project btw, I am going to play with it on a test server.

            1. 2

              Fair, you could have opted for YAML or TOML which are also easy to write programatically, but also allow humans to edit them easily (including comments).

              Sure, but I’m not aware of any serialiser that would retain the comments when writing the file.

              Great project btw, I am going to play with it on a test server.

              Thanks :)

              1. 2

                Sure, but I’m not aware of any serialiser that would retain the comments when writing the file.

                https://docs.rs/toml_edit/0.2.0/toml_edit/

        1. 4

          One point I have to disagree with is the complaint about NAT. I know very little about networks, but I do know NAT is the bane of end to end, and is naturally hostile to peer-to-peer communications. There’s still hole punching, but that requires a server somewhere. Without that, people must configure their router, with no help from the software vendor (the router is independent from the computer or phone it routes).

          • Want to hide the layout of your network? Just randomise your local addresses.
          • Want to hide the existence of part of your network? Use a firewall.
          • Want to drop incoming traffic by default? Use a firewall, dammit.

          We could even have a “NAT” that doesn’t translate anything, but open ports like an IPv4 NAT would. Same “security” as before, and roughly the same disadvantages (can’t easily get through without the user explicitly configuring allowances).

          That said, don’t we have protocols to allow the computer on the home network to talk to the router and ask it to open up some port on a permanent basis? I just want to start my game, advertise my IP on the match server, and wait for incoming players. Or setup a chat room without using any server, because I too care about my privacy.

          1. 9

            The NAT section absolutely reads like this person doesn’t understand firewalls.

            Every computer just by default accessible anywhere in the world unless you specifically firewall things?

            Where in the world is there NAT without a stateful firewall denying all incoming connections by default? Removed the NAT (because IPv6) you’d be left with a firewall. Where’s the issue? Maybe they think you need to configure the firewall on each computer specifically? That’s what this seems to imply:

            the kind of devices that do actively use IPv6 (mobile devices, mainly), are able to just zeroconf themselves perfectly, which is nice from a “just works” perspective

            1. 2

              NATs do provide some obsfucation of addresses and can make it more difficult for an attacker to reach a device directly (ignoring, of course, intentionally forwarded ports..)

              1. 5

                I’m not convinced that this is true. Most IPv6 stacks (including the Windows one) support generating a new IPv6 address periodically. The stack holds on to the old address until all connection from it are closed. IPv6 lets you take this to extremes and generate a new IPv6 address for every outbound connection. It’s fairly easy to have a stable IPv6 address that you completely firewall off from the outside and use for connections from the local network, and pick a new address for every new outbound connection. If someone wants to map your internal network, they will never see the address that you listen on and they can’t easily tell when two connections are from the same machine or different ones.

                In contrast, IPv4 NAT relies on heuristics and is actively attacked by benign protocols such as STUN, so implementations often have vulnerabilities (there was one highlighted on this site this week).

                1. 3

                  No: I receive SSH bruteforce on my LAN from private IPv4 packets coming from outside, joyfully going through my ISP router from WAN to LAN. Firewalling, in IPv4 and IPv6 alike, prevents that, not NAT alone.

                  1. 2

                    Also Yes: a misconfigured firewall with NAT might have upstream routers not route devices individual addresses to it.

                    Also TLS does have sessioin resuption cookies, and maybe rotating the IPv6 at every request is possible? Not practical though…

                    Good point: How would we implement privacy-focused VPNs with IPv6? The Big Bad NAT?

                    1. 1

                      Also TLS does have sessioin resuption cookies, and maybe rotating the IPv6 at every request is possible? Not practical though…

                      With what I’ve researched, it’s possible, assuming the server supports them, which a number do not (because a lot of mechanisms like that, in the interest of accessibility or speed, are a compromise to security)

                2. 1

                  Where in the world is there NAT without a stateful firewall denying all incoming connections by default?

                  Ideally, nowhere. In the world I live in? I’ve seen a good handful of people order their firewall incorrectly and end up placing an ALLOW ALL rule as the first in the sequence, meaning they have effectively no firewall.

                  With IPv4, accidentally leaving your firewall wide open like that, assuming you run a network fully behind NAT, would lead to no real issue, since any port not given a NAT rule has no actual destination to pass it to.

                  For the record: I am in no way saying NAT alone should be your security policy. But looking at the design documents, the conclusion of them seems to say that if your firewall dies, for some reason, NAT at least still plays traffic cop (well, maybe more like traffic controller)

                3. 1

                  NAT is the bane of P2P, and is something that, yes, I do indeed understand why it’s not considered part of IPv6.

                  However the nice ability to have about, at current count, 6 different hosts, accessible all under the same IP with just a port number to deal with is nice - I don’t need to remember multiple addresses, certainly not stupidly long ones, I just know that anywhere in the world I can type in 96.94.238.189 and that’s the only number I need to memorize. And as far as my research has lead me, that’s just not possible in IPv6.

                  1. 2

                    haproxy?

                    1. 1

                      Of which, 90% of my network traffic is routed through. However, HAProxy is not the end-all be-all here, some things it just can’t do:

                      • SSH, which, at least, OpenSSH, does not have support for the PROXY protocol, nor host based routing. I need, currently, 3 separate ports NATted through to deal with the latter. We’ll get to the former in just a moment.
                      • Mail, be it SMTP, POP3, or IMAP, but especially SMTP, where the IP address connecting is massively important (See also: SPF), and, again, Postfix, which is what I’m using at an MTA, does not support the proxy protocol either, to the best of my knowledge.
                      • Very long-run connections, like IRC. While yes it can handle these, and no, I don’t mean logging, option logasap is a thing, but it really doesn’t seem like HAP was exactly meant to keep track of connections like that, especially when, once again, my ircd, UnrealIRCd, pays attention to the connecting IP (There is discussion about allowing PROXY support, I believe it’s experimental with WebIRC blocks, but for the entire server, it’s not supported)
                      • Anything with enough security reasoning to solidly slap fail2ban on it, such as… SMTP and SSH. Fail2Ban doesn’t understand PROXY, though, admittedly, you only need the application service to understand that and log it. But either way if I’m routing connections through another machine then fail2ban is useless without a fair bit of configuration to make it work cross machine, something that I wrote an entire service for, just to allow my Apache instances to correctly ban IPs at the HAProxy level.

                      As much as HAProxy is an amazing piece of kit that is very functional and flexible, not everything expects, or exactly allows, arbitrary reverse proxies without a lot of fiddling.

                    2. 1

                      Can’t you just use the same prefix for all IPV6 addresses? I imagine something like:

                      aa:bb:cc:dd:ee:ff:gg:hh::1
                      aa:bb:cc:dd:ee:ff:gg:hh::2
                      aa:bb:cc:dd:ee:ff:gg:hh::3
                      aa:bb:cc:dd:ee:ff:gg:hh::4
                      

                      The first part may still be a pain if you have to remember it and type it by hand, but the rest doesn’t sound that bad…

                      1. 1

                        In theory, yes. Also in theory, you should never have to type in an IPv6 address because DNS anyways (let’s just forget that sometimes when I’m configuring a fresh host on the network it has no DNS)

                        And minus setting up network prefix translation for your chosen prefix inside the private space, you’ll be dealing with something out of your control. For example, I just disconnected my phone from wifi. It now has a public IPv6 of 2600:387:c:5113::129. And it’s a lot easier to memorize 24 bits of decimal than 64 bits of hex. Heck, if you run a /24 inside the standard 192.168 prefix range for IPv4, you only really have to remember two numbers: your chosen prefix (in my case, 5), and the IP of the host you want to reach (say, 158). Therefore I can mentally remember that the pair (5, 158) is, say, the new container I just brought up, and I can probably hammer out 192.168.5.158 into my browser’s address bar before I’ve even fully recalled that.

                        IPv6, however, would likely cause me to have to memorize an entire address, or always be going back to my trusty ip a command to copy it. and something like 2600:387:c:5113 as a prefix isn’t something I can really compact, like I can compact 192.168.5 to 5. And being much longer, it’ll take more repitions to successfully memorize that recall away, meaning I just need to keep my host IP portion in memory.

                        God forbid if any part of that address changes on you though. Hopefully dynamic IP assignments (“from an ISP” dynamic, not “from DHCP” dynamic) won’t be a thing in IPv6 the way they are in IPv4

                  1. 1

                    I wonder if you could partially alleviate this for links without high variance in throughput by throttling the upper PPP convection to a little less than the expected throughput of the lower IP link?

                    Maybe better to idea: switching on ECN bits on the packets just before you put them into PPP, any time the lower TCP connection says its send window is full? I think that might make connections in the upper TCP stack respond to congestion seen by the lower TCP stack much faster.

                    If the outer TCP stack notified you when it detected a lost packet, perhaps you could start seeing ECN bits even sooner. That would require adding more features to the sockets API though, which I guess is a tall order.

                    1. 2

                      The Yggdrasil approach is to use a very high MTU (to try to reduce the number of control messages) and drops packets if the upper TCP layer sends them faster than the lower TCP layer can send them.

                      1. 1

                        Interesting! Does the larger MTU change the granularity at which clients get feedback (or don’t) about how well their connections are going?

                        Dropping packets is perfectly reasonable, it’s just that ECN seems more elegant to because it is nominally supposed to have the same effect on client send rate as a drop, but didn’t involve throwing away a perfectly good packet that maybe already traversed several hops consuming bandwidth along the way.

                    1. 2

                      Harmful things that are so superfluous and useless that require no alternative:

                      • PoSix locales.

                      I can sort of get not liking POSIX locales. POSIX locales specify encoding, while nowadays you really should only be specifying language and measurements, and just use UTF-8 as the encoding. They also affect the behaviour of functions that are used in software<->software interfaces, which can break stuff*.

                      But “superfluous”? Why would you bother with UTF8, so much hassle to support multiple languages, just to turn around and not actually support translating the interface? Do they have some sort of alternative method in mind for how to specify one’s preferred, default natural language?

                      * classic “UNIX Hater’s Handbook” talking point; UNIX design is to have one interface that serves double-duty as a UI and an API, resulting in an interface that’s subpar for both

                      1. 5

                        POSIX locales as harmful is a nice piece of cultural imperialism from lazy programmers:

                        As a native of a non-anglo culture I can see several nuances my native culture is better suited for daily use in localized form than the POSIX default American-English culture (eg. date format is yyyy.mm.dd not the… stuff used in the USA mm/dd/yyyy).

                        Also having support for my own culture is convenient for me. Users should have the right to good things, even if some Plan9 wankers think “oh no, it is inconvenient for the sw author: it needs work to be done!”. Guess the appeal of this attitude is a factor in the success of their toy.

                        1. 2

                          I don’t think people opposed to POSIX locales are opposed to the idea of localisations, but rather to the way POSIX locales implements it. In fact, the arguments I’ve seen against POSIX locales is that it makes correct localisations much harder than they need to be.

                          I never worked much with this so I don’t really have an opinion one way or the other, but I don’t think your comment is a fair portrayal of people’s actual arguments.

                          1. 1

                            I’ve got some experience with .Net localizations and web solutions for localizations, and some 15 year old memories of POSIX localization.

                            I think it cannot be done in an elegant way without making everyday tasks overly verbose and cumbersome, as it is a cross-cutting concern to pretty much the whole system, not only business logic (it has its effect there: in Hungary week starts with Monday, in other places it starts with Sunday, and we are still in the western calendaring system…), and also the underlying system/framework.

                            Its effect on frontend/visualization is trivial, that is the only place where I think it is not considered a cross-cutting concern, rather a clear dependency (translations for presentation strings)

                            Having character encoding in the locale is a dirty hack though. For me that is the only part POSIX locales is clearly bad.

                        2. 2

                          Pretty sure people aren’t arguing against localization, just that doing it as low level as the libc is not the way to go about it. It also doesn’t help that the API is poorly designed.

                          1. 1

                            Poorly designed APIs are a C/POSIX trademark.

                            As long as number to text formatting and date to text formatting, date handling is done in the libc, it must handle localization.

                        1. 6

                          Posts like these always make me feel like I’m living on another planet than some people. Why use docker? Why use a pi-hole at all? Is this all just for the web interface?

                          I personally think it’s much better to run DNSCrypt Proxy and just either point it to an upstream adblocking DNS or host my internal one with it’s own set of blocklists that use the same list from the Pi-hole. That could probably even be simplified to a set of firewall rules instead of DNS, or just DNS local resolver without DNSCrypt.

                          1. 3

                            I signed up for NextDNS about two weeks ago due to some excited Slack chatter about it (and to test my Handshake domain) and I quite like it. I’m gonna see about applying it to my router, if possible, next week.

                            1. 3

                              Honestly I just use one of the public resolvers that does AdBlocking on my phone or mobile device and at home I run an internal resolver that blackholes using the uBlock origin lists and a tiny script that turns it into unbound format. All of these solutions seem… Massively complex for what they really are.

                              1. 1

                                Oh that’s neat, thanks for sharing!

                                1. 1

                                  Since public resolvers can see DNS request originating from your network, the privacy impact can be quite severe. I’d suggest to choose your upstream provider wisely. That’s why I’d never chose a public DNS server from google for example. Since you are already running unbound, you could also chose to take another way:

                                  I’ve set up unbound to query the root dns servers directly and increased cache size to 128 megs. When the prefetch option is set, cache entries are revalidated before they expire. Not only does this increase privacy, but also dramatically reduces response times for most sites when the cache is warmed up. Be aware that the DNS traffic goes up by around 10 percent or so.

                              2. 2

                                People don’t understand how things work, so instead of learning how to build something simple, the, throw heaps of complex software on top of each other, because that is how things are done in 2020.

                                I too have a cron job that creates an unbound block list. The great thing is that I can easily debug it, because I understand all of it

                                1. 1

                                  How many devices do you own that talk to the internet?

                                  If it’s literally just me, then I would configure a thing on my laptop and call it done. I live with a bunch of other people, and even if I could individually configure all of their devices (some of them are too locked down for that), I wouldn’t really want to have to learn how to configure ad blocking on six different operating systems from three different vendors.

                                  A centralized solution is actually easier, and it inherently gives ad blocking to everyone. It also has a web interface, so you can teach someone how to turn the ad blocker off if they really, really need to, but turning it off is enough of a pain in the neck that they usually just decide that reading such and such a listicle isn’t work it.

                                  1. 1

                                    8 physical devices and 30 virtual machines (technically 20 talking to the internet because the others are active directory labs for testing and they switch around depending on my needs). The reality is that if I were in your situation I’d just set my router to give out the DHCP nameserver for dns.adguard.com or to the local resolver to recurse up. That wouldn’t even require software installs but does rely entirely on a third party resolver.

                                    1. 1

                                      That would’ve been an option, too. I did consider it.

                                      OTOH, as you mentioned, “is it just for the web interface?” Yes, that’s one of the biggest reasons.

                                1. 2

                                  The Paradox of the Sandbox

                                  A successful sandbox is self-negating; there’s no safety after everyone gets in.

                                  Operating systems and a web browsers are the same thing. To paraphrase Nicholas Nethercote, they’re both “execution environments that happen to have some multimedia capabilities.” In that vein, Google, as an actor, is irrelevant in the steady venn overlap of operating system and web browser. Microsoft feared Netscape for the same reason.

                                  1. 5

                                    Virtually every sandbox is non-nestable. IMO this is core to the issue; if your sandbox is useful but not nestable then you need another sandbox inside it.

                                    LUA contexts are closer to the right kind of thing but are not as generally useful as I would like.

                                    1. 3

                                      Totally! Once everyone is the same sandbox, then someone makes a new sandbox inside of the old, and the cycle begins anew. Because of that, I wonder if sandboxes are cheap, easy, and wrong. One alternative security model I like is capability-based security.

                                      How do Lua contexts work? I’m not familiar.

                                      1. 2

                                        Most programming languages have a global namespace (for eg classes) and if your code calls “fs.Open” it gets the syscall.

                                        LUA lets you craft a new namespace and run other code within it. That namespace could have a different definition of “fs.Open”, and it’s 100% transparent. These are nestable. Code running in a namespace without the network or filesystem defined cannot access those things.

                                        1. 1

                                          If I understand you correctly, a nested context is a strict subset of its surrounding context with no way to jailbreak. Very cool!

                                          1. 1

                                            Yep! Main issue is that it wasn’t designed for untrusted code (eg the interpreter isn’t hardened) or non-LUA code (limiting usefulness). Still very cool.

                                            1. 2

                                              Another issue, which is tackled directly by communicating-event-loop designs, is how to avoid plan interference, the legendary concurrency bug class. It is important not just to be able to run code with a new context of objects, which can limit the authority to invoke various powers, but also to be able to run the code with a new execution context (a new continuation/thread/etc.), so that the containing code is not denied of its own ability to manage its invariants.

                                              I think that some Lua environments handle this, and they do it through communicating event loops just like E.

                                              1. 1

                                                This is really neat. Feels like a dynamic analogue to effect systems in statically typed FP langs.

                                                1. 0

                                                  FWIW Oil will likely grow this subinterpreter feature, which Lua and Tcl have. (In contrast, there have been many attempts to put it into Python, but the code fundamentally isn’t architectred that way)

                                                  https://github.com/oilshell/oil/issues/704 (a bunch of links here about Tcl, node.js, and so forth)

                                                  Use cases:

                                                  • evaluating untrusted config files (similar to Lua’s original use case)
                                                  • writing interactive shells in Oil, and separating user state from shell state
                                                  • maybe: Lua-like “process”-like concurrency with states and threads (not sure if anyone uses this, but it’s in a Lua paper)
                                            2. 2

                                              Since you said the magic phrase, I should fork the comment thread here to note that capability-based security properties are relatively cheap in formal settings. In particular, the ability to isolate one computation from another is free in all of the pure lambda-calculi.

                                              This has an immediate and attractive suggestion for language design, which I want to avoid mystifying: Consider the ability of one object to interfere with another unsuspecting object as an impurity or side effect. That includes function calls! We do have to work to ensure that some objects are sufficiently tame so as to not commit side effects; this is usually called freezing and the resulting objects are not just immutable, but transitively immutable and unable to store private references for any reason. There will be no hidden caches, debugging routines, timers, or other potential side channels.

                                              We don’t have to have all objects be frozen. Instead, we can hope that the objects which represent possible behaviors are frozen; this then allows us to instantiate objects we know to be safe, and combine them with those frozen objects, and know that the worst that can happen are the normal Turing-complete things. Specifically, I think that when modules are frozen objects, then code loading can be as (un)safe as the user desires. The user can design their own sandbox, and be confident that the programs inside that sandbox will not be able to import any outside references.

                                              Right now, the main design problem is the one that both you and Munroe refer to. The missing solution which will end the cycle is a common interchange format for object references, so that delegations of authority can happen in a truly uniform fashion; this breaks the cycle at Munroe’s northern arrow by suggesting that we can interchange limited authority between existing power structures without requiring both structures to be embedded within a single common context. Of course, literally every power broker on the planet would rather that this not happen, and so instead we are stuck in our current situation, where we must import each proprietary API by hand and integrate its object model to our desired approximation.

                                              (This last bit tugs at a philosophical pondering I have been aiming to reasonably justify for some time. If some Alice and Bob have a dispute, then justice should consist of both Alice and Bob being satisfied in the arrangement, regardless of who they are. (This is the famous cake-cutting concept.) But then, if some judge Judy is summoned to adjudicate the dispute and agrees, or if she shows up of her own volition and intercedes, then surely justice should consist of all of Alice, Bob, and Judy being satisfied in the arrangement. Otherwise, Judy may well use some private knowledge to deprive both Alice and Bob of what would otherwise have been equitable.)

                                          2. 1

                                            A successful sandbox is self-negating; there’s no safety after everyone gets in.

                                            Isn’t that what the Same Origin Policy is supposed to resolve? That is, if you want more sandboxes, use more domain names.

                                            1. 1

                                              Yes, and do you think it has been successful? I don’t.

                                          1. 2

                                            I was already opposed to copyleft when I read this post, but I found it very convincing and I feel like my position has shifted a little further towards being opposed to intellectual property altogether. I’m not quite convinced that the benefit of dual licensing beyond MIT outweighs the cost of not having Github parse and display the licenses properly, but I have more appreciation for the intent behind the UNLICENSE than before.

                                            Thanks for writing this, burntsushi. Also thank you for having such calm and thoughtful responses to all of the comments in this thread, it was a very pleasant read.

                                            1. 1

                                              towards being opposed to intellectual property altogethe

                                              copyright abolitionists unite!

                                              I am happy to use copyleft as a strategy for now, but if we ever got close to a world where we could just weaken or end copyright I would jump on that bandwagon so fast.

                                              1. 5

                                                I only started being opposed to copyleft after this FSF / Pirate Party debacle. It seems like a stark case where the means to the end wind up becoming the ends unto themselves: the FSF opposes shortened copyright terms because it shortens the applicable time limit of the GPL.

                                                Never mind that, if someone’s use case is fine with a five year old version of Linux, their use case would probably be satisfied with a BSD flavor anyway (in other words, I think the biggest impediment to just cloning a GPL’ed application to form a proprietary or MIT-licensed equivalent is if it has a constant stream of valuable updates). Never mind that SaaS allows software to remain proprietary with no intellectual property protection at all. The FSF basically decided that it was okay to harm the cultural commons in order to prevent companies from making proprietary forks of outdated versions of GPL’ed software.

                                                1. 1

                                                  There are many things where the FSF and I disagree. This is certainly one of them

                                                2. 1

                                                  Is there anything that would convince you that copyleft is counter-productive to abolishing copyright?

                                                  1. 1

                                                    Given that that was my original reason for opposing copyleft and I’ve since switched my position, it seems unlikely ;)

                                                    Abolishing copyright would of course make my copyleft strategies stop working, but that’s fine in the service of the greater goal. No strategy lasts forever.

                                              1. 1

                                                With all the things that are false about names, time, and now addresses, I wonder if it’d be easier just to list all the things that are true.

                                                1. 1

                                                  Are you worried about what the post office will expect, or do you have some other use case for the addresses?

                                                1. 8

                                                  I’ve noticed too that building Rust programs from isolated components/crates with well-defined interfaces is very convenient. I can develop and test each component individually. When they’re not a part of a monolith, I don’t need dependency injection to test them. And then putting components together is as easy as building with Lego.

                                                  Many crates can be small enough to actually be reusable across projects. The larger the library, the more you need to make it flexible and configurable, which adds complexity. But when a crate is trivial, you either reuse it or you don’t, so they can stay simple and focused.

                                                  1. 29

                                                    No offense, but isn’t that how it works with every language? You build libraries, which do their own thing, are tested individually. And then later you connect them all, which is your main app. But that’s easy, since the individual parts are very likely to work.

                                                    1. 28

                                                      On a high level it’s supposed to be like that everywhere, but I see qualitative differences:

                                                      • In C and C++ the culture is to actively avoid having “unnecessary” dependencies. They’re considered a hassle and a liability, so they’re used only if it’s too hard to avoid them. That requires dependencies to be complex or large enough to justify their existence.

                                                        In C splitting a project into separate translation units or even libraries gives almost no isolation: linkable symbols are global, and nothing stops wrong part of the code from pulling in a header it shouldn’t have. In C, proper isolation is something that requires discipline from the programmer, not a tool that keeps lazy programmers in check.

                                                      • npm has the same culture of small modules (and people mock it for left-pad and is-array). Building experience is mostly similar to Cargo, but Rust’s strong type system adds an extra level of assurance. I miss things like docs.rs and guarantees around immutability and borrowing, so in JS I use more defensive coding and I’m more worried that if I change implementation of one module I’m going to break some other module.

                                                      I don’t have much experience with Java, but the few projects I’ve worked on were a single monolith with DI, and used libraries only for 3rd party code, not its own. In PHP it was like that too: monolith on top of a framework + maybe a few libraries for specific things. Microservices are closest to the level of internal splitting Rust projects do.

                                                      1. 8

                                                        In C and C++ the culture is to actively avoid having “unnecessary” dependencies. They’re considered a hassle and a liability, so they’re used only if it’s too hard to avoid them. That requires dependencies to be complex or large enough to justify their existence.

                                                        I see no problem with avoiding “unnecessary” dependencies. Every additional line of code is a potential liability (both technical and legally) in any language, and an additional risk that something will break in the future. It’s just that in C and C++ dealing with dependencies is more painful than “add a line to a dependency file” so you think about it more.

                                                        1. 6

                                                          That’s exactly my point: the C/C++ view is that dependencies are a liability, while Cargo made them work well enough that Rust users see them as a good thing to have.

                                                          an additional risk that something will break in the future

                                                          The alternative view is that dependencies lower the risk of your software breaking in the future, because they’ve been tested by multiple people, on multiple platforms. If they need a fix, someone will patch them before you even realize you needed this (e.g. I don’t know Windows well, so I get better Windows support if I use deps than if I write something myself. Conversely, I send patches to other crates that improve macOS support, and everyone benefits).

                                                          Rust projects choose to split themselves into multiple libraries, because there’s very little downside, and projects benefit from enforced decoupling and easier development on the “leaf” components. In C/C++ you wouldn’t do that, because it seems like a weird thing to do and wasted effort that just makes build scripts more complex. Tooling and culture makes all the difference.

                                                          1. 6

                                                            while Cargo made them work well enough that Rust users see them as a good thing to have.

                                                            I haven’t used Rust but I’m curious; how much of that is Cargo and how much of that is “having a type system that’s not rooted in 1970s ideas”?

                                                            1. 18

                                                              Pedantic note: don’t blame the 1970s, Hindley published his paper in 1969 and Milner in 1978.

                                                              1. 5

                                                                To expand on modularity section of the post, I think this is fundamentally rooted in the module system. Which is, in Rust, mostly orthogonal to type-system (unlike OCaml, modules are not first-class values), but is very much a language concern. Getting rid of a global namespace of symbols is important. The insight about cyclic/non-cyclic dependencies both being useful and the compartmentalization of the two kinds into modules and crates is invaluable.

                                                                It’s interesting though that the nuts and bolts of the module system are not perfect – it is way to sophisticated for the task it achieves. It’s a shame to spend so much complexity budget on something that could’ve been boring. (I won’t go into the details here, but suffices to say that Rust has two flavors of module system, 2015 and 2018, because the first try was empirically confusing). But this annoyances do not negate the fundamentally right structure.

                                                                However, I also do think that the theory of depedency hell less library ecosystem would be useless without cargo being there and making it easy to put everything to practice. A lot of thought went into designing Cargo’s UX, and I doubt many people would notice the language-level awesomeness, if not for this tool

                                                                1. 4

                                                                  It’s a combination of:

                                                                  • ecosystem taking semver seriously (compare to amount of work Linux distros need to put in to keep sem-whatever packages work together)
                                                                  • compiler taking 1.0 back-compat seriously (in contrast, Node.js has semver-major breaking changes regularly, and my old JS projects just don’t run any more)
                                                                  • concept of crates exist in the language (e.g. Rust can have equivalent of -std=cXY set by each dependency individually. C/C++ can’t quite do it, because headers are shared)
                                                                  • not giving up on Windows as a first-class supported platform (in C it’s “not my problem that Windows sucks”. Cargo chose to make it its problem to fix)
                                                                  • zero-effort built-in unit testing. There’s no excuse for not having tests :)

                                                                  Type system also helps, because e.g. thread-safety is described in types, rather than documentation prose. Rust also made some conscious design decisions to favor libraries, e.g. borrow checking is based only on interfaces to avoid exposing implementation details (this wasn’t an easy decision, because it makes getters/setters overly restrictive in Rust).

                                                                  1. 5

                                                                    I’m not sure the Rust compiler, or its surrounding crates ecosystem, take backwards compat “more seriously” than Node does. They just have better tools available to them to detect and fix it. Take this lovely warning, for instance (yes, this is “real” code, not a contrived test case, though I had to check out an old commit to find it because the current version of the program is fixed):

                                                                    warning: cannot borrow `block` as mutable because it is also borrowed as immutable
                                                                    
                                                                        |
                                                                    687 |                 let variable_id = match &block.instructions[place.0 as usize] {
                                                                        |                                          ------------------ immutable borrow occurs here
                                                                    ...
                                                                    691 |                 stack.push(block.push(Instruction::Assign(*variable_id, value)));
                                                                        |                            ^^^^^                          ------------ immutable borrow later used here
                                                                        |                            |
                                                                        |                            mutable borrow occurs here
                                                                        |
                                                                        = note: `#[warn(mutable_borrow_reservation_conflict)]` on by default
                                                                        = warning: this borrowing pattern was not meant to be accepted, and may become a hard error in the future
                                                                        = note: for more information, see issue #59159 <https://github.com/rust-lang/rust/issues/59159>
                                                                    

                                                                    Rust is capitalizing on a lot of technical advantages, here:

                                                                    • It’s able to detect the problem with just static analysis. Comparable problems in Node would probably not be detectable until runtime, which means the problematic code has to actually be reached in order to detect the problem, at which point it might be too late.

                                                                    • Compiler warnings from Rust are precise, rare, and produced around the same time as compiler errors (which you have to pay attention to in order to produce a working program), so Rust has an open mic to communicate deprecation warnings to the developer.

                                                                    • Even if I slept through all of the warnings and got hit with breakage, it would result in my program no longer compiling, not silently changing behaviour or crashing in production.

                                                                    1. 1

                                                                      concept of crates exist in the language (e.g. Rust can have equivalent of -std=cXY set by each dependency individually. C/C++ can’t quite do it, because headers are shared)

                                                                      Can you explain what this means in language-agnostic terms? I don’t know C or C++.

                                                                      1. 3

                                                                        You can use a crate using the “abc” version of the language from a crate using the “def” version of the language. You can’t do this in C/C++ (or really any other language I know of).

                                                                        Rust calls these versions editions, c/c++ calls them… Not really sure but c++11/c++17 and so on.

                                                                        1. 1

                                                                          You can use a crate using the “abc” version of the language from a crate using the “def” version of the language. You can’t do this in C/C++ (or really any other language I know of).

                                                                          Indeed, but you can also use an ABI in C/C++. I’m not excusing the hell that is dependency management in C/C++ projects, but this certainly isn’t specific to Rust. C and C++ can also interact with binary code written in vastly different languages and runtimes, as long as they conform to the ABI of the platform they’re shipping on. Rust cannot do this without work.

                                                                          1. 2

                                                                            I’m not sure why this is a reply to me. It has nothing to do with language versions.

                                                                            It’s also not really true.

                                                                            All of Rust, C, and C++ speak the various “C” abi ’s with about the same level of work. You have to write a language specific description of the calls in the abi, after that they can both seamlessly call into it.

                                                                            In rust this is an extern block, that looks like extern "C" { fn abi_fn(x: i32) -> i32 }, in C this looks like extern int32_t abi_fn(int32_t);.

                                                                      2. 1

                                                                        not giving up on Windows as a first-class supported platform (in C it’s “not my problem that Windows sucks”. Cargo chose to make it its problem to fix)

                                                                        I’ve used Visual Studio for both C and C++ dev on Windows. How does Rust offer better Windows support than C/C++ ? Are you referring to the ease of cross-compilation?

                                                                        1. 6

                                                                          The problem is that Visual Studio doesn’t work on non-Windows systems, and Unix tools like autotools and pkg-config don’t work on Windows (or have quirky ports in mingw/cygwin), so it’s hard to make a project that builds on both. Package management on Windows is a fragmented mess. MSVC has different flags, pragmas and system headers than gcc and clang. C support in MSVC is incomplete and buggy, because Microsoft thinks C (not ++) is not worth supporting. It’s nothing insurmountable, but these are thousands of paper cuts.

                                                                          OTOH: cargo build works on Windows the same way as on any other platform. Unless you’re doing something very system-specific, it just works. And even if you touch something that’s system-specific, chances are there’s already a dependency you can use to abstract that away.

                                                                          Cross-compilation in Rust is not as nice as I’d like. While Rust itself can cross-compile object files and static libraries easily, they need linking and system libraries. Rust uses C toolchain for linking, so it inherits many of C’s cross-compilation pains.

                                                                          1. 1

                                                                            Oh, I see. While I’m not always the biggest fan of Rust’s insistence of doing everything in Rust, I agree that this is a huge win for Rust and Cargo ergonomics. When writing C/C++ in Windows, you are basically writing for a different C runtime, so you need to tailor it accordingly. Rust’s libraries abstract away the differences (though not without great effort, especially if you look at TUI libraries). I really hope other languages continue to explore this space.

                                                                          2. 3

                                                                            C programs written for unix-like operating systems often ignore Windows support entirely. Hence WSL, Cygwin, and so on. Likewise I’m sure few of your Windows C/C++ programs would run on Linux or MacOS unless you exclusively use cross platform libraries and avoid any Windows-specific interfaces.

                                                                            1. 1

                                                                              Yeah, most definitely. And even the tooling is dramatically different outside of the IDE, with stuff like nmake.

                                                                      3. 2

                                                                        In which way do they work better than in other languages and why?

                                                                        1. 1

                                                                          My point is that every line of code that you write, or that you use, might have bugs or vulnerabilities in them. Dependencies don’t come for free. Any of these dependencies could also have their maintainer just walk away from them, which has bitten me multiple times in the past.

                                                                          Whenever you bring in a dependency, you’re also implicitly depending on all of the dependencies of that thing as well. Tracking dependencies down and being responsible for them is a major issue with the Rust cargo mechanism in that it makes it more difficult for people who have legal liability for code they bring into a project, such as 3rd party contractors.

                                                                    2. 7

                                                                      Yes and no.

                                                                      Heylucas correctly mentions tooling, but it’s also the language.

                                                                      If I’m programming in C or Python and I’m using a library I have to carefully read the documentation to know what I can pass where without breaking things. If it’s java and I’m using a library I have to keep track of a few things like null’s… by carefully reading the documentation. Rust libraries for the most part actually manage to make it difficult to missuse them, so I can be much less careful about reading the documentation.

                                                                      Moreover on the documentation point, rust as a language lends itself to concise and accurate auto generated documentation much more than most languages do, so rust libraries will generally have better documentation despite it being less important. Partially this is tooling, but it is also things like not having SFINAE (C++), not having automatic implementation of interfaces (go), and so on.

                                                                      It is also my belief that the average rust library has less bugs than the average library in the average other language as a result of the language having a greater focus on helping programmers write correct programs (compared to most other languages, which generally focus more on quickly writing programs). As a result the “giant ball of libraries” model is less likely to be a giant buggy mess of others code.

                                                                      1. 5

                                                                        I’m guessing @kornel’s point is more about how easy that is with Rust’s tooling. But I agree, the same can be done for any other language. How easy/convenient it is might depend on the tooling ecosystem.

                                                                        1. 1

                                                                          People have their brains plagged with frameworks to the point that they don’t know how to use their language. Importing a library and test is by simply calling it, is something many developers (cough cough java, c++) already forgot how to do or are not even familiar with to start with.

                                                                          It is a programming language. That dependency injection is not needed, is a simple function of how isolated side effects are. There’s nothing language specific there.

                                                                      1. 8

                                                                        Regarding sum/product, I recently saw this: https://mail.haskell.org/pipermail/libraries/2020-October/030862.html

                                                                        It seems like there’s a proposal to make sum and product strict. So perhaps some time in the future Haskell will have one less warts :)

                                                                        1. 8

                                                                          Yep’, and a merge request is in the works as well, so we’re pretty much done with that point. :)

                                                                          1. 2

                                                                            This seems to be some good improvements!

                                                                            Let’s hope that other programming communities (like Rust) can use this as a learning opportunity!

                                                                            1. 0

                                                                              sum product and fold are already strict in Rust.

                                                                              1. 1

                                                                                I meant fixing things in general.

                                                                          1. 3
                                                                            • mosh
                                                                            • iotop
                                                                            • iftop
                                                                            • mytop
                                                                            • goaccess
                                                                            • ripgrep
                                                                            1. 1

                                                                              +1 for mosh. In combination with tmux unbeatable

                                                                            1. 5

                                                                              Yes.

                                                                              For me a big issue is Unicode support. For example with a language like Go, you can do something like this:

                                                                              package main
                                                                              func main() {
                                                                                 println("😀")
                                                                              }
                                                                              

                                                                              and it just works. Same goes for C#, D, Dart, Nim, PHP, Python, Ruby, Rust, and probably others. But with C++, you have to do this:

                                                                              #include <codecvt>
                                                                              #include <iostream>
                                                                              int main() {
                                                                                 std::ios_base::sync_with_stdio(false);
                                                                                 std::locale utf8(std::locale(), new std::codecvt_utf8_utf16<wchar_t>);
                                                                                 std::wcout.imbue(utf8);
                                                                                 std::wcout << L"😀" << std::endl;
                                                                              }
                                                                              

                                                                              Somehow C++ is 2 decades older than some of these languages, yet its a decade behind in Unicode support. That is not acceptable.

                                                                              1. 2

                                                                                But with C++, you have to …

                                                                                Bull shit.

                                                                                This works fine:

                                                                                #include <cstdio>
                                                                                int main(){
                                                                                	puts("😀");
                                                                                }
                                                                                

                                                                                as does loads of other things.

                                                                                1. 1

                                                                                  Only if your platform defines C strings as UTF8. Windows doesn’t guarantee this, and, in fact, only added support for it recently. Neither does the Linux Standard Base, or POSIX, though some distributions have dropped support for non-UTF8 locales, so you can safely assume UTF8 if you’re on Ubuntu or something.

                                                                                  1. 2

                                                                                    I think this does work to the same extent it works in eg, Go and Rust? Just shoves utf8 bytes into fd 1. So, the upthread comparison does seem a bit unfair to C++ to me.

                                                                                      1. 1

                                                                                        Oh wow, totally didn’t expect that, thanks for the pointer!

                                                                                    1. 1

                                                                                      Prove it: Take a screenshot of an environment where golang produces the correct output and my C++ example doesn’t.

                                                                                      I run my programs on computers, and I think that code I posted will run correctly on anything you can buy in a store today, or your money back.

                                                                                          1. 2

                                                                                            I’d rather use wcout than mess with the registry. Messing with the registry hurts your application’s ability to coexist with other applications on the same machine, only works on Windows, and requires admin access.

                                                                                            1. 1

                                                                                              You can also do it from the Control Panel

                                                                                              https://stackoverflow.com/questions/56419639

                                                                                              1. 2

                                                                                                That’s not better. I write stuff for Windows that needs to run on computers that other people own. I am not going to ask them to enable experimental features in the Control Panel.

                                                                                          2. 0

                                                                                            Are you aware rust and golang are different languages?

                                                                                      1. 1

                                                                                        You are right.

                                                                                        I am not sure what was giving me trouble previously. Maybe I was using an old compiler. I have tried many combinations try to get your example to break, but it does work. As long as I have at least Windows 10 1903, with at least Windows Terminal 0.3.2142. Then set Unicode:

                                                                                        Windows Registry Editor Version 5.00
                                                                                        [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
                                                                                        "OEMCP"="65001"
                                                                                        

                                                                                        and restart. After that your code does work.

                                                                                    1. 1

                                                                                      I went off to the article in the hopes to learn how to do pagination “correctly”.

                                                                                      Unfortunately, TFA makes points only about performance. The points are valid and good in that respect, though.

                                                                                      My hunch is that it’s quite impossible to have a reasonably performant (in time and space) paginated API design, because you would need to maintain a snapshot of the list of things to be paginated for every single client. In many system, you do not know whether your client is still live and still interested in that list, so you need to keep these snapshots around for a long time.

                                                                                      I think paginations only kinda sorta work for human consumption, especially when update rate is slow, say for a blog. For machine consumed lists, pagination is a recipe for disaster. As a client it’s typically impossible to get a coherent list, as a server it’s too expensive to provide such coherent views.

                                                                                      1. 2

                                                                                        This approach always gives you a plausible list of items. It tends to not give you items repeatedly if someone is adding new items to the start of the result set.

                                                                                        If you want a snapshot, ask for a snapshot. Add data versioning to the API. Page through the list asking for the rows as they existed as of the time you started querying. This works fine for browsing say wikipedia or a svn repo. :)

                                                                                        1. 1

                                                                                          That’s the thing that makes cursor pagination good if, and only if, you’re showing in something like chronological or reverse chronological order. Algorithmic order, like on Lobsters, really can’t give you good pagination without taking a full snapshot.

                                                                                      1. 1

                                                                                        This article pretty much overlaps with my attitude towards 3rdparty loaded stuff.

                                                                                        But, if you really have to load something from CDN, just put it on a subdomain of your org and set up reverse proxy for it. For example, if your site is babecook.com, just add scripts.babecook.com or assets.babecook.com.

                                                                                        Not bbcdn.com, babeasetts.com or babecookusercontent.com. How the hell does that even add up in yearly domain / SSL cert costs?

                                                                                        This approach opens several possibilities:

                                                                                        • We could finally make a “first party only” policy working for real and being able to browse sites when all other doamins are denied.|
                                                                                        • You can replace CDNs without any issues or code updates, and you can even store content on multiple CDN providers round-robin switching for cost optimization
                                                                                        • You would spend less on domains and SSL wildcards
                                                                                        • If something goes wrong, you can simply start to host assets on your own or move infrastructure behind that into whatever you want as long as domain stays unchanged
                                                                                        • This adds a somewhat cleaner visual indication of what is loaded or not, for example: scripts.babecook.com/js/jquery/1.2.3/jquery-1.3.4-min.js is much more understandable than bbcdn.net/5d41402abc4b2a76b9719d911017c592.js.
                                                                                        1. 4

                                                                                          As a general security principle, you don’t want to let potentially active content controlled by someone else be served from your domain or any subdomain thereof. This is why so many sites use a different domain (or sometimes just different TLD with same “brand” name, like github.com versus github.io) for their asset hosting/any user-generated content.

                                                                                          1. 3

                                                                                            Not bbcdn.com, babeasetts.com or babecookusercontent.com. How the hell does that even add up in yearly domain / SSL cert costs?

                                                                                            They’re doing that because they want to avoid sending cookies for every tiny avatar pic. Being in a different origin is literally the whole purpose of doing it that way.

                                                                                            1. 3

                                                                                              It’s a security thing. You don’t want to host potentially active user content under a domain that has auth cookies.

                                                                                              1. 2

                                                                                                HTTP/2 has reversed a lot of such best practices. Now cookie headers cost nothing thanks to HPACK, sharding is obsolete, extra domains add roundtrips and interfere with H/2 prioritization.

                                                                                              2. 1

                                                                                                How the hell does that even add up in yearly domain / SSL cert costs?

                                                                                                Probably about $10 per domain per year in total?

                                                                                                1. 10

                                                                                                  We could also remove medium-hosted sites entirely. I hadn’t realized they have a quota paywall when I made that last comment, I thought it was only tacky position: absolute footers and such. I sent Medium a note around then saying we were considering it to ask them to respond, but I never heard back.

                                                                                                  We’ve previously removed links that don’t have anything to read like an ad for a book or a lead gen form that offers a report in exchange for an email address, and a paywall that appears after a few clicks seems like it may fit in the same bucket. The partner program carrying poor incentives is a bit more reason to ban. Any more reasons to? Any reasons not to?

                                                                                                  (Related: We strip Medium’s undocumented “friend links” because the parameter looks like ad attribution and we have no idea what its effects or limitations are. Absent a compelling answer from Medium I’m reluctant to permit it.)

                                                                                                  1. 3

                                                                                                    We strip Medium’s undocumented “friend links” because the parameter looks like ad attribution and we have no idea what its effects or limitations are. Absent a compelling answer from Medium I’m reluctant to permit it

                                                                                                    LWN Subscriber Links are allowed, so there’s precedent. The only difference is that Medium’s friend links use a query instead of being part of the path.

                                                                                                    1. 1

                                                                                                      I think that @pushcx’s view is that the difference between LWN and Medium is that LWN has documented its Subscriber Links (https://lwn.net/SubscriberLink/MakeLink) whereas Medium apparently has not.

                                                                                                    2. 1

                                                                                                      Stripping Medium’s friend links functionally adds a paywall where one previously (with an unmodified URL) would not exist. So, in my opinion, the problems people have with medium links on lobste.rs comes not from medium but from the automatic stripping of friend links. Were the situation inverted – i.e., if a parameter, when present, produced a paywall on some site – it would be obvious that it would not be acceptable for lobste.rs to add this parameter automatically to all submitted links.

                                                                                                      We could quite reasonably ban medium links that do not have a friend link parameter attached – i.e., links that go to paywalled sites but do not allow readers to bypass that paywall. This gets around the primary problem people have with medium – that some articles are subject to the metered paywall. (It does not solve the other problems people have, which seem to revolve around the use of javascript, but we don’t ban other javascript-heavy sites either.)

                                                                                                      1. 1

                                                                                                        I think you’re right. Medium links without a Friend Link parameter are the problem. I think @pushcx is taking the stand that, without documentation, Friend Links are indistinguishable from ad attributions, and ad attributions should be stripped. Maybe the desire to strip something that looks like an undocumented “Friend Link parameter” should be relaxed?

                                                                                                        1. 2

                                                                                                          From what little we know, only the original author can create them. So allowing them certainly wouldn’t flip all Medium links to work because only a minority are submitted by the author, who may not even think to generate one.

                                                                                                          Banning Medium links without the parameter might work but feels like it would exacerbate the poor incentives of the partner program. I dunno, it’s hard to feel confident here when we can’t know much. I lean away from coding a feature to help a single site that pays for distribution to attribute our traffic when I’ve been putting work into making Lobsters less attractive to marketers. How is the error message for a non-author going to explain the situation, and what next steps would it suggest? How would we justify blocking other attribution methods if we privilege Medium’s?

                                                                                                          1. 1

                                                                                                            I mean, lobste.rs doesn’t block sites that host ads, nor does it block other sites where authors get paid based on number of views.

                                                                                                            Our goal is just to prevent spam, right? In other words, the problem here is a visible paywall that keeps users from viewing perfectly good content – and if the content isn’t good, then it should be blocked for that reason and not because of the host.

                                                                                                            1. 1

                                                                                                              Are the criteria/goals documented somewhere? I don’t think the goal is just to prevent spam. There is also a goal of making lobste.rs less attractive to marketers /marketeers (serving some larger goal of preventing spam?). Maybe having a well defined set of goals (if there isn’t one already) would help clarify how to approach these issues?

                                                                                                              1. 1

                                                                                                                I’m not sure.

                                                                                                                How is reducing spam not the same thing as being unattractive to marketers, in this case? (Unless one makes the relatively-weak claim that some advertisements are ‘desirable’ and therefore ‘not spam’ – but this would have the opposite affect and result in a culture more like HN’s.) If you squash advertising, then advertising (including all spam) is squashed.

                                                                                                                I’m trying to figure out why lobste.rs would want a policy that makes people pay for things over a policy that makes people not pay for things.

                                                                                                                The claim that a friends link would aid in tracking is dubious: there’s only one friends link code per story, created by the author; while medium’s stat page shows the number of views through a friends link, it does not allow you to cross-reference that information with referer, date, read versus view, or anything else. As far as I can tell, this feature exists solely to allow people who get the link from the author (rather than a google search or an internal medium recommendation) to bypass the paywall as though they were paying members of the site – the rough equivalent of the token parameter added to private google docs to allow people with a particular link to view the document. (Surely if somebody discovered that was being stripped, we’d add an exception, right?)

                                                                                                    1. 1

                                                                                                      I actually really like my caps lock key. I sometimes think about remapping it but then would realize I’d miss it a lot if it was gone… it is really useful for trying to type one-handed, which I’ve been doing a lot more of lately with the baby and all.

                                                                                                      1. 1

                                                                                                        I really like using a remapped caps lock combined with Sticky Keys. It means I can get a shift-lock by tapping the shift key twice.

                                                                                                        1. 1

                                                                                                          I’ve found casually mentioning “I use caps lock sometimes” brings people out of the woodwork really fast for some reason.

                                                                                                          I have my reasons, such as C_MACROS. Holding shift is a waste of effort.

                                                                                                        1. 7

                                                                                                          While I hate using XML for config files or other human readable documents, I’ve been a big fan of using XML as an RPC serialization format (or as a way to interact with REST APIs). It’s easy to construct through string concatenation, it’s fairly easy to whip up a quick parser, and there’s tons of high quality, fast implementations out there. Along with schematization it makes it fast and easy to send/verify XML payloads.

                                                                                                          1. 9

                                                                                                            I hate XML plenty but I keep finding myself and people I work with reinventing basic features of XML like comments or namespaces or query languages on top of our JSON configuration files. Or people try to use TOML or YAML which become harder to understand or reason about as the complexity increases.

                                                                                                            I don’t have an answer. It’s just an observation. We threw out the baby with the bathwater.

                                                                                                            1. 5

                                                                                                              Along with schematization it makes it fast and easy to send/verify XML payloads.

                                                                                                              This is the big win for me. You can pass a set on XML schemas to any business partner and they can quickly and generically validate the message on any platform. And with facets and comments, the meaning and properties of the message can be conveyed implicitly and in great detail.

                                                                                                              1. 2

                                                                                                                I really have trouble understanding this. Why use XML for serialization, especially in RPC or anything going over the network? It’s ludicrously inefficient for that (json is, too, but slightly less so). Just do yourself a favor and pick msgpack/cbor/bencode/protobuf/… or anything really that doesn’t require complicated escaping of the payload. If you want something easy to parse, bencode is much easier than XML anyway.

                                                                                                                1. 4

                                                                                                                  In terms of verbosity, transport encoding (gzip or whatever) probably gets rid of most of the difference. The great thing about XML is that a lot has been invested in efficient implementations of encoders and decoders. Theoretically others could be more performant but are they? And there’s a proliferation of different XML codec implementations - do you want a DOM interface or streaming or something that maps to native objects? Being old and popular has a lot of upsides.

                                                                                                                  1. 3

                                                                                                                    XML is useful in this case when both of the following are true:

                                                                                                                    • The sender and receiver are different organizations
                                                                                                                    • The payload is more like a document than a serialized data structure

                                                                                                                    In these cases, an XML schema of one sort or another is very useful for keeping both sides “honest.” The encodings you mention are not typically all that extensible, so you wind up versioning your data structures. You do more work up-front with the XML to save yourself some pain as the years drag on. The pain isn’t worth it if your data structures are small and simple. But sometimes you have one or many external parties that want to do data interchange with you, and defining a common schema in XML gives you a lingua franca that is both richer and harder to screw up than IDL-like binary encodings or ad-hoc JSON or its binary analogs.

                                                                                                                    It may seem like this never happens, but it may be that there is a document-like object being served out piecemeal by a family of nested REST APIs. If the REST calls are almost always performed in a certain order (get the main thing, get the pieces of the thing, get the pieces of the pieces…) then efficiency might be improved by just doing one call to get the complex thing. You might be able to improve the robustness of the handling on both sides by using XML in cases like that because it’s just easier to extend it without changing the shape in a way that will break the existing parsers.

                                                                                                                    All this said, if I had my druthers, I’d still probably use XML for a new system once or twice a year, versus using REST+JSON on a weekly basis.

                                                                                                                    1. 1

                                                                                                                      That’s a good point, thanks. XML makes a lot of sense for content that is more document-like. Someone on IRC mentioned DocBook as an example where XML is adequate.

                                                                                                                    2. 2

                                                                                                                      For REST API’s, I would just use JSON. Sure, the format itself is inefficient, but if you’re using the REST API from inside a web browser (and if you expect other people to use this API, then you ought to be using it yourself) it’s hard to beat the efficiency of having a JSON codec already included.

                                                                                                                      You might be able to design your server to use HTTP content negotiation to simultaneously support JSON and Msgpack. Their data models are pretty similar.

                                                                                                                      1. 1

                                                                                                                        It’s ludicrously inefficient for that (json is, too, but slightly less so). Just do yourself a favor and pick msgpack/cbor/bencode/protobuf/

                                                                                                                        Have you done any measurements to come to this conclusion? Especially compared to using EXI envelopes. SOAP is standardized and widespread, so you’d need a very good reason to use anything else.

                                                                                                                        When you get into fields like HPC, where RPC performance actually matters, you don’t actually use any of these formats.

                                                                                                                        1. 1

                                                                                                                          Indeed not, I didn’t know about EXI. Is that… a binary encoding for XML?! It seems less inefficient indeed. But also note how a lot of “modern” RPC is done via thrift, gRPC, finagle, etc. all of which rely on underlying binary encodings to be efficient. And even then they try to optimize for variable length integers and 0-copy decoding.

                                                                                                                          I can’t even articulate my point properly. In big companies using SOAP, I’m sure there’s tons of good tooling around XML. But if you’re not already using it, it seems to have very little appeal for RPC compared to, say, thrift. Thrift will be faster, smaller on the wire, and also comes with a schema.

                                                                                                                          1. 1

                                                                                                                            The point is that you need to justify, using actual numbers, why picking anything other that the established standard (SOAP) is a good idea. Not using SOAP smacks of junior dev-ness. SOAP and XML are going to be around way longer than whatever flavour of the month that always crops up in threads like these.

                                                                                                                            If EXI is not enough, I’m sure someone has figured out how to use ASN.1 with SOAP. This would enable using for example uPER as the wire format.