1. 1

    The ultimate in retro challenge: Install ITS and actually make it do something. Prepare to learn PDP-10 assembler :)

    1. 2

      Much easier these days; there’s a script at https://github.com/PDP-10/its that will do everything for you. Good luck changing your hostname, though; that does still require PDP-10 assembly.

    1. 7

      Well, some of us are in this category (as the article points out):

      If you’re building API services that need to support server-to-server or client-to-server (like a mobile app or single page app (SPA)) communication, using JWTs as your API tokens is a very smart idea. In this scenario:

      • You will have an authentication API which clients authenticate against, and get back a JWT
      • Clients then use this JWT to send authenticated requests to other API services These other API services use the client’s JWT to validate the client is trusted and can perform some action without needing to perform a network validation

      so JWT is not that bad. Plus, it is refreshing to visit a website that says ‘there are no cookies here’… in their privacy policy.

      1. 17

        Plus, it is refreshing to visit a website that says ‘there are no cookies here’… in their privacy policy.

        The EU “Cookie Law” applies to all methods of identification — cookies, local storage, JWT, parameters in the URL, even canvas fingerprinting. So it shouldn’t have any effect on the privacy policy whatsoever.

        1. 9

          You still can use sessions with cookies, especially with SPA. Unless the JWT token is stateless and short lived you should not use it. Also JWT isn’t the best design either as it gives too much flexibility and too much possibilities to misuse. PASETO tries to resolve these problems with versioning protocol and reducing amount of possible hashes/encryption methods.

          1. 1

            Why shouldn’t you use long lived JWTs with a single page application?

            1. 4

              Because you cannot invalidate that token.

              1. 6

                Putting my pedant hat on: technically you can, using blacklists or swapping signing files; But that then negates the benefit of encapsulating a user “auth key” into a token because the server will have to do a database lookup anyway and by that point might as well be a traditional cookie backed session.

                JWTs are useful when short lived for “server-less”/lambda api’s so they can authenticate the request and move along quickly but for more traditional things they can present more challenges than solutions.

                1. 7

                  Putting my pedant hat on: technically you can, using blacklists or swapping signing files; But that then negates the benefit of encapsulating a user “auth key” into a token because the server will have to do a database lookup anyway and by that point might as well be a traditional cookie backed session.

                  Yes, that was my point. It was just mental shortcut, that if you do that, then there is no difference between “good ol’” sessions and using JWT.

                  Simple flow chart.

                  1. 1

                    Except it is not exactly the same since loosing a blacklist database is not the same as loosing a token database for instance. The former will not invalidate all sessions but will re-enabled old tokens. Which may not be that bad if the tokens are sufficiently short-lived.

                    1. 1

                      Except “reissuing” old tokens has much less impact (at most your clients will be a little annoyed) than allowing leaked tokens to be valid again. If I would be a client I would much more like the former rather than later.

          2. 5

            One of my major concerns with JWT’s is that retraction is a problem.

            Suppose that I have the requirement that old authenticated sessions have to be remotely retractable, then how on earth would I make a certain JWT invalid without having to consult the database for “expired sessions”.

            The JWT to be invalidated could still reside on the devices of certain users after it has been invalidated remotely.

            The only way I could think of, is making them so short-lived that they expire almost instantaneous. Like in a few minutes at most, which means that user-sessions will be terminated annoyingly fast as well.

            If I can get nearly infinite sessions and instant retractions, I will gladly pay the price of hitting the database on each request.

            1. 8

              JWT retraction can be handled in the same way that a traditional API token would; you add it to a black list, or in the case of a JWT a “secret” that its signed against can be changed. However both solutions negate the advertised benefit of JWTs or rather they negate the benefits I have seen JWTs advertised for: namely that it removes the need for session lookup on database.

              I have used short lived JWTs for communicating with various stateless (server-less/lambda) api’s and for that purpose they work quite well; each endpoint has a certificate they can check the JWT validity with and having the users profile and permissions encapsulated means not needing a database connection to know what the user is allowed to do; a 60s validity period gives the request enough time to authenticate before the token expires while removing the need for retraction.

              I think the problem with JWTs is that many people have attempted to use them as a solution for a problem already better solved by other things that have been around and battle tested for much longer.

              1. 7

                However both solutions negate the advertised benefit of JWTs or rather they negate the benefits I have seen JWTs advertised for: namely that it removes the need for session lookup on database.

                I think the problem with JWTs is that many people have attempted to use them as a solution for a problem already better solved by other things that have been around and battle tested for much longer.

                This is exactly my main concern and also the single reason I haven’t used JWT’s anywhere yet. I can imagine services where JWT’s would be useful, but I have yet to see or build one where some form of retraction wasn’t a requirement.

                My usual go-to solution is to generate some 50-100 characters long string of gibberish and store that into a cookie on the user’s machine and a database table consisting of <user_uuid, token_string, expiration_timestamp> triples which is then joined with the table which contains user-data. Such queries are usually blazing fast and retraction then is a simple DELETE-query. Also: Scaling usually isn’t that big of a concern as most DBMS-systems tend to have the required features built-in already.

                Usually, I also set up some scheduled event in the DMBS which deletes all expired tokens from that table periodically. Typically once per day at night, or when the amount of active users is low. It makes for a nice fallback just in case some programming bug inadvertently creeps in.

                But I guess this was the original author’s point as well.

              2. 1

                I’ve never done any work with JWTs so this might be a dumb question - but can’t you just put an expiration time into the JWT data itself, along with the session and/or user information? The user can’t alter the expiration time because presumably that would invalidate the signature, so as long as the timestamp is less than $(current_time) you’d be good to go? I’m sure I’m missing something obvious.

                1. 5

                  If someone steals the JWT they have free reign until it expires. With a session, you can remotely revoke it.

                  1. 1

                    That’s not true. You just put a black mark next to it and every request after that will be denied - and it won’t be refreshed. Then you delete it once it expires.

                    1. 7

                      That’s not true. You just put a black mark next to it and every request after that will be denied - and it won’t be refreshed. Then you delete it once it expires.

                      The problem with the black mark, is that you have to hit some sort of database to check for that black mark. By doing so, you invalidate the usefulness of JWT’s. That is one of OP’s main points.

                      1. 2

                        Well, not necessarily. If you’re making requests often (e.g, every couple of seconds) and you can live with a short delay between logging out and the session being invalidated, you can set the timeout on the JWT to be ~30 seconds or so and only check the blacklist if the JWT is expired (and, if the session isn’t blacklisted, issue a new JWT). This can save a significant number of database requests for a chatty API (like you might find in a chat protocol).

                        1. 1

                          Or refresh a local cache of the blacklist periodically on each server, so it’s a purely in-memory lookup.

                          1. 4

                            But in that case, you’d be defeating their use as session tokens, because you are limited to very short sessions. You are just one hiccup of the network away from failure which also defeats their purpose. (which was another point of the OP).

                            I see how they can be useful in situations where you are making a lot of requests, but the point is that 99,9% of websites don’t do that.

                2. 1

                  For mobile apps, that have safe storage for passwords, the retraction problem is solved via issuing refresh tokens (that live longer, like passwords in password store of a mobile phone). The refresh tokens, are then used to issue new authorization token periodically and it is transparent to the user. You can re issue authorization token, using refresh token every 15 minutes, for example.

                  For web browsers, using refresh tokens may or may not be a good idea. Refresh tokens, are, from the security prospective, same as ‘passwords’ (although temporary). So their storage within web browser, should follow same policy as one would have for passwords.

                  So if using refresh tokens for your single page app, is not an option, then invalidating would have to happen during access control validation, on the backend. (Backend, still is responsible for access control, anyway, because it cannot be done on web clients, securely).

                  It is more expensive, and requires a form of distributed cache if you have distributed backend that allows stateless no-ip-bound distribution of requests…

                  1. 1

                    For mobile apps, that have safe storage for passwords, the retraction problem is solved via issuing refresh tokens (that live longer, like passwords in password store of a mobile phone).

                    But then why use 2 tokens instead of single one? It makes everything more complicated for sake of perceived simplification of not doing 1 DB request on each connection. Meh. And even you can use cookie as in your web UI, so in the end it will make everything simpler as you do not need to use 2 separate auth systems in your app.

                    1. 1

                      It makes everything more complicated for sake of perceived simplification of not doing 1 DB request on each connection.

                      This is not really, why 2 tokens are used (authentication token, and refresh token). 2 tokens are used to a) allow fast expiration of an authentication request b) prevent passing of actual user password through to the backend (it only needs to be passed when creating a refresh token).

                      This is a fairly standard practice though, not something I invented (it requires an API accessible, secure password store on user’s device ,which is why it is prevalent in mobile apps).

                      I also cannot see how a) and b) can be achieved with a single token.

              1. 31

                My position has essentially boiled down to “YAML is the worst config file format, except for all the other ones.”

                It gets pretty bad if your documents are large or if you need to collaborate (it’s possible to have a pretty good understanding of parts of YAML but that’s not always going to line up with what your collaborators understand).

                I keep wanting to say something along the lines of “oh, YAML is fine as long as you stick to a reasonable subset of it and avoid confusing constructs,” but I strongly believe that memory-unsafe languages like C/C++ should be abandoned for the same reason.

                JSON is unusable (no comments, easy to make mistakes) as a config file format. XML is incredibly annoying to read or write. TOML is much more complex than it appears… I wonder if the situation will improve at any point.

                1. 22

                  I think TOML is better than YAML. Sure, it has the complex date stuff, but that has never caused big surprises for me (just small annoyances). The article seems to focus mostly on how TOML is not Python, which it indeed is not.

                  1. 14

                    It’s syntactically noisy.

                    Human language is also syntactically noisy. It evolved that way for a reason: you can still recover the meaning even if some of the message was lost to inattention.

                    I have a mixed feeling about TOML’s tables syntax. I would rather have explicit delimiters like curly braces. But, if the goal is to keep INI-like syntax, then it’s probably the best thing to do. The things I find really annoying is inline tables.

                    As of user-typed values, I came to conclusion that everything that isn’t an array or a hash should just be treated as a string. If you take user input, you cannot just assume that the type is correct and need to check or convert it anyway, so why even bother having different types at the format level?

                    Regardless, my experience with TOML has been better than with alternatives, despite its flaws.

                    1. 6

                      Human language is also syntactically noisy. It evolved that way for a reason: you can still recover the meaning even if some of the message was lost to inattention.

                      I have a mixed feeling about TOML’s tables syntax. I would rather have explicit delimiters like curly braces. But, if the goal is to keep INI-like syntax, then it’s probably the best thing to do. The things I find really annoying is inline tables.

                      It’s funny how the exact same ideas made me make the opposite decision. I came to the conclusion that “the pain has to be felt somewhere” and that the config files are not the worst place to feel it.

                      I have mostly given up on different config formats and just default to one of the following three options:

                      1. Write .ini or Java properties-file style config-files when I don’t need more.
                      2. Write a dtd and XML when I need tree or dependency-like structures.
                      3. Store the configuration in a few tables inside an RDBMS and drop an .ini-style config file with just connection settings and the name of the config tables when things get complex.

                      As of user-typed values, I came to conclusion that everything that isn’t an array or a hash should just be treated as a string. If you take user input, you cannot just assume that the type is correct and need to check or convert it anyway, so why even bother having different types at the format level?

                      I fully agree with this well.

                    2. 23

                      Dhall is looking really good! Some highlights from the website:

                      • Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports
                      • You can also automatically remove all indirection in any Dhall code, converting the file to a logic-free normal form for non-programmers to understand.
                      • We take language security seriously so that your Dhall programs never fail, hang, crash, leak secrets, or compromise your system.
                      • The language aims to support safely importing and evaluating untrusted Dhall code, even code authored by malicious users.
                      • You can convert both ways between Dhall and JSON/YAML or read Dhall configuration files directly into a language that supports a native language binding.
                      1. 8

                        I don’t think the tooling should be underestimated, too. The dhall executable includes low-level plumbing tools (individual type checking, importing, normalization), a REPL, a code formatter, a code linter to help with language upgrades, and there’s full blown LSP integration. I enjoy writing Dhall so much that for new projects I’m taking a more traditional split between a core “engine”, and then pushing out the logic into Dhall - then compiling it at a load time into something the engine can work with. The last piece of the puzzle to me is probably bidirectional type inference.

                        1. 2

                          That looks beautiful! Can’t wait to give it a go on some future projects.

                          1. 2

                            Although the feature set is extensive, is it really necessary to have such complex functionality in a configuration language?

                            1. 4

                              It’s worth understanding what the complexity is. The abbreviated feature set is:

                              • Static types
                              • First class importing
                              • Function abstraction

                              Once I view it through this light, I find it easier to convince myself that these are necessary features.

                              • Static types enforce a schema on configuration files. There is almost always a schema on configuration, as something is ultimately trying to pull information out of it. Having this schema reified into types means that other tooling can make use of the schema - e.g., the VS Code LSP can give me feedback as I edit configuration files to make sure they are valid. I can also do validation in my CI to make sure my config is actually going to be accepted at runtime. This is all a win.

                              • Importing means that I’m not restricted to a single file. This gives me the advantage of being able to separate a configuration file into smaller files, which can help decompose a problem. It also means I can re-use bits of configuration without duplication - for example, maybe staging and production share a common configuration stanza - I can now factor that out into a separate file.

                              • Function abstraction gives me a way to keep my configuration DRY. For example, if I’m configuring nginx and multiple virtual hosts all need the same proxy settings, I can write that once, and abstract out my intention with a function that builds a virtual host. This avoids configuration drift, where one part is left stale and the rest of the configuration drifts away.

                              1. 1

                                That’s very interesting, I hadn’t thought of it like that. Do you mostly use Dhall itself as configuration file or do you use it to generate json/yaml configuration files?

                            2. 1

                              I finally need to implement Dhall evaluator in Erlang for my projects. I <3 ideas behind Dhall.

                            3. 5

                              I am not sure that there aren’t better options. I am probably biased as I work at Google, but I find Protocol Buffer syntax to be perfectly good, and the enforced schema is very handy. I work with Kubernetes as part of my job, and I regularly screw up the YAML or don’t really know what the YAML is so cutty-pasty from tutorials without actually understanding.

                              1. 4

                                Using protobuf for config files sounds like a really strange idea, but I can’t find any arguments against it.
                                If it’s considered normal to use a serialisation format as human-readable config (XML, JSON, S-expressions etc), surely protobuf is fair game. (The idea of “compiled vs interpreted config file” is amusing though.)

                                1. 3

                                  I have experience with using protobuf to communicate configuration-like information between processes and the schema that specifies the configurations, including (nested) structs/hashes and arrays, ended up really hacky. I forgot the details, but protobuf lacks one or more essential ingredients to nicely specify what we wanted it to specify. As soon as you give up and allow more dynamic messages, you’re of course back to having to check everything using custom code on both sides. If you do that, you may as well just go back to yaml. The enforced schema and multi language support makes it very convenient, but it’s no picnic.

                                  1. 2

                                    One issue here is that knowing how to interpret the config file’s bytes depends on having the protobuf definition it corresponds to available. (One could argue the same is true of any config file and what interprets it, but with human-readable formats it’s generally easier to glean the intention than with a packed binary structure.)

                                    1. 2

                                      At Google, at least 10 years ago, the protobuf text format was widely used as a config format. The binary format less so (but still done in some circumstances when the config file wouldn’t be modified by a person).

                                      1. 3

                                        TIL protobuf even had a text format. It sounds like it’s not interoperable between implementations/isn’t “fully portable”, and that proto3 has a JSON format that’s preferable.. but then we’re back to JSON.

                                2. 2

                                  JSON can be validated with a schema (lots of tools support it, including VSCode), and it’s possible to insert comments in unused fields of the object, e.g. comment or $comment.

                                  1. 17

                                    and it’s possible to insert comments in unused fields of the object, e.g. comment or $comment.

                                    I don’t like how this is essentially a hack, and not something designed into the spec.

                                    1. 2

                                      Those same tools (and often the system on the other end ingesting the configuration) often reject unknown fields, so this comment hack doesn’t really work.

                                      1. 8

                                        And not without good reason: if you don’t reject unknown fields it can be pretty difficult to catch misspellings of optional field names.

                                        1. 2

                                          I’ve also seen it harder to add new fields without rejecting unknown fields: you don’t know who’s using that field name for their own use and sending it to you (intentionally or otherwise).

                                      2. 1

                                        Yes, JSON can be validated by schema. But in my experience, JSON schema implementations are widely diverging and it’s easy to write schemas that just work in your particular parser.

                                      3. 1

                                        JSON is unusable (no comments, easy to make mistakes) as a config file format.

                                        JSON5 fixes this problem without falling prey to the issues in the article: https://json5.org/

                                        1. 2

                                          Yeah, and then you lose the main advantage of json, which is how ubiquitous it is.

                                          1. 1

                                            In the context of a config format, this isn’t really an advantage, because only one piece of code will ever be parsing it. But this could be true in other contexts.

                                            I typically find that in the places where YAML has been chosen over JSON, it’s usually for config formats where the ability to comment is crucial.

                                      1. 4

                                        If only there were a standardized binary format with JSON semantics and a canonical form.

                                        1. 5

                                          There’s CBOR (Concise Binary Object Representation), where the RFC does mention a canonical form.

                                          1. 2

                                            Unfortunately, like the rest of CBOR in my experience, the canonical form is neither as simple nor as rigorous as one might desire.

                                            1. 1

                                              Would you elaborate, please? I was under the impression that strict+canonical form following the recommendations in the RFC and normalizing unicode strings was safe.

                                              Do you have any counter example?

                                              1. 1

                                                The recommendations are recommendations, so a conforming implementation does not need to follow them. Following them is actually not trivial, given that it involves things like selecting sizes for integers that are far more grainular than I care about, ie, integers other than i32 or i64. So my implementation needs to be able to emit more complicated CBOR than it would otherwise find necessary. It says “integers must be as small as possible”, but is that as small as the value, or as small as the value’s possible domain? It’s unclear. It provides essentially no canonicalization for floats, just another suggestion or two. And normalizing Unicode strings, as the original post points out, is far from easy, though I suppose you’re going to have to do it somewhere no matter what.

                                                It’s not a canonical format, it’s a mixed bag of suggestions and directives for how to start designing a canonical format. It’s possible for two canonical parsers/emitters disagree on the same data, with details such as float signaling bits, whether or not duplicates are okay, etc.

                                                …actually, now that I look at it, CBOR in general is very weird. It describes very precisely what the representation of a data item should look like based off what the data is, and not what the the data may be. Maybe I’ve been programming strongly typed languages too long but that’s almost never what I actually want. Hm, I should try looking at it again from the opposite angle.

                                            2. 2

                                              Exactly. And it’s apparently still not widely known. It also supports chunked transmission if anyone needed to stream some JSON-like data and needed a framing protocol.

                                            3. 4

                                              ASN.1?

                                              1. 1

                                                ASN.1 is an interface description language not an encoding format.
                                                There are many different encoding rules BER, CER, DER, and PER.
                                                ASN.1 encoder/decoders are complex and have a long history of parsing bugs.
                                                I wouldn’t use ASN.1 for a new application unless it is required for interoperability.

                                                If ASN.1 isn’t required for interoperability then I would probably use Protocol Buffers instead.
                                                It’s simpler, well documented, fuzzed, and widely used.

                                                1. 1

                                                  The problem with PB as far as I know is that there is nothing like CER, so there is no guarantee that 2 different implementations of PB will produce exactly the same output from the same data. What I would see instead is ASN.2 specification that would remove bloat while it would still have the nice property of having different encodings like BER, JER, XER, etc. Maybe one day I will find time to write such spec down.

                                              2. 2

                                                My vote would be for canonical S-expressions: human-readable and efficient and canonical too.

                                                No support for unordered sets like maps, but a) lack of ordering is part of the problem and b) they can be emulated with ordered pairs.

                                                1. 2

                                                  You still need to canonicalize numbers (leading zeroes, for example) and Unicode strings.

                                                2. 1

                                                  BSON! (I am 101% joking.)

                                                  1. 1

                                                    There’s bencode, which is almost compatible with JSON but only has a canonical form. It is also extremely simple to write encoders and decoders for; I’ve done a few in several different languages, and it’s rarely more than a day or two of work. (My most recent, bendy, took ~1 week because I wanted it to be zero-copy while still presenting a DOM-like interface) The main downside is that there is no boolean or floating point type.

                                                  1. 2

                                                    I still don’t understand necessity of centralized message brokers for IoT setups. Especially for home IoT (as opposed to industrial/commercial). Even if setup requires central control node, I don’t understand why this node should run message broker and processing node connected to this message broker as a client.

                                                    If a sensor pushes data to the processing node, why it should push it via message broker and not directly? Pull is even worse: sensor should get request from message queue and then reply to the same queue, it’s worse than if a sensor would be just a listening server.

                                                    A controllable device (i.e. light switch) should receive control message as quickly as possible, but it can stuck into thinking that it’s connected to MQTT server, but at the side of MQTT TCP connection is closed. It will not receive that message until next keepalive or something like that.

                                                    Even putting NTP-like time protocol on top of MQTT is too quirky (and IoT devices always want to ask what’s time, they are frequently restarting, including when waking up from hibernation, and don’t have proper clocks).

                                                    Message queues have no data aggregation built-in (unlike time-series systems, like all these monitoring systems). They do nothing interesting. Why they’re better than direct connection? I can understand message queues for “put long task for asynchronous processing”, i.e. for web apps, but what’s purpose for it in IoT?

                                                    1. 3

                                                      Consider sensors that are on battery: they’ll wake up every few seconds (for things that need to respond quickly) or every minute (for simple sensors). The control node might want to send a command to it, but if the sensor is just listening, the control node would need to keep trying to connect in hopes that it manages to catch the sensor in the 50ms or so that it’s online. What happens if the sensor gets a different IP the next time it tries to connect? It’s lost.

                                                      How about if the sensor connects to the control node? It may be that there are multiple systems interested in readings. For example, I have a separate process that logs time series next to the control node, because home-assistant does a crap job of tagging its influxdb time series. I may also want to send commands to the sensor outside of the control node, e.g., for firmware updates. Once again, the decoupling provided by MQTT helps.

                                                    1. 3

                                                      as always, feel free to submit feedback, criticism or issues!

                                                      1. 3

                                                        Just some nitpicking on dependencies:

                                                        • When depending on a Git repository (as you do with your colored dependency), it is a good practice to point to a particular commit or tag using the rev or tag parameter instead of the branch, as the branch’s HEAD can change but a commit or tag can only point to only one specific state of the repository.
                                                        • When publishing a binary (executable) crate, it is a good practice to publish along the crate the Cargo.lock. You can find the reasoning on why you should publish this file in Cargo’s FAQ

                                                        I will try it later though! I always complained that some prompt frameworks are using scripting languages like Python or Ruby that have slow spin-up rate, so this project seems interesting and a cool way to customize my ugly and boring prompt.

                                                        1. 1

                                                          You kind of cover this but the Cargo.lock would capture the commit that the git dependency was at when the lock file was generated. So if the Cargo.lock was checked in everyone would build against the same commit.

                                                        2. 2

                                                          I already implemented a similar tool some months ago rusty-prompt, maybe you can get some inspiration out of it.

                                                          1. 1

                                                            sure! thanks for sharing!

                                                          2. 1

                                                            My bashes (both the one that comes with Mac OS and the latest 5.0.7 from brew) seem to cache PS1 somehow, making pista break quite a lot.

                                                            ➜  ~ /usr/local/bin/bash
                                                            bash-5.0$ PS1=$(pista)
                                                            ~
                                                            $ cd git
                                                            ~
                                                            $ PS1=$(pista)
                                                            ~/git
                                                            $ cd nomad
                                                            ~/git
                                                            $ PS1=$(pista)
                                                            ~/g/nomad master ·
                                                            $
                                                            
                                                            1. 2

                                                              Try PS1='$(pista)'. What’s happening is that pista is getting called once, when you set PS1, and then never again. The single quotes force PS1 to literally contain the expansion, which then gets expanded (and thereby call pista) each time the prompt is printed

                                                              1. 2

                                                                Ohhh, no :( Of course. I feel like I should step far away from the computer now.

                                                                1. 3

                                                                  looks like the installation instructions were faulty!

                                                                  1. 1

                                                                    Oh, whew, thanks for that. Now I feel slightly less stupid :)

                                                              2. 1

                                                                cant seem to replicate this, but it looks like a PROMPT_COMMAND thing.

                                                              3. 1

                                                                @hostname is nice to have if $SSH_CONNECTION is set.

                                                                1. 4

                                                                  i have plans to extend pista into a library, so you could build your own prompts with pista’s functions. maybe ill add a hostname function :^)

                                                              1. 2

                                                                Can anyone comment on what open-coding is? From context, it sounds like replacing the forth definition with a reference to machine code, but I’m not 100% sure that’s right, and also can’t think through the details.

                                                                1. 3

                                                                  Open coding is basically inlining with specialization. I am mostly familiar with it in the context of Smalltalk and Lisp implementations, where if the compiler can know that a particular value will be a fixnum, it can, for example, replace a call to the addition primitive (which also needs to handle bigints) with an ADD instruction. This can also avoid tagging and untagging values at every step.

                                                                1. 3

                                                                  My prompt normally looks like this:

                                                                  thequux@peewee <~> 
                                                                  [0]$ 
                                                                  

                                                                  This basic prompt has stayed almost exactly the same for 15 years (there was a brief time where the second line looked like : 0 ;, but I ended up finding that intrusive) . However, it has gotten a couple of subtle features:

                                                                  If I am in a directory with a git repo, this changes to:

                                                                  thequux@peewee <~/path/to/git-repo> (git:master)
                                                                  [0]$ 
                                                                  

                                                                  There can be other tags that appear in the prompt as well: which python venv is activated, the name of the environment I’m chrooted into (if any), what version of Ruby is active through rvm, and any shell configuration files that have changed since the last time they were loaded.

                                                                  There are also chevrons that appear on the left side to let me know I’m in a nested shell:

                                                                  thequux@peewee <~> 
                                                                  [0]$ zsh
                                                                  thequux@peewee <~> 
                                                                  >[0]$ zsh
                                                                  thequux@peewee <~> 
                                                                  >>[0]$ zsh
                                                                  thequux@peewee <~> 
                                                                  >>>[0]$ 
                                                                  

                                                                  Also, if I am root, the second line in the prompt turns red. It is all designed to be unobtrusive and only show me information when it’s relevant, but also, no matter how much information is being displayed, to give me as much space as possible to type my command line.

                                                                  If anybody is interested, the complete configuration is at https://github.com/thequux/dotfiles/blob/master/conffiles/zshrc

                                                                  1. 3

                                                                    I used to run the UCLA LUG’s installfest back in the late ‘00’s. Aside from my role organizing the event, I would solve all of the weird hardware that didn’t work after the bulk install, which was nearly always proprietary hardware. (To this day, I bear grudges against Broadcom and Sony for their wireless adapters and laptops, respectively). I’d explain that I’d be able to get their hardware working for now, but because of those hardware vendors’ attitudes towards Linux, it would probably break the next time they upgraded Linux, so when they replaced their laptop, they should look for one with an Intel or Atheros network card, etc.

                                                                    I have no idea how many people continued using Linux after the installfest; most people needed it for their university coursework. However, the dozen or so people who came back with a cheap PC (out of ~100 or so people who had weird hardware that didn’t work) suggests that the strategy worked at least a little.

                                                                    1. 1

                                                                      I like how alternative git interfaces are their own little cottage industry.

                                                                      Why have none of them been blessed by the git project itself as the new standard or recommended interface?

                                                                      1. 2

                                                                        Near as I can tell, they all involve tradeoffs. I use stgit daily, as an adjunct to branch-per-feature. I like it a lot, but would hesitate to recommend it to most developers, because when things go wonky, I usually need to fall back on fairly low-level knowledge of git. I’d imagine the same is true of the other interfaces.

                                                                      1. 55

                                                                        Any tool proponent that flips the problem of tools into a problem about discipline or bad programmers is making a bad argument. Lack of discipline is a non-argument. Tools must always be subordinate to human intentions and capabilities.

                                                                        We need to move beyond the culture of genius and disciplined programmers.

                                                                        1. 20

                                                                          Indeed; this practice strikes me as being uncomfortably close to victim-blaming.

                                                                          1. 15

                                                                            That’s a good analogy. People like to think they’re good programmers and don’t write buggy code so when faced with the evidence to the contrary in others they defensively blame the other programmer because otherwise they’d need to admit the same thing would happen to them.

                                                                            I think these broken arguments persist because their psychology and internal logic forces admitting our own faults which most people find displeasurable so defensive thinking kicks in to avoid it.

                                                                            1. -5

                                                                              Victim blaming is not a problem. It takes two people to make a person a victim: a bad actor and the actor without adequate protection.

                                                                              1. 4

                                                                                This is a pretty gross saying, even if it’s nice-sounding and pithy.

                                                                                1. 0

                                                                                  Not really. Every victim chose at some point to be a victim. That is not to say the other party can be absolved of blame. Far from it, the other party is the guilty one.

                                                                                  Take software. If nobody chooses hard languages, unsafe languages, nobody will be victimized. Choosing those languages and then blaming the language leaves you responsible for your choices, even while the tool chain is at fault

                                                                                  1. 2

                                                                                    This is absolutely ridiculous. If I walk down the street and I’m mugged, how did “I choose to become a victim”? There’s many, many cases where someone becomes a victim randomly.

                                                                                    Your logic applies only if we have some sort of perfect foresight. That’s impossible.

                                                                                    1. -1

                                                                                      When the mugging starts, do you give up? Do you look for an exit? Or do you just hand over your dignity without a further thought? Did you not notice the people before they started mugging you?

                                                                                      1. 1

                                                                                        People who downvoted as troll, look at Active Self Protection on YouTube for examples of places people choose to be or choose to not be victims

                                                                                    2. 1

                                                                                      Person’s walking down the street. Hell, let’s make them heavily armed, far more “adequately protected” than most people would think is reasonable. A sniper from outside visible range shoots them in the back of the head. They chose to be a victim? Come on.

                                                                                      1. 0

                                                                                        An exception to prove the rule. Most victimizing isn’t as mismatched, nor as final, as the proposed scenario.

                                                                                  2. 2

                                                                                    People who downvoted because Troll.

                                                                                    Come on. This position is in good faith, and I only bring it because yes, saying a person is at fault for choosing their tools is indeed victim blaming. And victim blaming is not a problem.

                                                                                2. 20

                                                                                  We’re at a point where we already were in the 60’s with cars and in the 90’s with planes. Everything was “driver error”, “pilot error”. In that case, since the results were actual fatalities, at a certain point there was this group of people that basically said: “that’s always going to happen, we’re always going to have human error, get over yourselves - how do we prevent people dying regardless”?

                                                                                  And that’s how we got seat belts, airbags, telescopic steering wheels, etc. for cars and detailed checklists, redundant systems, etc. for airplanes. I think 2017 was the first year with 0 fatalities for air travel. So it can be done.

                                                                                  It’s a very difficult mindset issue.

                                                                                  1. 11

                                                                                    Tool authors should not blame the programmer, but programmers should not blame their tools. Discipline and responsibility are needed from both.

                                                                                    1. 8

                                                                                      If you’re writing a lot of code in a memory-unsafe language, and find yourself occasionally writing memory- or concurrency-related bugs, and you know there are languages out there which make such bugs impossible or hard to write with no loss in performance or productivity, is it not okay to blame the tool? When should a carpenter stop blaming themselves for using the hammer incorrectly, and just accept that they need a new hammer whose head doesn’t occasionally fly off the handle?

                                                                                      1. 3

                                                                                        Discipline and responsibility means using the most appropriate tools for the job, being aware of the limitations of those tools, and doing what is necessary to compensate for those limitations.

                                                                                        1. 3

                                                                                          I agree with that, I work as a C++ programmer specifically because it’s the right tool for the particular job I’m doing. However, we use a safer language (Go in our case, because that’s what we know) for stuff where C++ isn’t completely necessary. If performance and no GC was a higher concern, or if we were on weaker hardware, Rust instead of Go would’ve been very interesting for all the parts where we don’t have to interact with a huge C++ library (in our case WebRTC).

                                                                                        2. 2

                                                                                          When a carpenter loses a hand they don’t blame the bandsaw. That I see so many programmers blame dangerous tools for being dangerous means were not even reached the level of a craft yet. Let alone an engineering discipline.

                                                                                          1. 18

                                                                                            I appreciate the analogy, but it doesn’t really apply. First of all, there is no tool that can replace a bandsaw for what a bandsaw does well. However, any worker who uses a bandsaw recognizes that it’s a fundamentally dangerous machine, and they take safety precautions when using it. If the analogy really applied, it would be considered unthinkable to write C code without a full suite of static analyzers, valgrind test suites, rigorous (MISRA C-level) coding standards, etc. Second, and more importantly, the saying applies to the quality of tool, but sometimes the tool is simply too dangerous to use: good carpenters will refuse to use a tablesaw without an anti-kickback guard or other safety features. Finally, when there was another tool that would do they job just as well, they’d use it.

                                                                                            1. 9

                                                                                              https://schneems.com/2016/08/16/sharp-tools.html

                                                                                              this is not how tools work, either in programming, or in carpentry

                                                                                              1. 3

                                                                                                This is a fantastic essay and you should submit it as a story.

                                                                                              2. 8

                                                                                                When a commercial carpentry shop has a carpenter lose a hand to a bandsaw, they are more or less forced to stop and try to make the bandsaw safer. This might be done by adding safety features to the tool, or by redesigning the process to use it less/differently, or by ensuring that the workers are trained in how to use it correctly without risk and are wearing proper safety equipment.

                                                                                                It’s not the carpenter’s fault, and it’s not the bandsaw’s fault, it’s the fault of the system that brings them together in a risky manner.

                                                                                                1. 4

                                                                                                  The company should not blame the employee or the bandsaw, and the employee should not blame the company’s training or procedures or the bandsaw. Discipline and responsibility are needed from both. That includes the discipline and responsibility needed to make the bandsaw, training, and procedures as safe as is possible, and to only certify employees who are capable of operating the device safely, and the employee’s discipline and responsibility to follow the training and procedures properly and to not defeat the safety measures.

                                                                                                  Assigning blame is useless. The focus should be on identifying all root and proximal causes, and eliminating each one, with priority chosen based on the heirarchy of values.

                                                                                        1. 7

                                                                                          T for touch-tone (because there’s also P for pulse, a whole other story)

                                                                                          I’m old enough to remember when Touch-Tone (a trademark; {DTMF,tone}-dialing being the “more official” name) was a pay-for option. You could save some small amount of money every month by not having it. Hayes and ATDP saved me a whole like…$0.33 a month or something.

                                                                                          1. 6

                                                                                            On the other end of the timeline, a few years ago I had a free rotary telephone and an effectively free VOIP line with an FXO port that only supported DTMF (and sadly no SIP or H.323 access; hooray for consumer-oriented service). I ended up wiring a 3.5mm TR jack into the phone and hooking that up to my computer, with a small shell script to play the appropriate DTMF tones from my computer. It probably saved me a whole $10 that year.

                                                                                            1. 2

                                                                                              As a BBS operator, saving money by ditching touch tone was a critical money-saver. Those lines were never used to dial out. I think the feature cost more than a dollar a month.

                                                                                            1. 2

                                                                                              Others have hit on this problem, but I’ll make it a top-level comment: who is going to pay for this? I am pretty decent at Rust. I love working on infrastructure. I’m a senior engineer by most org charts. I’ve never seen any sort of job opportunity for this.

                                                                                              Tech’s biggest problem is too much money being involved.

                                                                                              1. 3

                                                                                                It’s more strange that there’s a tremendous amount of money involved, but very little of it ever seems to make its way towards maintaining and improving core system and applications. It all ends up going towards showing us more ads, and figuring out how to gather more of our personal information, so that companies can show us more ads.

                                                                                                1. 2

                                                                                                  Tech’s biggest problem is too much money being involved.

                                                                                                  Or not enough money, depending on how you look at it. When security is opt in, the market will opt out to save a buck.

                                                                                                  1. 1

                                                                                                    I seem to be working professionally on an infrastructure project in Rust. We got there because we needed to build a bare-metal version and I insisted on Rust. Near as I can tell, there are more than a few other companies doing infrastructure work in Rust. It’s still far from the majority, but this is to be expected from a language that’s only been stable for ~3 years.

                                                                                                    1. 1

                                                                                                      Looks like a great application for Rust! Best wishes to your project.

                                                                                                  1. 2

                                                                                                    Has anyone gotten a pinebook? I signed up a long time ago and haven’t heard anything back yet. I’d love to be able to try this stuff out, but the hype articles make me think of vaporware more than anything promising.

                                                                                                    1. 2

                                                                                                      Yeah, I’ve had mine for a year and a half now. They should still be shipping.

                                                                                                      1. 1

                                                                                                        I thought the pinebook started shipping a year ago? Are you somewhere in their queue? Still have login details?

                                                                                                        https://hackaday.com/2017/04/28/hands-on-with-the-pinebook/

                                                                                                      1. 3

                                                                                                        I think there’s an important distinction to be made between what I call casual and professional use of a program. Casual interfaces are designed to be used occasionally, but not to make up the majority of what you do with your computer, whereas you use professional interfaces for most of your work day. Obviously, what may be a casual tool for some people would be a professional tool for others.

                                                                                                        However, some classes of software are nearly only ever used professionally: CAD software, many development tools, business accounting packages, lots of enterprise software, etc. Essentially, the sort of thing where for most people who use it, it is literally their job to use that software.

                                                                                                        This is relevant because, like any investment, investing time in learning how to use a complex interface is only useful if you expect it to pay off in time saved over the many times you use an interface. I use a disk partitioning tool maybe once or twice a year; it should be simple and obvious how to make it do the sorts of things I generally want to do. On the other hand, I practically live in my shell and my text editor, so the week or two of time I’ve spent on each to learn how to use them to their full power has paid off handsomely.

                                                                                                        So, to me, the interesting question is which tools you use frequently don’t have professional interfaces.

                                                                                                        1. 5

                                                                                                          I question the assertion that the interpreting slowdown is exponential; whether it takes a few dozen cycles per statement (as in a bytecode interpreter) or a few thousand (as in a pure AST interpreter), the same number of statements would need to be executed, and so the slowdown is roughly linear no matter how deep the call stack gets. The exception to that would be if the interpreter were itself getting interpreted for a subroutine call, and I can’t see any reason that an implementation would do such a thing.

                                                                                                          That said, one interesting place you might look for purely interpreted applications is in the Forth family; it’s fairly common to implement forth using some form of threaded code, which is effectively an AST modulo the fact that neither the S nor the T really apply to forth. You don’t see many programs written in forth directly (though apparently Chuck Moore has written a VLSI layout package), but postscript is fairly common and every implementation I’ve seen has been a pure interpreter.

                                                                                                          Another possibility, also outside the realm of traditional languages, is TeX, which seems to be completely uncompileable (even to bytecode); I’d definitely consider LaTeX to be a large application written entirely in TeX.

                                                                                                          1. 2

                                                                                                            and so the slowdown is roughly linear no matter how deep the call stack gets.

                                                                                                            You’re the first one to say anything about this! I need to think about this again. Right now I think you’re right. Hm, so the slowdown isn’t from where I think it is in that case. I hope I didn’t just waste more than a month on a false premise.

                                                                                                            Also, then I’m surprised there aren’t more examples of direct AST interpretation if its “just” a constant factor.

                                                                                                            Edit: Ok, so I think the issue is that the slowdown scales exponentially with the height of the call stack but that’s unaffected by whether bytecode or direct AST is interpreted. I’m trying to push primitives (like loops) out of the host language and have them written in L (a bit like Forth but its more interpreted than compiled). Now instead of calling a primitive D directly, I call A which calls B which calls C which calls D and each of those have ~10 instructions in L then I get a 1000x slowdown. Is this reasoning right?

                                                                                                            The exception to that would be if the interpreter were itself getting interpreted for a subroutine call

                                                                                                            There are many nested layers but the interpreter itself isn’t interpreted (except possibly as a (very deliberate) test).

                                                                                                            Forth

                                                                                                            I guess I’m thinking of Forth as basically already bytecode since its a stack machine like most bytecode’s VM. The same goes for Postscript although I don’t know the language.

                                                                                                            TeX and LaTeX

                                                                                                            That’s an interesting example. I don’t know how it works internally, just heard it described as a macro expansion language. I’m not sure I’d say that’s interpreted though but its definitely not compiled to either assembly or bytecode. So it would be an example I’m looking for, although I don’t think I can actually make use of this example.

                                                                                                            1. 4

                                                                                                              I think the way to look at is is that, as you push the veil back and write more and more of your logic in L, the slowdown applies to more and more of your execution time. It’s still not exponential; if a mixed program takes time t to run, and you have an interpretation overhead of o, the slowest you can possibly get by moving everything to interpretation is t*o and the fastest it can get is t/o. Those bounds could be tightened somewhat by considering the time that it spends in interpreted code vs. compiled code separately; take i to be the ratio of time spent in interpreted code. Then the worst case runtime would be t*i+(1-i)*t*o and the best case runtime would be t*(1-i)+i*t/o.

                                                                                                              These bounds are difficult to think about intuitively though, because they look at if you moved everything to compiled code or interpreted code. So, instead of looking at the possible performance improvement or loss, let’s look at the actual performance relative to the best case of everything being compiled. If the proportion of interpreted code in your program is i, the slowdown relative to the best possible case will simply be i*o+(1-i)

                                                                                                              It’s worth noting, though, that for the first few levels of pushing logic into your language, i grows (roughly) exponentially larger. The base has nothing to do with the interpretation overhead though; it’s simply the branching factor of your routines.

                                                                                                              1. 1

                                                                                                                Thanks! Your comments here should be all the way at the top!

                                                                                                                If the proportion of interpreted code in your program is i, the slowdown relative to the best possible case will simply be i*o+(1-i)

                                                                                                                Yes, I agree and the all-compiled program is a better baseline. This means that I should get a factor of o at worst, even if its all AST interpreted.

                                                                                                                It’s worth noting, though, that for the first few levels of pushing logic into your language, i grows (roughly) exponentially larger.

                                                                                                                Hm, I’m wondering if this is what I’ve been seeing and let me to the incorrect exponential conclusion.

                                                                                                                Still, that means both direct AST interpretation and layering will result in “only” a constant factor slowdown? And its also “only” a constant factor to go from bytecode to native assembly.

                                                                                                                In general, is there somewhere I can run my reasoning (like this one) by some people for flaws? Other than to posting on lobste.rs that is. I’ve told some others and no-one had seen the problem. At the same time I’m re-doing the time bound estimates, I’m also estimating how much time I lost because of this :(.

                                                                                                                Ok, one more thing. If functions in level j only calls functions in level j-1 (and level 0 is primitives of the host language) then the time for a level j call is exponential in j, right? (And this happens regardless of whether the levels are (all) interpreted bytecode or direct AST.)

                                                                                                                And that means I should make calls in as low level as possible (instead of j-1) to make things faster. Except this goes in the wrong direction in terms of the abstraction I’m trying to provide (and potential polymorphism that would allow). For example, not using the object system would be faster than using it.

                                                                                                              2. 1

                                                                                                                Is this reasoning right?

                                                                                                                I don’t think so.

                                                                                                                Perhaps think of it like this. A function can have more than one call site. Those call sites may have different stack depths. When executing that function, the interpreter will do the same work irrespective of what is on the stack.

                                                                                                                You may be thinking of nested loops? But that isn’t anything to do with how the program is run, but rather the algorithm which is implemented.

                                                                                                                1. 1

                                                                                                                  Also, then I’m surprised there aren’t more examples of direct AST interpretation if its “just” a constant factor.

                                                                                                                  You need to take into consideration the number of toy languages that make it to languages in production use, where that constant factor results in a significant savings in time and money. (Less servers, less developer downtime while tests run, etc)

                                                                                                                  1. 1

                                                                                                                    I guess I’m thinking of Forth as basically already bytecode since its a stack machine like most bytecode’s VM. The same goes for Postscript although I don’t know the language.

                                                                                                                    Hmm. Forths are typically pretty low level, and often run directly on hardware (though they often will use their own calling conventions, instead of the C one). The dynamism comes from the fact that words are “compiled” into a list of addresses to the other words, which can basically then be “jumped” to in succession. So, it’s actually slightly strange, and not really a VM with “a list of known operators that you switch on” in the traditional sense.

                                                                                                                    1. 2

                                                                                                                      Good point. Although it means Forths are even more compiled (and a less good example of what I’m looking for) than languages that only stop at the bytecode.

                                                                                                                1. 3

                                                                                                                  i am somewhere between appalled that he’d go so long with a labelless keyboard and ruefully aware that i would probably gripe but do nothing about it too :)

                                                                                                                  1. 2

                                                                                                                    It’s more amazing to me that you could go that long without developing muscle memory.

                                                                                                                    1. 5

                                                                                                                      I can touch-type, but I still look at the keyboard when I type complex passwords. I think my muscle memory is linked to sequences of key presses more than to specific characters. So I can type a sentence, but give me a random sequence of letters, numbers, and symbols and I’ll mess it up if I don’t glance at the keyboard.

                                                                                                                      1. 2

                                                                                                                        I have my keyboard arranged in alphabetical order, mostly to troll anybody else who sits down at my keyboard. Oddly enough, I still find it useful to look at the keyboard when typing a complex password, even though the letters on the keys have nothing to do with the character that they result in.

                                                                                                                  1. 4

                                                                                                                    My automata theory is incredibly rusty, but isn’t it the case that you typically convert an NFA to a DFA, so that, say, you don’t have to simultaneously take two paths explicitly? I guess I could dig out my copy of the dragon book…

                                                                                                                    1. 5

                                                                                                                      You could, but NFA to DFA conversion is susceptible to exponential blowup in the number of DFA states. This is why production regex engines use techniques to avoid this, by e.g., compiling the DFA lazily or compiling many small DFA.s

                                                                                                                      1. 1

                                                                                                                        exponential blowup in the number of DFA states.

                                                                                                                        Yeah. That makes sense!

                                                                                                                      2. 7

                                                                                                                        Yes, you could. However, if you compute the DFA strictly, you’ll usually end up with a HUGE (exponential in the size of the NFA) DFA, which then needs to be minimized to be workable. On the other hand, Thompson’s algorithm lazily generates the DFA as it’s needed, allowing you to still run in constant time.

                                                                                                                        1. 3

                                                                                                                          you’ll usually end up with a HUGE (exponential in the size of the NFA) DFA

                                                                                                                          That’s exaggerating a bit. There are pathological, worst case, regular expressions that are exponential in the size of the NFA, but most of them aren’t, and I wouldn’t say it’s “usual.”

                                                                                                                          1. 1

                                                                                                                            Thanks for that! I’ll have to look at Thompson’s algorithm. I guess I’m even more rusty than a thought…

                                                                                                                        1. 3

                                                                                                                          The example there about edges on an array is something I’ve had to directly deal with myself when I implemented a multidimensional image-processing function for GNU Octave. The problem is to find connected components in a binary image, where voxels are either 0 or 1. You want to find all the islands of ones, and you want to be flexible if diagonals count as being connected or not.

                                                                                                                          The problem, of course, is that along the edges you don’t want to check for neighbours outside of the image boundary. I found no good way to do this for an n-dimensional image, after a few attempts of writing very complicated code. In the end, I ended up padding the whole image with a boundary of zeros, iterating over the padded image with an if(current_voxel) conditional that skipped checking for lit neighbours around the boundary, and when checking for lit neighbours at the original image’s boundaries would give no neighbours at the padded zero boundaries.

                                                                                                                          The code was cleaner, but I incurred a big realloc, because N-dimensional images in Octave are stored as contiguous Fortran-order (column-major) arrays. I’m still looking for a better solution to this problem.

                                                                                                                          So, how do you do this cleanly?

                                                                                                                          1. 2

                                                                                                                            I recently ran across a blog post that claims Julia’s ‘cartesian iterators’ make this less painful/ugly than normal (scroll down to “A multidimensional boxcar filter”). I haven’t put them to any serious use myself yet though, so I can’t really vouch for this either way.

                                                                                                                            1. 2

                                                                                                                              I’d use a union-find algorithm along one dimension at a time. Diagonals would simply be another set of axes.

                                                                                                                              1. 1

                                                                                                                                Can you explain this more? Note that iterating one dimension at a time is a bit tricky given that the thing is n-dimensional to begin with.

                                                                                                                            1. 3

                                                                                                                              I’d actually go a step further than the author: sometimes time spent not thinking about work can be the most productive possible use of your time. Programming is not an end unto itself, unless you’re writing development tools (and even then, your domain is usually not actually programming; instead, you’re writing tools to make writing some kind of software easier), and the projects that will be most useful to other people are likely to be the projects that you come up with while you aren’t actively programming.