1. 2

    How does OCR work, AI or automation?

    1. 2

      If you believe the thesis of the article, OCR is AI made necessary only by our civilization’s inability to digitize everything, while automation would do away with both papers and the need for OCR.

      You might be confused because AI is often seen as one kind of automation. I believe the article tries to distinguish them by correctness (without actually using that word). OCR and other “AI” can make mistakes and require human intervention, while “automation” can be proven correct.

      1. 3

        I think it’s self evident that a world without humans is easy to manage by shifting data to memory regions.

        We all know printers are devilspawn.

        Getting computers to work with us instead of us working with them, that’s the trick, and I’m not sure automation solves that problem. In fact I might be so bold as to say AI gets computers to work with us, and automation helps us work with computers.

      2. 2

        It’s a sliding scale. https://en.m.wikipedia.org/wiki/OCR-A

        1. 1

          I defined “AI” for the purpose of this as “using the term to mean some heuristic for learning equations from data or simulations rather than writing them out by hand”.

          So I think OCR fits as AI, unless there’s a way to do OCR without training data.

          Basically what axelsvensson said.

          1. 1

            A while ago I got this problem of the fuzziness of the term “AI” stuck in my head and thought about it a lot. I think “a computer program that takes on the world on its own terms (without the situation being arranged to suit the computer program)” is somewhat accurate. I think this matches your definition of “AI” and the parenthetical fits how you contrast it against “automation”.

        1. 18

          PyLint - W0102, dangerous-default-value

          Please use an IDE or lint your code to prevent this.

          1. 10

            That doesn’t change the fact that’s is recurring WTF. Either you’ve worked with python for a while and have it internalized (or an IDE), but blaming an actual shortcoming of the language on the developer isn’t helpful.

            1. 7

              To me most lint tools are overly aggressive, and good warnings like this one get drowned out by bad ones.

              FWIW the google python style guide disallows mutable default arguments: https://google.github.io/styleguide/pyguide.html#212-default-argument-values

              But I think it would better if there was some non-lint / non-IDE / non-google place where this knowledge is stored and linkable.

              Googling reveals a book: https://docs.python-guide.org/writing/gotchas/

              People still rediscover it not just in 2021, but 2018 too: https://florimond.dev/en/posts/2018/08/python-mutable-defaults-are-the-source-of-all-evil/ (I’ve known about this issue approximately since I started using Python, which was ~2003 or so)

              1. 7

                The first time I encountered this behavior was through a linter warning, which can be seen in my original comment. I did the research and I understood what was happening. Then I changed my code accordingly and never made that mistake again.

                The only thing I „criticize“ is that the developer learned about this well known behavior of Python, because of a production bug. I wanted to point out that there is another way.

                1. 3

                  How should the default-value parameter be assigned when the function is called? I see four options:

                  • it’s assigned the object obtained by evaluating the default-value expression when the function was initially defined (the current way,
                  • it’s assigned the object obtained by freshly evaluating the default-value expression, on every call,
                  • it’s assigned a deep copy of the object obtained when the function was initially defined,
                  • only known-immutable python values are allowed as default values.

                  They all have different drawbacks.

                  1. 4

                    Naïvely, I would expect a default value to behave identically to having the expression written in at the call site.

                    With:

                    def f(a=expr)
                    

                    I would expect

                    f()
                    

                    To behave identically to

                    f(expr)
                    

                    Scoping issues aside (I would expect syms in expr to be bound at the definition site.)

                    So, 2. I think this is what most people expect, and why this decision is so surprising.

                    1. 2

                      Are there languages that do (3) and (4)? And any language other than Python that does (1)?

                      1. 2

                        Python is the only language I can think of right now where this is a caveat. Maybe C. But even there you deliberately hand in a reference if you want that to be mutable.

                    2. 2

                      PyLint is overly aggressive, and doesn’t catch all instances. For e.g. see.

                      class MyC:
                          def __init__(self):
                              self.val = []
                      
                          def process(self, e):
                              return self.val.append(e)
                      
                      def processed(element, o = MyC()):
                          o.process(element)
                          return o.val
                      
                      print(processed(42))
                      print(processed(51))
                      
                    1. 1

                      Is it just me, or is unveil a terrible choice of name? It normally means “remove a veil”, “disclose” or “reveal”. Its function is almost exactly the opposite - it removes access to things! As the author says:

                      Let’s start with unveil. Initially a process has access to the whole file system with the usual restrictions. On the first call to unveil it’s immediately restricted to some subset of the tree.

                      Reading the first line of the man page I can see how it might make sense in some original context, but this is the opposite of the kind of naming you want for security functions…

                      1. 3

                        Is it just me, or is unveil a terrible choice of name? It normally means “remove a veil”, “disclose” or “reveal”. Its function is almost exactly the opposite - it removes access to things!

                        It explicitly grants access to a list of things, starting from the empty set. If it’s not called, everything is unveiled by default.

                        1. 3

                          I am not a native speaker, so I cannot comment if the verb itself is a good choice or not :)

                          As a programmer who uses unveil() in his own programs, the name makes total sense. You basically unveil selected path to the program. If you then change your code to work with other files, you also have to unveil these files to your program.

                          1. 2

                            OK, I understand - it’s only for the first usage it actually restricts, and immediately also unveils, after that it continues to unveil.

                          2. 2

                            “Veiling” is not a standard idea in capability theory, but borrowed from legal practice. A veiled fact or object is ambient, but access to it is still explicit and tamed. Ideally, filesystems would be veiled by default, and programs would have to statically register which paths they intend to access without further permission. (Dynamic access would be delegated by the user as usual.)

                            I think that the main problem is that pledges and unveiling are performed as syscalls after a process has started, but there is no corresponding phase before the process starts where pledges are loaded from the process’s binary and the filesystem is veiled.

                            1. 1

                              Doing it as part of normal execution implements separate phases of pledge/unveil boundaries in a flexible way. The article gives the example of opening a log file, and then pledging away your ability to open files, and it’s easy to imagine a similar process for, say, a file server unveiling only the public root directory in between loading its configuration and opening a listen socket.

                              1. 1

                                I think that the main problem is that pledges and unveiling are performed as syscalls after a process has started, but there is no corresponding phase before the process starts where pledges are loaded from the process’s binary and the filesystem is veiled.

                                Well the process comes from somewhere. Having a chain-loader process/executable that sanitises the inherited environment and sets up for the next fits well with the established execution model. It’s explicitly prepared for this in pledge(, execpromises).

                                1. 2

                                  You could put it in e.g. an elf header, or fs-level metadata (like suid). Which also fits well with the existing execution model.

                                  Suid is a good comparison, despite being such an abomination, because under that model the same mechanism can double as a sandbox.

                                  Chainloader approach is good, but complexity becomes harder to wrangle with explicit pledges if you want to do djb-style many communicating processes. On the other hand, file permissions are distant from the code, and do not have an answer for ‘I need to wait until runtime to figure out what permissions I need’.

                                  1. 1

                                    Not going too far into the static/dynamic swamp shenanigans (say setting a different PT_INTERP and dlsym:ing out a __constructor pledge/unveil) - there’s two immediate reasons why I’d prefer not to see it as a file-meta property.

                                    1. Filesystem legacy is not pretty, and accidental stripping of meta on a move to incompatible file-system would have a fail-silent-dangerous (stripping sudo is not dangerous versus stripping pledge setup).
                                    2. Pledge- violations go kaboom, then you need to know that this was what happened (dmesg etc.) and you land in core_pattern like setups. The choice of chain-loader meanwhile takes the responsibility of attribution/communication so x11 gets its dialog or whatever, isatty() a fprintf and others a syslog and so on.
                              2. 1

                                Like Linux’s unshare

                              1. 34

                                return err is almost always the wrong thing to do. Instead of:

                                if err := foo(); err != nil {
                                	return err
                                }
                                

                                Write:

                                if err := foo(); err != nil {
                                	return fmt.Errorf("fooing: %w", err)
                                }
                                

                                Yes, this is even more verbose, but doing this is what makes error messages actually useful. Deciding what to put in the error message requires meaningful thought and cannot be adequately automated. Furthermore, stack traces are not adequate context for user-facing, non-programming errors. They are verbose, leak implementation details, are disrupted by any form of indirection or concurrency, etc.

                                Even with proper context, lots of error paths like this is potentially a code smell. It means you probably have broader error strategy problems. I’d try to give some advice on how to improve the code the author provided, but it is too abstract in order to provide any useful insights.

                                1. 18

                                  I disagree on a higher level. What we really want is a stacktrace so we know where the error originated, not manually dispensed breadcrumbs…

                                  1. 32

                                    maybe you do, but I prefer an error chain that was designed. A Go program rarely has just one stack, because every goroutine is its own stack. Having the trace of just that one stack isn’t really a statement of the program as a whole since there’s many stacks, not one. Additionally, stack traces omit the parameters to the functions at each frame, which means that understanding the error means starting with your stack trace, and then bouncing all over your code and reading the code and running it in your head in order to understand your stack trace. This is even more annoying if you’re looking at an error several days later in a heterogeneous environment where you may need the additional complication of having to figure out which version of the code was running when that trace originated. Or you could just have an error like “failed to create a room: unable to reserve room in database ‘database-name’: request timed out” or something similar. Additionally, hand-crafted error chains have the effect that they are often much easier to understand for people who operate but don’t author something; they may have never seen the code before, so understanding what a stack trace means exactly may be difficult for them, especially if they’re not familiar with the language.

                                    1. 6

                                      I dunno. Erlang and related languages give you back a stack trace (with parameters) in concurrently running processes no problem

                                      1. 5

                                        It’s been ages since I wrote Erlang, but I remember that back then I rarely wanted a stack trace. My stack were typically 1-2 levels deep: each process had a single function that dispatched messages and did a small amount of work in each one. The thing that I wanted was the state of the process that had sent the unexpected message. I ended up with some debugging modes that attached the PID of the sending process and some other information so that I could reconstruct the state at the point where the problem occurred. This is almost the same situation as Go, where you don’t want the stack trace of the goroutine, you want to capture a stack trace of the program at the point where a goroutine was created and inspect that at the point where the goroutine failed.

                                        This isn’t specific to concurrent programs, though it is more common there, it’s similar for anything written in a dataflow / pipeline style. For example, when I’m debugging something in clang’s IR generation I often wish I could go back and see what had caused that particular AST node to be constructed during parsing or semantic analysis. I can’t because all of the state associated with that stack is long gone.

                                    2. 10

                                      FWIW, I wrote a helper that adds tracing information.

                                      I sort of have two minds about this. On the one hand, yeah, computers are good at tracking stack traces, why are we adding them manually and sporadically? OTOH, it’s nice that you can decide if you want the traces or not and it gives you the ability to do higher level things like using errors as response codes and whatnot.

                                      The thing that I have read about in Zig that I wish Go had is an error trace which is different from the stack trace, which shows how the error was created, not the how the error propagates back to the execution error boundary which is not very interesting in most scenarios.

                                      1. 7

                                        The nice thing about those error traces is that they end where the stack trace begins, so it’s seamless to the point that you don’t even need to know that they are a thing, you just get exactly the information that otherwise you would be manually looking for.

                                      2. 8

                                        In a multiprocess system that’s exchanging messages: which stack?

                                        1. 2

                                          see: erlang

                                        2. 5

                                          You don’t want stack traces; you want to know what went wrong.

                                          A stack trace can suggest what may have gone wrong, but an error message that declares exactly what went wrong is far more valuable, no?

                                          1. 8

                                            An error message is easy, we already have that: “i/o timeout”. A stack trace tells me the exact code path that lead to that error. Building up a string of breadcrumbs that led to that timeout is just a poorly implemented, ad-hoc stack trace.

                                            1. 5

                                              Indeed and I wouldn’t argue with that. I love a good stack trace, but I find they’re often relied upon in lieu of useful error messages and I think that’s a problem.

                                              1. 2

                                                Building up a string of breadcrumbs that led to that timeout is just a poorly implemented, ad-hoc stack trace.

                                                That’s a bit of an over-generalization. A stack trace is inherently a story about the construction of the program that originated the error, while an error chain is a story about the events that led to an error. A stack trace can’t tell you what went wrong if you don’t have access to the program’s source code in the way that a hand crafted error chain can. A stack trace is more about where an error occurred, while an error chain is more about why an error occurred. I think they’re much more distinct than you are suggesting.

                                                and of course, if people are just bubbling up errors without wrapping them, yeah you’re going to have a bad time, but I think attacking that case is like suggesting that every language that has exceptions encourages Pokémon exception handling. That’s a bad exception-handling pattern, but I don’t think that the possibility of this pattern is a fair indictment of exceptions generally. Meanwhile you’re using examples of bad error handling practices that are not usually employed by Go programmers with more than a few weeks experience to indict the entire paradigm.

                                            2. 4

                                              Stack traces are expensive to compute and inappropriate to display to most users. Also, errors aren’t exceptions.

                                              1. 1

                                                That’s why Swift throws errors instead. Exceptions immediately abort the program.

                                              2. 3

                                                What really is the “origin” of an error? Isn’t that somewhat arbitrary? If the error comes from a system call, isn’t the origin deeper in the kernel somewhere? What if you call in to a remote, 3rd party service. Do you want the client to get the stack trace with references to the service’s private code? If you’re using an interface, presumably the purpose is to abstract over the specific implementation. Maybe the stack trace should be truncated at the boundary like a kernel call or API call?

                                                Stack traces are inherently an encapsulation violation. They can be useful for debugging your internals, but they are an anti-feature for your users debugging their own system. If your user sees a stack trace, that means your program is bugged, not theirs.

                                                1. 5

                                                  I get a line of logging output: error: i/o timeout. What do I do with that? With Ruby, I get a stack trace which tells me exactly where the timeout came from, giving me a huge lead on debugging the issue.

                                                  1. 5

                                                    I get a line of logging output: error: i/o timeout. What do I do with that?

                                                    Well, that’s a problem you fix by annotating your errors properly. You don’t need stack traces.

                                                    1. 3

                                                      When your Ruby service returns an HTTP 500, do you send me the stack trace in the response body? What do I do with that?

                                                      Go will produce stack traces on panics as well, but that’s precisely the point here: these are two different things. Panics capture stack traces as a “better than nothing” breadcrumb trail for when the programmer has failed to account for a possibility. They are for producers of code, not consumers of it.

                                                    2. 2

                                                      There’s definitely competing needs between different audiences and environments here.

                                                      A non-technical end user doesn’t want to see anything past “something went wrong on our end, but we’re aware of it”. Well, they don’t even want to see that.

                                                      A developer wants to see the entire stack trace, or at least have it available. They probably only care about frames in their own code at first, and maybe will want to delve into library code if the error truly doesn’t seem to come from their code or is hard to understand in the first place.

                                                      A technical end user might want to see something in-between: they don’t want to see “something was wrong”. They might not even want to see solely the outer error of “something went wrong while persisting data” if the root cause was “I couldn’t reach this host”, because the latter is something they could actually debug within their environment.

                                                  2. 9

                                                    This is one reason I haven’t gone back to Go since university - There’s no right way to do anything. I think I’ve seen a thousand different right ways to return errors.

                                                    1. 9

                                                      Lots of pundits say lots of stuff. One good way to learn good patterns (I won’t call them “right”), is to look at real code by experienced Go developers. For instance, if you look at https://github.com/tailscale/tailscale you’ll find pervasive use of fmt.Errorf. One thing you might not see – at least not without careful study – is how to handle code with lots of error paths. That is by it’s very nature harder to see because you have to read and understand what the code is trying to do and what has to happen when something goes wrong in that specific situation.

                                                      1. 6

                                                        there is a right way to do most things; but it takes some context and understanding for why.

                                                        the mistake is thinking go is approachable for beginners; it’s not.

                                                        go is an ergonomic joy for people that spend a lot of time investing in it, or bring a ton of context from other languages.

                                                        for beginners with little context, it is definitely a mess.

                                                        1. 9

                                                          I thought Go was for beginners, because Rob Pike doesn’t trust programmers to be good.

                                                          1. 18

                                                            I’d assume that Rob Pike, an industry veteran, probably has excellent insight into precisely how good the average programmer at Google is, and what kind of language will enable them to be productive at the stuff Google makes. If this makes programming languages connaisseurs sad, that’s not his problem.

                                                            1. 9

                                                              Here’s the actual quote:

                                                              The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt.

                                                              So I have to wonder who is capable of understanding a “brilliant language” …

                                                              1. 8

                                                                So I have to wonder who is capable of understanding a “brilliant language” …

                                                                Many people. They don’t work at Google at an entry-level capacity, that’s all.

                                                                There’s a subtle fallacy at work here - Google makes a lot of money, so Google can afford to employ smart people (like Rob Pike!) It does not follow that everyone who works at Google is, on average, smarter than anyone else.

                                                                (edited to include quote)

                                                                1. 8

                                                                  Let’s say concretely we are talking about OCaml. Surely entry-level Googlers are capable of understanding OCaml. Jane Street teaches it to all new hires (devs or not) in a two-week bootcamp. I’ve heard stories of people quickly becoming productive in Elm too.

                                                                  The real meaning of that quote is not ‘entry-level Googlers are not capable of it’, it’s ‘We don’t trust them with it’ and ‘We’re not willing to invest in training them in it’. They want people to start banging out code almost instantly, not take some time to ramp up.

                                                                  1. 8

                                                                    Let’s say concretely we are talking about OCaml. Surely entry-level Googlers are capable of understanding OCaml. Jane Street teaches it to all new hires (devs or not) in a two-week bootcamp.

                                                                    I suspect that Jane Street’s hiring selects for people who are capable of understanding OCaml; I guarantee that the inverse happens and applicants interested in OCaml self select for careers at Jane Street, just like Erlang-ers used to flock towards Ericsson.

                                                                    Google has two orders of magnitude more employees than Jane Street. It needs a much bigger funnel and is likely far less selective in hiring. Go is “the law of large numbers” manifest as a programming language. That’s not necessarily bad, just something that is important for a massive software company and far less important for small boutiques.

                                                                    1. 2

                                                                      applicants interested in OCaml self select for careers at Jane Street,

                                                                      As I said, they teach it to all hires, including non-devs.

                                                                      Google has two orders of magnitude more employees than Jane Street. It needs a much bigger funnel and is likely far less selective in hiring

                                                                      Surely though, they are not so loose that they hire Tom Dick and Harry off the street. Why don’t we actually look at an actual listing and check? E.g. https://careers.google.com/jobs/results/115367821606560454-software-developer-intern-bachelors-summer-2022/

                                                                      Job title: Software Developer Intern, Bachelors, Summer 2022 (not exactly senior level)

                                                                      Minimum qualifications:

                                                                      Pursuing a Bachelor’s degree program or post secondary or training experience with a focus on subjects in software development or other technical related field. Experience in Software Development and coding in a general purpose programming language. Experience coding in two of C, C++, Java, JavaScript, Python or similar.

                                                                      I’m sorry but there’s no way I’m believing that these candidates would be capable of learning Go but not OCaml (e.g.). It’s not about their capability, it’s about what Google wants to invest in them. Another reply even openly admits this! https://lobste.rs/s/yjvmlh/go_ing_insane_part_one_endless_error#c_s3peh9

                                                                      1. 2

                                                                        And I remember when Google would require at minimum a Masters Degree before hiring.

                                                                        1. 1

                                                                          I had a master’s degree in engineering (though not in CS) and I couldn’t code my way out of a paper bag when I graduated. Thankfully no-one cared in Dot Com Bubble 1.0!

                                                                      2. 3

                                                                        They want people to start banging out code almost instantly, not take some time to ramp up.

                                                                        Yes, and? The commodification of software developers is a well-known trend (and goal) of most companies. When your assets are basically servers, intangible assets like software and patents, and the people required to keep the stuff running, you naturally try to lower the costs of hiring and paying salary, just like you try to have faster servers and more efficient code.

                                                                        People are mad at Rob Pike, but he just made a language for Google. It’s not his fault the rest of the industry thought “OMG this is the bee’s knees, let’s GO!” and adopted it widely.

                                                                        1. 1

                                                                          Yes, I agree that the commodification of software developers is prevalent today. And we can all see the result, the profession is in dire straits–hard to hire because of bonkers interview practices, hard to keep people because management refuses to compensate them properly, and cranking out bugs like no tomorrow.

                                                                        2. 2

                                                                          on the contrary, google provides a ton of ramp up time for new hires because getting to grips with all the internal infrastructure takes a while (the language is the least part of it). indeed, when I joined a standard part of the orientation lecture was that whatever our experience level was, we should not expect to be productive any time soon.

                                                                          what go (which I do not use very much) might be optimising for is a certain straightforwardness and uniformity in the code base, so that engineers can move between projects without having to learn essentially a new DSL every time they do.

                                                                          1. 1

                                                                            You may have a misconception that good programming languages force people to ‘essentially learn a new DSL’ in every project. In any case, as you yourself said, the language is the least part of the ramp-up of a new project, so even if that bit were true, it’s still optimizing for the wrong thing.

                                                                            1. 1

                                                                              no, you misunderstood what i was getting at. i was saying that go was optimisng for straightforwardness and uniformity so that there would be less chance of complex projects evolving their own way of doing things, not that better languages would force people to invent their own DSLs per project.

                                                                              also the ramp-up time i was referring to was for new hires; a lot of google’s internal libraries and services are pretty consistently used across projects (and even languages via bindings and RPC) so changing teams requires a lot less ramp up than joining google in the first place.

                                                                              1. 1

                                                                                i was saying that go was optimisng for straightforwardness and uniformity so that there would be less chance of complex projects evolving their own way of doing things,

                                                                                Again, the chances of that happening are not really as great as the Go people seem to be afraid it is, provided we are talking about a reasonable, good language. So let’s say we leave out Haskell or Clojure. The fear of language-enabled complexity seems pretty overblown to me. Especially considering the effort put into the response, creating an entirely new language and surrounding ecosystem.

                                                                  2. 9

                                                                    No, Rob observed, correctly, that in an organization of 10,000 programmers, the skill level trends towards the mean. And so if you’re designing a language for this environment, you have to keep that in mind.

                                                                    1. 4

                                                                      it’s not just that. It’s a language that has to reconcile the reality that skill level trends toward the mean, with the fact that the way that google interviews incurs a selection/survival bias towards very junior programmers who think they are the shit, and thus are very dangerous with the wrong type of power.

                                                                      1. 4

                                                                        As I get older and become, presumably, a better programmer, it really does occur to me just how bad I was for how long. I think because I learned how to program as a second grader, I didn’t get how much of a factor “it’s neat he can do it all” was in my self-assessment. I was pretty bad, but since I was being compared to the other kids who did zero programming, it didn’t matter that objectively I was quite awful, and I thought I was hot shit.

                                                                      2. 4

                                                                        Right! But the cargo-cult mentality of the industry meant that a language designed to facilitate the commodification of software development for a huge, singular organization escaped and was inflicted on the rest of us.

                                                                        1. 4

                                                                          But let’s be real for a moment:

                                                                          a language designed to facilitate the commodification of software development

                                                                          This is what matters.

                                                                          It doesn’t matter if you work for a company of 12 or 120,000: if you are paid to program – that is, you are not a founder – the people who sign your paychecks are absolutely doing everything within their power to make you and your coworkers just cogs in the machine.

                                                                          So I don’t think this is a case of “the little fish copying what big bad Google does” as much as it is an essential quality of being a software developer.

                                                                          1. 1

                                                                            Thank you, yes. But also, the cargo cult mentality is real.

                                                                      3. 2

                                                                        Go is for compilers, because Google builds a billion lines a day.

                                                                  3. 2

                                                                    return errors.Wrapf(err, "fooing %s", bar) is a bit nicer.

                                                                    1. 13

                                                                      That uses the non-standard errors package and has been obsolete since 1.13: https://stackoverflow.com/questions/61933650/whats-the-difference-between-errors-wrapf-errors-errorf-and-fmt-errorf

                                                                      1. 1

                                                                        Thanks, that’s good to know.

                                                                      2. 8

                                                                        return fmt.Errorf("fooing %s %w", bar, err) is idiomatic.

                                                                        1. 8

                                                                          Very small tweak: normally you’d include a colon between the current message and the %w, to separate error messages in the chain, like so:

                                                                          return fmt.Errorf("fooing %s: %w", bar, err)
                                                                          
                                                                      3. 1

                                                                        It makes error messages useful but if it returns a modified err then I can’t catch it further up with if err == someErr, correct?

                                                                        1. 2

                                                                          You can use errors.Is to check wrapped errors - https://pkg.go.dev/errors#Is

                                                                          Is unwraps its first argument sequentially looking for an error that matches the second. It reports whether it finds a match. It should be used in preference to simple equality checks

                                                                          1. 2

                                                                            Thanks! I actually didn’t know about that.

                                                                          2. 2

                                                                            Yes, but you can use errors.Is and errors.As to solve that problem. These use errors.Unwrap under the hood. This error chaining mechanism was introduced in Go 1.13 after being incubated in the “errors” package for a long while before that. See https://go.dev/blog/go1.13-errors for details.

                                                                        1. 2

                                                                          Company: PingThings
                                                                          Company Site: https://pingthings.io
                                                                          Positions: DevOps, Senior Software Engineer
                                                                          Location: Remote, USA timezone preferred

                                                                          Description: We’re building a data analytics platform for the power industry. We’ve got a custom high performance time series database, deployed in a Kubernetes environment, and we’re looking for engineers. We’re hiring for multiple positions, but our top priority right is devops. We’re looking for Linux system administration and Kubernetes experience to help us build and scale our deployments.

                                                                          Tech Stack: Go, Kubernetes.

                                                                          Contact: Email jobs@pingthings.io or ori@pingthings.io with your CV.

                                                                          1. 22

                                                                            Clang and GCC both support type-checking the parameters to printf and related functions; you just have to enable the right warning flag (which I don’t have handy right now on my lawn chair.) You can even enable this checking on your own functions that take format strings + varargs, by tagging them with an attribute.

                                                                            This has been around for more than 10 years. Given how easy it is to make mistakes with printf, every C or C++ programmer should enable these warnings (and -Werror) if their compiler supports them.

                                                                            (Yeah, C++ has its own formatted output library, and I sometimes use it, but it generates very inefficient code.)

                                                                            1. 8

                                                                              -Wall -Werror. And when developing, -fsanitize=memory.

                                                                              1. 10

                                                                                I have nothing against -Werror for CI, or for development, or for anything like that.

                                                                                But please, please don’t use -Werror by default if you’re working on an open-source project where you want others to come and compile your code. Inevitably, a new compiler version comes around which introduces a new warning or makes an existing warning trigger in more situations. There are so, so many cases where I’ve cloned a project, then followed the build instructions exactly, only to realize that my compiler is slightly different than the one the author is using, and the build fails due to -Werror. It’s never fun to have to dive deep into an unknown build system to disable -Werror when you just want to compile something.

                                                                                -Werror for CI and for development. -Wno-error for the default debug and release builds. Please.

                                                                                Even if you’re the only developer and user, -Werror is probably gonna bite you while bisecting. Your 2 year old commits probably contain code which your current compiler will warn about.

                                                                                1. 0

                                                                                  I mean you can just disable -Werror while you’re developing, no big deal.

                                                                                  1. 3

                                                                                    What do you mean? My main points are about the experience for a user who wants to compile your software, not the experience for the developer.

                                                                                    1. 0

                                                                                      I mean if they want to compile it then presumably they know their way around a Makefile or a config script. Users shouldn’t impose development practices on the project developers.

                                                                                      1. 4

                                                                                        What? The needs of your users shouldn’t affect development? That literally doesn’t make sense. You’re developing software for your users, no? Presumably all development should be influenced by the needs of your users, right?

                                                                                        If you make software where no user is going to compile your software from source, and where you’re not looking for contributors, and won’t ever ask people to make a build with debug symbols enabled to get a stack trace, and you release binaries for every conceivable system, then sure, do whatever you want. But most open source projects will have some users who want to compile from source, and you should probably adapt the software to the needs of your users. Most software acknowledges this by putting build instructions prominently in the readme.

                                                                                        You can at least do the slightest amount of effort to make your software nicer to use for your users. If you care at all, avoid intentionally breaking their build by default. Please.

                                                                                        1. 1

                                                                                          EDIT: btw, your point:

                                                                                          -Wno-error for the default debug and release builds.

                                                                                          Is super reasonable and I totally agree for release builds.

                                                                                          But for development…

                                                                                          The needs of your users shouldn’t affect development? That literally doesn’t make sense. You’re developing software for your users, no? Presumably all development should be influenced by the needs of your users, right?

                                                                                          The needs of users should influence the user experience of the software, not the development process of the software. If a user wants to compile from source it would surely be easier if the compilation disabled safety checks like say the typechecker phase. But it would also be massively inconvenient for the developers. Like I said, when it comes to development process, the needs of the developers outweigh the needs of users.

                                                                                          If you care at all, avoid intentionally breaking their build by default. Please.

                                                                                          Developers don’t intentionally break builds. At least, assuming a reasonable developer. What we don’t like is demands from users that would compromise our development practices. If a compiler warning breaks a build, the user of the open source software hopefully takes the time to file a ticket so the dev is aware and can fix it. Open source is a give-and-take, it’s not one side (developers) continuously caving to user demands.

                                                                                    2. 1

                                                                                      Use it for CI if you want it. Otherwise the point about compilers stands.

                                                                                      1. 1

                                                                                        No, it doesn’t, as I explained elsewhere in this thread. By default disabling safety checks during the normal development process just leads to more buggy and unmaintainable code over time.

                                                                                        1. 1

                                                                                          They’re warnings for a reason. If you want to ensure your checked-in code has no warnings, put them in CI. But don’t burden your users.

                                                                                          1. 1

                                                                                            Don’t optimize for the niche cases. Users typically don’t compile software themselves, they install from pre-compiled packages or download pre-built executables. In the normal course of things it’s much more likely that packagers will be the ones actually compiling/setting up builds. And if their packaging system is any good, they will lock down the versions of things like C compiler as much as possible, to ensure reproducibility.

                                                                                  2. 4

                                                                                    I would make that -Wall -Wextra -Werror=format -Werror=switch, plus -Werror and -fsanitize=… for the developer build.

                                                                                    -Wall is effectively the first warning level. There is also -Wextra and -Wpedantic.

                                                                                    I would reserve -Werror for the developer build, unless you want to break the ability to compile your current software with future compilers (since compiler writers come up with new warnings all the time). In particular, if you have users, you don’t want them to have to debug this.

                                                                                    The problem is of course false warnings – some warnings are pure noise in my opinion:

                                                                                    if(size == argc) // -Wsign-compare
                                                                                    

                                                                                    Lastly, if you care about specific warnings (who doesn’t?), it’s safer (safe enough, I suppose) to turn just those into errors, such as with -Werror=format (which catches this) and -Werror=switch (which catches unhandled switch cases). Those are some quality warnings on top of my head that are no reason to tolerate once your code is written to that standard.

                                                                                    Also, -Werror=sometimes-uninitialized for clang.

                                                                                    1. 2

                                                                                      At least the clang policy is for -Wextra to include things that have a lot of false positives, whereas -Wall includes things that have very low false-positive reates. You should think very carefully about enabling it by default for two reasons:

                                                                                      • Every false positive a developer encounters for a warning makes them more likely to just do the thing that silences the warning, rather than fix the issue.
                                                                                      • Your codebase will include an increasing number of things that just silence warnings, which makes the code less readable and makes it easier for bugs to sneak in.
                                                                                      1. 1

                                                                                        Are you sure you don’t mean -Weverything? Apart from -Wsign-compare (which is in -Wall in g++ and -Wextra in gcc), which I would say that about, I find -Wextra and even -Wpedantic quite livable.

                                                                                        1. 4

                                                                                          At least the last time we tried to write a policy for this, the consensus was:

                                                                                          • -Wall is things that are unlikely to cause false positives and we’d like to recommend everyone uses.
                                                                                          • -Wextra is things with too high a false positive rate for -Wall.
                                                                                          • -Weverything is all warnings including experimental ones that may give completely nonsense results.

                                                                                          I wouldn’t recommend -Weverything for anyone who is not working on the compiler. As I recall, -Wsign-compare had quite a bit of discussion because it does have a fairly high false positive rate, but it also catches a lot of very subtle bugs and it encourages people to be more consistent about their use of types, so leads to better code overall.

                                                                                          1. 1

                                                                                            Ok. My take on -Wsign-compare is that it mixes up equality vs inequaity comparisons. I wish it didn’t:

                                                                                            if(size < argc) // Bad: Result depends on signedness.
                                                                                            if(size == argc) // Ok: Result does not depend on signedness.
                                                                                            

                                                                                            Only one of these comparisons is problematic, yet -Wsign-compare complains about both! As such, half this warning is legit; half of it is never that, and the need to silence it is just detrimental.

                                                                                            You’re right, I wouldn’t use -Wextra: I would use -Wextra -Wno-sign-compare, because that’s the unreasonable part.

                                                                                            1. 2

                                                                                              The second is not, by itself, a problem, but it can depend on a cast that might be a problem (you must have constructed size from something, and what happens if that size is in the range that is negative for int? Is your logic still correct in this case?). It can also mask problems on some platforms because of the exciting integer promotion rules in C that are invoked here. Unfortunately, C and POSIX APIs are incredibly inconsistent about when they use signed and unsigned values (argv really should be a size_t, but that would break backwards compatibility[1]).

                                                                                              C also doesn’t have a good way of asking the question that you really want to ask, which is ‘are both of these values contained in the range that can be expressed in both types and, if so, are they identical?’. The integer promotion rules correctly handle that if there is a type that can express the range of both, but fails with annoying corner cases where there isn’t one. You really want a 31- or 63-bit unsigned type in a lot of cases so that you have something that can be unambiguously cast to a signed or unsigned 32- or 64-bit integer.

                                                                                              [1] I’d love for C++ to define a new entry point that provided a pair of iterators for the arguments and have the compiler synthesise the prologue in main, but that’s probably never going to happen.

                                                                                          2. 4

                                                                                            -Wextra has warnings for a lot of things which don’t necessarily indicate a problem. The biggest example of this is the unused parameter warning, which warns when a function has an unused parameter.

                                                                                            Sometimes, an unused parameter indicates a problem, sure. But a lot of the time, the parameters have to exist because the function has to implement some particular signature to be used as a function pointer (or to override a virtual method in C++), but doesn’t need to use one of the arguments. In C++, a common pattern is to comment out the name of the parameter - something like, void foo(int /*count*/, int *whatever) { ... }, but that’s illegal syntax in C.

                                                                                            I find things like -Wsign-compare to be a bit annoying sometimes, but mostly reasonable. It could possibly catch errors, and it’s easy enough to do the cast (or use the right signedness in for loops). But it’s certainly another example of something which has a lot of false positives.

                                                                                            I often use -Wall -Wextra -Wpedantic -Wno-unused-parameter in my makefiles.

                                                                                            1. 3

                                                                                              You can avoid that warning in C too. I think the commonest way is to cast it to void:

                                                                                              void api(int unused) {
                                                                                                  (void)unused;
                                                                                              }
                                                                                              
                                                                                              1. 3

                                                                                                You can silence most of the warnings. The questions is whether the extra noise in the source code lowering the overall readability is going to cause more bugs than the warning prevents.

                                                                                      2. 1

                                                                                        Except -Wall doesn’t turn on all warnings, just “a lot”, and in my experience omits some I consider important. (Im sure this is compiler- and maybe version-dependent.)

                                                                                        IIRC, -Wpedantic turns on “too many” warnings in Clang but “about right” in GCC.

                                                                                        In Xcode, I use a curated xcconfig file that turns on nearly every warning; then in my own config file I turn off the ones that are so annoying I don’t use them, mostly super pedantic stuff like implicit sign conversions.

                                                                                      3. 1

                                                                                        (Yeah, C++ has its own formatted output library, and I sometimes use it, but it generates very inefficient code.)

                                                                                        Not to mention terrible ergonomics.

                                                                                        1. 6

                                                                                          In what way? The API is almost the same as printf, except that the format strings are in braces and the qualifiers can be a bit more readable and generates better code. It also provides a much richer API allowing things like compiling the format string so that you get compile-time type checking even in cases where the string and the arguments are provided in different places. You can also provide formatters for your own types that have the same type safety guarantees.

                                                                                          1. 2

                                                                                            Wow, I thought they were talking about ostream. I’m not up to date on C++20, the last I used was C++17. This is cool! It seems just like Rust’s format! macro.

                                                                                            1. 2

                                                                                              C++20 has std::format, which is basically a stable subset of the fmt APIs. It’s sufficiently easy to add fmt to a project that I’d only use it for things that are aggressively wanting to trim dependencies though.

                                                                                              Completely agreed on ostream. It’s not amenable to localisation, it’s hard to use in a multithreaded context, and the APIs for actually doing the stringification are horrible (depending on pushing state into the stream to control formatting). The core bits of iostream are actually quite nice: separating the buffer management from the I/O interface, for example, but the high-level bits are awful. They’re over 20 years old at this point (they’re inherited from STL, before C++98’s standard library), so also predate multithreading as a common pattern.

                                                                                      1. 3

                                                                                        Forget about performance – making a habit of this prevents race conditions and TOCTOU security problems.

                                                                                        1. 8

                                                                                          Oh man, GNU autoconf. Is that still used in new projects? I feel like there’s a whole book to be written in all the weird things it tries to detect. It’s been 15 years since some of the operating systems it supports were last even booted other than as a curiosity. But if you still need BeOS compatibility, why, it’s got you covered.

                                                                                          1. 9

                                                                                            I’ve thankfully avoided autoconf for a long time, but I did encounter one project that didn’t include all of the default autoconf things and so had a simple autoconf script that checked about a dozen things and worked on all surviving systems around 2008.

                                                                                            I’ve used CMake for all new projects for a decade or so and it does a lot of the checks. Almost 20 years ago, phk pointed out that most of the autoconf checks boiled down to ‘is this Linux?’ or ‘is this *BSD?’ or ‘is this Windows?’ and CMake has done a pretty good job of memoising a lot of that and, unlike autoconf, works out of the box on Windows. I don’t know if it supports BeOS, but it supports Haiku out of the box. The memory allocator that I work on uses CMake and supports Haiku (allegedly - I’ve never tried it).

                                                                                            For all of the hate autoconf gets (and that libtool deserves: it is completely useless on all platforms except AIX, and not supporting AIX is a feature, not a bug), it is nowhere near as painful as imake. Moving from imake to autoconf was one of the motivations in the X.org fork from XFree86 and it was a big improvement.

                                                                                            1. 2

                                                                                              It’s tragically sad that phk’s post is nearly 20 years old.

                                                                                            2. 9

                                                                                              I tend to prefer autoconf over CMake for some reasons:

                                                                                              • It’s easier to debug when it goes wrong, which is often (CMake debugging is excruciating unless you’re at Kitware)
                                                                                              • It is actually smaller than CMake for sure (gigantic >200 MB C++ blob over a non-euclidean amount of M4), though I doubt this manifests as something reasonable
                                                                                              • libtool and friends support AIX conventions better (it’s wacky and has fat libraries, weird exports, etc.), which is better for my dayjob (sorry David)

                                                                                              But other than those two things, it is pretty gross and hideously slow on anything that can’t hide the realities of how much fork and stat suck. CMake has a better separation of concerns (i.e. you can actually target not-Unix and make IDE projects from it).

                                                                                              1. 4

                                                                                                Making IDE projects is really what sold me on CMake. While debugging the build system itself can be excruciating with CMake, debugging the resulting program on a new platform without IDE support tends to be similarly excruciating. Especially if you want to target non-Unix also. And autotools on non-Unix is as godawful to debug as CMake for build issues, IME. Plus, my debugging skills are better on Unix-y things…

                                                                                                Cmake debugging has improved enough that it sucks a less to debug than it used to, and I don’t have to touch AIX lately, so it usually carries the day for me.

                                                                                                With all that said, if I regularly needed my programs to run on AIX, HP/UX or Solaris, and needed to build them with non-GNU or non-clang toolchains, I’d still choose autotools even now, though. CMake with GNU/Clang works pretty well for me now that I rarely need to worry about anything other than recent-ish Linux/BSD/Mac/Windows.

                                                                                                1. 2

                                                                                                  Making IDE projects is really what sold me on CMake.

                                                                                                  This is the only reason I use CMake: it supports Visual Studio and CLion well.

                                                                                                2. 3

                                                                                                  CMake debugging is excruciating unless you’re at Kitware

                                                                                                  This, so much this. The last time I needed to cross-compile, paths were getting mangled: /usr/bin became /path/to/cross/toolchain/usr/bin/usr/bin (note the doubled /usr/bin); I found myself running strace to figure out which of the wrong files it was including, and then setting breakpoints in the debugger in a custom debug build of the cmake binary to find out which of the cmake libraries was causing that include to get run.

                                                                                                  It took a week to find the right variables to set to make that house of cards work.

                                                                                                  1. 7

                                                                                                    Every time I have to google -DCMAKE_BULLSHIT_FLAG, and for the project specific ones, hope they documented them! Most autotools projects, I can run ./configure --help at least. I think CMake is easier for developers, but autotools is easier for integrators.

                                                                                                  2. 1

                                                                                                    The debugging this is probably subjective. My experience debugging autotools is universally negative. I can usually find the place where it’s doing the stupid thing in the generated output, but trying to map that back to the input and whether it’s the autoconf, automake, or libtool input is hard. Perhaps more familiarity with the tooling makes it easier. With CMake, there’s a single layer that does the translation and there are helpers to dump the properties for any object in the system, so I can do that and easily see where the input is coming from. Again, that’s partly due to familiarity with CMake.

                                                                                                    The only time that I’ve had a problem with debugging something in CMake was when I got the Objective-C runtime to build on Windows. I was slightly abusing CMake to say that Objective-C[++] files were C[++] so that I could reuse all of its built-in machinery. Unfortunately, when invoking cl.exe, CMake helpfully hard-coded the /TC or /TP (compile as C/C++) flags depending on the source language. When invoking clang-cl.exe, this forced it to try to compile Objective-C as C and Objective-C++ as C++, which then broke. I can’t really count that against CMake in a CMake vs autotools comparison, because autotools can’t target a Visual-Studio flavoured compiler or linker.

                                                                                                    I’m not sure where the >200MB number comes from. On FreeBSD, the entire cmake 3.21 package is 34 MiB, of which 3.9 MiB is the C++ bit and the rest is all CMake script. On Windows, where the official binaries include all dependencies, the total install size is 98 MiB for CMake 3.21.2. Note that the size isn’t an apples-to-apples comparison with autotools because CMake does not depend on bash (which doesn’t matter on *NIX platforms, but is incredibly important on Windows, where autotools requires something like mingw) and includes quite a few (large and useful) things that autotools doesn’t, for example:

                                                                                                    • A testing framework (CTest) that provides a simple way of running tests and integrates well with most CI systems.
                                                                                                    • A package-building system that can generate tarballs, RPMs, DEBs, FreeBSD packages, Windows installers, NuGet packages, and a bunch of other things.
                                                                                                    • Infrastructure for exporting and importing targets so that you can distribute modular components and import them easily (I think libtool was supposed to do this, but I’ve never seen it work for anything other than very tightly coupled projects).

                                                                                                    I’ve been fortunate enough never to have had AIX inflicted on me, so I can’t really speak to how well CMake works on AIX versus autotools, but apparently there are 3.18 packages. I have had to use Windows and autotools really suffers there:

                                                                                                    • It requires a UNIX-like shell and set of command-line tools, so you end up needing to install mingw or similar.
                                                                                                    • It can’t drive a Visual Studio toolchain, so you have a hard dependency on clang or gcc for your build system, even if your code would happily build with MSVC.
                                                                                                    • It can only generate GNU Make output, so now you have another tool that doesn’t really support Windows well in your dependency chain.

                                                                                                    For me, the biggest reasons to prefer CMake are:

                                                                                                    • It generates compile_commands.json automatically (I stick CMAKE_EXPORT_COMPILE_COMMANDS=true in my environment, otherwise you have to opt into this), so all of the non-build tooling works well.
                                                                                                    • It generates ninja files that are significantly faster. For LLVM, the autotools build system took 30 seconds to run ‘make’ on a tree with no changes. The CMake-based one takes a tiny fraction of a second. Ninja also does parallel builds better.
                                                                                                    • It can target Visual Studio (and use Ninja to build with the VS compiler and linker, which is typically much faster than a native VS build), which means that we can test our code with more compilers (we have clang, gcc, and MSVC in our CI matrix and each one provides warnings that the others don’t).
                                                                                                    • It can generate Visual Studio and XCode projects, for when I want to use an IDE’s debugger. I typically live in vim, but for some debugging tasks an environment where the debugger and editor are integrated is really nice.
                                                                                                    • CTest is really easy to use and we can trivially connect the output to CI reporting.
                                                                                                    • It integrates well with things like pkg-config and other mechanisms for finding other code. I’m increasingly using vcpkg for dependencies and it has fantastic CMake integration (it’s largely implemented in CMake) and it will fetch and compile all of my dependencies for when I want something that statically links everything.
                                                                                                    • The UI for users is much nicer. ccmake (*NIX) or cmake-gui.exe (Windows) give me a nice UI for exploring all of the options and let me expose typed options (e.g. booleans, simple enumerations where the user must pick one) and dependent options (some are hidden if they’re not required).
                                                                                                    • The packaging support works even for header-only C++ libraries. I’ve just modernised the snmalloc build system and now you can do a build of the library in the header-only configuration and it will generate CMake config files that you can import into another project and get all of the compile flags necessary to build.
                                                                                                    1. 2

                                                                                                      I’ve been fortunate enough never to have had AIX inflicted on me, so I can’t really speak to how well CMake works on AIX versus autotools, but apparently there are 3.18 packages

                                                                                                      This is the point where I say I oversimplified things: what I actually target is an AIX syscall emulator, and that environment tries to do away with the worst excesses of AIX. For example, it tries to match the normal soname version convention (libfoo.so.69) instead of the hellish AIX one that requires dynamically using ar (libfoo.a(libfoo.so.69)) that GNU also came up with based on how IBM was kinda unversioned libraries (there’s an autoconf switch to enable one or both conventions, most of this damage is inflicted on me by GNU), it only has 64-bit packages so no need for fat libraries, etc. - though I still need libtool because the .so files STILL have to be archives so ld won’t fuck up exports (primarily so it links with libfoo.so.69 instead of libfoo). I’m sure this can be taught to CMake instead of using libfool, but it still sucks man! Said RPM for 3.16 is 233 MB without any of the GUI stuff; seems it’s because CMake’s binaries are statically linked likely to due bad dumb AIX linker stuff. Don’t you just love how diverse POSIX can be?

                                                                                                  3. 9

                                                                                                    I got a cool horror story about this. It’s probably mostly off-topic, unless we’re talking about how weird autotools are and how nobody has been wanting to learn them anymore for years, which is somewhat tangential. But I mostly want to tell it for the laughs ’cause I know people here will probably enjoy it.

                                                                                                    One of the “coolest” things I can say I’ve crossed off of a bucket list I never wanted in the first place during my experience in the corporate world is, uh, writing configure scripts by hand. I wish I were kidding but nope. For quite some time, I literally fixed dozens of bugs in a configure script, which I edited, by hand. Here’s what happened.

                                                                                                    We had this huge hunk of a codebase that was split in maybe 60 or so separate programs, all of them managed through a big blob of autotools magic. It was definitely a non-trivial thing. One day, way before my time, back when they were more like 15 programs or so, someone who did not want to learn autoconf & friends had to add another program to that list, and being entirely unsure how this whole thing even worked, they just copy-pasted the relevant bits of the configure script and replaced all occurrences of $whateverprogram with $theirprogram, and did more or less the same for config.h, Makefile.in and so on.

                                                                                                    Of course that wouldn’t quite make it past the mandated code review, so this person dutifully did what was asked and added the “right” autotools incantations in Makefile.am & co.. They could never make it work, though, so they did the only logical thing: they committed everything that autotools coughed out – including the half-generated, half-handcrafted configure script, config.h, Makefile.in, everything – and changed the build script to skip the autogen step and go straight for running configure.

                                                                                                    Since the automated integration system was happy – it produced successful builds, after all! – and all the checkboxes raised during the review process had been ticked, there was no reason to delay this important piece of functionality anymore so the whole thing was merged.

                                                                                                    Fast forward like ten years and of course the whole autotools scaffolding was basically useless. Because, for reasons I am not at liberty to discuss, this whole thing had absolutely no documentation whatsoever, nobody quite knew exactly what boilerplate was and wasn’t required – they just replicated that guy’s ten year-old diff, including the incorrect incantations in Makefile.am & friends. None of it was relevant, of course. The build system didn’t use it to generate anything, and if it tried, it wouldn’t work anymore, anyway: the original commit didn’t work, and it had ten years’ worth of replicated junk on top of that. Only the configure script – most of which had become hand-rolled by then – mattered.

                                                                                                    By the time I got there I was more or less the only person on the team who’d been around Unices back when autotools were really common so I was eventually asked for an estimate on fixing it. Upon pointing out that I don’t really know autotools, either, and that more importantly, I’m not too familiar with the system, and that it has like ten years of brain damage, so it’d probably take me a few weeks at best, it quickly got chalked up under “shit we’ll ask interns to do”, because we’re not going to throw full time senior engineering money at that. But various build errors did eventually find their way to me, and I’d usually trace them to some bash copy-pasta that had been lifted of StackOverflow and plastered onto configure, which I’d fix as if this was a real thing.

                                                                                                    Needless to say, the damn thing never got fixed. By the time this was happening, most of the people who showed for internship interviews hadn’t even heard of autotools, and learning all that, and all the ins and outs of the system that got built, well enough to actually fix the whole thing, realistically took way longer than the internship period.

                                                                                                    Edit: I don’t think that gets used for new projects much, no. New projects don’t need it, they all run under Ubuntu LTS in Docker containers anyway :-P.

                                                                                                    1. 1

                                                                                                      Edit: I don’t think that gets used for new projects much, no. New projects don’t need it, they all run under Ubuntu LTS in Docker containers anyway :-P.

                                                                                                      You might be kidding. But the world thinks that’s the right choice of build system.

                                                                                                      1. 1

                                                                                                        I’m kidding somewhat :-).

                                                                                                  1. 2

                                                                                                    I never understand this attitude, at all. The code to do these things (persistence, API gateways, migrations, etc) still needs to exist either way; you can either: 1) generate it mechanically, 2) generate it by hand, or 3) use runtime tricks to achieve it if your environment is dynamic enough. Once you’ve got the parser or models in a format you can consume, #1 and #3 present a similar amount of work, with the main difference being that when you use #1 it’s a helluva lot easier to read the actual code being executed when it comes time to debug something. I mean, it’s also usually faster, but that’s not usually the deciding factor.

                                                                                                    1. 2

                                                                                                      That makes sense. I think there is a continuum between #1 and #3 where you can autogen code and only rarely have to look at it

                                                                                                      1. 2

                                                                                                        The code to do these things (persistence, API gateways, migrations, etc) still needs to exist either way;

                                                                                                        In my experience, it very often doesn’t. Putting the articulation points of a system in the right place can eliminate huge amounts of complexity.

                                                                                                        1. 1

                                                                                                          The design and the heft of an industrial solution is far different and far heavier than something artisanal. I’m reminded of the recent post about the person who wrote their own 150 line query builder.

                                                                                                          I could use protoc to generate code (#1) or use a JSON API implementation (#3) … or I could Net::HTTP.get('example.com', '/index.html')

                                                                                                        1. 4

                                                                                                          Compute always wins. Imagine anything that was hard in the past and how hardware eventually overcame it. We don’t even notice that thing anymore.

                                                                                                          I understand the efficiency argument but the horse has kind of left the barn. Bets against webtech aren’t good bets to me. Something like Tauri, Photino or Neutralinojs will do the efficiency angle or we’ll have 512GB DDR5 sticks on desktop around ~2022. Yes, it’s a shame it’s a waste and there are other platforms than desktop but they’ll follow suit eventually.

                                                                                                          API and feature-wise, I can’t imagine something like QT having feature parity or adoption force. I appreciate and love the QT apps I use, they are lightweight and fast. I appreciate QT and efficiency. I just have stopped betting against webtech. The OP’s 2nd point of web tabs use memory too is slightly weak but I like the idea. You can close tabs and kind of manage it. You can’t do this in Slack for example.

                                                                                                          I understand the debate and I’m happy to listen, I’m just trying to write down where my head is at about this. I hope you consider compute always wins even if it annoyingly generalized.

                                                                                                          1. 6

                                                                                                            This view is stuck in the 90s. CPU speeds and memory sizes have plateaued since the mid 2000s.

                                                                                                            1. 3

                                                                                                              Here’s a graph from 2014 to 2021 across two architectures. https://twitter.com/iancutress/status/1334549180967227393 One arch (arm) is absolutely not plateaued (if you mean flat), the other is still having 10-20% IPC jumps per gen. Cores have been increasing, performance for parallel work is massively upward trend.

                                                                                                              Since 2000s dollar per GB of RAM is going down. I mentioned DDR5 stick density. We’ll see about price. Memory size is increasing, has been. https://jcmit.net/memoryprice.htm

                                                                                                              Surely computers have not been ~the same speed since the early 2000s? Can I read something you are thinking of?

                                                                                                              1. 3

                                                                                                                Surely computers have not been ~the same speed since the early 2000s? Can I read something you are thinking of?

                                                                                                                Computers in the early 2000s were nearly doubling in speed annually. You’re talking about low double digits. That’s an order of magnitude slower growth.

                                                                                                                Going from 1993 to 2003, my CPU went from 33 mhz to 2.4 ghz, a factor of about 1000. Probably more when counting IPC increase. My ram went from 12 megs to 1 gig of ram, a factor of 100.

                                                                                                                Between 2011 and 2021, we’ve seen nothing like that, especially counting single core performance. We’ve probably roughly doubled single core CPU performance, and quadrupled memory sizes. I’d say that this is a plateau.

                                                                                                                (And, sure, core counts have gone up – but that requires programmers to spend time optimizing – and effective optimizations mean ensuring that you’re not getting false sharing, slowing things down due to cache line ping pong and lock contention, etc. It’s not a free lunch.)

                                                                                                                1. 1

                                                                                                                  Also, while performance capability has gone up, so have performance requirements - if you compared Windows XP to Windows 10, you’ll find a whole lot of stuff had to be removed from kernelspace and slowed down as a result, AIUI. It’s not enough to just run XP on a Ryzen chip, these days.

                                                                                                          1. 1

                                                                                                            I personally, am not interested in winning/losing, who wins or who loses.

                                                                                                            1. 4

                                                                                                              I’m interested in what technology stacks I’m going to be forced to deal with, due to their ubiquity, over the coming decade.

                                                                                                              1. 2

                                                                                                                What and how would force you?

                                                                                                                1. 4

                                                                                                                  Work. Try to find a systems engineering job that doesn’t work with Linux.

                                                                                                                  They exist, but are rare and getting rarer.

                                                                                                                  1. 2

                                                                                                                    Speaking from a world far away from Unix: Your work is going to be making Linux or at least POSIX-specific programs working on your system or setting up APIs so work can be moved to Docker containers as appropriate.

                                                                                                                    1. 1

                                                                                                                      Depends on what you understand by “work”. I personally am barely getting by, but at least I’m doing work I’m happy with rather than forcefully working on something that’s doesn’t fit me.

                                                                                                              1. 2

                                                                                                                This sounds like a fantastic argument for static linking.

                                                                                                                1. 4

                                                                                                                  Personally, I’ve never liked the advice that writing obvious comments is bad practice—probably because I write obvious comments all the time.

                                                                                                                  This is fun and games while it is correct but when you run across a comment that is claiming the opposite of what the code does, what now?

                                                                                                                  // Add a horizontal scroll bar
                                                                                                                  newScrollBar = new JScrollBar(scrollBar, VERTICAL);
                                                                                                                  

                                                                                                                  What’s the correct behaviour? Should you fix the comment? Should you fix the code? Should you leave it alone if you’re not directly affected by it? With just the code the problem isn’t really there since there is one source of truth. It might not be correct but at least there’s no contradictions because the code does what the code says it does.

                                                                                                                  The problem with comments is, that they’re non-executable, so the person changing the code changes the code otherwise the change will not happen. But will they remember to change the comment? Maybe, maybe not. This isn’t even a hypothetical case, I’ve seen cases where the comment claimed the exact opposite behaviour than what the code did.

                                                                                                                  1. 5

                                                                                                                    What’s the correct behaviour? Should you fix the comment? Should you fix the code? Should you leave it alone if you’re not directly affected by it? With just the code the problem isn’t really there since there is one source of truth. It might not be correct but at least there’s no contradictions because the code does what the code says it does.

                                                                                                                    I’ve found those cases incredibly useful, because they tell me the code diverged. Something changed; why? Does other code assume the scroll bar is horizontal? Was that code switched when this snippet was switched to vertical? Does this code/comment divergence help me track down the bug I’m searching for?

                                                                                                                    Without the comment I wouldn’t know that something interesting happened and wouldn’t even think to git blame it, but now I know that there’s historical information here.

                                                                                                                    1. 3

                                                                                                                      I’ve found those cases incredibly useful, because they tell me the code diverged. Something changed; why?

                                                                                                                      Usually, because I slipped as I was typing the comment. I use the wrong word all the time when speaking and writing, and need either some testing or a careful reader to notice.

                                                                                                                      There’s nothing insightful to be gained from my thinkos and brain farts: they’re also in my multi-paragraph expository comments, but at least there you usually have enough context to realize “oh, Ori is a sloppy writer, and clearly meant this”.

                                                                                                                      1. 2

                                                                                                                        The thing is: for the most part, 99% of the cases, this is because the code was changed but the comment wasn’t (very much according to Occam’s Razor) so what you get out of this is wasting your time with needless investigation of something that’s not actually a problem and then fixing the comment or just removing it outright.

                                                                                                                        1. 2

                                                                                                                          If you are certain the code is right and know the comment is wrong, then take 30 seconds to fix the comment to match the code and move on with your life. Not a big deal.

                                                                                                                          If you are uncertain whether the code is wrong or the comment is wrong, then that indicates a hole in your understanding. That hole would exist whether or not the comment was there. So thats a useful indicator that you should investigate further.

                                                                                                                          1. 1

                                                                                                                            The comment makes understanding that line actively more difficult. If you’re reading the code for the purpose of coming to some initial understanding of a codebase then you’re going in with holes in your understanding and this inconsistent comment creates yet another hole.

                                                                                                                            You’re suggesting that if I go to a random Github repo I’m not familiar with, start trying to read the code, and find a comment that is blatantly and objectively inconsistent with the code it’s attached to, that’s not actually a problem because if I just understood the codebase I’d know which to trust.

                                                                                                                            You’re onto something there. That’s exactly the point: if context makes it clear which to trust, then you don’t need both, you just need the (correct) code. If we can make the assumption the misleading comment won’t matter because anyone reading it will know that the comment is wrong, then what is the comment doing there at all?

                                                                                                                    1. 4

                                                                                                                      Saying code comments are useless is a judgement from us as fluent readers of the code. It disregards the value comments have for someone less fluent than yourself who is either reading or writing code.

                                                                                                                      If you’re trying to teach people a language that they don’t understand, you’re going to need to do a LOT more explaining:

                                                                                                                      // Add a horizontal scroll bar
                                                                                                                      hScrollBar = new JScrollBar(scrollBar, HORIZONTAL);
                                                                                                                      add(hScrollBar, BorderLayout.SOUTH);
                                                                                                                      

                                                                                                                      May need to become:

                                                                                                                      // Add a new horizontal scroll bar.
                                                                                                                      // Horizontal scroll bars are what we call the GUI element that
                                                                                                                      // allow you to move a widget's visible frame side to side instead
                                                                                                                      // of vertically (see https://tutorial.com/diagram.jpg).
                                                                                                                      //
                                                                                                                      // 'new' is a keyword that indicates that the following word is a class
                                                                                                                      // name, and that  the constructor `JScrollBar.JScrolBar()` is to be invoked.
                                                                                                                      // This is used to initialize the scrollbar object, and set it up as horizontal.
                                                                                                                      hScrollBar = new JScrollBar(scrollBar, HORIZONTAL);
                                                                                                                      
                                                                                                                      // The add() function is implicitly called on the `this` object, and takes both
                                                                                                                      // the hScrollBar we created in the line above, the current container object,
                                                                                                                      // and uses the BorderLayout.SOUTH constant to indicate that it goes on
                                                                                                                      // the bottom of the container.
                                                                                                                      add(hScrollBar, BorderLayout.SOUTH);
                                                                                                                      

                                                                                                                      The problem with these ‘useless comments’ is that they’re generally not sufficient to teach someone that is unfamiliar with reading code, and unhelpful for someone that is.

                                                                                                                      1. 1

                                                                                                                        Could you comment also mention what scrollBar is, why it is necessary and how it differs from hScrollBar? I have no experience with Java and would like the explanation of everything directly in the code.

                                                                                                                      1. 5

                                                                                                                        If you want this, why not simply space separated columns with consistent escaping of spaces? This would preserve the easy human readability and direct manipulation benefits of the shell, without losing anything over json. Anything that can be represented with json could be represented as a keypath/value column.

                                                                                                                        And, unlike json, it’s streamable – you can start processing records before the last one is emitted. Json does not allow this: It requires you to buffer until the end to detect a json syntax error.

                                                                                                                        1. 4

                                                                                                                          You can stream JSON. You just emit one object per line instead of using an array: http://ndjson.org/

                                                                                                                          1. 5

                                                                                                                            That’s not valid json. It’s a non-standard extension – and I think it’s inferior to space-separated columns with quoting to handle spaces.

                                                                                                                            1. 1

                                                                                                                              Inferior in what context? Streaming JSON readers are a dime a dozen in every programming language. The thing you’re proposing is probably fine too, but we don’t need two different ways to do it, and the existing way is fine.

                                                                                                                              1. 2

                                                                                                                                Inferior in the context of a shell:

                                                                                                                                • Streaming TSV libraries are a dime a dozen in every programming language
                                                                                                                                • It allows tools to output in one format for both humans and the machine, instead of doing it two different ways and forcing me to either grub through json output by hand, or guess at how the output translates.
                                                                                                                                • It’s simpler to parse with existing tools like awk, sed, etc, and if it becomes ubiquitous, it would be a smaller lift to convince maintainers that a simple columnar format should live directly in these tools
                                                                                                                                • It’s simpler to parse without existing tools when writing new code
                                                                                                                                • It’s simpler to parse without tools at all: just use your eyeballs
                                                                                                                                • It’s simpler to generate in a streaming format from existing tools
                                                                                                                                • It’s simpler to generate at all, especially if you allow yourself to use glibc’s register_printf_function to add a ‘%q’ format specifier that quotes.

                                                                                                                                And finally, the only real argument for JSON isn’t really valid:

                                                                                                                                • The ‘existing way’ doesn’t actually exist: there are approximately no well established shell tools that produce json output natively. It’s a similar (or larger) lift to convert the shell ecosystem to use json, the transition period will be harder and longer, and the end result will not be as pleasant to use.
                                                                                                                                1. 2

                                                                                                                                  Streaming TSV libraries are a dime a dozen in every programming language

                                                                                                                                  You can’t count on two libraries handling newlines the same way, so it’s not practically composable. That’s the whole point of using JSON. It doesn’t have to be good. It just has to be a standard.

                                                                                                                                  It allows tools to output in one format for both humans and the machine, instead of doing it two different ways and forcing me to either grub through json output by hand, or guess at how the output translates.

                                                                                                                                  JSON is reasonably readable, and there are plenty of tools like gron to clean it up more. I don’t see how TSV is especially more readable, particularly once you take out newlines.

                                                                                                                                  It’s simpler to parse with existing tools like awk, sed, etc, and if it becomes ubiquitous, it would be a smaller lift to convince maintainers that a simple columnar format should live directly in these tools

                                                                                                                                  Awk and sed have had forty years to catch on, but they’ve been stalled for the last twenty or so. Good luck, I guess, but I give this project a 1% chance of really catching on, and an awk renaissance a 0.001% chance of happening.

                                                                                                                            2. 3

                                                                                                                              Doesn’t even have to be one per line. Just push into the parser until you get a whole object, then start over with the next byte. This is how jq works

                                                                                                                              1. 2

                                                                                                                                Interesting – so jq will return valid results from malformed json? That’s rather unexpected for a strict format.

                                                                                                                                1. 2

                                                                                                                                  Not malformed JSON, but it is a stream processor so it can process a stream of valid json items yes

                                                                                                                                  1. 3

                                                                                                                                    I’m not sure what you mean by ‘not malformed json’: will it produce output for something like:

                                                                                                                                    {"array of objects": [
                                                                                                                                    	{
                                                                                                                                    		"index": 0,
                                                                                                                                    		"index start at 5": 5
                                                                                                                                    	},
                                                                                                                                    	{
                                                                                                                                    		"index": 1,
                                                                                                                                    		"index start at 5": 6
                                                                                                                                    	},
                                                                                                                                    	{
                                                                                                                                    		"index": 2,
                                                                                                                                                  [[[[[[[[[[[[
                                                                                                                                                 AND IT GOES OFF THE RAILS!
                                                                                                                                                l1lk12j304580q1298wdafl;kjasc vlkawd f[0asfads
                                                                                                                                    

                                                                                                                                    If it streams, that means it will produce output from the start of the botch. If it catches errors and only prints within valid json objects, it needs unbounded buffering.

                                                                                                                                    For loosely defined formats where a truncation of the format is still valid, I’d expect the former, but not for a self contained, validatable, delimited format like json.

                                                                                                                                    1. 1

                                                                                                                                      There is no valid object in your example, so it errors trying to pull the first item out of a stream. Here’s an example I think you were trying to go for:

                                                                                                                                      {
                                                                                                                                      	"a": 1,
                                                                                                                                      	"b": [1,2]
                                                                                                                                      }{barf,"
                                                                                                                                      

                                                                                                                                      This will produce output for the first object and a syntax error for the second

                                                                                                                                      1. 1

                                                                                                                                        There are multiple valid sub-objects. If no output is produced, then the elements can’t be processed incrementally.

                                                                                                                                        You can chose to restrict yourself if you want, but this seems unnecessary.

                                                                                                                            3. 3

                                                                                                                              Using tabular data makes it more annoying to work with anything that uses multiple properties of an object. For example, here’s a pipeline to get all of the interfaces with a valid global IPv6 address:

                                                                                                                              > ip -j addr | jq --raw-output '.[] | select(.addr_info | any(.family == "inet6" and .scope == "global")) | .ifname'
                                                                                                                              lxdbr0
                                                                                                                              

                                                                                                                              How would you do that if your input data was like

                                                                                                                              0.ifname lxdbr0
                                                                                                                              0.addr_info.0 inet global
                                                                                                                              0.addr_info.1 inet6 global
                                                                                                                              ...
                                                                                                                              

                                                                                                                              ?

                                                                                                                              e: Also, afaik none of the standard Unix string-manipulation tools deal well with escaped spaces (unless you replaced them with something else entirely, which would lose you a lot of the readability).

                                                                                                                              1. 2

                                                                                                                                if the format wasn’t a naive translation from json, but used the tabular format better:

                                                                                                                                0 ifname lxdbr0
                                                                                                                                0 addr_info inet global
                                                                                                                                0 addr_info inet6 global
                                                                                                                                

                                                                                                                                Then here’s a translation – presumably, in a world where this format took over, awk would be augmented to know about the column escaping:

                                                                                                                                    % awk '
                                                                                                                                    	$2=="ifname"{name[$1]=$3}
                                                                                                                                    	$2=="addr_info"&& $3=="inet6" { print name[$1] }
                                                                                                                                    '
                                                                                                                                

                                                                                                                                With this same format, where the ‘ifname’ applies to all subsequent attributes, and indentation is taken as an empty inital column (" foo bar" == "'' foo bar"), you could clean that output up further:

                                                                                                                                ifname lxdbr0
                                                                                                                                    addr_info inet global
                                                                                                                                    addr_info inet6 global
                                                                                                                                

                                                                                                                                and then to parse it, you could do this:

                                                                                                                                    % awk '
                                                                                                                                    	$1=="ifname"{name=$1}
                                                                                                                                    	$2=="addr_info"&& $3=="inet6" { print name }
                                                                                                                                    '
                                                                                                                                

                                                                                                                                which is just as rigorous as the json, but infinitely more readable: I don’t need tools if I want to interact with it directly, which means that tools no longer need output modes, and I no longer need to mentally translate between output modes when interacting with the tools.

                                                                                                                                The original can be done similarly, you’d just need to split json-influenced path on ‘.’ to get the useful info:

                                                                                                                                    % awk '
                                                                                                                                    	$2=="ifname"{
                                                                                                                                    	     split($1, path, ".");
                                                                                                                                    	     name[path[0]]=$2
                                                                                                                                    	}
                                                                                                                                    	$2=="addr_info"&& $3=="inet6" {
                                                                                                                                    	    split($1, path, ".");
                                                                                                                                    	    print name[path[0]]
                                                                                                                                    	}
                                                                                                                                    '
                                                                                                                                

                                                                                                                                As far as standard unix utilities: I thought the discussion was between converting to json and converting to a simpler, friendier format. That said, it does work better with existing tools, which would make the transition easier.

                                                                                                                                1. 1

                                                                                                                                  But now your code has a bug: if it has multiple ipv6 addresses, it’ll print out the interface name more than once. (I don’t know if this is sensible in this specific case, but if you adjust it to “an ipv4 or ipv6 global address” it clearly could happen).

                                                                                                                                  And I think in general this sort of approach is going to be more sensitive to the order fields are printed out in than a JSON object, where the order of the fields is (generally) irrelevant.

                                                                                                                                  1. 1

                                                                                                                                    And I think in general this sort of approach is going to be more sensitive to the order fields are printed out in than a JSON object, where the order of the fields is (generally) irrelevant.

                                                                                                                                    Yes, taking advantage of the order is definitely a useful feature of this format.

                                                                                                                                    1. 2

                                                                                                                                      I don’t consider relying on the order, which seems more like an implementation detail to me, to be a feature.

                                                                                                                                      1. 2

                                                                                                                                        It’s an implementation choice, not an implementation detail. And it allows a great deal more expressiveness if you chose to use it effectively.

                                                                                                                            1. 8

                                                                                                                              It’s interesting to see people pushing for the concepts that Unix explicitly threw out early on. To quote ken:

                                                                                                                              Many familiar computing ‘concepts’ are missing from UNIX. Files have no records. There are no access methods. User programs contain no system buffers. There are no file types. These concepts fill a much-needed gap. I sincerely hope that when future systems are designed by manufacturers the value of some of these ingrained notions is reexamined. Like the politician and his ‘common man’, manufacturers have their ‘average user’.

                                                                                                                              1. 2

                                                                                                                                Ooh yes this! This is why I find it so off-putting when folks say “You want to change $X. That makes it no longer UNIX”

                                                                                                                                Balderdash! :) UNIX is a living organism, or should be. It can evolve. Because we all care about it we should shepherd its evolution with loving care but, over time, change it must or it will be overtaken by whatever the Next Big Thing is :)

                                                                                                                                (And yes I know it’s thrived for half a century already but even great dynasties must eventually evolve or fall.)

                                                                                                                              1. 7

                                                                                                                                How does git9 support staging only part of the changes to a file? From what I can tell it does not.

                                                                                                                                I would describe any version control system which doesn’t allow me to commit only some of the hunks in a file or edit the patch myself as “boneheaded.”

                                                                                                                                1. 9

                                                                                                                                  Can I quote you on the boneheaded bit? It seems like a great endorsement.

                                                                                                                                  Anyways – this doesn’t fit my workflow. I build and test my code before I commit, and incrementally building commits from hunks that have never been compiled in that configuration is error prone. It’s bad enough committing whole files separately – I’m constantly forgetting files and making broken commits as a result. I’ve been using git since 2006, and every time I’ve tried doing partial stages, I’ve found it more pain than it was worth.

                                                                                                                                  So, for me (and many others using this tool) this simply isn’t a feature that’s been missed.

                                                                                                                                  That said, it’s possible to build up a patch set incrementally, and commit it: there are tools like divergefs that provide a copy on write view of the files, so you can start from the last commit in .git/fs/HEAD/tree, and pull in the hunks from your working tree that you want to commit using idiff. That shadowed view will even give you something that you can test before committing.

                                                                                                                                  If someone wanted to provide a patch for automating this, I’d consider it.

                                                                                                                                  1. 5

                                                                                                                                    Thanks for this response - its a very clear argument for a kind of workflow where staging partial changes to a file doesn’t make sense.

                                                                                                                                    I work primarily as a data scientist using languages like R and Python which don’t have a compilation step and in which it is often the case that many features are developed concurrently and more or less independently (consider that my projects usually have a “utils” file which accumulates mostly independent trivia). In this workflow, I like to make git commits which touch on a single feature at a time and its relatively easy in most cases to select out hunks from individual files which tell that story.

                                                                                                                                  2. 5

                                                                                                                                    As somebody who near-exclusively uses hggit, and hence no index, I can answer this from experience. If you want to commit only some of your changes, that’s what you do. No need to go through an index.

                                                                                                                                    Commit only some of your changes?
                                                                                                                                    hg commit --interactive
                                                                                                                                    git commit --patch

                                                                                                                                    Add more changes to the commit you’re preparing?
                                                                                                                                    hg amend --interactive
                                                                                                                                    git commit --amend --patch

                                                                                                                                    Remove changes from the commit?
                                                                                                                                    hg uncommit --interactive
                                                                                                                                    git something-complicated --hopefully-this-flag-is-still-called-patch

                                                                                                                                    The main advantage this brings: because the commit-you’re-working-on is a normal commit, all the normal verbs apply. No need for special index-flavoured verbs/flags like reset or diff --staged. One less concept.

                                                                                                                                    If you want to be sure you won’t push it before you’re done, use hg commit --secret on that / those commits; then hg phase --draft when you’re ready.

                                                                                                                                    1. 2

                                                                                                                                      Actually sounds pretty good! Anyone know if such a thing is possible with fossil?

                                                                                                                                    2. 4

                                                                                                                                      You can do it like hg does with shelve - always commit what is on disk, but allow the user to shelve hunks. These can be restored after the commit is done. Sort of a reverse staging area.

                                                                                                                                      1. 3

                                                                                                                                        I haven’t tried git9, but it should still be possible to support committing parts of files in a world without a staging area. As I imagine it, the --patch option would just be on the commit command (instead of the add command).

                                                                                                                                        Same with all other functionality of git add/rm/mv – these commands wouldn’t exist. Just make them options of git commit. It doesn’t matter if the user makes a commit for each invocation (or uses --amend to avoid that): If you can squash, you don’t need a staging area for the purpose of accumulating changes.

                                                                                                                                        Proof of concept: You can already commit parts of files without using the index, and without touching the workspace: Just commit everything first, then split it interactively using git-revise (yes, you can edit the inbetween patch too). I even do that quite often. Splitting a commit is something you have to do sometimes anyway, so you might as well learn that instead. When you can do this – edit the commit boundaries after the fact, you no longer need to get it perfect on the first try, which is all that the staging area can help you with.

                                                                                                                                        Rather than a staging area, I wish I could mark commits as “unfinished” (meaning that I don’t want to push them anywhere), and that I could refer to these unfinished commits by a never-changing id that didn’t change while working on them.

                                                                                                                                        1. 3

                                                                                                                                          This fits my mental model much better too. Any time I have files staged and am not in the “the process of committing” I probably messed someting up. The next step is always clear the index or add everything to the index and commit.

                                                                                                                                          1. 3

                                                                                                                                            -p is indeed available on commit. And also on stash.

                                                                                                                                          2. 2

                                                                                                                                            I feel the Plan 9 way would be to use a dedicated tool to help stash away parts of the working directory instead.

                                                                                                                                            1. 2

                                                                                                                                              I would describe any version control system which doesn’t allow me to commit only some of the hunks in a file or edit the patch myself as “boneheaded.”

                                                                                                                                              I would describe people wedded to the index in softer but similar terms.

                                                                                                                                              Here’s the thing: if you’re committing only part of your working tree, then you are, by definition, committing code that you have never run or even attempted to compile. You cannot have tested it, you cannot have even built it, because you can’t do any of those things against the staged hunks. You’re making a bet that any errors you make are going to be caught either by CI or by a reviewer. If you’ve got a good process, you’ve got good odds, but only that: good odds. Many situations where things can build and a reviewer might approve won’t work (e.g., missing something that’s invoked via reflection, missing a tweak to a data file, etc.).

                                                                                                                                              These aren’t hypotheticals; I’ve seen them. Many times. Even in shops with absolute top-tier best-practices.

                                                                                                                                              Remove-to-commit models (e.g. hg shelve, fossil stash, etc.) at least permit you not to go there. I can use pre-commit or pre-push hooks to ensure that the code at the very least builds and passes tests. I’ve even used pre-push hooks in this context to verify your build was up-to-date (by checking whether a make-like run would be a no-op or not), and rejected the push if not, telling the submitter they need to at least do a sanity check. And I have, again, seen this prevent actual issues in real-world usage.

                                                                                                                                              Neither of these models is perfect, both have mitigations and workarounds, and I will absolutely agree that git add -p is an incredibly seductive tool. But it’s an error-prone tool that by definition must lead to you submitting things you’ve never tested.

                                                                                                                                              I don’t think my rejection of that model is boneheaded.

                                                                                                                                              1. 6

                                                                                                                                                You cannot have tested it, you cannot have even built it, because you can’t do any of those things against the staged hunks.

                                                                                                                                                Sure you can, I do this all the time.

                                                                                                                                                When developing a feature, I’ll often implement the whole thing (or a big chunk of it) in one go, without really thinking about how to break that up into commits. Then when I have it implemented and working, I’ll go back and stage / commit individual bits of it.

                                                                                                                                                You can stage some hunks, stash the unstaged changes, and then run your tests.

                                                                                                                                                1. 5

                                                                                                                                                  Here’s the thing: if you’re committing only part of your working tree, then you are, by definition, committing code that you have never run or even attempted to compile.

                                                                                                                                                  I mean, sure. And there are many places where this matters.

                                                                                                                                                  Things like cleanly separating a bunch of changes to my .vimrc into logical commits, and similar sorts of activity, are… Not really among them.

                                                                                                                                                  1. 4

                                                                                                                                                    Here’s the thing: if you’re committing only part of your working tree, then you are, by definition, committing code that you have never run or even attempted to compile. You cannot have tested it, you cannot have even built it, because you can’t do any of those things against the staged hunks

                                                                                                                                                    While this is true, it isn’t quite as clear-cut as you make it seem. The most common case I have for this is fixing typos or other errors in comments or documentation that I fixed while adding comments / docs for the new feature. I don’t want to include those changes in an unrelated PR, so I pull them out into a separate commit and raise that as a separate (and trivial to review) PR. It doesn’t matter that I’ve never tried to build them because there are no changes in the code, so they won’t change the functionality at all.

                                                                                                                                                    Second, just because I haven’t compiled them when I commit doesn’t mean that I haven’t compiled them when I push. Again, my typical workflow here is to notice that there are some self-contained bits, commit them, stash everything else, test them, and then push them and raise a PR, before popping the stash and working on the next chunk. The thing that I push is tested locally, then tested by CI, and is then a small self-contained thing that is easy to review before merging.

                                                                                                                                                    But it’s an error-prone tool that by definition must lead to you submitting things you’ve never tested.

                                                                                                                                                    And yet, in my workflow, it doesn’t. It allows you to submit things that you’ve never tested, but so does any revision-control system that isn’t set up with pre-push hooks that check for tests (and if you’re relying on that, rather than pre-merge CI with a reasonable matrix of targets, as any kind of useful quality bar then you’re likely to end up with a load of code that ‘works on my machine’).

                                                                                                                                                    1. 3

                                                                                                                                                      I mentioned there are “mitigations and workarounds,” some of which you’re highlighting, but you’re not actually disagreeing with my points. Git is the only SCM I’ve ever encountered where make can work, git diff can show nothing, git commit won’t be a no-op, and the resulting commit can’t compile.

                                                                                                                                                      And the initial comment I’m responding to is that a position like mine is “boneheaded”. I’m just arguing it isn’t.

                                                                                                                                                    2. 1

                                                                                                                                                      Gotta admit, this is a very solid argument.

                                                                                                                                                    3. 1

                                                                                                                                                      That’s a good question. I imagine something like

                                                                                                                                                      @{cp file /tmp && bind /tmp/file file && ed file && git/commit file}
                                                                                                                                                      

                                                                                                                                                      should work.

                                                                                                                                                    1. 6

                                                                                                                                                      this looks cool, and kudos for using LMDB. The 35MB binary size leapt out at me, though — what contributes to that? (LMDB itself is only ~100KB.) Lots of stemming tables and stop-word lists?

                                                                                                                                                      1. 4

                                                                                                                                                        Rust isn’t entirely svelte either. It doesn’t take too many transitive dependencies before you’re at 10MB.

                                                                                                                                                        1. 3

                                                                                                                                                          How do you call yourself minimalist when you’re pulling in that many dependencies?

                                                                                                                                                          1. 11

                                                                                                                                                            To be clear, I’m not the author. But if I were, this would come off more as a personal dig than a real question. Be kind. :)

                                                                                                                                                            1. -2

                                                                                                                                                              Holy passive aggressiveness, batman :)

                                                                                                                                                              Remind me to avoid rhetorical questions in the future.

                                                                                                                                                            2. 7

                                                                                                                                                              It depends what you compare it to. An Elasticsearch x86-64 gzipped tarball is at > 340MB https://www.elastic.co/downloads/elasticsearch

                                                                                                                                                          2. 4

                                                                                                                                                            If anyone wants to figures this out, two tools to use are:

                                                                                                                                                            I am 0.3 certain that at least one significant component is serialization code: rust serialization is fast, but is rumored to inflate binaries quite a bit. I haven’t measured that directly, but I did observe compile time hits due to serialization.

                                                                                                                                                            1. 3

                                                                                                                                                              My guess is assets for the web UI are packed into the binary

                                                                                                                                                            1. 3

                                                                                                                                                              What git command line do I run to remove a maintainer?

                                                                                                                                                              1. 9
                                                                                                                                                                 git clone https://github.com/git/git
                                                                                                                                                                 find git -type f -exec sed -i "s/git/nit/g" {} \; // And some more refactoring
                                                                                                                                                                 git commit -am "Fork git to nit"
                                                                                                                                                                

                                                                                                                                                                Git is a dvcs after all.

                                                                                                                                                                1. 1

                                                                                                                                                                  Nah, you also need to update every shell script to use not instead of git.

                                                                                                                                                                  My theory is that this is why CLI reforms suchas gitless haven’t taken off.

                                                                                                                                                                2. 1

                                                                                                                                                                  No need to be authoritarian, you can always make your own fork. Or simply apply the patches.

                                                                                                                                                                  1. 13

                                                                                                                                                                    It’s not authoritarian to think that people who aren’t engaging in good faith shouldn’t be running core tooling projects used by the entire software industry. Applying a fork doesn’t solve the issue that a toxic person is leading a massive community effort.

                                                                                                                                                                    Furthermore, this isn’t about solving it for me – I know how to use git already. It’s about increasing accessibility for newcomers, who won’t know how to apply patches and recompile.

                                                                                                                                                                    1. 5

                                                                                                                                                                      So long as you only think so.

                                                                                                                                                                      Where can I see a single example of engaging in bad faith, or any toxicity for that matter?

                                                                                                                                                                      It could be argued that core tooling shouldn’t change at all, and a change like this would confuse the documentation, or break things. Though this has happened already with the master → main switch, as well as with some changes to the porcelain. git is rather bad from both viewpoints.

                                                                                                                                                                      1. 2

                                                                                                                                                                        It could be argued that core tooling shouldn’t change at all, and a change like this would confuse the documentation, or break things.

                                                                                                                                                                        Thank goodness neither of those points is relevant to the linked discussion. All of this stuff is backwards-compatible.

                                                                                                                                                                        1. 6

                                                                                                                                                                          This is under the confusing of documentation. The more ways to do it, the more confused it is. Changing terms in any place would also be a source of confusion. I admit to not having read it in detail, but no miracle is possible.

                                                                                                                                                                          I mostly just scanned it, early on found out it literally lies (“everyone” means “people I agree with”), and figured out it’s just someone publicly moaning, so not worth the attention.

                                                                                                                                                                          And then there was this comment where someone disrespects other people’s work, of course.

                                                                                                                                                                      2. 3

                                                                                                                                                                        You’re free to fork and improve git. Or even implement your own from scratch.

                                                                                                                                                                        The more forks and independent implementations, the better the ecosystem – and maybe some ideas filter across.

                                                                                                                                                                        1. 4

                                                                                                                                                                          Furthermore, this isn’t about solving it for me – I know how to use git already. It’s about increasing accessibility for newcomers, who won’t know how to apply patches and recompile.

                                                                                                                                                                          1. 2

                                                                                                                                                                            Well then you better put in the effort to get your fork into distribution repositories.

                                                                                                                                                                            1. 1

                                                                                                                                                                              Why would you be unable to provide downloads and packages?

                                                                                                                                                                              It’s not like X.org, Jenkins, LibreOffice, and other forks are significantly harder to install than the original.

                                                                                                                                                                          2. 2

                                                                                                                                                                            Yes it is; the way the term “shouldn’t” is used there presumes it is or should be in your authority.

                                                                                                                                                                            1. 1

                                                                                                                                                                              Applying a fork doesn’t solve the issue that a toxic person is leading a massive community effort.

                                                                                                                                                                              Sure it does: if you do better, people switch projects, and the origin of the fork stops being a massive community effort. How many Hudson developers are there today? How many Jenkins developers are there today?

                                                                                                                                                                              How about Gogs vs Gittea?

                                                                                                                                                                        1. 13

                                                                                                                                                                          I am intrigued by the framing of the Sturm und Drang about the state of the web as being driven, to some significant degree, by politics internal to Google.

                                                                                                                                                                          1. 26

                                                                                                                                                                            As I stated earlier this week, promo packets are what’ll do in the web.

                                                                                                                                                                            I think a lot of developers simply lack the interest in context to process the realpolitik that shapes and distorts the fabric of spacetime for our industry.

                                                                                                                                                                            If you refuse to understand that Google’s whole business is threatened by adblockers, you probably would be confused at the some of the changes to web extensions webRequest that make that work harder. If you don’t understand the desire to make SEO, crawling, and walled gardens easier AMP probably seemed like a great return to roots.

                                                                                                                                                                            Other companies do this too, of course. If you didn’t know about OS/2 Warp some of the Windows APIs probably seemed weird. If you don’t know about Facebook trying to own everything you do then the lack of email signup for Oculus probably seems strange. If you invested heavily into XNA you probably got bit when internal shifts at Microsoft killed XNA off. If you don’t know about internal Canonical and RHEL shenanigans, systemd and other things probably are a surprise.

                                                                                                                                                                            Developers need to pay as much attention to the business dependencies as the technical ones.

                                                                                                                                                                            1. 6

                                                                                                                                                                              What do you mean by promo packets? I’m not familiar with this term.

                                                                                                                                                                              1. 21

                                                                                                                                                                                When you’re doing a performance review at Google, you can request a promotion. If you do this, you put together a ‘packet’ including the impactful work you’ve done. New work is rewarded heavily, maintenance less so. For senior engineers, shipping major projects with an industry wide impact is a path to promotion.

                                                                                                                                                                                1. 30

                                                                                                                                                                                  Which means Google rewards doing something new for the sake of doing something new. It’s tremendously difficult to get promoted by improving older systems. Crucially, you often need to demonstrate impact with metrics. The easiest way to do that is sunset an old system and show the number of users who have migrated to your new system, voluntarily or otherwise.

                                                                                                                                                                                  1. 16

                                                                                                                                                                                    Ew. Thanks for the insight. But ew.

                                                                                                                                                                                  2. 1

                                                                                                                                                                                    Is there any material evidence suggesting that someone’s promotion is the reason that chrome will remove alert? Obviously google will push the web in the direction that juices profit, but an individual promotion? Seems like a red herring.

                                                                                                                                                                                    1. 6

                                                                                                                                                                                      It is often difficult to pick it apart as it’s rarely a single person or team. What happens in large organizations is that there is a high-level strategy and different tactics spring from that. Then, there are metrics scorecards, often based on a proxy, which support the tactics delivering the strategy. This blurs the picture from the outside and means that rarely one person is to blame, or has singular control over the successes.

                                                                                                                                                                                      I haven’t followed the alert situation very closely, but someone familiar with large organizations can get a good read from the feature blurb. There is a strong hint from the language that they are carrying a metric around security, and possibly one around user experience. This translates to an opportunity for a team to go and fix the issue directed by the metrics since it’s quantifiable. The easiest way to start might be to work back from what moves the metric, but this is a very narrow perspective.

                                                                                                                                                                                      Developers may know what the best things to work on having been a developer in that area for 10 years, but their impact tracks towards those top-level strategies. Management can’t justify promotion because someone else is very focused on metrics that drive the strategy.

                                                                                                                                                                                      In lots of places this is called alignment. Your boss may only support X amount of work on non-aligned projects, if you do at least Y amount of work on Y projects. A classic big company alignment example is a talented person in a support department. If they can fix your biggest problem at the source it’d be best to let them do this. However, metrics incentivize assigning them to solving N support cases per week and other metrics designed for lower-skilled individuals instead of working on fixing root causes. Eventually, they leave unless you have smart management taking calculated risks, manage the metrics at the team level so the team is not noticed working the way it wants, seeking paths for talented people to work on the product, etc.

                                                                                                                                                                                      1. 1

                                                                                                                                                                                        Many of us understand how metrics and incentives at tech companies work. Was just pointing out that it’s a bold claim to assume that chrome is removing alert due to an individual seeking a promotion.

                                                                                                                                                                                2. 3

                                                                                                                                                                                  I think about this in terms of my time at Apple – like, people ascribed all kinds of causes to various seemingly peculiar Apple decisions that to those of us on the inside were obvious cases of internal politicking leaking out.

                                                                                                                                                                                  1. 2

                                                                                                                                                                                    WHATWG is a consortium of multiple companies so I’m curious why everyone is pointing the finger at Google here, or is the assertion that Google has so much power over the WHATWG and Chrome at this point that there’s no ability for other companies to dissent? (And I mean we all know that the W3C lost and WHATWG won so a consortium of vendors is the web.)

                                                                                                                                                                                    1. 9

                                                                                                                                                                                      The multiple companies are Apple, Google, Microsoft, and Mozilla (https://whatwg.org/sg-agreement#steering-group-member, section 3.1b) Of the three, only Apple develops a browser engine that is not majority funded by Google.

                                                                                                                                                                                      1. 4

                                                                                                                                                                                        I’m pretty sure Apple develops a browser engine that is majority funded by Google: https://www.theverge.com/2020/7/1/21310591/apple-google-search-engine-safari-iphone-deal-billions-regulation-antitrust

                                                                                                                                                                                        1. 5

                                                                                                                                                                                          That’s some pretty weird logic.

                                                                                                                                                                                          The browser engine Apple creates is used for a whole bunch of stuff across their platforms, besides Safari:

                                                                                                                                                                                          Mail, iMessage, Media Store fronts, App Store fronts.. Those last two alone produce revenue about 4x what Google pays Apple to make it the default.

                                                                                                                                                                                          Do I wish they’d get more people using alternatives and pass on the google money? Sure. Is there any realistic chance their ability to fund Safari and/or Webkit would be harmed by not taking the google money? Seems pretty unlikely.

                                                                                                                                                                                          1. 1

                                                                                                                                                                                            I don’t think the stores use WebKit. They didn’t last time I investigated.

                                                                                                                                                                                          2. 4

                                                                                                                                                                                            It’s true-ish. But I’m sure the most profitable company in the world probably doesn’t require that money and would be able to continue without.

                                                                                                                                                                                            1. 3

                                                                                                                                                                                              You don’t become the most profitable company by turning down revenue.

                                                                                                                                                                                          3. 1

                                                                                                                                                                                            Right I was just wondering if folks think the WHATWG is run solely by Google at this point. Thanks for the clarification.

                                                                                                                                                                                          4. 5

                                                                                                                                                                                            The point is that many of those new APIs don’t happen in standards groups at all. Exactly because they’d require more than one implementation.

                                                                                                                                                                                            1. 5

                                                                                                                                                                                              Yes, this. Google’s play here is less about controlling standards per se (ed: although they do plenty of that too) and more about getting everyone to treat Whatever Google Does as the standard.

                                                                                                                                                                                            2. 4

                                                                                                                                                                                              WHATWG was run at inception by a Googler and was created to give Google even more power over the standards process than the hopelessly broken W3C already gave them. That they strong armed Mozilla into adding their name or that Apple (who was using the same browser engine at the time) wanted to be able to give feedback to the org doesn’t change the Googlish nature of its existence, IMO

                                                                                                                                                                                          5. 12

                                                                                                                                                                                            Like it or not, Google is the www. It is the driving force behind the standards, the implementations (other than Safari), and the traffic that reaches websites.

                                                                                                                                                                                            It would be odd if Google’s internal politics didn’t leak into the medium.

                                                                                                                                                                                            1. 6

                                                                                                                                                                                              Right, it’s just … one of those things that is obvious in retrospect but that I would never be able to state.

                                                                                                                                                                                              1. 9

                                                                                                                                                                                                A lot of people seem to think that standards work is a bit like being in a university - people do it for the love of it and are generally only interested in doing what’s best for all.

                                                                                                                                                                                                In reality it’s a bunch of wealthy stakeholders who realize that they need to work together for everyone’s best - they’re not a monopoly, yet - but in the meantime it behooves them to grab every advantage they can get.

                                                                                                                                                                                                As mentioned in the post, standards work is hard and time-consuming, and if an organisation can assign a dedicated team to work on standards, that work will get implemented.

                                                                                                                                                                                                1. 3

                                                                                                                                                                                                  Universities work like that too now

                                                                                                                                                                                                  1. 1

                                                                                                                                                                                                    This is sadly true.