Threads for eandre

  1. 5

    This is a great testing approach! Without going into details, the bulk of tests indeed should be just data, with just a smidge of driver code on top.

    Another important consideration is that the existence of a package for this is mostly irrelevant— you should just build it yourself (about 200 loc for MVP I suppose?) if there isn’t one for your ecosystem. It’s not a hidden gem, it’s more of an leftpad (considering the total size of tests you’d write with this).

    One thing I would warn about here is that these specific tests, which work via process spawning, are quite slow. If you have hundreds/thousands of them, process spawning would be a major bottleneck. Unless the SUT spawns processes itself, you’d better of running a bunch of tests in process.

    I think the example from the post shows this problem:

    https://github.com/encoredev/encore/blob/main/parser/testdata/cron_job_definition.txt

    Rather than invoking parse process via testscript and asserting stdout/stderr, this should have used just the txtar bit, and the parse function for an order of magnitude faster test.

    Some real world experience with this problem:

    • in Kotlin native, switching from many processes to single process for tests reduced the overall time massively (don’t remember the number)
    • rustc testsuite is very slow due to this problem. However, as rustc build times are just atrocious due to suboptimal bootstrapping setup (both compiler and stdlib/runtime are magic code which need to be bootstrapped; a better setup is where compiler is just “normal Rust crate”), the testing time overall isn’t a pain point.
    • cargo testsuite is very slow, but, as cargo is fundamentally about orchestrating processes, there isn’t much one can do there.
    1. 5

      Author of the post here. Note that “parse” here does invoke a function, not a binary. You have to specify “exec” to execute a subprocess.

      1. 2

        Ah, I see now. Yeah, then this is just the perfect setup!

      2. 2

        Another important consideration is that the existence of a package for this is mostly irrelevant— you should just build it yourself (about 200 loc for MVP I suppose?) if there isn’t one for your ecosystem. It’s not a hidden gem, it’s more of an leftpad (considering the total size of tests you’d write with this).

        You seem to be saying (or implying) two different things here: (1) this is not a hidden gem: it’s like leftpad, and therefore you should write this yourself (the last clause is mine, but it seems—maybe?—implied by the first clause); (2) if this didn’t already exist, you could write it yourself. (2) seems fine, though—for the record—not true for many values of yourself. (I doubt I would write this myself, to be explicit.) (1) seems like a terrible case of NIH syndrome. This package is significantly more than 200 lines of code and tests. Why would I want to reinvent that?

        Finally my recollection is that left-pad was four lines, maybe less. There’s simply no comparison between the two projects. (I checked, and the original version of left-pad was 12 lines: https://github.com/left-pad/left-pad/commit/2d60a7fcca682656ae3d84cae8c6367b49a5e87c.)

        1. 1

          You seem to be saying (or implying) two different things here: (1)

          Tried to hedge against that reading with explicit “if there isn’t one for your ecosystem”, but apparently failed :-)

          I doubt I would write this myself, to be explicit This package is significantly more than 200 lines of code

          So this line of thinking is what I want to push back a bit, and the reason why I think that part of my comment is useful. It did seem to me a bit that this was presenting as a complicated black box which you are lucky to get access to, but implementing which is out of scope. I do want to say that it simpler than it seems, in the minimal configuration. the txtar part is splitting by regex and collecting the result in Map<Path, String>. The script part is splitting by lines, running each line as a subprocess and comparing results. This is simple, direct programming using basic instruments of the language/stdlib – no fancy algorithms, no deep technology stacks, no concurrency.

          Now, if you make this the main driver for your testsuite, you’d want to add some fanciness there (smarter diff, colored output, custom commands). But such needs arising mean that you testsutie is using the tool very heavily. If you have 10_000 lines of tests, 1000 lines of a test driver are comparatively cheap. left-pad is small relative to its role in the application – padding some strings here and there. Test driver is small relative to its role in the application – powering majority of tests in a moment, supporting evolution of the test suite over time, and being a part of every edit-compile-test cycle for every developer.

      1. 2

        There are lots of “let’s make infrastructure hosting simple again” companies (Render, Railway, etc), but they always seem to me to be in quite precarious situations: they can be great for small scale use, but as startups grow they will want access to the complete suite of services offered by major cloud providers. Their best customers are constantly leaving.

        Disclosure: I’m obviously biased; I’m one of the founders of Encore, where we’re working on decoupling the developer experience from the cloud provider. We’re bringing a Heroku-level (or rather, even better) developer experience to your own cloud account.

        It always seemed strange to me to couple infrastructure with developer experience, when the incentives aren’t aligned at all. Just like gym memberships, cloud providers have little incentive to tell you how much you could be saving by setting up something simpler that does the same trick. By separating the developer experience from the hosting provider this problem goes away.

        1. 2

          This is really interesting, but as far as I can tell from scanning the doc there’s no provision for composition. If I want to use actual subqueries or small chunks of queries in a few different places, is there any way to do that, or do I have to manually compose them in my .sql files?

          1. 2

            Yeah that’s a good point, and definitely something that would be nice to see.

              1. 1

                Yeah, we use squirrel at $JOB. I’m pretty happy with it.

            1. 1

              Shouldn’t type ID[T ResourceType] xid.ID be type ID[T ~ResourceType] xid.ID instead?

              Later in the article, ResourceType goes from concrete to an interface, which resolves the problem.

              1. 1

                I started skeptical of this, but it’s actually a pretty good use of generics. I think in terms of pure efficiency, if they had defined an ID as an array (not slice!) of 24 bytes with the first 4 as a type code, that would actually be smaller and faster than using two machine words for an interface, but this is still quite good.

                1. 2

                  With monomorphization the approach with generics avoids the overhead from indirection through an interface, so the memory representation is just a [20]byte.

                  1. 1

                    Actually the memory representation is [12]byte. As André mentioned the benefit of the generics means we don’t have to carry around the type information at runtime within the data itself.

                    However before generics encoding the first couple of bytes as a type identifier would have been a neat way of doing it and a natural extension to how XID encoders day within the IDs.

                    We would have still gone for human readability in wire formats of the resource type (I.e I’d like to see if an id is a trace, app, user, log line etc).

                    Another advantage to encoding it as a type rather than in the byte array is there’s not upper limit on how many types of resources we could create. (Although admittedly 4 bytes would give more than enough)

              1. 3

                I frequently see “newtypes” (in Haskell-speak) and sometimes phantom type parameters for IDs, but it’s really not clear to me how much benefit this approach produces over a type alias. That is, for the classes of bugs that type ID string catches, but type ID = string would not, how often does those actually occur? My experience is: Not often.

                1. 1

                  A type alias of type ID = string would make all different types of identifiers equivalent from the type checker’s perspective. Since identifiers are so pervasive in backend development this was the main class of bugs we wanted to prevent, as it’s quite easy to call a function with the wrong type of identifier (especially after refactoring). But YMMV of course.

                  1. 3

                    My experience is that this class of bugs is relatively uncommon and easy to catch/fix.

                    Moreover, names seem to provide better bang-for-buck in both solving this problem and other readability improvements. In go, this means using structs instead of positional arguments.

                    For example, instead of:

                    func AttachFile(db DB, emailID string, fileID string) (attachmentID string, err error) {
                      // ...
                    }
                    

                    Do:

                    type AttachFile struct {
                      EmailID string
                      FileID string
                    }
                    
                    func (args AttachFile) Do(db DB) (attachmentID string, err error) {
                      // ...
                    }
                    

                    Now it’s extremely unlikely you’ll accidentally swap the IDs because the call site will be much clearer. As a bonus, this makes it easier to address a laundry list of other concerns, such as logging, authorization, validation, etc.

                    1. 3

                      The fun thing is that if you squint, types are names that are checked by the computer. Granted, there are always trade offs involved, but for me, it makes it that little bit easier to make the integration reflect the problem I’m solving. And the error proofing is a bonus, too.

                      1. 2

                        Fun fact; Go actually calls “type Foo string” a NamedType in the spe

                    2. 2

                      I avoid this in my codebase by making all the sequences (in test mode) increment by 200 per ID, and starting them at different offsets for each table (fewer than 200 tables). This ensures that no identifier can match a row in another table.

                  1. 2

                    I wish there was a great way of doing composition of presentation logic without having to pull in JavaScript. Not so much because modern JS is so bad, but because the build system and the leaky abstractions that mostly work but then don’t and it ends up costing me several hours to figure out what’s wrong.

                    I haven’t come across a good solution for this. NextJS is great and I use it all the time, but the static website still involves tons of JavaScript. Would be nice to find a way to do composition without bringing in all that runtime dependency.

                    1. 3

                      As a veteran of many battles with webpack configuration: the “build system” situation is getting better. esbuild has given me no cause for complaint so far; it’s simpler and hundreds of times faster.

                      1. 1

                        In theory what you’re asking for is what Astro does, but I haven’t used it yet.

                      1. 24

                        It works incredibly well in my experience. There are two main pain points.

                        First is related to certain projects not really following the rules: grpc-go and Kubernetes. grpc-go in particular has caused lots of headaches due to removing APIs in minor releases. Kubernetes has had a crazy build system that preceded Go modules, so maybe it will get better over time.

                        The second pain point is around managing multi-module repositories. It kinda works by using replace directives but it’s pretty hacky, to the point where I’m actively avoiding using multiple modules in projects even when it “should” be a good fit. There’s a proposal on the table to add workspace support to the go command that might make this work better.

                        Overall, despite these pain points, it’s easily the best package manager I’ve used. It’s simple and predictable.

                        1. 2

                          I love it! What did you use to create the svg annotations?

                          1. 1

                            Thanks! I used Illustrator and then spent an hour cleaning it up by hand :)

                          1. 8

                            Kudos on a cheat sheet that’s small and clear enough to actually be a cheat sheet! Very nicely done!

                            1. 1

                              Thank you, that was the idea!

                            1. 12

                              I grew tired of always looking up the magic invocation to add, remove, upgrade dependencies, so I assembled all the concepts you need to know to use Go modules effectively on a single page. Hopefully it’s helpful to others.

                              1. 3

                                You did a great job at keeping it brief while still being usable.

                                Thanks!

                              1. 3

                                I find it quite interesting most people are suggesting surface-level changes to syntax. I wonder why that is?

                                1. 9

                                  Note: this observation is called Wadler’s law: https://wiki.haskell.org/Wadler's_Law

                                  1. 9

                                    Its called bikeshedding, and always happens with complicated problems/questions. Its easiest to argue surface-level stuff rather than the really difficult stuff.

                                    I think its an utterly fascinating aspect of human nature!

                                    1. 4

                                      A few reasons I think

                                      • It’s like a bike shed, it’s easy to have an opinion on how to paint it/change syntax, hard to have an opinion on how to improve the internal structure.
                                      • Change the internet structure too much and it’s no longer “rust”. Obviously many people prefer python to rust, but saying “clone python” isn’t an interesting answer.
                                      • Most of rust’s warts are, at least in my opinion, in half finished features. But like the above it’s uninteresting to say “well just hurry up and fix const generics, GATs, custom allocators, self referential structs, generators, making everything object safe, etc”. Those aren’t really “redesigns”, they’re just “well keep at it and we’ll see what we end up with”.
                                    1. 2

                                      Hey everyone, I’ve been working on this framework for close to 3 years now. It uses static analysis and code generation to simplify and improve large parts of the developer experience, like automatically generating API docs, instrumenting your app with tracing, and more.

                                      Would love your feedback :)

                                      1. 2

                                        Looks really polished and built with attention to detail, congrats. Makes me wish I knew go 😪

                                        1. 1

                                          Thank you! It’s an easy enough language to learn :)

                                      1. 8

                                        Can definitely relate. When faced with a ton of work it’s easy to fall into analysis paralysis to try to figure out the best place to start. The end result is usually that you never start at all. Usually you know deep down that what you really need to do is just to embrace the grind. There often are no shortcuts, and the optimal theoretical order of things is secondary to any order that results in actually getting it done.

                                        1. 2

                                          I was originally planning on working on the open-sourcing my framework for building distributed systems, Encore. Unfortunately my MacBook display broke down (a bunch of purple lines across the screen) so need to spend the weekend getting it repaired and setting up my dev environment on another machine. Fun times!

                                          1. 1

                                            I’m basically the least lucky guy on the planet and have computer issues constantly, the best thing I ever did was migrate to using a VM for a Dev environment. I can get a more or less full dev-laptop experience, but it’s easier to back up, easier to replicate to other machines if my main one dies, and generally just pretty much the-universe’s-attempt-at-a-better-idiot-proof. Heartily recommended.

                                          1. 9

                                            Entertaining article, and whether the author intended it or not I definitely feel there is some truth to it! We can all use a “complexity check” every once in a while. A bit sad with the proliferation of paywalls on Medium, but what can you do :)

                                            1. 2

                                              ignore their cookies :)

                                              1. 4

                                                Yeah, it’s easy enough to get around; it’s more about what it represents. I’m curious to what degree the authors who post on Medium have a choice in the matter? If it’s entirely up to the article I’m perfectly fine with it, but I’ve been under the assumption that it’s Medium deciding and not the blog authors?

                                                1. 2

                                                  The author can decline the paywall, but then medium won’t make it findable except by direct link.

                                                2. 1

                                                  Ctrl-shift-n ftw

                                              1. 2

                                                Working towards open sourcing www.encore.dev, the Go framework for rapid backend development I’m building. Many steps to go, but exciting nonetheless!

                                                1. 1

                                                  I’ve been thinking about this a lot in the context of building cloud-based software, as I’m working on a product specifically to drastically improve developer productivity in backend development.

                                                  There’s an almost endless amount of improvements we could make to the developer experience. I think the real question is: why haven’t we? I think the answer comes down to this: our tools are too general-purpose. The big players are cloud providers, that want to support any application regardless how it’s built. As a result we end up with innovations like containers that further push us towards the “applications are black boxes” end of the spectrum.

                                                  I think we could 10x the developer experience if we started from the opposite end: if we built our developer experience around purpose-built tooling (backend development in my case), and we added constraints to how you write your application, we could infer so much more about the application than what is possible today.

                                                  For example, all of these things you should get for free, with no work needed other than writing your business logic:

                                                  • Build & deployment orchestration, with your whole app running “serverless”
                                                  • Setting up databases, managing connections, passwords, backups, and DB migrations
                                                  • Automatic API documentation based on static analysis of your API function declarations
                                                  • Generating type-safe client code for calling your API for any language
                                                  • Expressing API calls between backend services as function calls (that get compiled into real API calls), and getting compile-time validation
                                                  • Automatic distributed tracing of your whole application
                                                  • Automatic management of production & test environments, and preview environments (for each pull request)
                                                  • Run everything locally with no code changes needed
                                                  • Cross-service debugging
                                                  • Error monitoring, graphing & alerting (observability)

                                                  I could go on :)

                                                  1. 1

                                                    I’m with you on that one :)

                                                    https://m3o.com

                                                    1. 1

                                                      Very cool, I hadn’t seen that one! Will have a look, thanks for sharing!

                                                  1. 13

                                                    FWIW GitLab uses a gRPC based solution (Gitaly) for all Git repository interactions, including Git wire protocol traffic (the Git protocol data is treated as a raw stream). See the protocol definition at https://gitlab.com/gitlab-org/gitaly/tree/master/proto. This allows GitLab to abstract the storage of Git repositories behind a gRPC interface.

                                                    Fun fact: this is how Heptapod (https://foss.heptapod.net/heptapod/heptapod) - a fork of GitLab that supports Mercurial - works: they’ve taught Mercurial to answer the gRPC queries that Gitaly defines. GitLab issues gRPC requests and they are answered by Mercurial instead of Git. Most functionality “just works” and doesn’t care that Mercurial - not Git - is providing data. Abstractions and interfaces can be very powerful…

                                                    1. 2

                                                      That’s interesting! I’ve looked at doing remote Git operations by creating an abstraction over the object store. The benefit is that the interface is much smaller than what you linked to. I guess the downside is higher latency for operations that need several round-trips. Do you know if that has been explored?

                                                      1. 6

                                                        GitLab’s/Gitaly’s RPC protocol is massive. My understanding is they pretty much invent a specialized RPC method for every use case they have so they can avoid fragmented transactions, excessive round trips, etc. The RPC is completely internal to GitLab and doesn’t need to be backwards or forwards compatible over a long time horizon. So they can get away with high amounts of churn and experimentation. It’s a terrific solution for an internal RPC. That approach to protocol design won’t work for Git itself, however.

                                                    1. 2

                                                      Interesting article!

                                                      The article ponders the hypothetical implications of such a design, and states “There would be no distinction between JSON and html. (What would have been the downstream consequences to API design?)”

                                                      I think this conflates JSON and JavaScript. This would have no impact on JSON as I see it. And even if you had extended JSON to support this, I don’t think it would have changed API design much at all. API design is really about structured, semantic data. Since HTML is so presentation-centric I doubt it would have much impact on API design. There is a reason almost all serialization formats are effectively equivalent to one another.

                                                      In fact, if JSON had such a strong coupling to HTML it might have led to JSON not being widespread for APIs, and a new serialization format winning instead. That might in turn have reduced the popularity of JavaScript as a language.

                                                      1. 3

                                                        This weekend I was experimenting with building a gitremote-helper using gRPC, to align the authentication with the rest of my infrastructure that’s also built on gRPC. It worked really well! I really appreciate git’s overall architecture and how easy it makes plugging in different components (in this case a transport protocol).

                                                        This week I’m experimenting with building a web-based debugger using the Debug Adapter Protocol.