Threads for leighmcculloch

  1. 13

    The issues with library (“crate”) organization are already apparent, and unless something is done about it relatively soon I think we’ll see a fracturing of the Rust ecosystem within 5 years. IMO the fundamental problem is that crates.io is a flat namespace (similar to Hackage or PyPI).

    For example, the other day I needed to create+manipulate CPIO files from within a Rust tool. The library at https://crates.io/crates/cpio has no documentation and limited support for various CPIO flavors, but it still gets the top spot on crates.io just due to the name. There’s also https://crates.io/crates/cpio-archive, which is slightly better (some docs, supports the “portable ASCII” flavor) but it’s more difficult to find and the longer name makes it seem less “official”.

    If I wanted to write my own CPIO library for Rust, it wouldn’t be possible to publish it on crates.io as cpio. I would face the difficult choice between (1) giving it an obscure and opaque codename (like hound the WAV codec, or bunt the ANSI color-capable text formatter) or (2) publishing it the C/C++ way as downloadable tarball[0] on my website or GitHub or whatever.

    Go has a much better story here, because libraries are identified by URL(-ish). I couldn’t publish the library cpio, it would be john-millikin.com/go/cpio or github.com/jmillikin/go-cpio/cpio or something like that. The tooling allows dependencies to be identified in a namespace controlled by the publisher. Maven has something similar (with Java-style package names anchored by a DNS domain). Even NPM provides limited namespacing via the @org/ syntax.

    [0] By the way, from what I can tell Cargo doesn’t support downloading tarballs from specified URLs at all. It allows dependencies to come from a crates.io-style package registry, or from Git, but you can’t say “fetch https://my-lib.dev/archive/my-lib-1.0.0.tar.gz”. So using this option limits the userbase to less common build tools such as Bazel.

    1. 3

      If I wanted to write my own CPIO library for Rust, it wouldn’t be possible to publish it on crates.io as cpio

      library name doesn’t have to match package name. You can publish jmillikin_cpio which people could use as use cpio.

      1. 8

        Yes but I think the point was that if I, as someone who doesn’t know anything about rust ecosystem, was looking for a cpio package, i probably would not go beyond the “official” cpio.

        1. 7

          If you had to choose between jcreekmore/cpio and indygreg/cpio, you still wouldn’t know which is the better one.

          1. 12

            That’s the point though I think: it makes it more obvious that you actually need to answer that question, because both look equally (not) “official”/“authoritative”.

            1. 4

              I think you’re really stretching this example, because short of at_and_t_bell_labs/cpio there can’t be any official/authoritative cpio package. There may be a popular one, or ideally you should get a quality one. So to me this boils down to just search and ranking. crates-io has merely text-based search without any quality metrics, so it brings up keyword spam instead of good packages.

              The voting/vouching for packages that @mtset suggests would be better implemented without publishing forks under other namespaces as votes. It could be an upvote/star button.

              1. 3

                If crates.io had organization namespaces, then an “official” CPIO library might have the package name @rust/cpio.

                This would indicate a CPIO package published by the Rust developers, which would be as close to “official” as putting it into the standard library.

                1. 2

                  That would be good for officialness, but I think it’s nether realistic nor useful.

                  We are approaching 100K crates. Rust-lang org already has more work than it can handle, and can’t be expected to maintain more than a drop in a bucket of the ecosystem. See what’s available on GitHub under rust-lang-nursery and rust-lang-deprecated.

                  And official doesn’t mean it’s a good choice. You’d have rust-lang/rustc-serialize inferior to dtolnay/serde. And rust-lang/mpsc that is slower and less flexible than taiki-e/crossbeam-channel. And rust-lang/tempdir instead of stebalien/tempfile, and rust-lang/lazy_static instead of matklad/once_cell.

                  1. 1

                    We are approaching 100K crates. Rust-lang org already has more work than it can handle, and can’t be expected to maintain more than a drop in a bucket of the ecosystem.

                    Yep! That’s true! In a healthy ecosystem, the number of official packages is extremely small as a percentage. Look at C++, C#, Java, Go – there might be a few dozen (at most) packages maintained by the developers of the language, compared to hundreds of thousands of third-party packages.

                    And official doesn’t mean it’s a good choice. You’d have rust-lang/rustc-serialize inferior to dtolnay/serde. And rust-lang/mpsc that is slower and less flexible than taiki-e/crossbeam-channel. And rust-lang/tempdir instead of stebalien/tempfile, and rust-lang/lazy_static instead of matklad/once_cell.

                    Also yep! And also (IMO) totally normal and healthy. The definition of “good choice” will vary between users. Just because a package is maintained by the language team doesn’t mean it will be appropriate for all use cases. That’s why Go’s flags package can co-exist with third-party libraries like github.com/jessevdk/go-flags or github.com/spf13/pflag.

                  2. 1

                    I wish you had put that at the top of your original post. :)

                    1. 2

                      I think it’s a minor point, to be honest. Even if crates.io never gets organizational namespaces, just being able to upload to a per-user namespace would be a sea-change improvement over current state.

              2. 1

                Personally, I think this is a great use case for a social-web system; we’ve already seen this with metalibraries like stdx and stdcli, though none have stood the test of time. I think a namespacing system with organizational reexports could really shine; I’d publish cpio (sticking with the same example) as mtset/cpio, and then it could be included in collections as stdy/cpio or embedded/cpio or whatever. Reviews and graph data would help in decisionmaking, too.

            2. 6

              There’s some issues with that approach.

              First, I do want the package name to match the library name, or at least be ${namespace}${library_name} where ${namespace} is something clearly namespace-ish. If I did not have this requirement then I would name crates.io packages with a UUID. And to be honest, I don’t think anyone would do that remapping – people would type use jmillikin_cpio::whatever and grumble about the arrogance of someone who uses their own name in library identifiers.

              Second, a namespace provides access control. I’m the only person who can create Go libraries under the namespaces john-millikin.com/ or github.com/jmillikin/, but anyone in the world can create crates.io packages starting with jmillikin_. It’s just a prefix, it has no semantics other than implying something (ownership) to human viewers that may or not be true.

              1. 4

                And to be honest, I don’t think anyone would do that remapping – people would type use jmillikin_cpio::whatever and grumble about the arrogance of someone who uses their own name in library identifiers.

                To clarify, it’s the author of the library who sets its default name. You can have the following in Cargo.toml of your cpio library:

                [package]
                name = “jmillikin_cpio”
                
                [lib]
                name = “cpio”
                

                Users would then use jmillikin_cpio in their Cargo.tomls, but in the code the name would be just cpio.

                This doesn’t solve the problem of access control, but it does solve the problem of names running out.

                1. 3

                  Yes, per my post I’m aware that’s possible, I just think it would be bad. What you propose would be semantically equivalent to using a UUID, since the package name and library name would no longer have any meaningful relationship.

                  In other words, I think your code example is semantically the same as this:

                  [package]
                  name = "c3f0eea3-72ab-4e79-a487-8b162153cfd1"
                    
                  [lib]
                  name = "cpio"
                  

                  Which I dislike, because I think that it should be possible to compute the library name from a package name mechanically (as is the idiom in Go).

              2. 4

                Using a prefix doesn’t provide access control which is an important feature of namespacing. If there’s no access control, you don’t really have a namespace.

                For example, I might publish all my packages as leigh_<package> to avoid collisions with other people, but there’s nothing stopping someone else from publishing a package with a leigh_ prefix.

                This is a real problem, especially with the prevalent squatting going on on crates.io.

                For example, recently I was using a prefix for a series of crates at work, and a squatter published one of the crates just before I did. So now I have a series of crates with a prefix that attempts to act as a namespace, and yet one of the crates is spam.

                Most other ecosystems have proven that namespacing is an effective tool.

            1. 5

              Is there a VSCode extension that does this?

              1. 4

                Yes. There are 2 implementations with their own quirks. Edit: lmao I forgot how bad the quirks were. Don’t try to use them.

              1. 5

                These tools are very helpful and it’s great that we have so many tools to choose from in this space 👏.

                I’d also like to see cargo itself provide some better defaults in support of supply chain safety.

                Specifically as a default I’d like to always be in control of the versions of dependencies I’m using. But developers using cargo, similar to developers using npm, or ruby’s bundler, aren’t in control in at least a couple situations:

                1. If you cargo install a tool or dep, the lock file for the tool/dep is ignored unless you pass --locked. This means if a dep released an update containing malware, you get it immediately if you cargo install immediately after the release. This also means CI builds containing secrets like API keys and that run a cargo install could expose an API key to a 5 min old code update, unless developers know to use --locked.

                2. The ecosystem norm is to not commit Cargo.lock files in libs. But the side-effect is any dev who does a fresh clone of the repo, say when a new dev joins a company or getting a new computer, immediately gets max compatible versions, which could be an update released 5 mins ago containing malware.

                Would be great to see the ecosystem lean into using lock files in libs and bin installs, or maybe even reevaluate Go’s minimum compatibility dependency resolution which would address both of these, and adopting automatic audit checks like npm.

                Definitely interested in hearing how other people mitigate these types of things.

                1. 1

                  Work: This week I’m learning about toolchains that compile to wasm, and writing a blog post, but currently got a mental block on the blog post.

                  Fun: Lately I’ve been learning zig. The last thing I wrote for a little project was a generic iterator for reverse iterating a slice, to simplify stepping through a slice to visit every second element, which I found somewhat tedious to code without an iterator using a while loop with unsigned integers being the norm for array size and indexing. I think I finally have a better understanding of how to write generic types, and how arrays, slices, and pointers interact and coerce.

                  1. 6

                    This blog post is absolutely great, and I use it as a starting point for a discussion about why not to use a custom testing interface (most of the time testify).

                    I feel very strong about testify. Negatively. Its use is a red light for me and a sign that the author might not think but rather force known from somewhere else solutions.

                    1. 9

                      I feel very strong about testify. Negatively. Its use is a red light for me and a sign that the author might not think but rather force known from somewhere else solutions.

                      This is a pretty strong statement regarding the usage of a test helper libary.

                      But if you think someone writing their own WithinDuration test helper method, for each library they write, is a good use of their time, and is somehow a signal for overall project quality (or of somehow not thinking?!), then I guess more power to you.

                      1. 2

                        “A little copying is better than a little dependency”

                        The cost of writing a set of Assert helpers for each project you own/maintain is zero.

                        1. 13

                          Sure. If time is worth nothing, then writing a set of helpers for every project is fine.

                          Oh, but then maybe you could share it across projects to save a bit of time/effort for any new project you start.

                          Maybe even open source it? Other people might even be interested in using it!

                          Oh wait.. Now we are back to square zero with it being bad?

                          That said, if you only need one (or a few, or several even) function, then sure. I agree that copying it around is better than adding a dependency. But if you ever reach a point where you have to update more than one project to add or fix a helper, then you are probably better off making it a dependency.

                          But use of a test helper library as somehow being a red light for overall project quality, sure seems dubious to me.

                          1. 2

                            Sure. If time is worth nothing, then writing a set of helpers for every project is fine.

                            The time it takes me to write those helpers is, without exaggeration, less than the time I spend waiting for VS Code to do whatever action in 1 day. It doesn’t enter into the cost accounting. The cost of a dependency, on the other hand, is real, and significant, and perpetual.

                            I once heard a good rule of thumb: never import anything you could write in an afternoon. Assert is well below that threshold.

                            But if you ever reach a point where you have to update more than one project to add or fix a helper, then you are probably better off making it a dependency.

                            The only reason to add a helper to a project is if you need it; the only reason to update a helper in a project is if it’s causing problems in that project. There’s no situation I can think of where you have a bunch of similar/identical helpers in a bunch of projects you own/maintain, and you need to update them all.

                            1. 17

                              I’ve been using Go since before 1.0 was released. I have a lot of experience using the reflect package. I’m pretty sure I couldn’t write a good set of assert helpers in an afternoon.

                              The funny thing here is that nobody seems to acknowledge that the assert helpers aren’t just about deleting some if statements. It’s also about the messages you get when a test fails. A good assert helper will print a nice diff for you between expected and actual values.

                              testify is pretty dang close to what I would write. And while some dependencies have a perpetual cost, I’ve not experienced that with testify specifically.

                              I usually like the “Go Way” of doing things, but this particular position is pretty Out There IMO.

                              1. 3

                                The funny thing here is that nobody seems to acknowledge that the assert helpers aren’t just about deleting some if statements. It’s also about the messages you get when a test fails. A good assert helper will print a nice diff for you between expected and actual values.

                                I don’t see much value in rich assertion failure messages, most of the time. Literally this and nothing more is totally sufficient for 80% of projects.

                                func Assertf(t *testing.T, b bool, format string, args ...interface{}) {
                                    t.Helper()
                                    if !b {
                                        t.Errorf(format, args...)
                                    }
                                }
                                
                                1. 7

                                  You’re going to have a hell of a time debugging that on CI when all you have is “foobar equality failed” with no indication of what the unexpected value was to help you puzzle out why it works on your machine but not the CI server.

                                  I mean, more power to you but I’m not out to make my job any harder than it has to be. “expected: “test string” received: “TODO set this value before pushing test config”” is too easy a win for me to ignore, and god help you when the strings are piles of JSON instead. Then you’re really going to want CI to give you that diff.

                                  1. 3

                                    You’re going to have a hell of a time debugging that on CI when all you have is “foobar equality failed” with no indication of what the unexpected value was to help you puzzle out why it works on your machine but not the CI server.

                                    I hear this often enough, but it’s just never been my experience; I guess I’m asserting at a relatively granular level compared to most people.

                                    But it’s moot, I think, because if you need that specificity, Assertf lets you provide it just fine by way of the format string.

                                    1. 6

                                      I think the distinction is we are all likely writing different types of tests, that trade off different things.

                                      In tests that I write asserting on simple values, sure simple ifs get the job done for me.

                                      In tests of JSON outputs, or large structures, I find it more helpful to test equality of the entire thing at once and get a diff. It’s faster to review, and I get greater context, and the test will break if things change in the value I’m not testing.

                                      I find a lot of value in tests that operate at the top level of an application. Like tests that test the stdin and stderr/stdout of a CLI, or tests that test the raw request and response to an API. They catch more bugs and force me to think about the product from the perspective of the system interacting with it. I don’t think this is the only thing to test for though or only way to test.

                                      I know I find value in testify, it isn’t perfect like any code, but I dont think there’s a perfect practice about whether to use testify or not. It depends what you’re optimizing for and the type of assertions you’re making and inspecting.

                                      1. 5

                                        But it’s moot, I think, because if you need that specificity, Assertf lets you provide it just fine by way of the format string.

                                        It’s not moot, because usually by the time you realize you need it, you’re already looking at the failing test in CI. So now you need to roundtrip a patch to make your test more verbose.

                                    2. 8

                                      I don’t see much value in rich assertion failure messages, most of the time.

                                      Writing tests is part of my daily flow of programming, and so are failing tests. Not having to spend a bunch of time printf-ing values is a literal time saver.

                                      I’ve spent more years using plain go test than testify. We switched to testify at work a few years back and it paid for itself after a couple days.

                                      And I love how the goalposts have shifted here subtly. At first it was, “don’t reuse code that you could just write yourself in an afternoon.” But now it’s, “oh okay, so you can’t write it in an afternoon, but only because you value things that I don’t.” Like, have all the opinions you want, but “failure on test.go:123 is often totally sufficient” is just empirically wrong for me.

                                      Before testify, writing tests was a huge pain in the ass. And if it wasn’t a pain in the ass, it was a pain in the ass to read the error messages because the test didn’t print enough detailed information.

                                      Case in point, we’d have things like if !reflect.DeepEqual(x, y) { ... }, and when that failed, we’d be like, “oh what changed.” If x and y are big nested types, then printing out those values using the standard formatting specifiers is not that helpful. And I view the fact that needing reflect.DeepEqual (or go-cmp) in tests as a shortcoming in the language. There’s a convention for defining Equal methods which go-cmp reuses thankfully, but no other part of the language really recognizes the reality that, hey, maybe types want to define their own equality semantics independent of what Go does for you by default. And thus, Equal is not composable unless you go out of your way to recursively define it. Which, by the way, is an immediate footgun because it’s easy to forget to update that method when a new field is added. And it’s hard to write a good test for that.

                                      And don’t get me started on other shitty things. Like comparing values with time.Time in them somewhere. Or doing other things like, say, asserting that two slices have equivalent elements but not necessarily the same order. Oops. Gotta monomorphize that second one manually for each collection type you call for it. Or I could just use ElementsMatch and not think about it again.

                                      These are all problems that have come up for us in practice that have cost us time. Your “unpopular opinion” is crap in my experience.

                                      1. 2

                                        “failure on test.go:123 is often totally sufficient” is just empirically wrong for me.

                                        That’s totally fine! This isn’t a competition, we’re just sharing experiences. I think?

                                        Your “unpopular opinion” is crap in my experience.

                                        This honestly made me feel bad; I’m sorry to have put you off.

                                        1. 5

                                          I’m sorry too. Your comments in this thread came off as pretty dismissive to me and I probably got too defensive.

                            2. 1

                              Repetition is one of the claims that testify users are bringing. Irony is also often present, I believe to provide a bit more confidence.

                              I think a preference to import as much external code or first thinking about any single problem and consider solving it without external code speaks well about what kind of developer you are. I do not find it productive to argue which approach is superior, because it often feels like beating a dead horse. I hope the right answer comes with experience.

                              I have no idea what WithinDuration does, so I had to check this. Isn’t this function solving a very specific problem? Using this logic, I could claim that testify is garbage because it does not provide a function to check if a date is B.C. and I must write the assertion manually.

                              It is easy to argue about abstract problems. Even easier if badly explained and with no context. Please notice that the blog post is very specific with examples and numbers.

                          1. 14

                            The reason I started using testify is because the error messages were better than the built in messages, and I didn’t like repeating the if statement condition in every error message. I’m not sure if this is still the case though.

                            One thing I don’t like about the testify lib is the lack of consistency on parameter order (actual and expected.)

                            1. 3

                              One thing I don’t like about the testify lib is the lack of consistency on parameter order (actual and expected.)

                              assert.Len(t, collection, length, message)
                              

                              bothers me a lot

                              1. 1

                                Isn’t that one correct? Collection is the “actual” and length is the “expected”.

                                1. 3

                                  I don’t know if “correct” is really the appropriate word to use here, but no, it is inconsistent with most other methods. For example: https://pkg.go.dev/github.com/stretchr/testify@v1.7.0/require#Equal

                                  1. 2

                                    Oh haha my bad. I misread the parent comment as claiming “actual, expected” is the prominent order but it’s indeed the reverse.

                              2. 2

                                I like using libs like testify for the same reason, when a test fails, the output is helpful. Multiline strings are diffed. JSON inconsistencies are highlighted. Nested values with subtle differences are compared. It’s those features that make a huge difference.

                                I think testifys has evolved over time in ways it shouldn’t, like inconsistent arguments and functions that bloat the API, but it’s still great imo.

                                Out of my own desire to explore ideas I’ve been building my own assertion library, inspired by testify, minimalistic, but useful for the apps I build, https://4d63.com/test. I don’t expect to build something better, but to understand the tradeoffs, decisioning process, and how this stuff works.

                              1. 1

                                Company: Stellar Development Foundation

                                Company site: https://stellar.org (open source: https://github.com/stellar)

                                Position(s):

                                Engineering

                                Ecosystem

                                Product

                                Business Development

                                Legal & Policy

                                Location: SF, NY, Asia-Pacific, or Remote

                                Description: Stellar is a decentralized, fast, scalable, and uniquely sustainable network for financial products and services. It is both a cross-currency transaction system and a platform for digital asset issuance, designed to connect the world’s financial infrastructure.

                                The Stellar Development Foundation (SDF) is a non-profit organization that supports the development and growth of Stellar. The Foundation helps maintain Stellar’s codebase, supports the technical and business communities building on the network, and serves as a voice to regulators and institutions. The Foundation seeks to create equitable access to the global financial system, using the Stellar network to unlock the world’s economic potential through blockchain technology.

                                The Foundations work is open sourced under the Apache 2.0 license: https://github.com/stellar

                                Tech stack:

                                • C++ 11, Go, TypeScript, JavaScript
                                • Postgres, BigQuery
                                • Kubernetes

                                Contact: Apply here

                                1. 2

                                  I tried to read the whitepaper a while back, correct me if I’m wrong here:

                                  • Stellar is a fork of ripple?
                                  • Stellar uses voting within subsets and has many overlapping subsets, the consensus is some convergence? I’d appreciate if you could explain this a bit.

                                  Another thing I’m curious about:

                                  • What does it take to become a validator?
                                  • How are new tokens minted / distributed?
                                  • How do you do oracles?
                                  • How do you make sure that these “stable assets” (pegs to fiats or other things) track what they are pegging?

                                  I was like 99.9999% sure this project was a scam when I looked at the paper, it tries so hard to obfuscate how it works and the ratio of claims to explanations is very bad. Seeing it here on lobsters is very surprising to me.

                                  1. 2

                                    I was like 99.9999% sure this project was a scam when I looked at the paper, it tries so hard to obfuscate how it works and the ratio of claims to explanations is very bad. Seeing it here on lobsters is very surprising to me.

                                    I can’t comment on the project directly but I’ve met one of the people involved in it socially not knowing we both worked in tech and later briefly discussed Stellar. I don’t believe it’s a scam or that they would have anything to do with it if it were.

                                    1. 2

                                      A scam in the sense of whether they are actually a byzantine fault tolerant decentralized system in the first place (something they claim to be).

                                      The paper does not feel clear but rather obfuscated. The questions I asked are pretty important prerequisites for deciding if you want to throw your lot in with these people, whatever people think of blockchains, it is pretty clear that byzantine fault tolerance is an important property that is desirable in any political / economic system. Exploring how it can be achieved is a worthwhile research. However due to the perverse incentives there’s all sorts of outrageous claims.

                                      Another problem that would be desirable to avoid is the tragedy of the commons (the cause of climate change), which proof of stake is vulnerable to (as well as at least a large class of voting protocols, if not all), but the exploit would require co-ordination that seems to be implausible with current social networking technology (however I believe that the exploit would come in the form of something not vulnerable to tragedy of the commons, though I have yet to prove that).

                                      I am not saying this because I am fond of proof of work, quite the opposite, the environmental consequences mean we should be trying to find a better solution. However we should be truthful in that pursuit and respect mathematical facts (like PoS being vulnerable to ToC) - they have a tendency to predict the future.

                                    2. 2

                                      I think they misrepresent achieving the same distributed consensus projects like bitcoin do. Its ultimately obfuscated federated control.

                                      That said I think many blockchain projects are scams in that they misrepresent distributed consensus being the solution to problems when it really isn’t.

                                      1. 1

                                        @ilmu Great questions, thanks!

                                        The old Stellar network that launched in 2014 was using software (stellard) that was a modified fork of the Ripple node software (rippled). In 2015 the current Stellar network was launched with a new consensus model (SCP) and new stellar-core software that was written fresh. There are some blog posts about the network upgrade that go into more detail: 1, 2.

                                        For details about how the Stellar Consensus Protocol (SCP) works, I recommend these resources. The video talks about how the voting works.

                                        1. 1

                                          Thank you for answering, I just saw that you did due to the reply feature being back, I looked at these resources a bit (not through it yet) and it looks like the path you have chosen is interesting (to me at least, the model I am trying to figure out is similar in direction to yours). However I don’t think you can claim byzantine fault tolerance as-is.