1. 7

    Using polymorphic variants you can strengthen the return type. For example:

    match foo 2 with 
      | `Y y -> y 
      | `Z -> 0
    

    will continue to compile with both

    let foo y = if y == 0 then `Z else `Y (y + 1)
    

    and the strengthened

    let foo y = `Y (y + 1)
    
    1. 6

      You know someone is over-hyping Rust (or is just misinformed) when you see statements like

      Which means there’s no risk of concurrency errors no matter what data sharing mechanism you chose to use

      The borrow checker prevents data races which are involved in only a subset of concurrency errors. Race conditions are still very possible, and not going away any time soon. This blog post does a good job explaining the difference.

      Additionally, I have my worries about async/await in a language that is also intended to be used in places that need control over low level details. One library that decides to use raw I/O syscalls on some unlikely task (Like error logging) and, whoops, there goes your event loop. Bounded thread pools don’t solve this (What happens if you hit the max? It’s equivalent to a limited semaphore), virtual dispatch becomes more of a hazard (Are you sure every implementation knows about the event loop? How can you be sure as a library author?), what if you have competing runtime environments (See twisted/gevent/asyncio/etc. in the Python community. This may arguably be more of a problem in Rust given it’s focus on programmer control), and the list goes on. In Go, you literally never have to worry about this, and it’s the greatest feature of the language.

      1. 1

        You know someone is over-hyping Rust (or is just misinformed) when you see statements like

        It doesn’t help that they state (or did state until recently) on their website that Rust was basically immune to any kind of concurrency error.

        1. -1

          That definition of “race condition - data race” essentially refers to a operational logic error on the programmer’s side. As in, there’s no way to catch race conditions that aren’t data races via a compiler, unless you have a magical business-logic-aware compiler, at which point, you wouldn’t need a programmer.

          As far as the issues with async I/O go… well, yes. Asyncio wouldn’t solve everything. But asyncio also wouldn’t necessarily have to be single threaded. It could just meant that a multi-threaded networking application will now spend less resources on context-switching between threads. But the parallelism of threads > cpu_count still comes useful for various blocking operations which may appear here and there.

          As far as GO’s solution goes, their solution to the performance issue isn’t that good. Since goroutines have significant overhead. Much less than a native thread, but still, considerably more overhead than something like MIO.

          The issue you mentioned as an example, hidden sync I/O syscall by some library, can well happen in a goroutine run function just as well, the end-result of that will essentially be a OS native thread being blocked, much like in Rust. At least, as far as my understanding of goroutine goes, that seems to be the case.

          Granted, working with a “pool” of event loops representing multiple threads might be harder than just using goroutines, but I don’t see it as being that difficult.

          1. 5

            That definition is the accurate, correct definition. It’s important to state that Rust helps with data races, and not race conditions in general. Even the rustonomicon makes this distinction clear.

            The discussion around multiple threads seems like a non-sequitur to me. I’m fully aware that async/await works fine with multiple threads. I also don’t understand why the performance considerations of goroutines were brought into the picture. I’m not making any claims about performance, just ease of use and programmer model. (Though, I do think it’s important to respond that goroutines are very much low enough overhead for many common tasks. It also makes no sense to talk about performance and overhead outside of the context of a problem. Maybe a few nanoseconds per operation is important, and maybe it isn’t.)

            The issue I mentioned does not happen in Go: all of the syscalls/locks/potentially blocking operations go through the runtime, and so it’s able to deschedule the goroutine and let others run. This article is another great article about this topic.

            It’s great that you’re optimistic about the future direction Rust is taking with it’s async story. I’m optimistic too, but that’s because I have great faith in the leadership and technical design skills of the Rust community to solve these problems. I’m just pointing out that they ARE problems that need to be solved, and the solution is not going to be better than Go’s solution in every dimension.

            1. 0

              The issue I mentioned does not happen in Go: all of the syscalls/locks/potentially blocking operations go through the runtime, and so it’s able to deschedule the goroutine and let others run.

              Ok, maybe I’m mistaken here but:

              “Descheduling a goroutine”, when a function call is blocking, descheduling a goroutine has the exact same cost as descheduling a thread, which is huge.

              Secondly, go is only using a non-blocking syscall under the hood for networking I/O calls at the moment. So if I want to wait for an operation on any random file or wait for an asynchronous prefetch call, I will be unable to do so, I have to actually block the underlying thread that the goroutine is using.

              I haven’t seen any mention of “all blocking syscalls operations” being treated in an async manner, they go through the runtime, yes, but the runtime may just decide that it can do nothing about it other than let the thread be de-scheduled as usual. And, as far as I know, the runtime is only “smart” about networking I/O syscalls atm, the rest are treated like a blocking operation/

              Please correct me if this is wrong.

              1. 2

                descheduling a goroutine has the exact same cost as descheduling a thread, which is huge.

                A goroutine being descheduled means it yields the processor and calls into the runtime scheduler, nothing more. What happens to the underlying OS threads is another matter entirely. This can happen at various points where things could block (e.g. chan send / recv, entering mutexes, network I/O, even regular function calls), but not at every such site.

                the runtime is only “smart” about networking I/O syscalls atm

                Yes, sockets and pipes are handled by the poller, but what else could it be smarter about? The situation may well be different on other operating systems, but at least on Linux, files on disk are always ready as far as epoll is concerned, so there is no need to go through the scheduler and poller for those. In that case, I/O blocks both the goroutine and the thread, which is fine for Go. For reference, in this situation, node.js uses a thread pool that it runs file I/O operations on, to avoid blocking the event loop. Go doesn’t really need to do this under the covers, though, because it doesn’t have the concept of a central event loop that must never be blocked waiting for I/O.

                1. 2

                  Descheduling a goroutine is much cheaper than descheduling a thread. Goroutines are cooperative with the runtime, so they ensure that there is minimal state to save when descheduling (no registers, for example). It’s on the order of nanoseconds vs microseconds. Preemptive scheduling helps in a number of ways, but typically causes context switching to be more expensive: you have to be able to stop/start at any moment.

                  Go has an async I/O loop, yes, but it runs in a separate managed thread by the runtime. When a goroutine would wait for async I/O, it parks itself with the runtime, and the thread the goroutine was running on can be used for other goroutines.

                  While the other syscalls do in fact take up a thread, critically, the runtime is aware when a goroutine is going to enter a syscall, and so it can know that the thread will be blocked, and allow other goroutines to run. Without that information, you would block up a thread and waste that extra capacity.

                  The runtime manages a threadpool and ensures that GOMAXPROCS threads are always running your code, no matter what syscalls or I/O operations you’re doing. This is only possible if the runtime is aware of every syscall or I/O operation, which is not possible if your language/standard library are not desiged to provide. Which Rust’s doesn’t, for good reasons. It has tradeoffs with respect to FFI speed, control, zero overhead, etc. They are different languages with different goals, and one isn’t objectively better than the other.

                  1. 2

                    And, as far as I know, the runtime is only “smart” about networking I/O syscalls atm, the rest are treated like a blocking operation/

                    Pretty much everything that could block goes through sockets and pipes though. The only real exception is file I/O, and file I/O being unable to be epolled in a reasonable way is a kernel problem not a Go problem.

            1. 8

              As someone who is a total stranger to Elm, its dev and its community, but was interested for a long time in learning this language, I wonder if this opinion reflects the feeling of the “great number” or not.

              1. 21

                I have to say that I personally can very much see where he’s coming from. GitHub contributions are dealt with in a very frustrating way (IMO they’d do better not allowing issues and PRs at all). There’s a bit of a religious vibe to the community; the inner circle knows what’s good for you.

                That said, they may very well be successful with their approach by a number of metrics. Does it hurt to loose a few technically minded independent thinkers if the language becomes more accessible to beginners?

                Where I see the largest dissonance is in how Elm is marketed: If the language is sold as competitive to established frameworks, you’re asking people to invest in this technology. Then turning around and saying your native modules are gone and you shouldn’t complain because no one said the language was ready feels a bit wrong.

                1. 7

                  Yeah when I look at the home page, it does seem like it is over-marketed: http://elm-lang.org/

                  At the very least, the FAQ should probably contain a disclaimer about breaking changes: http://faq.elm-community.org/

                  Ctrl-F “compatibility” doesn’t find anything.

                  It’s perhaps true that pre-1.0 software is free to break, but it seems like there is a huge misunderstanding in the community about compatibility. The version number doesn’t really mean much in my book – it’s more a matter of how many people actually rely on the software for production use, and how difficult their upgrade path is. (Python 3 flauted this, but it got by.)

                  I think a lot of the conflict could be solved by making fewer promises and providing some straightforward, factual documentation with disclaimers.

                  I watched the “What is Success?” talk a couple nights ago and it seemed like there is a lot of unnecessary conflict and pain in this project. It sounds like there is a lot to learn from Elm though – I have done some stuff with MUV and I like it a lot. (Although, while the types and purity probably help, but you can do this in any language.)

                  1. 4

                    I watched the “What is Success?” talk a couple nights ago and it seemed like there is a lot of unnecessary conflict and pain in this project

                    I watched the talk also, after another… Lobster(?)… Posted it in another thread. My biggest takeaway was that Evan really doesn’t want to deal with an online community. People at IRL meetups, yes. Students in college, yes. People/companies online trying to use the language? No. His leading example of online criticism he doesn’t want to deal with was literally “Elm is wrong” (he quoted without any context, which isn’t that helpful. But maybe that was all of it.)

                    That’s fine. He’s the inventor of the language, and the lead engineer. He probably does have better things to do. But as an outsider it seems to me that someone has to engage more productively with the wider community. Our, just come out and say you don’t care what they think, you’ll get what you’re given, and you can use it if you choose. But either way communicate more clearly what’s going on, and what to expect.

                2. 13

                  I’ve shipped multiple production applications in Elm and attempted to engage with the community and I can say that their characterization perfectly matches mine.

                  Native modules being removed in particular has caused me to no longer use Elm in the future. I was always ok with dealing with any breakage a native module might cause every release, and I’m even ok with not allowing them to be published for external consumption, but to disallow them completely is unreasonable. I’m sure a number of people feel the same way as I do, but it feels impossible to provide meaningful feedback.

                  1. 9

                    I work for a company that began using Elm for all new projects about a year and a half ago. That stopped recently. There are several reasons that people stopped using Elm. Some simply don’t like the language. And others, like the author of this post, want to like the language but are put off by the culture. That includes me. This article closely resembles several conversations I’ve had at work in the past year.

                  1. 0

                    This is ill-advised.

                    You cannot define 1/0 and still have a field. There’s no value that works. Even when you do things like the extended real numbers where x/0 = infinity, you’re really just doing a kind of shorthand and you acknowledge that the result isn’t a field.

                    You can of course define any other algebraic structure you want and then say that operating on the expression 1/0 is all invalid because you didn’t define anything else and no other theorem applies, but this is not very helpful. You can make bad definitions that don’t generalise, sure, definitions that aren’t fields. But to paraphrase a famous mathematician, the difficulty lies not in the proofs but in knowing what to prove. The statement “1/0 = 0 and nothing else can be deduced from this” isn’t very interesting.

                    1. 1

                      Could you explain why, formally, defining 1/0=0 means you no longer have a field?

                      1. 7

                        I want to make an attempt to clarify the discussion here because I think there is some substance I found interesting. I don’t have a strong opinion about this.

                        The article actually defines an algebraic structure with three operators: (S, +, *, /) with some axioms. It happens that these axioms makes it so (S, +, *) is a field (just like how the definition of a field makes (S, +) a group).

                        The article is right in saying that these axioms do not lead to a contradiction. And there are many non-trivial such structures.

                        However, the (potential) issue is that we don’t know nearly as much about these structures than we do about fields because any theorem about fields only apply to (S, +, *) instead of (S, +, *, /). So all the work would need to be redone. It could be said that the purpose of choosing a field in the first place is to benefit from existing knowledge and familiar expectations (which are no longer guarantteed).

                        I guess formally adding an operator means you should call it something else? (Just like how we don’t call fields a group even though it could be seen as a group with an added * operator.)

                        This has no bearing on the 1/0 = 0 question however, which still works from what’s discussed in the article.

                        1. 1

                          As I understand it, you’ve only defined the expression 1/0 but you are saying that /0 isn’t shorthand for the multiplicative inverse of 0 as is normally the case for /x being x^-1, by definition. Instead, /0 is some other kind of magical non-invertible operation that maps 1 into 0 (and who knows what /0 maps everything else into). Kind of curious what it has to do with 0 at all.

                          So I guess you can do this, but then you haven’t defined division by zero at all, you’ve just added some notation that looks like division by zero but instead just defined some arbitrary function for some elements of your field.

                          If you do mean that /0 is division by zero, then 1/0 has to be, by definition, shorthand for 1*0^-1 and the arguments that you’ve already dismissed apply.

                          1. 4

                            The definition of a field makes no statements about the multiplicative inverse of the additive identity (https://en.wikipedia.org/wiki/Field_(math)#Classic_definition). Defining it in a sound way does not invalidate any of the axioms required by the field, and, in fact, does define division by zero (tautologically). You end up with a field and some other stuff, which is still a field, in the same way that adding a multiplication operator on a group with the appropriate properties leaves you with a group and some other stuff.

                            The definition of the notation “a / b => a * b^-1” assumes that b is not zero. Thus, you may define the case when b is 0 to mean whatever you want.

                            That people want to hold on to some algebraic “identities” like multiplying by the denominator cancels it doesn’t change this. For that to work, you need the assumption that the denominator is not zero to begin with.

                            1. 1

                              In what way, whatever it is you defined /0 to be, considered to be a “division”? What is division? Kindly define it.

                              1. 3

                                Division, a / b, is equal to a * b^-1 when b is not zero.

                                1. 2

                                  And when b is zero, what is division? That’s the whole point of this argument. What properties does an operation need to have in order to be worthy of being called a division?

                                  1. 3

                                    Indeed, it is the whole point. For a field, it doesn’t have to say anything about when you divide by zero. It is undefined. That doesn’t mean that you can’t work with and define a different, but still consistent, structure where it is defined. In fact, you can add the definition such that you still have the same field, and more.

                                    edit: Note that this doesn’t mean that you’re defining a multiplicative inverse of zero. That can’t exist and still be a field.

                                    1. 1

                                      In what way is it consistent? Consistent with what? As I understand it, you’re still saying that the expression 1/0 is an exception to every other theorem. What use is that? You still have to write a bunch of preconditions, even in Coq, saying how the denominator isn’t zero. What’s the point of such a definition?

                                      It seems to me that all of this nonsense is about not wanting to get an exception when you encounter division by zero, but you’re just delaying the problem by having to get an exception whenever you try to reason with the expression 1/0.

                                      1. 3

                                        I mean that the resulting structure is consistent with the field axioms. The conditions on dividing by zero never go away, correct. And yes, this is all about avoiding exceptions in the stack unwinding, programming language sense. The article is a response to the statements that defining division by zero in this way causes the structure to not be a field, or that it makes no mathematical sense. I am also just trying to respond to your statements that you can’t define it and maintain a field.

                                        1. 1

                                          It really doesn’t make mathematical sense. You’re just giving the /0 expression some arbitrary value so that your computer doesn’t raise an exception, but what you’re defining there isn’t division except notationally. It doesn’t behave like a division at all. Make your computer do whatever you want, but it’s not division.

                                          1. 5

                                            Mathematical sense depends on the set of axioms you choose. If a set of axioms is consistent, then it makes mathematical sense. You can disagree with the choices as much as you would like, but that has no bearing on the meaning. Do you have a proof that the resulting system is inconsistent, or even weaker, not a field?

                                            1. 1

                                              I don’t even know what the resulting system is. Is it, shall we say, the field axioms? In short, a set on which two abelian operations are defined, with two distinct identities for each abelian operation, such that one operation distributes over the other? And you define an additional operation on the distributing operation that to each element maps its inverse, except for the identity which instead is mapped to the identity of the distributed-over operation?

                                              1. 2

                                                It’s a field where the definition of division is augmented to include a definition when the divisor is zero. It adds no new elements, and all of the same theorems apply.

                                                1. 1

                                                  I’m bailing out, this isn’t a productive conversation for either of us. Sorry.

                                                  1. 1

                                                    You are correct. The field axioms are all still true, even if we extend / to be defined on 0.

                                                    The reason for this is that the axioms never “look at” any of the values x/0. They never speak of them. So they all hold regardless of what x/0 is.

                                                    That said, even though you can define x/0 without violating axioms it doesn’t mean you should. In fact it seems like a very bad idea to me.

                              2. 1

                                That doesn’t make it not a field; you don’t have to have a division operator at all to be a field, let alone a division operator that is defined to be multiplication by the multiplicative inverse.

                                1. 1

                                  What is division?

                                  1. 1

                                    zeebo gave the same answer I would give: a / b is a multiplied by the multiplicative inverse of b when b is not zero. This article is all about how a / 0 is not defined and so, from an engineering perspective, you can define it to be whatever you want without losing the property that your number representation forms a field. You claimed that defining a / 0 = 1 means that your numbers aren’t a field, and all I’m saying is that the definition of the division operator is 100% completely orthogonal to whether or not your numbers form a field, because the definition of a field has nothing to say about division.

                                    1. 1

                                      What is an engineering perspective?

                                      Also, this whole “a field definition doesn’t talk about division” is a bit of misunderstanding of mathematical idioms. The field definition does talk about division since “division” is just shorthand for “multiplicative inverse”. The reason the definition is written the way it is (excluding 0 from having a multiplicative inverse) is that giving zero a multiplicative inverse results in contradictions. When you say “ha! I won’t let that stop me! I’m going to define it anyway!” well, okay, but then either (1) you’re not definining a multiplicative inverse i.e. you’re not defining division or (2) you are defining a multiplicative inverse and you’re creating a contradiction.

                                      1. 1

                                        (I had a whole comment here, but zeebo is expressing themselves better than I am, and there’s no point in litigating this twice, especially when I feel like I’m just quoting TFA)

                                        1. 1

                                          Me too, I’m tapping out.

                          1. 18

                            I suppose I know why, but I hate that D is always left out of discussions like this.

                            1. 9

                              and Ada, heck D has it easy compared to Ada :)

                              1. 5

                                Don’t forget Nim!

                              2. 3

                                Yeah, me too. I really love D. Its metaprogramming alone is worth it.

                                For example, here is a compile-time parser generator:

                                https://github.com/PhilippeSigaud/Pegged

                                1. 4

                                  This is a good point. I had to edit out a part on that a language without major adoption is less suitable since it may not get the resources it needs to stay current on all platforms. You could have the perfect language but if somehow it failed to gain momentum, it turns into somewhat of a risk anyhow.

                                  1. 4

                                    That’s true. If I were running a software team and were picking a language, I’d pick one that appeared to have some staying power. With all that said, though, I very much believe D has that.

                                  2. 3

                                    And OCaml!

                                    1. 10

                                      In my opinion, until ocaml gets rid of it’s GIL, which they are working on, I don’t think it belongs in this category. A major selling point of Go, D, and rust is their ability to easily do concurrency.

                                      1. 6

                                        Both https://github.com/janestreet/async and https://github.com/ocsigen/lwt allow concurrent programming in OCaml. Parallelism is what you’re talking about, and I think there are plenty of domains where single process parallelism is not very important.

                                        1. 2

                                          You are right. There is Multicore OCaml, though: https://github.com/ocamllabs/ocaml-multicore

                                      2. 1

                                        I’ve always just written of D because of the problems with what parts of the compiler are and are not FOSS. Maybe it’s more straightforward now, but it’s not something I’m incredibly interested in investigating, and I suspect I’m not the only one.

                                        1. 14
                                      1. 1

                                        The section on compatibility seems unfair. In the maximal (Cargo) approach, it is posited that since CI runs on everything, you have strong evidence that they work together. But in the minimal (modules) approach, no such consideration is given. The same argument applies there, but better: in the maximal approach, you have to rerun CI for every library that transitively depends on you when you release a new version, invalidating all previous runs. But, in the minimal approach, every CI run stays valid until someone explicitly updates to the new version AND pushes a new tagged version of their own library. Even then, only that new tagged version requires a CI run. No other libraries are affected. I think it’s reasonable to assume people won’t publish new tagged versions of their libraries with broken dependencies, and so you’re much more likely to get a compatible set.

                                        In other words, assuming authors test their releases, the only way to get an untested configuration in the minimal world is if you combine multiple libraries together that share a transitive dependency. Even then, you know that at least one subset of libraries is a tested combination. The maximal world contains this failure mode and more: every time a package is published, every transitive dependency may get a broken combination, and you aren’t guaranteed that any of your libraries have been tested in combination.

                                        1. 2

                                          Sharing transitive dependencies is pretty common, at least in Rust world. As you pointed out, this cancels most of “you get configuration tested by author” advantage.

                                          There is a tradeoff between getting configuration tested by author and getting configuration tested by ecosystem. Another tradeoff is between silently getting new bugs and silently getting new bugfixes.

                                          1. 1

                                            I don’t believe it cancels out most of the advantage. The versions in the transitive dependencies also have to be different, which I believe will be rare due to how upgrades work. As for testing by the author vs testing by the ecosystem, all it takes is one library in the ecosystem depending on the same two libraries as you, and you get just as much ecosystem testing.

                                            I agree with the bug tradeoff. Personally, I prefer stability, but I can understand that others may have a different preference. I think in the minimal world, people who want updates can explicitly ask for that, and in the maximal world, it seems people are starting to add flags to allow the other direction (–minimal-versions).

                                            1. 1

                                              Sharing transitive dependencies with different versions is also common in Rust. Instead of making assertions, I probably should whip up a script to count and publish statistics, but exa/Cargo.lock should illustrate my point. exa is rather popular Rust command line application.

                                              How to read Cargo.lock: top section is serialization of graph so hard to read. Bottom section is checksum, sorted by package name and version. exa transitively depends on both num_traits 0.1 and 0.2, and winapi 0.2 and 0.3. This is typical.

                                              1. 1

                                                num-traits 0.1 actually depends on 0.2. None of the transitive dependencies there actually require 0.1 in a way that excludes 0.2 (only datetime requires ^0.1.35) as far as I can tell, so I see no reason it needs to be included in the build. Perhaps it’s included in the checksum for some other reason?

                                                edit: I have since learned that ^ on v0 dependencies only allows patch level updates. So ^0.1.35 means >=0.1.35, <0.2.

                                                Winapi 0.2 and 0.3 do appear to both be required in the build. This is due to the term_size crate using a ~0.2 constraint. While I do not have a windows machine to test right now, this commit bumped the version to 0.3. It was only a reorganization of imports, and I believe that all of the pub use entries in 0.3 would cover all of the old imports. I will test this out on a windows machine later tonight.

                                                None of the dependencies on v1 or greater require multiple versions. People tend to attempt to respect semver, so this is expected. Also note that out of 55 transitive dependencies, only those two libraries had multiple versions, and only one would possibly require any changes, and it was a v0 dependency. I believe this is also typical, and I have surveyed a large corpus of Go packages that use dep and had the same findings. Even if the tools allow stricter constraints, typically they weren’t needed.

                                                edit: Here’s a link to my analysis of these types of issues in the Go community: https://github.com/zeebo/dep-analysis

                                                edit edit: To be clear, currently in the Go community there is no easy or supported way to include multiple versions of the same package into your library or binary. The Rust community can, and so perhaps some norms around what types of code transformations are possible differ, causing it to happen more often. I think the fact that Go has been getting along fine without multiple versions is evidence that Go doesn’t need that feature as much, but does not imply that for Rust. I don’t mean to argue that minimal selection would be a fit for Rust, but I don’t think it has the problems that the post describes in the context of Go.

                                                1. 1

                                                  Oops, you are right. num-traits is using so-called semver trick, which explains it better than I ever can. For crates using semver trick, it is indeed normal for num-traits 0.1 to depend on num-traits 0.2. A good way to think about it is that post-0.2 release 0.1.x deletes 0.1 implementation and provides 0.1 interface compatible shim around 0.2 implementation instead.

                                                  lalrpop/Cargo.lock is probably a better example. LALRPOP is a parser generator and transitively depends on regex-syntax 0.4, 0.5, and 0.6, without semver trick. I admit this is not typical, but it is also not rare. Multiple versions support has been in Cargo since forever.

                                                  Your dep analysis is fascinating. Thanks a lot for letting us know.

                                                  1. 1

                                                    Thanks for another example. In this case, the Cargo.lock is shared for a number of workspaces, so many of the duplications are not actually present in the same artifact. Additionally, many of the different versions are present only in build-dependencies for some artifact. I analyzed the dependencies and reduced the set of real duplications down to these:

                                                    | workspace      | dependency   | duplicated    |
                                                    |----------------|--------------|---------------|
                                                    | lalrpop        | regex        | v0.8 and v1   |
                                                    | lalrpop        | regex-syntax | v0.5 and v0.6 |
                                                    | lalrpop        | unreachable  | v0.1 and v1   |
                                                    | lalrpop-snap   | regex-syntax | v0.4 and v0.6 |
                                                    | lalrpop-snap   | unreachable  | v0.1 and v1   |
                                                    

                                                    The first two duplications were fixed by upgrading docopt from 0.8 to 1.0. No code changes required, the tests passed, and would happen automatically with MVS. The third was fixed by upgrading string_cache from 0.7.1 to 0.7.3. Again, no code changes were required, the tests passed, and this would happen automatically. This also fixed the fifth duplication. The fourth duplication is the only one that caused any problems, as there were significant changes to regex-syntax between 0.4 and 0.5, and it directly depends on 0.4.

                                                    So in this case, there was only one dependency issue that would not have been solved by just picking the higher version out of about 70 dependencies, and the one failure was in a v0 dependency. So, while I agree they exist, I just don’t think they will be frequent, nor a significant source of pain.

                                                    In fact, the only times duplications happened were when “breaking” changes happened. The way the default version selector in Cargo exacerbates this by considering any minor version change in a v0 crate to be “breaking”. In only one example was it actually breaking, and in every other example, just using the largest version worked. In the MVS world, breaking changes require updating the major version, which will allow both copies to exist in the same artifact. So while sharing transitive dependencies is frequent, sharing transitive dependencies that do not respect semver is infrequent, and sharing transitive dependencies with incompatibilities within compatible semver versions is also infrequent, causing this to not be a problem in practice.

                                                    1. 1

                                                      and sharing transitive dependencies with incompatibilities within compatible semver versions is also infrequent, causing this to not be a problem in practice.

                                                      I don’t think you can reach this conclusion. If someone were to do this analysis, time is a critical dimension that must be accounted for. I also think you aren’t doing a correct treatment of semver. Namely, if I were in the Go world, regex-syntax would be at v6 rather than v0.6, to communicate its breaking changes. Each one of those minor version bumps had breaking changes. It simply may be the case that some breaking changes are bigger than others, and therefore, some dependents may not be affected.

                                                      With respect to time, there is often a period of time after which a core crate has a new semver release where large parts of ecosystem depend on both the new version (because some folks are eager to upgrade) and also the older version. For example, there was a period of time a ~year ago where some projects were building both regex 0.1 and regex 0.2, even when there were significant breaking changes in the 0.2 release. You wouldn’t observe this now because people have moved on and upgraded. So the collection of evidence to support your viewpoint is quite a bit more subtle than just analyzing a particular snapshot in time.

                                                      (To comment on the larger issue, my personal inclination is that I’d probably like a world with minimal version selection better just because it suits my sensibilities, but that I’m also quite happy with Cargo’s approach, and really haven’t experienced much if any pain with Cargo that could be attributed to maximal version selection.)

                                                      1. 1

                                                        Thanks for explaining the v6 vs v0.6 distinction better than I was able to. I was trying to get at that with the “breaking” paragraph. Cargo implicitly treats all minor version changes in the v0 major range as “breaking” by making the valid limit only in the minor range, in the same way it treats major versions in the v1 and above range as “breaking”. I think this is a great idea, but muddies the waters a bit on comparing ecosystems with respect to multiple versions of transitive dependencies. Like you said, in a Go world, it would be regex-syntax at v4 and v6, which would both be allowed in the binary at the same time.

                                                        About your point on talking about time, in a Go world, those would be regex v1 and regex v2, again, not causing any issues. I am claiming that it is rare that multiple versions of some package need to exist in the same artifact when they are within the same compatible semver range. For example, if both v1.2 and v1.3 are required in the binary at the same time. I agree an analysis through time is valuable, but rarity also depends on time, so sampling any snapshot will help estimate how often it happens.

                                                        In order to get an estimate for how often multiple semver compatible dependencies occur, I went through the git history of the above projects and their Cargo.locks, but only counting duplicates if they are of the form v0.X.Y and v0.X.Z or vX.Y.Z and vX.S.T. Again, v0 gets this special consideration because of the way that Cargo applies the default constraint. In order to make sure that the authors of these libraries weren’t pinning to some possibly older but semver compatible range, I checked their Cargo.tomls for any constraints that were not of the default form.

                                                        • LALRPOP had no such conflicts in 15 revisions back to 2015. Every constraint was of the default form.
                                                        • exa had no such conflicts in 115 revisions back to 2014. Every constraint was either default or "*".

                                                        There is no evidence in either of these repositories that at any time Cargo had to do anything other than pick the highest compatible semver version for any shared transitive dependencies.

                                                        This discussion has helped me understand better that the v0 range is going to be problematic for the Go modules system if people treat is as I expect and is encouraged: as a spot for breaking changes and experimentation. Cargo handles this gracefully by allowing breakage to be signaled in the minor version, but Go has no such design consideration. I hope that either a change is made to make this easier, or guidance is made in the community to avoid the problems.

                                                        1. 2

                                                          I am claiming that it is rare that multiple versions of some package need to exist in the same artifact when they are within the same compatible semver range.

                                                          Oh interesting, OK. I think I missed this! I think I would indeed say that this is consistent with my experience in the ecosystem. While I can definitely remember many instances at which two semver incompatible releases are compiled into the same binary, I can’t remember any scenario in which two semver compatible releases were compiled into the same binary. I imagine Cargo probably tries pretty hard to avoid that from ever happening, although truthfully, I can’t say that I know whether that’s a hard constraint or not!

                                                          This discussion has helped me understand better that the v0 range is going to be problematic for the Go modules system if people treat is as I expect and is encouraged: as a spot for breaking changes and experimentation. Cargo handles this gracefully by allowing breakage to be signaled in the minor version, but Go has no such design consideration. I hope that either a change is made to make this easier, or guidance is made in the community to avoid the problems.

                                                          Yeah that’s a good point. I can’t think of any libraries I’ve ever published (aside from maybe a few niche ones that nobody uses) that haven’t had to go through some kind of breaking changes before I was ready to declare an API as “stable.” Usually they only happen because other people start to actually use it. The Go ecosystem could technically just reform their conventions around what v1 means. IIRC, the npm ecosystem kind of pushes toward this by starting folks at 1.0.0 by default I think? But that may be tricky to pull off!

                                        1. 3

                                          I recently tried out Futhark on a whim, and it has been the most accessible way to get into GPGPU programming that I’ve found. I had a 40x speedup on one of my little programs with 40 lines of code. It’s still a v0, so a little rough around the edges sometimes, but in my experience the authors are super responsive on Github if you run in to any bugs.

                                          It really is a gem, and I really hope it continues. I know I’ll be using it for the foreseeable future.

                                          1. 1

                                            I recently found https://github.com/rs/jplot which isn’t limited to using characters, but is limited to iTerm2. You also might want to check out the https://github.com/gizak/termui library for inspiration on how to do higher resolution plots.

                                            1. 1

                                              I wonder if there’s a roadmap for this. There are some bits that still need work sprinkled throughout in the source and I’m not seeing a great description of the in memory and on-disk data structures to the point where one could do capacity planning (is it just protobuf dumped to disk?). I see benchmarks in the tests as well, but nothing to put the times and MB/sec in context that’s helpful enough to do capacity planning if I were to run them in a target environment.

                                              1. 1

                                                Thanks for the comments. I don’t really have a roadmap for it, but I think the most work remaining is fleshing out the UI and allowing for more kinds of queries (zooming in on specific quantiles or histograms, etc). The specific TODO you linked there is only an issue for the shutdown code: how long do you wait when doing a best effort dump to disk on exit? In general, I also tend to use the TODO comment more loosely to remind me about design tradeoffs in specific locations, in case those decisions aren’t optimal.

                                                Capacity planning is discussed here, but I agree that documentation around the disk layout would be good, as well as making that more prominent. The database tends to be very disk light in terms of I/O due to it only having to write on 10 minute (configurable) intervals, and only writing a small amount of data (around ~300 bytes per metric, regardless of number of observations). I added an issue to keep track of what you’ve identified.