1. 14

    I’m all for Rust, but let’s not shit on Go like that. Lack of deterministic destruction may make resource management a bit harder/more manual, but calling it “unacceptably crippled” is a hyperbole.

    1. 6

      Yeah, that last paragraph really ruined the article for me. Not only does it make me question the author’s character, it also casts doubt on the earlier technical parts; I’d recommend they delete it.

      1. 6

        I don’t think he’s talking about Go’s GC. I think he’s talking about how Spartan and locked down the language is (by design). I wouldn’t describe it as unacceptably crippled, in general, but for some use cases it really is.

        1. 5

          What use cases in particular? I can’t think of any that wouldn’t also lead you to say that the vast majority of languages are “unacceptably crippled”, which makes the phrasing less technically useful and more emotionally charged for no good reason.

          1. 2

            Fair point. I almost certainly would not have chosen those words, myself. Most languages would be disqualified for the use cases that Go is not a good choice for.

          2. 1

            Yeah there’s other ways in which Go is deliberately crippled besides the lack of deterministic destruction. Whether or not that crippling counts as unacceptable depends on what you want to do and what you’re used to, but I personally agree with the author on that point, and I hope they don’t take it out of the blog post in response to criticism.

        1. 6

          Why does anyone trust Keybase? They’ve been untrustworthy since they originally suggested uploading private GPG keys for convenience

          1. 4

            Exactly! Like hell I’m giving them my private key!

            IIRC, I created a key pair just for Keybase. Signed it with my key pair. That worked fine since nobody on Keybase checked it that I remember. That just inspires more confidence, right? ;)

            1. 3

              It’s the same as uploading an encrypted key to Dropbox, Google drive, etc. Yes, in theory you lose a tiny bit of security, but realistically your attacker needs to break AES to use your key, and such attacker capabilities usually aren’t included in most threat models.

              1. 1

                The keys weren’t encrypted with a passphrase for the web stuff to work seamlessly.

                1. 1

                  IIRC web stuff connects to keybase service on your computer to work

                  1. 5

                    It originally didn’t at the launch of Keybase. You had the option of cli tools (secure, you control the key) or uploading to their web servers for convenience

                  2. 1

                    Odd. The web app does scrypt (even says that on the login button) on the password, I’d be surprised if the derived key wasn’t used to encrypt the keys used for messaging.

                    1. 2

                      Unless you have a time machine you won’t be able to see what they used to do with uploaded GPG keys

                      1. 1

                        Indeed, because the backend is closed source.

                        1. 2

                          And even if it was open, because you can’t know that’s what they were actually running. (This is why E2E encryption and an open client is important, and an open backend is a security red-herring.)

                2. 3

                  This is one of those situations where if you’re a hardcore crypto-head, and have been managing your own PGP/GPG keys for years? You probably shouldn’t, but then it’s not FOR you.

                  It’s for people who want a reasonably secure, convenient way to use crypto to send/receive email, store files, and chat.

                  There’s no requirement that you upload your existing keys to them, you can always have them generate a fresh key and use it that way.

                  1. 1

                    Yes true but it is misleading to the non-technical users. Compromise of the Keybase servers meant compromise of their private keys, and as there was no forward secrecy in use…

                    1. 3

                      I disagree. I don’t think they ever claimed that users keys wouldn’t be compromised if they (Keybase) were.

                      This is a perfect example of the perfect being the enemy of the good.

                1. 1

                  They say that as operators of the server, they can’t get your ssh keys. That would be true if they didn’t also build and distribute the binaries for their open source client.

                  I’m not saying they’d do something nefarious, yadda yadda, what if a bad actor takes over their build infrastructure, look at node.js, yadda yadda. I think this side of the topic is well understood.

                  1. 1

                    I guess there is no way around that, if you want to do what they set out to do. There’s always a risk, and there is no mitigation against certain types of risk if you can’t or don’t want to compromise on features (they can’t, the feature is their service in this case). All you can do is make the risk understood (which I think they do a good job of, I certainly don’t feel they are not honest about that).

                    Ubuntu has their client packaged btw, as far as I know, I’m sure other distributions have too.

                    I’d be interested in what you would you suggest as a viable alternative approach?

                    1. 2

                      Ubuntu builds the client from source, then? (One would hope they’d always do that..) Let’s see… Hm, I can’t find a keybase package in Ubuntu. https://packages.ubuntu.com/search?keywords=keybase&searchon=names&suite=all&section=all

                      But yes, if/when distros build and publish the keybase client and put it under their respective security umbrellas, AND if some independent analysts declare that the client’s E2E works as advertised (server can’t read ssh keys nor messages etc), then that would solve some of my complaints. You see, I do not particularly worry about the upstream author of bash putting a back door in, because distros build it and put their reputations on the line. Same deal.

                      frkl, in general, keybase just needs to release their source code for the server and whatever related source code. That would solve all of my complaints eventually, as the community evaluates the code, breaks it, improves it, etc. (Or maybe we’d decide the hype wasn’t deserved. How should I know, yet?) If it’s legit, then I’m confident that most users will use the official instance anyway. I’m sure I would.

                      I just want them to pull that left hand out from behind their back. This is a security product. Show me your hands. Duh.

                      I might even be willing to accept keybase if they (or somebody) releases a free software alternative server that the client can connect to. Even if it has less features? (Am I negotiating with capitalists?) I guess this is called a reference implementation.

                      I just will NOT tell all my friends and family to use a closed source non-free communications platform again. (I brought big G a hundred innocent souls before I realized what was going on there… My heart is broken. Never again.)

                      1. 2

                        How does releasing the server source help audit the security of the system? You can’t be sure that the source they release is the source their running, and the whole point of E2E encryption is that you only need to audit the source of the endpoints, and in this case, that source is open. Additionally there has been a (paid for by keybase) 3rd party security audit: https://keybase.io/docs-assets/blog/NCC_Group_Keybase_KB2018_Public_Report_2019-02-27_v1.3.pdf

                        I do agree that it would help with the “keybase shuts down” scenario, though.

                        1. 1

                          Releasing the server source would be an act of good will, it would be morally correct (to me), and it would be ethically correct (among certain reputable segments of the internet population).

                          If we have faith in E2E, I guess we don’t have to trust the servers? Yes, that’s the whole point.

                          Of course, what if that’s not the source code that’s actually running?

                          Well, see https://signal.org/blog/private-contact-discovery/, specifically under the heading Trust But Verify. As you can see, the problem isn’t ‘solved’ but at least one group has taken a whack at it.

                          The signal server is open source. btw.. But as you know I’m sure, keybase is far more ambitious. kbfs makes me daydream…

                          1. 1

                            Economic and moral arguments are fine. I don’t intend to argue with those.

                            I’ve read that signal blog before, and it leaves me with some questions. For example, even if you’re sure you’re contacting the secure enclave, and that it’s running the software it’s claiming it is, the enclave has to have some sort of key to decrypt the messages you’re sending it, otherwise anyone could read them. How do you know someone else doesn’t have that key? Indeed, it seems like those first assumptions about correctly contacting the enclave are dubious: how do you know that they don’t just have the “secure enclave’s” key and are pretending to be one? Have you physically inspected their hardware and agree that the packets are being routed to the correct server? I’d love to learn why these things aren’t actually problems.

                            But until then, it seems like it has just put up some smoke and mirrors to distract you from just trusting them. It probably helps against attackers compromising their servers, but it shouldn’t be possible for that to be a threat in the first place. It’s fundamentally worse even if the source is open, because it’s strictly better when the server source is irrelevant. I can run keybase with zero trust in their servers, but I’ll always have to have some amount of trust in signal.

                            1. 1

                              Okay. I don’t know this part… What is the difference between the Keybase client and the Signal client that makes server trust unnecessary for one and necessary for the other?

                              1. 1

                                The contacts you send to signal being disguised requires you to trust their correct implementation of the secure enclave.

                        2. 1

                          Ubuntu builds the client from source, then? (One would hope they’d always do that..) Let’s see… Hm, I can’t find a keybase package in Ubuntu. https://packages.ubuntu.com/search?keywords=keybase&searchon=names&suite=all&section=all

                          Forget I said that, I’m an idiot. I did an ‘apt search’ on one of my machines and forgot I installed their ppa…

                          And yes, fair points…

                    1. 6

                      I am considering switching to other languages for my next project or migrating slowly. Will have to wait and see I suppose.

                      The main strengths of go (from my point of view) are good libraries and no annoying sync/async code barrier.

                      The main weaknesses are a runtime that makes using C libraries hard, and some feeling of kludgyness because users can’t define things like try or w.e. themselves. ‘Go 2’ doesn’t really change anything.

                      1. 4

                        I consider myself relatively neutral when it comes to Go as a language. What really keeps me from investing much time or attention in it is how its primary implementation isolates itself so completely as if to compel almost every library to be rewritten in Go. In the short term this means it will boost the growth of a vibrant ecosystem but I fear a longer term world where the only reasonable way to interoperate between new languages and systems which don’t fit into Go’s model is to open a socket.

                        While I don’t think we need to be alarmist about bloated electron apps but in general, we’re talking about many orders of magnitude in cost increase for language interoperation. This is not the direction we should be going and I fear Go has set a bad precedent with its answer to this problem. Languages will evolve and systems will too but if we have to climb higher and higher walls every time we want to try something new, we’ll eventually be stuck in some local optimum.

                        I’d like to see more investment in languages, systems, and runtimes sitting between them that can respond to new ideas in the future w/o involving entirely new revisions of a language with specific features responding to specific problems. Perhaps some version of Go 2 will get there but at the moment it seems almost stuck on optimizing for today’s problems rather than looking at where things are going. Somewhere in there is a better balance and I hope they find it.

                        1. 4

                          Yeah - I really want to use either GNU guile or Janet to write http handlers for Go, with the current system it is not really possible to do it well.

                          There are multiple implementations of Lua in Go for the same reasons, poor interop if you aren’t written in Go and want two way calls.

                          1. 3

                            A crucial part of this is that go was explicitly, deliberately created as a language to write network servers in.

                            In that context, of course the obvious way to interop with a go program is to open a socket.

                            1. 2

                              Sure. Priorities make RPC look like their main goal but the cost of an RPC call is on an entirely different level than a function call and comes with a lot of complexity from accidental distributed system is now required to call some logic written in another language.

                              At a company where everything is already big and complex, this may seem like a small price but it’s becoming a cost today so we see people opting to write pure Go libraries and pass on shareable libraries or duplicating effort. In many cases this becomes a driver to kill diversity in technical choices that I talk about in my original comment above.

                              It’s an obvious problem but the Go team would rather drive people away from systems level interoperability for Go’s short term gains. They claim that it’s be too hard to support a real FFI option or that they are short on resources but other runtimes do a better job of this so it’s possible and secondarily, Go supposedly isn’t a Google project but a community, yet we see it clearly being managed from one side of this coin.

                              1. 1

                                In my experience, it’s the quality of the golang tools driving this.

                                For instance: I found it easier to port a (smallish) library to go and cross-compile the resulting go code, than to cross-compile the original c library.

                                I initially considered porting to rust, which is imo a delightful language, but even though cross-compilation is much easier in rust than in c (thanks to rustup), it doesn’t compare to go.

                                The process for c:

                                • For each target arch, research the available compiler implementations; packages are often unavailable or broken, so you’ll be trying to build at least a few from source, probably on an unsupported platform.

                                The process for rust:

                                • For each target arch, ask rustup to fetch the toolchain. It’ll tell you to install a bunch of stuff yourself first, but at least it tends to work after you do that.

                                The process for go:

                                • Set an environment variable before running the compiler.
                                1. 1

                                  … so we see people opting to write pure Go libraries and pass on shareable libraries or duplicating effort. In many cases this becomes a driver to kill diversity in technical choices that I talk about in my original comment above.

                                  It’s unclear to me why having another implementation of something instead of reusing a library reduces diversity rather than increasing it.

                                  They claim that it’s be too hard to support a real FFI option or that they are short on resources but other runtimes do a better job of this so it’s possible

                                  I’m personally a maintainer of one of the most used openssl bindings for Go, and I’ve found the FFI to be a very real option. That said, every runtime has it’s own constraints and difficulties. Are you aware of any ways to do the FFI better that would work in the context of the Go runtime? If not, can you explain why not? If the answer to both of those is no, then your statements are just unfounded implications and fear mongering.

                                  Go supposedly isn’t a Google project but a community, yet we see it clearly being managed from one side of this coin.

                                  And yet, I’m able to have changes included in the compiler and standard library, provide feedback on proposals, and have stewards of the language directly engage with my suggestions. My perspective is that they do a great job of listening to the community. Of course they don’t always agree with everything I say, and sometimes I feel like that’s the unattainable bar that people hold them to in order to say that it’s a community project.

                                  1. 1

                                    The specific issues with Go FFI interop is usually around dealing with structured data rather than buffers of bytes and integers. Data layout ABI options would be a big plus. Pinning shared data would also help tremendously in avoiding extra copying or marshaling that is required in many of these cases. On the other side, calling into Go could be made faster in a number of ways, particularly in being able to cache thread local contexts for threads Go doesn’t manage (these are currently setup and torn down for every call in this direction).

                                    There are also plenty of cases where construction of movable types could be supported with proper callbacks provided but instead Go opts to disallow sharing any of these data types entirely.

                                  2. 1

                                    They claim that it’s be too hard to support a real FFI option or that they are short on resources but other runtimes do a better job of this so it’s possible and secondarily,

                                    It’s been a while since I actively used and followed Go, isn’t the problem that they have to forgo the ‘run a gazillion Goroutines’ if they wanted to support real FFI? To support an extremely large number of goroutines, they need small, but growable stacks, which means that they have to do stack switching when calling C code. Plus it doesn’t have the same call conventions.

                                    In many respects have designed themselves into a corner that it is hard to get out of, without upsetting users and/or breaking backwards compat. Of course, they may be happy with the corner that they are in.

                                    That said, Go is not alone here, e.g. native function calls in Java are also expensive. It seems that someone has made an FFI benchmark ;):

                                    https://github.com/dyu/ffi-overhead

                                2. 2

                                  I generally symphatize with your main argument (personally, I also miss easier C interop, esp. given that it was advertised as one of the initial goals of the language) - but on the other hand, I don’t think you’re giving justice to the language in this regard.

                                  Specifically, AFAIK the situation with Go is not really much different from other languages with a garbage collector - e.g. Java, C#, OCaml, etc, etc. Every one of them has some kind of a (more or less tricky to use) FFI interface to C; in case of Go it’s just called cgo. Based on your claim, I would currently assume you don’t plan to invest much time in any other GCed language either, is that right?

                                  1. 2

                                    I can’t speak to modern JVMs but OCaml and C# (.Net Cote and Mono) both have much better FFIs both in support for passing data around and in terms of performance costs. It’s hard to understate this but CGo is terribly slow compared to other managed language interop systems and is getting slower not faster over time.

                                    I’ll let folks draw their own conclusions on whether this is intentional or just a limitation of resources but the outcome is a very serious problem for long term investments in a language.

                                    1. 1

                                      I think it’s important to quantify what “terribly slow” is. It’s on the order of ~100ns. That is more than sufficient for a large variety of applications.

                                      It also appears from the implication that you believe it’s intentionally being slow. Do you have any engineering evidence that this is happening? In other words, are you aware of any ways to make the FFI go faster?

                                      1. 1

                                        Not in my experience. Other than trivial call with no args and return nothing, it is closer to 1 microsecond for Go calling out in many cases because of how argument handling has to be done and around 10 microseconds for non-Go code calling Go.

                                        1. 1

                                          It is indeed slower to call from C into Go for various reasons. Go to C calls can also be slower depending on how many arguments contain pointers because it has safety checks to ensure that you’re handling garbage collected memory correctly (these checks can be disabled). I don’t think I’ve ever seen any benchmarks place it at the microsecond level, though, and I’d be interested if you could provide one. There’s a lot of evidence on the issue tracker (here or here for example) that show that there is interest in making cgo faster, and that good benchmarks would be happily accepted.

                                    2. 2

                                      Every one of them has some kind of a (more or less tricky to use) FFI interface to C; in case of Go it’s just called cgo. Based on your claim, I would currently assume you don’t plan to invest much time in any other GCed language either, is that right?

                                      LuaJIT C function calls are apparently as fast as from C (and under some circumstances faster):

                                      https://nullprogram.com/blog/2018/05/27/

                                1. 6

                                  Mostly the same old same old, but I do have to agree the inability to reflect on non-public fields can lead to some pretty stupid serialization code.

                                  1. 3

                                    you can reflect on unexported fields, but it’s generally a bad idea. The json package doesn’t do it as a design choice for the json package, but it’s not a limitation of the language in general.

                                    Here’s an example of reflecting on unexported struct fields, which I do not recommend doing for anything other than debug logging: https://play.golang.org/p/swNXd266OVL

                                    If the json marshaler serialized unexported fields, then any consumers of your library that serialized your data would see their serialized data change when you’ve made changes to unexported fields. That breaks the design philosophy of unexported vs exported identifiers: you pledge to those that import your code that exported identifiers will continue to exist and be supported, and you retain for yourself the right to alter any unexported identifiers as you see fit, without having to coordinate with people who import your code. The json marshaler serializing unexported fields would break the second principle.

                                    In fact I had a struct with 74 private fields that I wanted to serialize with json and was forced to make all fields public and update all uses throughout the large app.

                                    Look I’m gonna go out on a limb here and guess that a struct type with 74 fields is not a particularly great design to begin with, or that such code would be easy to work with in any language.

                                    1. 1

                                      The consumers of my library don’t serialize its data; they ask it to serialize its own data in a few well-specified ways. Is it actually an expected contract that I can take any arbitrary struct returned from any library, serialize/deserialize it myself, and expect it to work? That seems…a bit too much to expect.

                                      Conflating code visibility with serialization visibility seems like a non sequitur to me. If I have a library function func (*thing) GenerateJSON() string that’s supposed to generate the JSON for a thing, and that function uses a struct whose only purpose is to drive the JSON library to generate the right JSON, there’s no Go code that needs source-level visibility into that struct. So it just seems goofy that I have to mark those fields “public”.

                                      1. 1

                                        Is it actually an expected contract that I can take any arbitrary struct returned from any library, serialize/deserialize it myself, and expect it to work? That seems…a bit too much to expect.

                                        are you suggesting that json.Marshal, when called from package A, should be incapable of serializing a value of a type defined in package B?

                                        Conflating code visibility with serialization visibility seems like a non sequitur to me.

                                        If package A exports A.Thing, and package B imports package A and uses an A.Thing and wants to serialize that to json for whatever reason (let’s say package B defines a type that has a field of type A.Thing in an API response, for example), the author of package A should be able to rename the unexported fields of A.Thing without the author of package B having the output of their program changed. If you want to change the json name of a field you can do so with a struct tag, and if you want to mark an exported field as being hidden from the json, you can also do that with a struct tag. The behavior of json.Marshal is such that a struct defined with no json tags at all will serialize all of its exported fields and none of its unexported fields. The default has to be something, and that seems like a reasonable default. I think it would be more confusing if unexported fields were serialized by default.

                                        If I have a library function func (*thing) GenerateJSON() string that’s supposed to generate the JSON for a thing

                                        I’m not sure where you’re going with this. encoding/json defines the json.Marshaler interface for this purpose. If the default doesn’t suit the output that you want, you can readily change the output to be any arbitrary output you want by satisfying the json.Marshaler interface.

                                        please stop saying public. They’re exported or unexported identifiers. The exported/unexported concept in Go is different than the public/private concept in Java/C++/etc, and the naming reflects that.

                                        1. 1

                                          First, to get a terminology problem out of the way, s/public/exported/g in what I said.

                                          The problem is not with a “default”. It would be fine if the default was to ignore unexported fields. As far as I can tell, it’s impossible to serialize unexported fields using the json package. You just get “struct field xxx has json tag but is not exported”. [0]

                                          So when marshaling/unmarshaling is not simply a matter of copying exported fields, to use json internally you end up making another unexported struct with exported fields and copying data into that, or some such workaround.

                                          [0] https://play.golang.org/p/FHvX6x8_61k

                                          1. 1

                                            and if you’re serializing something to json, you’re doing so … why? To make the data available externally. The very act of serializing the data to json is an act of exporting; to make it available beyond the confines of the current binary, perhaps to other programs written by yourself, things written by other people, or to even future versions of the same application. That’s the fundamental theory of what exported fields are: things that are a part of public APIs.

                                            You either have an exported or an unexported struct type. If it’s an unexported struct type, what difference does it make whether the fields are exported or not? If it’s an exported struct type, why would you want to serialize an unexported field? It’s not going to show up in docs for your package, your consumers will be totally confused as to where it comes from, and again, once it gets serialized, you’re no longer free to make unannounced changes to that field.

                                            This really seems like grasping at straws. I have literally never found this to actually impede my work in seven years of using Go.

                                            1. 2

                                              This is kind of getting into the weeds so it should probably move to some esoteric mailing list. :)

                                              My library writes JSON that only it reads. The JSON itself isn’t a public API; its contents are not documented. The fact my code wants to reveal fields to itself through JSON doesn’t mean it wants to reveal those fields to other Go code. I’m totally free to make changes to the JSON and the code, as long as they’re backward/forward compatible to my code.

                                              “Exported” does not have an identical meaning in code and in serialization. Consider that I can serialize a gzip.Writer into JSON, because it has some exported fields. I’m pretty sure that is purely accidental as far as the authors of the gzip package are concerned, and obviously if you unmarshal that JSON you don’t get a working gzip.Writer. I don’t think what they meant by exporting the Header field was “please serialize this”; they just meant it’s OK to use it directly in your code at runtime.

                                              Anyway, I didn’t say this was a deal-breaker, just that it’s goofy. An unexported struct type with exported fields — to me the only non-goofy reason for that would be when the struct is implementing an interface with those fields in it. Instead, I’m doing it because otherwise it would be arbitrarily restricted from being serialized in the most convenient way, that is, by using the json package.

                                    2. 1

                                      Just in case you want to upset anyone who looks at your code, you can use https://github.com/zeebo/sudo to remove the read-only restriction on a reflect value.

                                    1. 1

                                      I think that this looks quite nice. I am annoyed by chat options that are not truly multi-device. I do like that the client is open-source.

                                      Why can’t old chats be synced to new devices, though? You could sync them via other devices, couldn’t you?

                                      1. 1

                                        Non-ephemeral chats are synced to new devices as described in the FAQ:

                                        Non-ephemeral messages persist until the user explicitly deletes them and are E2E synced to new devices. This produces a Slack-like experience, only encrypted! So when you add someone to a team, or add a new device for yourself, the messages are unlocked.

                                        1. 1

                                          But why only non-ephemeral ones? The article mentioned forward security but it makes no real sense to me. You could use new forward secure keys for the new communication…

                                          1. 2

                                            I thought that don’t-sync-to-new-devices was a feature. I thought the ephemeral option was for those messages that you want to send but not keep. Messages that don’t have value in the future (or do have liability in the future).

                                            1. 1

                                              ah, alright. that does make sense. For me, a new trusted device is a relatively arbitrary boundary to bounce ephemeral messages on, though.

                                      1. 5

                                        Cross comment from HN:

                                        Show us the server: https://github.com/keybase/client/issues/6374

                                        1. 7

                                          this is not really relevant to the security claims in the article IMO. It is important separately of course.

                                          1. 6

                                            I would love it if drive-by passive aggression like that comment from HN stayed on HN.

                                            1. 12

                                              That was painful to read. Such entitled attitudes.

                                              1. 2

                                                yep, and unfortunately some of them come from lobsters.

                                              2. 7

                                                This is important, IMHO. At this point keybase is yet another walled garden. Investing in them means that you are subjected 100% to their whims and future success (or lack thereof). It’s painful when your communication is shut down because a company decided to go do something else, or close up shop.

                                                1. 4

                                                  It isn’t. If the claims are true, that things are encrypted on the devices, which they seem they are (and the source is open source) then it doesn’t matter what happens on the server from a security perspective.

                                                  1. 1

                                                    Nope, it definitely is. You may be able to recover your data, but you’ll then be searching quite urgently for a service to replace it since the proprietary server stuff is unavailable.

                                                2. 2

                                                  How about you spend 8 hours a day and make a great library that Keybase will really want to use in their backend, and make your library use GPLv3. Then they will have to open source.

                                                  That’s not true. If the backend is not distributed to anyone then it would not need to be open source.

                                                  1. 1

                                                    Is this also true of the AGPL?

                                                  2. 0

                                                    I don’t have a clear understanding of how keybase server works. Can someone provide details.

                                                    1. 2

                                                      https://keybase.io/docs/server_security has details about what the server is responsible for, and what clients trust and verify from them.

                                                  1. 4

                                                    I’ve always found myself overly interested in the busy beaver function: simple formulation, but it outpaces any computable function, no matter how many factorials and exponents you throw in!

                                                    Scott Aaronson has a great write-up: https://www.scottaaronson.com/writings/bignumbers.html

                                                    1. 2

                                                      I totally agree. Having a M.S. in math, the things that would interest me the most about CS would be things like the halting problem, busy beaver function, etc. For example, another Scott Aaronson link (https://www.scottaaronson.com/busybeaver.pdf) gives explicit descriptions of small Turing machines that cannot be proven to halt using ZFC (the axioms underlying most of modern mathematics), or proving they halt is equivalent in difficulty to proving Goldbach’s conjecture or the Riemann hypothesis.

                                                    1. 9

                                                      Bit too late for me. Doing wonderfully with Keybase git for private repos.

                                                      1. 5

                                                        Indeed. I don’t understand the appeal for private repos that aren’t also private to the host. I won’t be using them even though they’re free.

                                                        1. 1

                                                          Are such repos as easy for people to start using as the gitlab way?

                                                          1. 1

                                                            It’s really easy to use, but does not compare to gitlab in terms of features. But since I just need secure, distributed place to access my repos, it’s a good match.

                                                        1. 7

                                                          Using polymorphic variants you can strengthen the return type. For example:

                                                          match foo 2 with 
                                                            | `Y y -> y 
                                                            | `Z -> 0
                                                          

                                                          will continue to compile with both

                                                          let foo y = if y == 0 then `Z else `Y (y + 1)
                                                          

                                                          and the strengthened

                                                          let foo y = `Y (y + 1)
                                                          
                                                          1. 6

                                                            You know someone is over-hyping Rust (or is just misinformed) when you see statements like

                                                            Which means there’s no risk of concurrency errors no matter what data sharing mechanism you chose to use

                                                            The borrow checker prevents data races which are involved in only a subset of concurrency errors. Race conditions are still very possible, and not going away any time soon. This blog post does a good job explaining the difference.

                                                            Additionally, I have my worries about async/await in a language that is also intended to be used in places that need control over low level details. One library that decides to use raw I/O syscalls on some unlikely task (Like error logging) and, whoops, there goes your event loop. Bounded thread pools don’t solve this (What happens if you hit the max? It’s equivalent to a limited semaphore), virtual dispatch becomes more of a hazard (Are you sure every implementation knows about the event loop? How can you be sure as a library author?), what if you have competing runtime environments (See twisted/gevent/asyncio/etc. in the Python community. This may arguably be more of a problem in Rust given it’s focus on programmer control), and the list goes on. In Go, you literally never have to worry about this, and it’s the greatest feature of the language.

                                                            1. 1

                                                              You know someone is over-hyping Rust (or is just misinformed) when you see statements like

                                                              It doesn’t help that they state (or did state until recently) on their website that Rust was basically immune to any kind of concurrency error.

                                                              1. -1

                                                                That definition of “race condition - data race” essentially refers to a operational logic error on the programmer’s side. As in, there’s no way to catch race conditions that aren’t data races via a compiler, unless you have a magical business-logic-aware compiler, at which point, you wouldn’t need a programmer.

                                                                As far as the issues with async I/O go… well, yes. Asyncio wouldn’t solve everything. But asyncio also wouldn’t necessarily have to be single threaded. It could just meant that a multi-threaded networking application will now spend less resources on context-switching between threads. But the parallelism of threads > cpu_count still comes useful for various blocking operations which may appear here and there.

                                                                As far as GO’s solution goes, their solution to the performance issue isn’t that good. Since goroutines have significant overhead. Much less than a native thread, but still, considerably more overhead than something like MIO.

                                                                The issue you mentioned as an example, hidden sync I/O syscall by some library, can well happen in a goroutine run function just as well, the end-result of that will essentially be a OS native thread being blocked, much like in Rust. At least, as far as my understanding of goroutine goes, that seems to be the case.

                                                                Granted, working with a “pool” of event loops representing multiple threads might be harder than just using goroutines, but I don’t see it as being that difficult.

                                                                1. 5

                                                                  That definition is the accurate, correct definition. It’s important to state that Rust helps with data races, and not race conditions in general. Even the rustonomicon makes this distinction clear.

                                                                  The discussion around multiple threads seems like a non-sequitur to me. I’m fully aware that async/await works fine with multiple threads. I also don’t understand why the performance considerations of goroutines were brought into the picture. I’m not making any claims about performance, just ease of use and programmer model. (Though, I do think it’s important to respond that goroutines are very much low enough overhead for many common tasks. It also makes no sense to talk about performance and overhead outside of the context of a problem. Maybe a few nanoseconds per operation is important, and maybe it isn’t.)

                                                                  The issue I mentioned does not happen in Go: all of the syscalls/locks/potentially blocking operations go through the runtime, and so it’s able to deschedule the goroutine and let others run. This article is another great article about this topic.

                                                                  It’s great that you’re optimistic about the future direction Rust is taking with it’s async story. I’m optimistic too, but that’s because I have great faith in the leadership and technical design skills of the Rust community to solve these problems. I’m just pointing out that they ARE problems that need to be solved, and the solution is not going to be better than Go’s solution in every dimension.

                                                                  1. 0

                                                                    The issue I mentioned does not happen in Go: all of the syscalls/locks/potentially blocking operations go through the runtime, and so it’s able to deschedule the goroutine and let others run.

                                                                    Ok, maybe I’m mistaken here but:

                                                                    “Descheduling a goroutine”, when a function call is blocking, descheduling a goroutine has the exact same cost as descheduling a thread, which is huge.

                                                                    Secondly, go is only using a non-blocking syscall under the hood for networking I/O calls at the moment. So if I want to wait for an operation on any random file or wait for an asynchronous prefetch call, I will be unable to do so, I have to actually block the underlying thread that the goroutine is using.

                                                                    I haven’t seen any mention of “all blocking syscalls operations” being treated in an async manner, they go through the runtime, yes, but the runtime may just decide that it can do nothing about it other than let the thread be de-scheduled as usual. And, as far as I know, the runtime is only “smart” about networking I/O syscalls atm, the rest are treated like a blocking operation/

                                                                    Please correct me if this is wrong.

                                                                    1. 2

                                                                      descheduling a goroutine has the exact same cost as descheduling a thread, which is huge.

                                                                      A goroutine being descheduled means it yields the processor and calls into the runtime scheduler, nothing more. What happens to the underlying OS threads is another matter entirely. This can happen at various points where things could block (e.g. chan send / recv, entering mutexes, network I/O, even regular function calls), but not at every such site.

                                                                      the runtime is only “smart” about networking I/O syscalls atm

                                                                      Yes, sockets and pipes are handled by the poller, but what else could it be smarter about? The situation may well be different on other operating systems, but at least on Linux, files on disk are always ready as far as epoll is concerned, so there is no need to go through the scheduler and poller for those. In that case, I/O blocks both the goroutine and the thread, which is fine for Go. For reference, in this situation, node.js uses a thread pool that it runs file I/O operations on, to avoid blocking the event loop. Go doesn’t really need to do this under the covers, though, because it doesn’t have the concept of a central event loop that must never be blocked waiting for I/O.

                                                                      1. 2

                                                                        Descheduling a goroutine is much cheaper than descheduling a thread. Goroutines are cooperative with the runtime, so they ensure that there is minimal state to save when descheduling (no registers, for example). It’s on the order of nanoseconds vs microseconds. Preemptive scheduling helps in a number of ways, but typically causes context switching to be more expensive: you have to be able to stop/start at any moment.

                                                                        Go has an async I/O loop, yes, but it runs in a separate managed thread by the runtime. When a goroutine would wait for async I/O, it parks itself with the runtime, and the thread the goroutine was running on can be used for other goroutines.

                                                                        While the other syscalls do in fact take up a thread, critically, the runtime is aware when a goroutine is going to enter a syscall, and so it can know that the thread will be blocked, and allow other goroutines to run. Without that information, you would block up a thread and waste that extra capacity.

                                                                        The runtime manages a threadpool and ensures that GOMAXPROCS threads are always running your code, no matter what syscalls or I/O operations you’re doing. This is only possible if the runtime is aware of every syscall or I/O operation, which is not possible if your language/standard library are not desiged to provide. Which Rust’s doesn’t, for good reasons. It has tradeoffs with respect to FFI speed, control, zero overhead, etc. They are different languages with different goals, and one isn’t objectively better than the other.

                                                                        1. 2

                                                                          And, as far as I know, the runtime is only “smart” about networking I/O syscalls atm, the rest are treated like a blocking operation/

                                                                          Pretty much everything that could block goes through sockets and pipes though. The only real exception is file I/O, and file I/O being unable to be epolled in a reasonable way is a kernel problem not a Go problem.

                                                                  1. 8

                                                                    As someone who is a total stranger to Elm, its dev and its community, but was interested for a long time in learning this language, I wonder if this opinion reflects the feeling of the “great number” or not.

                                                                    1. 21

                                                                      I have to say that I personally can very much see where he’s coming from. GitHub contributions are dealt with in a very frustrating way (IMO they’d do better not allowing issues and PRs at all). There’s a bit of a religious vibe to the community; the inner circle knows what’s good for you.

                                                                      That said, they may very well be successful with their approach by a number of metrics. Does it hurt to loose a few technically minded independent thinkers if the language becomes more accessible to beginners?

                                                                      Where I see the largest dissonance is in how Elm is marketed: If the language is sold as competitive to established frameworks, you’re asking people to invest in this technology. Then turning around and saying your native modules are gone and you shouldn’t complain because no one said the language was ready feels a bit wrong.

                                                                      1. 7

                                                                        Yeah when I look at the home page, it does seem like it is over-marketed: http://elm-lang.org/

                                                                        At the very least, the FAQ should probably contain a disclaimer about breaking changes: http://faq.elm-community.org/

                                                                        Ctrl-F “compatibility” doesn’t find anything.

                                                                        It’s perhaps true that pre-1.0 software is free to break, but it seems like there is a huge misunderstanding in the community about compatibility. The version number doesn’t really mean much in my book – it’s more a matter of how many people actually rely on the software for production use, and how difficult their upgrade path is. (Python 3 flauted this, but it got by.)

                                                                        I think a lot of the conflict could be solved by making fewer promises and providing some straightforward, factual documentation with disclaimers.

                                                                        I watched the “What is Success?” talk a couple nights ago and it seemed like there is a lot of unnecessary conflict and pain in this project. It sounds like there is a lot to learn from Elm though – I have done some stuff with MUV and I like it a lot. (Although, while the types and purity probably help, but you can do this in any language.)

                                                                        1. 4

                                                                          I watched the “What is Success?” talk a couple nights ago and it seemed like there is a lot of unnecessary conflict and pain in this project

                                                                          I watched the talk also, after another… Lobster(?)… Posted it in another thread. My biggest takeaway was that Evan really doesn’t want to deal with an online community. People at IRL meetups, yes. Students in college, yes. People/companies online trying to use the language? No. His leading example of online criticism he doesn’t want to deal with was literally “Elm is wrong” (he quoted without any context, which isn’t that helpful. But maybe that was all of it.)

                                                                          That’s fine. He’s the inventor of the language, and the lead engineer. He probably does have better things to do. But as an outsider it seems to me that someone has to engage more productively with the wider community. Our, just come out and say you don’t care what they think, you’ll get what you’re given, and you can use it if you choose. But either way communicate more clearly what’s going on, and what to expect.

                                                                      2. 14

                                                                        I’ve shipped multiple production applications in Elm and attempted to engage with the community and I can say that their characterization perfectly matches mine.

                                                                        Native modules being removed in particular has caused me to no longer use Elm in the future. I was always ok with dealing with any breakage a native module might cause every release, and I’m even ok with not allowing them to be published for external consumption, but to disallow them completely is unreasonable. I’m sure a number of people feel the same way as I do, but it feels impossible to provide meaningful feedback.

                                                                        1. 10

                                                                          I work for a company that began using Elm for all new projects about a year and a half ago. That stopped recently. There are several reasons that people stopped using Elm. Some simply don’t like the language. And others, like the author of this post, want to like the language but are put off by the culture. That includes me. This article closely resembles several conversations I’ve had at work in the past year.

                                                                        1. 0

                                                                          This is ill-advised.

                                                                          You cannot define 1/0 and still have a field. There’s no value that works. Even when you do things like the extended real numbers where x/0 = infinity, you’re really just doing a kind of shorthand and you acknowledge that the result isn’t a field.

                                                                          You can of course define any other algebraic structure you want and then say that operating on the expression 1/0 is all invalid because you didn’t define anything else and no other theorem applies, but this is not very helpful. You can make bad definitions that don’t generalise, sure, definitions that aren’t fields. But to paraphrase a famous mathematician, the difficulty lies not in the proofs but in knowing what to prove. The statement “1/0 = 0 and nothing else can be deduced from this” isn’t very interesting.

                                                                          1. 1

                                                                            Could you explain why, formally, defining 1/0=0 means you no longer have a field?

                                                                            1. 7

                                                                              I want to make an attempt to clarify the discussion here because I think there is some substance I found interesting. I don’t have a strong opinion about this.

                                                                              The article actually defines an algebraic structure with three operators: (S, +, *, /) with some axioms. It happens that these axioms makes it so (S, +, *) is a field (just like how the definition of a field makes (S, +) a group).

                                                                              The article is right in saying that these axioms do not lead to a contradiction. And there are many non-trivial such structures.

                                                                              However, the (potential) issue is that we don’t know nearly as much about these structures than we do about fields because any theorem about fields only apply to (S, +, *) instead of (S, +, *, /). So all the work would need to be redone. It could be said that the purpose of choosing a field in the first place is to benefit from existing knowledge and familiar expectations (which are no longer guarantteed).

                                                                              I guess formally adding an operator means you should call it something else? (Just like how we don’t call fields a group even though it could be seen as a group with an added * operator.)

                                                                              This has no bearing on the 1/0 = 0 question however, which still works from what’s discussed in the article.

                                                                              1. 1

                                                                                As I understand it, you’ve only defined the expression 1/0 but you are saying that /0 isn’t shorthand for the multiplicative inverse of 0 as is normally the case for /x being x^-1, by definition. Instead, /0 is some other kind of magical non-invertible operation that maps 1 into 0 (and who knows what /0 maps everything else into). Kind of curious what it has to do with 0 at all.

                                                                                So I guess you can do this, but then you haven’t defined division by zero at all, you’ve just added some notation that looks like division by zero but instead just defined some arbitrary function for some elements of your field.

                                                                                If you do mean that /0 is division by zero, then 1/0 has to be, by definition, shorthand for 1*0^-1 and the arguments that you’ve already dismissed apply.

                                                                                1. 4

                                                                                  The definition of a field makes no statements about the multiplicative inverse of the additive identity (https://en.wikipedia.org/wiki/Field_(math)#Classic_definition). Defining it in a sound way does not invalidate any of the axioms required by the field, and, in fact, does define division by zero (tautologically). You end up with a field and some other stuff, which is still a field, in the same way that adding a multiplication operator on a group with the appropriate properties leaves you with a group and some other stuff.

                                                                                  The definition of the notation “a / b => a * b^-1” assumes that b is not zero. Thus, you may define the case when b is 0 to mean whatever you want.

                                                                                  That people want to hold on to some algebraic “identities” like multiplying by the denominator cancels it doesn’t change this. For that to work, you need the assumption that the denominator is not zero to begin with.

                                                                                  1. 1

                                                                                    In what way, whatever it is you defined /0 to be, considered to be a “division”? What is division? Kindly define it.

                                                                                    1. 3

                                                                                      Division, a / b, is equal to a * b^-1 when b is not zero.

                                                                                      1. 2

                                                                                        And when b is zero, what is division? That’s the whole point of this argument. What properties does an operation need to have in order to be worthy of being called a division?

                                                                                        1. 3

                                                                                          Indeed, it is the whole point. For a field, it doesn’t have to say anything about when you divide by zero. It is undefined. That doesn’t mean that you can’t work with and define a different, but still consistent, structure where it is defined. In fact, you can add the definition such that you still have the same field, and more.

                                                                                          edit: Note that this doesn’t mean that you’re defining a multiplicative inverse of zero. That can’t exist and still be a field.

                                                                                          1. 1

                                                                                            In what way is it consistent? Consistent with what? As I understand it, you’re still saying that the expression 1/0 is an exception to every other theorem. What use is that? You still have to write a bunch of preconditions, even in Coq, saying how the denominator isn’t zero. What’s the point of such a definition?

                                                                                            It seems to me that all of this nonsense is about not wanting to get an exception when you encounter division by zero, but you’re just delaying the problem by having to get an exception whenever you try to reason with the expression 1/0.

                                                                                            1. 3

                                                                                              I mean that the resulting structure is consistent with the field axioms. The conditions on dividing by zero never go away, correct. And yes, this is all about avoiding exceptions in the stack unwinding, programming language sense. The article is a response to the statements that defining division by zero in this way causes the structure to not be a field, or that it makes no mathematical sense. I am also just trying to respond to your statements that you can’t define it and maintain a field.

                                                                                              1. 1

                                                                                                It really doesn’t make mathematical sense. You’re just giving the /0 expression some arbitrary value so that your computer doesn’t raise an exception, but what you’re defining there isn’t division except notationally. It doesn’t behave like a division at all. Make your computer do whatever you want, but it’s not division.

                                                                                                1. 5

                                                                                                  Mathematical sense depends on the set of axioms you choose. If a set of axioms is consistent, then it makes mathematical sense. You can disagree with the choices as much as you would like, but that has no bearing on the meaning. Do you have a proof that the resulting system is inconsistent, or even weaker, not a field?

                                                                                                  1. 1

                                                                                                    I don’t even know what the resulting system is. Is it, shall we say, the field axioms? In short, a set on which two abelian operations are defined, with two distinct identities for each abelian operation, such that one operation distributes over the other? And you define an additional operation on the distributing operation that to each element maps its inverse, except for the identity which instead is mapped to the identity of the distributed-over operation?

                                                                                                    1. 2

                                                                                                      It’s a field where the definition of division is augmented to include a definition when the divisor is zero. It adds no new elements, and all of the same theorems apply.

                                                                                                      1. 1

                                                                                                        I’m bailing out, this isn’t a productive conversation for either of us. Sorry.

                                                                                                        1. 1

                                                                                                          You are correct. The field axioms are all still true, even if we extend / to be defined on 0.

                                                                                                          The reason for this is that the axioms never “look at” any of the values x/0. They never speak of them. So they all hold regardless of what x/0 is.

                                                                                                          That said, even though you can define x/0 without violating axioms it doesn’t mean you should. In fact it seems like a very bad idea to me.

                                                                                    2. 1

                                                                                      That doesn’t make it not a field; you don’t have to have a division operator at all to be a field, let alone a division operator that is defined to be multiplication by the multiplicative inverse.

                                                                                      1. 1

                                                                                        What is division?

                                                                                        1. 1

                                                                                          zeebo gave the same answer I would give: a / b is a multiplied by the multiplicative inverse of b when b is not zero. This article is all about how a / 0 is not defined and so, from an engineering perspective, you can define it to be whatever you want without losing the property that your number representation forms a field. You claimed that defining a / 0 = 1 means that your numbers aren’t a field, and all I’m saying is that the definition of the division operator is 100% completely orthogonal to whether or not your numbers form a field, because the definition of a field has nothing to say about division.

                                                                                          1. 1

                                                                                            What is an engineering perspective?

                                                                                            Also, this whole “a field definition doesn’t talk about division” is a bit of misunderstanding of mathematical idioms. The field definition does talk about division since “division” is just shorthand for “multiplicative inverse”. The reason the definition is written the way it is (excluding 0 from having a multiplicative inverse) is that giving zero a multiplicative inverse results in contradictions. When you say “ha! I won’t let that stop me! I’m going to define it anyway!” well, okay, but then either (1) you’re not definining a multiplicative inverse i.e. you’re not defining division or (2) you are defining a multiplicative inverse and you’re creating a contradiction.

                                                                                            1. 1

                                                                                              (I had a whole comment here, but zeebo is expressing themselves better than I am, and there’s no point in litigating this twice, especially when I feel like I’m just quoting TFA)

                                                                                              1. 1

                                                                                                Me too, I’m tapping out.

                                                                                1. 18

                                                                                  I suppose I know why, but I hate that D is always left out of discussions like this.

                                                                                  1. 9

                                                                                    and Ada, heck D has it easy compared to Ada :)

                                                                                    1. 5

                                                                                      Don’t forget Nim!

                                                                                    2. 3

                                                                                      Yeah, me too. I really love D. Its metaprogramming alone is worth it.

                                                                                      For example, here is a compile-time parser generator:

                                                                                      https://github.com/PhilippeSigaud/Pegged

                                                                                      1. 4

                                                                                        This is a good point. I had to edit out a part on that a language without major adoption is less suitable since it may not get the resources it needs to stay current on all platforms. You could have the perfect language but if somehow it failed to gain momentum, it turns into somewhat of a risk anyhow.

                                                                                        1. 4

                                                                                          That’s true. If I were running a software team and were picking a language, I’d pick one that appeared to have some staying power. With all that said, though, I very much believe D has that.

                                                                                        2. 3

                                                                                          And OCaml!

                                                                                          1. 10

                                                                                            In my opinion, until ocaml gets rid of it’s GIL, which they are working on, I don’t think it belongs in this category. A major selling point of Go, D, and rust is their ability to easily do concurrency.

                                                                                            1. 6

                                                                                              Both https://github.com/janestreet/async and https://github.com/ocsigen/lwt allow concurrent programming in OCaml. Parallelism is what you’re talking about, and I think there are plenty of domains where single process parallelism is not very important.

                                                                                              1. 2

                                                                                                You are right. There is Multicore OCaml, though: https://github.com/ocamllabs/ocaml-multicore

                                                                                            2. 1

                                                                                              I’ve always just written of D because of the problems with what parts of the compiler are and are not FOSS. Maybe it’s more straightforward now, but it’s not something I’m incredibly interested in investigating, and I suspect I’m not the only one.

                                                                                              1. 14
                                                                                            1. 1

                                                                                              The section on compatibility seems unfair. In the maximal (Cargo) approach, it is posited that since CI runs on everything, you have strong evidence that they work together. But in the minimal (modules) approach, no such consideration is given. The same argument applies there, but better: in the maximal approach, you have to rerun CI for every library that transitively depends on you when you release a new version, invalidating all previous runs. But, in the minimal approach, every CI run stays valid until someone explicitly updates to the new version AND pushes a new tagged version of their own library. Even then, only that new tagged version requires a CI run. No other libraries are affected. I think it’s reasonable to assume people won’t publish new tagged versions of their libraries with broken dependencies, and so you’re much more likely to get a compatible set.

                                                                                              In other words, assuming authors test their releases, the only way to get an untested configuration in the minimal world is if you combine multiple libraries together that share a transitive dependency. Even then, you know that at least one subset of libraries is a tested combination. The maximal world contains this failure mode and more: every time a package is published, every transitive dependency may get a broken combination, and you aren’t guaranteed that any of your libraries have been tested in combination.

                                                                                              1. 2

                                                                                                Sharing transitive dependencies is pretty common, at least in Rust world. As you pointed out, this cancels most of “you get configuration tested by author” advantage.

                                                                                                There is a tradeoff between getting configuration tested by author and getting configuration tested by ecosystem. Another tradeoff is between silently getting new bugs and silently getting new bugfixes.

                                                                                                1. 1

                                                                                                  I don’t believe it cancels out most of the advantage. The versions in the transitive dependencies also have to be different, which I believe will be rare due to how upgrades work. As for testing by the author vs testing by the ecosystem, all it takes is one library in the ecosystem depending on the same two libraries as you, and you get just as much ecosystem testing.

                                                                                                  I agree with the bug tradeoff. Personally, I prefer stability, but I can understand that others may have a different preference. I think in the minimal world, people who want updates can explicitly ask for that, and in the maximal world, it seems people are starting to add flags to allow the other direction (–minimal-versions).

                                                                                                  1. 1

                                                                                                    Sharing transitive dependencies with different versions is also common in Rust. Instead of making assertions, I probably should whip up a script to count and publish statistics, but exa/Cargo.lock should illustrate my point. exa is rather popular Rust command line application.

                                                                                                    How to read Cargo.lock: top section is serialization of graph so hard to read. Bottom section is checksum, sorted by package name and version. exa transitively depends on both num_traits 0.1 and 0.2, and winapi 0.2 and 0.3. This is typical.

                                                                                                    1. 1

                                                                                                      num-traits 0.1 actually depends on 0.2. None of the transitive dependencies there actually require 0.1 in a way that excludes 0.2 (only datetime requires ^0.1.35) as far as I can tell, so I see no reason it needs to be included in the build. Perhaps it’s included in the checksum for some other reason?

                                                                                                      edit: I have since learned that ^ on v0 dependencies only allows patch level updates. So ^0.1.35 means >=0.1.35, <0.2.

                                                                                                      Winapi 0.2 and 0.3 do appear to both be required in the build. This is due to the term_size crate using a ~0.2 constraint. While I do not have a windows machine to test right now, this commit bumped the version to 0.3. It was only a reorganization of imports, and I believe that all of the pub use entries in 0.3 would cover all of the old imports. I will test this out on a windows machine later tonight.

                                                                                                      None of the dependencies on v1 or greater require multiple versions. People tend to attempt to respect semver, so this is expected. Also note that out of 55 transitive dependencies, only those two libraries had multiple versions, and only one would possibly require any changes, and it was a v0 dependency. I believe this is also typical, and I have surveyed a large corpus of Go packages that use dep and had the same findings. Even if the tools allow stricter constraints, typically they weren’t needed.

                                                                                                      edit: Here’s a link to my analysis of these types of issues in the Go community: https://github.com/zeebo/dep-analysis

                                                                                                      edit edit: To be clear, currently in the Go community there is no easy or supported way to include multiple versions of the same package into your library or binary. The Rust community can, and so perhaps some norms around what types of code transformations are possible differ, causing it to happen more often. I think the fact that Go has been getting along fine without multiple versions is evidence that Go doesn’t need that feature as much, but does not imply that for Rust. I don’t mean to argue that minimal selection would be a fit for Rust, but I don’t think it has the problems that the post describes in the context of Go.

                                                                                                      1. 1

                                                                                                        Oops, you are right. num-traits is using so-called semver trick, which explains it better than I ever can. For crates using semver trick, it is indeed normal for num-traits 0.1 to depend on num-traits 0.2. A good way to think about it is that post-0.2 release 0.1.x deletes 0.1 implementation and provides 0.1 interface compatible shim around 0.2 implementation instead.

                                                                                                        lalrpop/Cargo.lock is probably a better example. LALRPOP is a parser generator and transitively depends on regex-syntax 0.4, 0.5, and 0.6, without semver trick. I admit this is not typical, but it is also not rare. Multiple versions support has been in Cargo since forever.

                                                                                                        Your dep analysis is fascinating. Thanks a lot for letting us know.

                                                                                                        1. 1

                                                                                                          Thanks for another example. In this case, the Cargo.lock is shared for a number of workspaces, so many of the duplications are not actually present in the same artifact. Additionally, many of the different versions are present only in build-dependencies for some artifact. I analyzed the dependencies and reduced the set of real duplications down to these:

                                                                                                          | workspace      | dependency   | duplicated    |
                                                                                                          |----------------|--------------|---------------|
                                                                                                          | lalrpop        | regex        | v0.8 and v1   |
                                                                                                          | lalrpop        | regex-syntax | v0.5 and v0.6 |
                                                                                                          | lalrpop        | unreachable  | v0.1 and v1   |
                                                                                                          | lalrpop-snap   | regex-syntax | v0.4 and v0.6 |
                                                                                                          | lalrpop-snap   | unreachable  | v0.1 and v1   |
                                                                                                          

                                                                                                          The first two duplications were fixed by upgrading docopt from 0.8 to 1.0. No code changes required, the tests passed, and would happen automatically with MVS. The third was fixed by upgrading string_cache from 0.7.1 to 0.7.3. Again, no code changes were required, the tests passed, and this would happen automatically. This also fixed the fifth duplication. The fourth duplication is the only one that caused any problems, as there were significant changes to regex-syntax between 0.4 and 0.5, and it directly depends on 0.4.

                                                                                                          So in this case, there was only one dependency issue that would not have been solved by just picking the higher version out of about 70 dependencies, and the one failure was in a v0 dependency. So, while I agree they exist, I just don’t think they will be frequent, nor a significant source of pain.

                                                                                                          In fact, the only times duplications happened were when “breaking” changes happened. The way the default version selector in Cargo exacerbates this by considering any minor version change in a v0 crate to be “breaking”. In only one example was it actually breaking, and in every other example, just using the largest version worked. In the MVS world, breaking changes require updating the major version, which will allow both copies to exist in the same artifact. So while sharing transitive dependencies is frequent, sharing transitive dependencies that do not respect semver is infrequent, and sharing transitive dependencies with incompatibilities within compatible semver versions is also infrequent, causing this to not be a problem in practice.

                                                                                                          1. 1

                                                                                                            and sharing transitive dependencies with incompatibilities within compatible semver versions is also infrequent, causing this to not be a problem in practice.

                                                                                                            I don’t think you can reach this conclusion. If someone were to do this analysis, time is a critical dimension that must be accounted for. I also think you aren’t doing a correct treatment of semver. Namely, if I were in the Go world, regex-syntax would be at v6 rather than v0.6, to communicate its breaking changes. Each one of those minor version bumps had breaking changes. It simply may be the case that some breaking changes are bigger than others, and therefore, some dependents may not be affected.

                                                                                                            With respect to time, there is often a period of time after which a core crate has a new semver release where large parts of ecosystem depend on both the new version (because some folks are eager to upgrade) and also the older version. For example, there was a period of time a ~year ago where some projects were building both regex 0.1 and regex 0.2, even when there were significant breaking changes in the 0.2 release. You wouldn’t observe this now because people have moved on and upgraded. So the collection of evidence to support your viewpoint is quite a bit more subtle than just analyzing a particular snapshot in time.

                                                                                                            (To comment on the larger issue, my personal inclination is that I’d probably like a world with minimal version selection better just because it suits my sensibilities, but that I’m also quite happy with Cargo’s approach, and really haven’t experienced much if any pain with Cargo that could be attributed to maximal version selection.)

                                                                                                            1. 1

                                                                                                              Thanks for explaining the v6 vs v0.6 distinction better than I was able to. I was trying to get at that with the “breaking” paragraph. Cargo implicitly treats all minor version changes in the v0 major range as “breaking” by making the valid limit only in the minor range, in the same way it treats major versions in the v1 and above range as “breaking”. I think this is a great idea, but muddies the waters a bit on comparing ecosystems with respect to multiple versions of transitive dependencies. Like you said, in a Go world, it would be regex-syntax at v4 and v6, which would both be allowed in the binary at the same time.

                                                                                                              About your point on talking about time, in a Go world, those would be regex v1 and regex v2, again, not causing any issues. I am claiming that it is rare that multiple versions of some package need to exist in the same artifact when they are within the same compatible semver range. For example, if both v1.2 and v1.3 are required in the binary at the same time. I agree an analysis through time is valuable, but rarity also depends on time, so sampling any snapshot will help estimate how often it happens.

                                                                                                              In order to get an estimate for how often multiple semver compatible dependencies occur, I went through the git history of the above projects and their Cargo.locks, but only counting duplicates if they are of the form v0.X.Y and v0.X.Z or vX.Y.Z and vX.S.T. Again, v0 gets this special consideration because of the way that Cargo applies the default constraint. In order to make sure that the authors of these libraries weren’t pinning to some possibly older but semver compatible range, I checked their Cargo.tomls for any constraints that were not of the default form.

                                                                                                              • LALRPOP had no such conflicts in 15 revisions back to 2015. Every constraint was of the default form.
                                                                                                              • exa had no such conflicts in 115 revisions back to 2014. Every constraint was either default or "*".

                                                                                                              There is no evidence in either of these repositories that at any time Cargo had to do anything other than pick the highest compatible semver version for any shared transitive dependencies.

                                                                                                              This discussion has helped me understand better that the v0 range is going to be problematic for the Go modules system if people treat is as I expect and is encouraged: as a spot for breaking changes and experimentation. Cargo handles this gracefully by allowing breakage to be signaled in the minor version, but Go has no such design consideration. I hope that either a change is made to make this easier, or guidance is made in the community to avoid the problems.

                                                                                                              1. 2

                                                                                                                I am claiming that it is rare that multiple versions of some package need to exist in the same artifact when they are within the same compatible semver range.

                                                                                                                Oh interesting, OK. I think I missed this! I think I would indeed say that this is consistent with my experience in the ecosystem. While I can definitely remember many instances at which two semver incompatible releases are compiled into the same binary, I can’t remember any scenario in which two semver compatible releases were compiled into the same binary. I imagine Cargo probably tries pretty hard to avoid that from ever happening, although truthfully, I can’t say that I know whether that’s a hard constraint or not!

                                                                                                                This discussion has helped me understand better that the v0 range is going to be problematic for the Go modules system if people treat is as I expect and is encouraged: as a spot for breaking changes and experimentation. Cargo handles this gracefully by allowing breakage to be signaled in the minor version, but Go has no such design consideration. I hope that either a change is made to make this easier, or guidance is made in the community to avoid the problems.

                                                                                                                Yeah that’s a good point. I can’t think of any libraries I’ve ever published (aside from maybe a few niche ones that nobody uses) that haven’t had to go through some kind of breaking changes before I was ready to declare an API as “stable.” Usually they only happen because other people start to actually use it. The Go ecosystem could technically just reform their conventions around what v1 means. IIRC, the npm ecosystem kind of pushes toward this by starting folks at 1.0.0 by default I think? But that may be tricky to pull off!

                                                                                              1. 3

                                                                                                I recently tried out Futhark on a whim, and it has been the most accessible way to get into GPGPU programming that I’ve found. I had a 40x speedup on one of my little programs with 40 lines of code. It’s still a v0, so a little rough around the edges sometimes, but in my experience the authors are super responsive on Github if you run in to any bugs.

                                                                                                It really is a gem, and I really hope it continues. I know I’ll be using it for the foreseeable future.

                                                                                                1. 1

                                                                                                  I recently found https://github.com/rs/jplot which isn’t limited to using characters, but is limited to iTerm2. You also might want to check out the https://github.com/gizak/termui library for inspiration on how to do higher resolution plots.

                                                                                                  1. 1

                                                                                                    I wonder if there’s a roadmap for this. There are some bits that still need work sprinkled throughout in the source and I’m not seeing a great description of the in memory and on-disk data structures to the point where one could do capacity planning (is it just protobuf dumped to disk?). I see benchmarks in the tests as well, but nothing to put the times and MB/sec in context that’s helpful enough to do capacity planning if I were to run them in a target environment.

                                                                                                    1. 1

                                                                                                      Thanks for the comments. I don’t really have a roadmap for it, but I think the most work remaining is fleshing out the UI and allowing for more kinds of queries (zooming in on specific quantiles or histograms, etc). The specific TODO you linked there is only an issue for the shutdown code: how long do you wait when doing a best effort dump to disk on exit? In general, I also tend to use the TODO comment more loosely to remind me about design tradeoffs in specific locations, in case those decisions aren’t optimal.

                                                                                                      Capacity planning is discussed here, but I agree that documentation around the disk layout would be good, as well as making that more prominent. The database tends to be very disk light in terms of I/O due to it only having to write on 10 minute (configurable) intervals, and only writing a small amount of data (around ~300 bytes per metric, regardless of number of observations). I added an issue to keep track of what you’ve identified.