1. 6

    @andyc Congratulations! One of the main reasons I stopped using osh was because of the performance (especially for tab-completion), so it’s great to hear that’s mostly been solved. I might try osh again soon :)

    I do hope that OSH gets better support for job control, when I stopped contributing it seemed like I was the only one using it, so no one noticed when it broke. There’s also quite a few issues that have been open for a while. I know it’s hairy but I do find it really useful.

    RE “writing it in Rust would be too much boilerplate” - I’m actually currently in the process of rewriting my parser in Rust, so I’ll let you know how it goes. I plan to use pratt parsing, not recursive descent, which should cut down on the amount of code: so far the most boilerplate by far has been the pretty printing (about 200 lines of code that could probably have been autogenerated). I think this would have been similarly long in any language, although I challenge others to prove me wrong. This of course will change as I make more progress with the parser. Right now I’ve only implemented binary expressions and postfix operators, but the hardest bit is parsing typedefs and function pointer declarations.

    1. 2

      I plan to use pratt parsing

      I agree pratt parsing being nice for expressions; but is it any better than plain recursive descent for statements?

      but the hardest bit is parsing typedefs and function pointer declarations

      I struggled with that, too. The Right Left Rule might be useful for you: http://cseweb.ucsd.edu/~ricko/rt_lt.rule.html

      1. 1

        Yes I’m optimistic Oil will be fast. So far we’ve translated 16K lines of code but that doesn’t include tab completion at the moment. For a variety of reasons like using yield that might be the last thing translated, but we can talk about it.

        I remember you had a problem with job control but I can’t find the bug right now. I know there are some other unresolved bugs like:

        https://github.com/oilshell/oil/issues/500

        Baisc job control works for me – I just tested the latest 0.8.pre2 release with vim -> Ctrl-Z -> fg. But there are other hairy parts that aren’t implemented, and probably won’t be without help because I’m a tmux user :-/ But I also may not have encouraged help there because I knew we were going to translate to C++. The code right now is best viewed as a prototype for a production quality shell. I expected it will be 3-5K lines of hand-written C++ and 30-50K lines of translated C++.


        We can talk about it on Zulip maybe but I don’t think pratt parsing is great for most full languages like Python or JS, only for “operator grammar” subset with precedence. Even despite writing a whole series on it!

        http://www.oilshell.org/blog/2017/03/31.html

        If the language is “normal” I don’t think Rust is a bad idea – after all plenty of parsers are written in Rust. Shell is an especially large language syntactically. It’s much smaller than Python and C++ in general, but the syntax is much larger and requires a big parser.

      1. 1

        Switches must be exhaustive. Because one of my main personal projects deals with strings, I’m often dealing with matching against them. With IRC, you only have a small number of message types you probably want to match on, but Rust enforces you to cover all cases.

        How does this work in Go? Will Go allow you to write a match statement that only encompasses a finite number of exact strings? What happens if the value to match on is a string that isn’t in the set, does it crash?

        Idiomatic rust would suggest creating an enum type to represent every IRC command you care about, converting the raw string to that type early on (or failing if the input string isn’t a valid IRC command), and then using that type in the rest of the code.

        1. 1

          Switches in Go have a ‘default’ case (which is optional / ‘noop if not specified’).

          There are type switches too, but I don’t think you could use subtyping with string enums the way you could in rust (I could be wrong, but I’ve done quite a bit of go and have never seen that sort of technique used).

          1. 6

            The reason why Rust cannot have the default case is that match in Rust is an expression while it is statement in Go. That mean that it is possible to do something like

            let foo = match value {
              Foo => 1,
              Bar => 2
            }
            

            In Go this would need to be written:

            var foo int
            
            switch value {
            case "foo":
              foo = 1
            case "bar":
              foo = 2
            }
            
            1. 4

              in Rust is an expression while it is statement in Go.

              That’s an interesting observation. I’m about to expand it; this is mostly for my own understanding, I don’t think I’m about to write anything you don’t already realize.

              Go’s switch constructs are imperative and each branch contains statements, which means every branch’s type is effectively Any-with-side-effects-on-the-scope.

              Rust’s match constructs are expressions, which means (in Rust) that every match arm is also an expression, and all must have the same type T-with-no-side-effects-on-the-scope.

              (Both languages are free to perform side effects on the world inside their match arms.)

              Then, if I understand what you’re getting at, statement blocks have an ‘obvious’ null/default value of ‘no-op, do nothing’, which is why Go’s compiler can automatically add a default handler ‘do nothing if no match’. If the programmer know that is the wrong default action, they must explicitly specify a different one.

              Types, on the other hand, have no notion of a default value. Which is why the Rust programmer must match exhaustively, and specify a correct value for each match arm. The compiler can’t add `_ => XXX_of_type_T’, because it cannot know what XXX should be for any type T.

              1. 3

                Yes, in theory it could use Default::default() if defined, but it is not defined for all values, so it would be confusing for the users. Also forcing exhaustive matching reduces amount of bugs, in the end you can always write:

                match value {
                  1 => foo(),
                  2 => bar(),
                  _ => unreachable!()
                }
                

                To tell compiler that value should be 1 or 2 and it can optimise rest assuming that this is true (and will result with meaningful error otherwise). unreachable!() returns bottom type ! which match any type (as it is meant for functions that do not return at all).

                1. 3

                  Small nit: unreachable!() doesn’t allow for optimizations, that’s unreachable_unchecked. On the other hand, _unchecked can cause undefined behavior if your expression can actually be reached.

        1. 2

          Very informative article. I have a question about this though!

          + ?Sized - The size of B can be unknown at compile time. This isn’t relevant for our use case, but it means that trait objects may be used with the Cow type.

          But isn’t B a str which doesn’t have a size known at compile time?

          1. 2

            Yep.

            1. 2

              str doesn’t have a known size, but &str does: it’s the size of a (ptr, len) pair.

              1. 3

                Yes, but that’s &B. The statement quoted is that the “B: ?Sized” bound doesn’t matter for the thing described. It does, as B is str.

            1. 13

              I wouldn’t say that strong typing removes the need for unit tests. You can have a well-typed function and a langauge that enforces that the function is only called with well-typed inputs; and still have that function perform the wrong business logic for your application without a unit test checking this.

              Can all the historical versions of all the events be deserialized correctly from the event store, getting converted to newer versions (or to completely different event types) as needed?

              Let’s assume that your event store stores values as some very generic format, like a list of bytes, and that your event type is some kind of complicated enum. Your deserialization function is then something like List byte -> Optional EventType - Optional, of course, because it doesn’t make sense that for every possible list of bytes, there will be a valid event value in your program that the bytes correspond to. The bytes comprising the ASCII-encodes declaration of independence are a well-typed input to this function, just as the actual bytes in your event store are. So you still need some way to check that you’re doing the business logic of decoding your bytes the right way. A unit test seems perfectly appropriate here. You might even want to have the ASCII-encoded version of the declaration of independence in your unit test, to be sure that your function returns None instead of trying to parse this as an event in your system for some reason.

              1. 4

                So, I do agree that type systems can never fully replace unit/integration tests; they can catch type errors but not logic errors.

                However, a good type system can turn logic errors into type errors. A great example of this is returning a reference to a local variable in a language without memory management: in C or C++ it’s completely legal (maybe with a warning), in Rust it’s a compile error. This isn’t unique to memory management: in C, char + float is a float; in Haskell (and most other functional languages, including Rust), adding Char to Double is a type error. One last example: I’m writing a Rust wrapper for a C library. Every C function I call can return PARAM_INVALID if the allocated buffer is null. The Rust function doesn’t even document the error because it’s not possible to have a null reference in Rust’s type system (also not unique to Rust, C++ has references too).

                My long winded point is that even though you always need tests, if you have a good type system, there are less things to test.

                1. 4

                  Curry-Howard Correspondence says that types ARE logic, so they definitely DO catch logic errors.

                  1. 3

                    That requires a powerful enough type system to represent a proof. Theoretically this is possible, and there definitely is value in using dependent typing and formal verification tools. But at the moment, with typical programming languages, only limited parts of the business logic can be represented with types.

                    Even today, with a bit of discipline, it is possible to make certain states impossible to represent in the program. This allows you to eliminate some unit tests, which is definitely a good thing, but we’re still far from the point of proving all of a program’s logic with the type system.

                    1. 1

                      I understand they don’t catch all errors because there are some you can’t encode in your logic, I’m just pointing out that “type errors are not logic errors” is totally incorrect!!

                    2. 1

                      I’m not terribly familiar with Curry-Howard, but Wikipedia says it states that

                      a proof is a program, and the formula it proves is the type for the program

                      I don’t see how that means that types can catch logic errors: the type is a theorem, not a proof. Furthermore, just because you can formalize code as a proof doesn’t mean it’s correct; mathematicians find wrong proofs all the time.

                      1. 3

                        If you declare some argument is a String and your code compiles, then the type checker has proven that the argument can only possibly be a String. Passing a non-String argument, as you could do in a dynamic language, is a logic error. You violate a precondition of your function, which only works on strings.

                        1. 1

                          Type checkers are logic checkers so you can’t really screw up your proof, only the theorems. Yes, this happens sometimes, but it IS a logic system.

                          1. 3

                            I think a better phrasing of skepticism is to ask what there is that can check whether you proved the right things.

                            Whether it’s done with tests or with types, at some point you are relying on the programmer to provide a sufficiently-correct formal specification of the problem to a machine, and if you declare that it should be done via types because the programmer is fallible and types catch things the programmer will mess up, you potentially trigger an infinite regression of types that check the code, and meta-types that check the types, and meta-meta-types that check the meta-types, and so on.

                            (of course, this is all very well-trod ground in some fields, and is ultimately just a fancier version of the old “Pray, Mr. Babbage…”, but still a question worth thinking about)

                      2. 2

                        See also https://www.destroyallsoftware.com/talks/ideology which goes much more into depth on this.

                    1. 2

                      Some Java GUI IDEs have something like this: you can create a component, add various events handlers, then copy paste the whole component into another menu and it will recreate all the handlers and layout. Excel will also let you copy paste rows/columns/grids within a spreadsheet. It walould be really cool to make that a standardized format so you can use it across applications.

                      1. 5

                        I’m looking forward to the rest in the series as I’m a fan of the author and everything they’ve done for Rust, however with only the first article out thus far which merely discusses components that may cause slow compilation it leads the reader in an overly negative direction, IMO.

                        Rust compile times aren’t great, but I don’t believe they’re as bad as the author is leading onto thus far. Unless your dev-cycle relies on CI and full test suite runs (which requires full rebuilds), the compile times aren’t too bad. A project I was responsible for at work used to take ~3-5ish minutes for a full build if I remember correctly. By removing some unnecessary generics, feature gating some derived impls, feature gating esoteric functionality, and re-working some macros as well as our build script the compile times were down to around a minute which meant partial builds were mere seconds. That along with test filtering, meant the dev-test-repeat cycle was very quick. Now, it could also be argued that feature gates increase test path complexity, but that’s what our full test suite and CI is for.

                        Granted, I know our particular anecdote isn’t indicative of all workloads, or even representative of large Servo style projects, but for your average medium sized project I don’t feel Rust compile times hurt productivity all that much.

                        …now for full re-builds or CI reliant workloads, yes I’m very grateful for every iota of compile time improvements!

                        1. 7

                          It is also subjective. For a C++ developer 5 minutes feels ok. If you are used to Go or D, then a single minute feels slow.

                          1. 4

                            Personally, slow compile times are one of my biggest concerns about Rust. This is bad enough for a normal edit/compile/run cycle, but it’s twice as bad for integration tests (cargo test --tests) which have to link a new binary for each test.

                            Of course, this is partly because I have a slow computer (I have a laptop with an HDD), but I don’t think I should need the latest and greatest technology just to get work done without being frustrated. Anecodatally, my project with ~90 dependencies is ~8 seconds for an incremental rebuild, ~30 seconds just to build the integration tests incrementally, and over 5 minutes for a full build.

                          1. 7

                            It’s not a majority opinion, but I believe some things should just be kept forever.

                            Sometimes they were deprecated in Python 2 like using Threading.is_alive in favour of Threading.isAlive to be removed in Python 3

                            Like this, for example. Is changing the spelling of something really worth breaking this for everyone?

                            1. 6

                              Yeah or just provide an alias and in the docs note that the snake case version is preferred or something.

                              I really really want to like Python. From a sys admin perspective its a fantastic language. One file scripts where you don’t have to deal with virtual env and its not gonna change much.

                              From a DevOps perspective (package creation and management, versioning, virtualenv, C based packages not playing well with cloud-oriented distros, stuff like this in doing language version upgrades) I’ve always found it to be a nightmare, and this kind of thing is just another example of that. I tried to install Ansible and am basically unable to do it on a Mac because no version of Python or Pip can agree that I have installed it and it should be in my path.

                              I don’t begrudge anyone who uses it or think it’s a bad language, that would be pretty obtuse, but I always avoid it when I can personally.

                              1. 7

                                This is what we do in Mercurial. Aliases stay forever but are removed from the documentation.

                                Git does this too. git diff --cached is now a perpetual alias for git diff --staged because the staging area has been variously called the cache and the index. Who cares. Aliases are cheap. Just keep them forever.

                                1. 2

                                  I didn’t even realize --cached was deprecated, I use that all the time.

                                  1. 3

                                    And that’s how it should be. You shouldn’t even notice it changed.

                                2. 4

                                  Yeah or just provide an alias and in the docs note that the snake case version is preferred or something.

                                  That’s exactly what was done: https://docs.python.org/2.7/library/threading.html - unfortunately, not everybody spots this stuff, and people often ignore deprecation notices.

                                  The real problem is that it’s difficult to write a reliable tool to flag and fix this stuff is you can’t reliably do type inference to figure out the provenance of stuff.

                                  I tried to install Ansible and am basically unable to do it on a Mac because no version of Python or Pip can agree that I have installed it and it should be in my path.

                                  Some of the things Homebrew does makes that difficult. You’re right: it’s a real pain. :-( I’ve used pipx in the past to manage this a bit better, but upgrades of Python can break the symlinks, sometimes for no good reason at all.

                                  As far as Ansible goes, I stick to the version installed by Homebrew and avoid any dependencies on any of Ansible’s internals.

                                  1. 1

                                    From someone who uses Ansible on a daily basis: the best way to use it is by creating a virtualenv for your Ansible project and keeping everything you need in there. My ‘infra’ project has a requirements.txt and a boostrap.sh that creates the virtualenv and install all dependencies. If you try to install Ansible at the system level, you are going to hate your life.

                                  2. 6

                                    Yeah, or at least bring in some compatibility libraries. Deprecating and removing things that aren’t actually harmful seems like churn for the sake of it.

                                    1. 3

                                      That (removing cruft, even if not harmful) was basically the reason for Python 3. And everyone agreed with it 10 years ago. And most people using the language now came to it probably after all these decisions have been made, and the need to upgrade to Python 3 was talked about for all this time. Now is simply not the time to question it. Also, making most of those fixes is easy (yes, even if you have to formally fork a library to do a sed over it).

                                      1. 1

                                        Those breaking changes came with a major version bump. Why not just wait until Python 4 to remove the cruft?

                                        1. 3

                                          There should ideally never be a Python 4: none of those deprecated bits are meant to be used in Python 3 code. They were only present to ease transition in the short term, and it’s been a decade.

                                          1. 2

                                            While there are people who prefer a semver-esque approach of bumping the major every time a deprecation cycle finishes, AFAIK Python has no plans to adopt such a process, and intends to just continue doing long deprecation cycles where something gets marked for deprecation with a target release (years in the future) for actually removing it.

                                            1. 1

                                              Python’s releases are never perfectly backwards compatible. Like most programming languages, old crufty things that have been deprecated for years are sometimes removed.

                                              1. 1

                                                That’s a shame. A lot of languages provide some mechanism for backwards compatibility, either by preserving it either at the source or ABI level, or allowing some kind of indication as to what language version or features the code expects. It’s nice to be able pick up a library from years ago without having to worry about bit rot.

                                                1. 2

                                                  It’s a library compatibility issue not a language compatibility issue. It’s been deprecated for a decade honestly there’s been plenty of time to fix it

                                                  1. 1

                                                    This particular library is part of the language. A decade is an awfully short time for a language.

                                                    1. 2

                                                      Python has never promised that it will eternally support every thing that’s ever been in the language or the standard library. Aside from the extended support period of Python 2.7, it’s never even promised to maintain something as-is on a time scale of a decade.

                                                      Python remains a popular and widely-adopted language despite this, which suggests that while you personally may find it a turn-off that the language deprecates things and removes them over time, there are other people who either do not, or are willing to put up with it for sake of having access to a supported version of the language.

                                                      This is, incidentally, the opposite of what happens in, say, Java, where the extreme backward-compatibility policy and glacial pace of adding even backwards-compatible new features tends to split people exactly the same way.

                                                      1. 2

                                                        In a semver-esque world, anything deprecated in 2 is fair game for removal in 3, of course (and if this particular thing was, then I concede). In that way Python 3 is a different language to Python 2, which I believe is how most folks consider it anyway. It’s just a shame that, apparently, you can’t write a Python 3 program and expect it to work with Python 3 in 10 years with no programmatic way of specifying which Python 3 it works in. Nobody would be any worse off if they just waited for Python 4 to clean up.

                                                        1. 2

                                                          If I write something today, that raises deprecation warnings today, I don’t expect to be able to walk away from it for ten years and then have it continue to work on the latest version. I expect that those deprecation warnings really do mean “this is going to change in the future”, and that I either should clean it up now to get rid of the warnings, or I’ll have to clean it up later if I want it to keep working.

                                                          That’s the kind of situation being discussed here – people who wrote code that already raised deprecation warnings at time of writing, and are surprised to find out that those warnings really did mean “this is going to change in the future”.

                                                        2. 1

                                                          Everyone who wants a Python that doesn’t change has been (and probably still is) using Python 2.7. I expect that we will see more pushback in the community against these sorts of changes as that becomes less tenable.

                                            2. 4

                                              It would be nice if Python had an equivalent of go fix. It’s just a pity things like that are difficult with dynamic languages.

                                            1. 10

                                              Objective reasons:

                                              • nulls are checked
                                              • resources are safe(r)
                                              • strongly typed
                                              • reasonably efficient in my hands, incredibly efficient in skilled hands
                                              • cargo is awesome, along with the other dev tools (is this subjective? I objectively don’t think so)
                                              • small runtime
                                              • wasm

                                              Subjective reasons:

                                              • it’s not C++
                                              • it’s not Haskell
                                              • I like the community
                                              • I like the lobster
                                              1. 2

                                                One more for the list: Result instead of exceptions makes it easy easier to see what can go wrong in a function. Compare that to python or even Java where any function can throw any exception and your only hope is to read the documentation.

                                                1. 1

                                                  Java does have checked exceptions though, which should have the same benefit. They’re not mandatory though.

                                                  1. 2

                                                    Java has checked exceptions but they’re annoying to use. I see a lot of try { return Integer.parseInt(s); } catch (NumberFormatException e) { throw new RuntimeError(e); }, especially in prototyping. In Rust errors are much easier to work with, you can implement functions on Result like .unwrap_or() or .and_then().

                                                1. 5
                                                  1. 3

                                                    Lol he forgot case 6:

                                                            case 5:
                                                                //Demo over
                                                                advancetext = true;
                                                                hascontrol = false;
                                                                state = 6;
                                                                break;
                                                            case 7:
                                                    

                                                    Seriously though I can’t imagine writing 4099 cases.

                                                    1. 2

                                                      It skips tons of numbers all over the place. E.g, goes from 2514 to 3000. Seems like much of it was intentional? Either way, there are way fewer than 4099 cases.. not that that makes it much better :).

                                                    2. 3

                                                      If that’s not “Doing Things The Hard Way”, I don’t know what is :-)

                                                      1. 3

                                                        Haha, reminds me of the thousand case switch statement in Undertale’s source. Game code really can be scary – it’s seems like that’s especially true for 2D games for some reason…

                                                    1. 1

                                                      I’m working on course materials for a new IoT course at my university, as well as hacking on side projects when I have the time.

                                                      I’m also trying to get back into a good work/sleep schedule for 2020, which is difficult, given that some of my most productive work hours seem to occur after midnight.

                                                      1. 1

                                                        Hey Philip! Let me know when you get those resources together. Charles said the project is going to be a webserver implemented in assembly, is that right? Sounds like a lot of fun.

                                                        I’ve found that my most productive time is either way early in the morning (before 8) or after dinner (after 8). During the day I don’t seem to get as much done, I couldn’t tell you why.

                                                      1. 1

                                                        Two projects, both in Rust:

                                                        For work, implementing a Rust wrapper around a NoSQL database API. The primary API is in C, so I’m learning a lot about unsafe and FFI. If anyone wants to try it out, I’m looking for feedback on how easy it is to use :)

                                                        For fun, I’ve been working on docs.rs. I tracked down a bug that’s been bugging me ;) for a month and a half, and it only took me an hour of relearning SQL!

                                                        1. 16

                                                          Even though I love Rust, I am terrified every time I look at the dependency graph of a typical Rust library: I usually see dozens of transitive dependencies written by Internet randos whom I have zero reason to trust. Vetting all those dependencies takes far too much time, which is why I’m much less productive in Rust than Go.

                                                          I try to also use the same level of scrutiny when bringing in dependencies in Rust. It can be a challenge and definitely uses up time. This is why the crev project exists, so that the effort can be distributed through a web of trust. I don’t think it has picked up a critical mass yet, but I’m hopeful.

                                                          Some projects (including my own) have also been taking dependencies more seriously, and in particular, by providing feature flags to turn off things to decrease the dependency count.

                                                          1. 9

                                                            Also more direct tools like lichking which can help you search for deps with licenses you don’t like.

                                                            1. 2

                                                              Indeed. I regularly use that on my projects with more than a handful of dependencies as a sanity check that there is zero copyleft in my tree.

                                                            2. 2

                                                              Some projects (including my own) have also been taking dependencies more seriously

                                                              One of my biggest pet peeves in Rust is duplicate dependencies. Docs.rs is the worst offender I build regularly (we currently compile 4 different versions of syn!) but it’s a problem throughout the ecosystem. I’ve opened a few different bugs but it usually gets marked as low priority or ‘nice to have’.

                                                              Part of the problem is that so few crates are 1.0 (looking at you, rand), but another part is that IMO people aren’t very aware of their dependency tree. I regularly see crates with 150+ dependencies and it boggles my mind.

                                                              Hopefully tools like cargo tree, cargo audit, and cargo outdated will help but there still has to be some effort from the maintainers.

                                                              1. 3

                                                                Works fine now.

                                                                1. 1

                                                                  Maybe it’s blocked by geolocation? Your profile says you’re from Russia :/

                                                                  1. 0

                                                                    And so? Why Microsoft should block Russians?

                                                                    1. 1

                                                                      Not saying they should, just that they may have decided to.

                                                                      1. 1

                                                                        Still a weird first assumption. Also the post prior to yours was from a russian saying it works now.

                                                                    1. 1

                                                                      That’s weird. Works for me, both that website and the linked PDF.

                                                                      1. 1

                                                                        Works here too.

                                                                      1. 3

                                                                        A more interesting question is, does the test code shown in the article (comparison of an uninitialized value to itself) invoke undefined behavior?

                                                                        1. 1

                                                                          yes it does. in this case it’s dereferencing an uninitialized pointer. if we would make the array global/static it would be dereferencing a null pointer.

                                                                          1. 1

                                                                            This is not correct. Uninitialized static arrays have every element initialized to zero, it’s fine to access any element of them: http://port70.net/~nsz/c/c99/n1256.html#6.7.8p10

                                                                            10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then: …

                                                                            • if it is an aggregate, every member is initialized (recursively) according to these rules;

                                                                            and furthermore

                                                                            21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

                                                                            Aggregate means struct or array (6.2.5, http://port70.net/~nsz/c/c99/n1256.html#6.2.5p21):

                                                                            21 Arithmetic types and pointer types are collectively called scalar types. Array and structure types are collectively called aggregate types.

                                                                            1. 3

                                                                              The example code shows a non-static (automatic) array with a declaration and no initializer. You’ll note that the sections you quoted refer to what happens when an array (or other aggregate) has fewer items specified in their initializer than the aggregate has members. In this case, there is no initializer. Indeed, this is undefined behavior.

                                                                              All the best,

                                                                              -HG

                                                                              1. 1

                                                                                You appear to only have read the 2nd quote in my comment. Here’s the first one again:

                                                                                If an object that has static storage duration is not initialized explicitly, then: …

                                                                                • if it is an aggregate, every member is initialized (recursively) according to these rules;

                                                                                I agree this UB for an array with automatic storage, though.

                                                                                1. 1

                                                                                  @jyn514, the array in the example code is not declared static.

                                                                              2. 2

                                                                                I stand corrected. in the case that the array has static duration, all elements are indeed initialized to zero. thank you.

                                                                                edit: and my whole sentence is wrong. a, the pointer to the head of the array is actually initialized, it is only the array elements that aren’t initialized in the case of automatic duration. so the UB is caused by using the elements and not by dereferencing.

                                                                              3. 1

                                                                                It has nothing to do with pointers. I’m talking about the uninitialized memory. Would the following code invoke UB ?

                                                                                int x;
                                                                                if (x == x)
                                                                                    ...
                                                                                

                                                                                I do not master the definition of UB, but it seems that if “int” can be initialized with a trap representation (highly unlikely in current implementations), it would be reasonable for this code to crash.

                                                                                1. 1

                                                                                  yes this is UB, as far as the standard is concerned the value of x is indeterminate, and using that value in any form is undefined behaviour.

                                                                                  from http://port70.net/~nsz/c/c99/n1256.html#J.2:

                                                                                  The value of an object with automatic storage duration is used while it is indeterminate

                                                                            1. 7

                                                                              People are all very quick to point out how great reducing friction is when they’re using a faster compiler, faster tests, when they’ve achieved fluency in vim or emacs or whatever, but everyone always jumps to tell you to just think harder if you say typing fast matters. I don’t get it.

                                                                              I would like to hear an “it doesn’t matter” post from someone who went from slow typing to fast.

                                                                              1. 3

                                                                                But all the ones you mention above are challenged! The Rust community has tons of people that hold the opinion that compiler speed doesn’t hold them back!

                                                                                And the point is also not “slow to fast”, the point is “if you are already mediocre at this, should you invest more time”?

                                                                                1. 1

                                                                                  The Rust community has tons of people that hold the opinion that compiler speed doesn’t hold them back!

                                                                                  I’m surprised to hear that, it’s been my experience that the slow compile times are very frustrating for rapid iteration because running the tests takes so long. I’m talking specifically about cargo test and similar, things that cargo check doesn’t help with because it only detects compile time errors.

                                                                                  I mentioned this a while back when cargo -Z timings came out, a surprising amount of the compile time after the first initial build comes from link times: https://internals.rust-lang.org/t/exploring-crate-graph-build-times-with-cargo-build-ztimings/10975/7

                                                                                  1. 2

                                                                                    Yep link time is non-trivial. lld can help here if you’re on a supported platform. E.g. here’s a 50% reduction in debug build time on a game. Here’s an example by the Embark game studio for how they speed up compile times with lld on Windows.

                                                                                    1. 1

                                                                                      Some people just don’t care about rapid iteration that much. If you do, yes, it’s a problem.

                                                                                  2. 1

                                                                                    This is an excellent point that I wish I had detailed more clearly in the post. It certainly feels odd that people will optimize so many of these other areas of friction, but then say that improving typing is not valuable.

                                                                                    1. 2

                                                                                      I’d rewrite it with those things in mind. There’s tangible benefits to improving typing and it is a valuable skill, but giving some guidance on what should trigger you to improve on typing would be great!

                                                                                  1. 13

                                                                                    Fascinating, I never would have suspected that the backslash would take “precedence” over the single-line comment. The trigraphs are are an interesting historical factoid but I’d have been just as stumped by a comment ending with a regular backslash.

                                                                                    1. 12

                                                                                      If that’s really true, then this is the real “WAT?” in my opinion, and making this an article about trigraphs only clouds the message - I kinda didn’t catch the comment-extension issue, because all the time I was thinking only “meh, who sane would even enable trigraphs in the first place?” (Kinda similar level of a practical joke as #define while if in my opinion. I.e. “yes, sure, theoretically there could be some extreme reason to use it in some code; now please revert this change, take your chair to the corner of the room, and write 100 times in your notebook ‘I will never do this again’.”.)

                                                                                      1. 8

                                                                                        #define while if

                                                                                        I would like to thank you for this advice.

                                                                                        1. 5

                                                                                          This is a also a lot of fun when someone has a Makefile with either a lot of CFLAGS or uses environmental variables for CFLAGS, because they a simple -Dwhile=if usually breaks everything.

                                                                                          1. 1

                                                                                            I like your style.

                                                                                      2. 6

                                                                                        Wikipedia mentions another interesting example:

                                                                                        /??/
                                                                                        * A comment *??/
                                                                                        /
                                                                                        

                                                                                        This actually is a valid block comment.

                                                                                        1. 5

                                                                                          Yes, deleting backslashes before newlines happens very early in the compiler, even before preprocessing: http://port70.net/~nsz/c/c99/n1256.html#5.1.1.2

                                                                                          The only thing that happens earlier is trigraph substitutions.

                                                                                        1. 3

                                                                                          Great article! I noticed you mentioned that f(sleep(7)) is legal but not why. This is described in 6.5.2: http://port70.net/~nsz/c/c99/n1256.html#6.5.2.2p6

                                                                                          If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined.

                                                                                          Notice how it never mentions how to determine the number of parameters (clearly you can’t find it from the prototype). This is a concession to pre-C89 code, which did not have function prototypes at all! I actually tried to compile some of my professor’s code from 1989 which was using this feature a while back.

                                                                                          Some more discussion of this misfeature here: https://github.com/jyn514/rcc/issues/61

                                                                                          1. 2

                                                                                            Wow, what an amazing contribution, thank you!

                                                                                            Does an integer promotion then also mean that the argument is automatically executed?

                                                                                            1. 3

                                                                                              I am not sure what you mean by automatically executed. The arguments to a function are evaluated in an unspecified order before the function is called, but integer promotion means something different: types are promoted to int or unsigned int before being passed, as per 6.3.1.1 (http://port70.net/~nsz/c/c99/n1256.html#6.3.1.1p2):

                                                                                              If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.

                                                                                              1. 2

                                                                                                The arguments to a function are evaluated in an unspecified order before the function is called, […]

                                                                                                I was actually referring to this when I said executed. Thank you!

                                                                                          1. -1

                                                                                            According to section 6.2.5.12, integers are arithmetic types. This, in combination with the second rule, makes that i will now be 0.

                                                                                            But the text you cite says: “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:”. I fail to see how i; is equivalent to static i;. As far as I can tell, i; being initialized to zero just so happened to be done by the compiler and/or resident memory, conveniently, but there’s no actual guarantee of that.

                                                                                            Plus i is implicitly (signed) int, so --i; is signed integer overflow, and also well into undefined behavior territory. I’d imagine a compiler at would be well in its rights to just optimize the entire function to nothing because UB occurs first thing, since once you hit UB, all bets are off.

                                                                                            1. 6

                                                                                              i has external linkage, and static storage duration.

                                                                                              C99 6.2.2p5

                                                                                              If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.

                                                                                              C99 6.2.4p3

                                                                                              An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.

                                                                                              The static storage-class specifier means that the linkage is internal or none, depending on whether the declaration is at file scope or block scope, and the storage duration is static. Objects with external linkage (for example extern int i;), or no linkage (for example static int i; at block scope), also have static storage duration.

                                                                                              1. 3

                                                                                                Thank you so much for taking the time to look up the relevant pieces in the standard!

                                                                                                1. 5

                                                                                                  No problem :) I spent a long time studying these pieces of the standard when writing cproc, and I know how tricky they are.

                                                                                                2. 2

                                                                                                  Drats, I actually got out-language-lawyered. Learned something new, cheers!

                                                                                                  1. 2

                                                                                                    Great link to an HTML version of the standard, I’ve been using the PDF and it’s much harder to navigate. I’m very impressed by your compiler too, it’s much further along than mine: https://github.com/jyn514/rcc.

                                                                                                    I noticed your compiler is a little inconsistent about functions without prototypes:

                                                                                                    $ ./cproc-qbe
                                                                                                    int f() { return 0; }
                                                                                                    int main() { f(1); }
                                                                                                    export
                                                                                                    function w $f() {
                                                                                                    @start.1
                                                                                                    @body.2
                                                                                                    	ret 0
                                                                                                    }
                                                                                                    <stdin>:2:17: error: too many arguments for function call
                                                                                                    $ ./cproc-qbe
                                                                                                    int f();
                                                                                                    int main() { return f(1); }
                                                                                                    export
                                                                                                    function w $main() {
                                                                                                    @start.1
                                                                                                    @body.2
                                                                                                    	%.1 =w call $f(w 1)
                                                                                                    	ret %.1
                                                                                                    }
                                                                                                    
                                                                                                    1. 3

                                                                                                      The difference between

                                                                                                      int f();
                                                                                                      

                                                                                                      and

                                                                                                      int f() { return 0; }
                                                                                                      

                                                                                                      is that the first declaration specifies no information about the parameters, and the second specifies that the function has no parameters. When calling a function, the number of parameters must match the number of arguments, so I believe the error message is correct here.

                                                                                                      C99 6.7.5.3p14

                                                                                                      An identifier list declares only the identifiers of the parameters of the function. An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters. The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of the parameters is supplied.

                                                                                                      C99 6.5.2.2p6

                                                                                                      If the number of arguments does not equal the number of parameters, the behavior is undefined.

                                                                                                      I’m very glad C2X is removing function definitions with identifier lists (n2432). So int f() { return 0; } will actually be the same thing as int f(void) { return 0; }.

                                                                                                      1. 3

                                                                                                        I missed 6.7, thank you! That makes things easier for me to implement I think :)

                                                                                                  2. 4

                                                                                                    I am always afraid to answer to detailed questions like these, because I am still learning a lot about C and not always sure. However, I believe that i actually has a automatic static storage duration, because it is defined outside of the scope of a function or whatsoever. This means that the variable will be persistent throughout the whole program.

                                                                                                    Edit: I said that i would have an automatic storage duration, but the arguments are for a static storage duration. This is what I actually meant to write. My apologies for the inconvenience.

                                                                                                    The second point is a good one, about which I responded earlier in a comment under my post. You depend on your compiler for that indeed. This line was written for gcc specifically however.

                                                                                                    1. 0

                                                                                                      I am always afraid to answer to detailed questions like these, because I am still learning a lot about C and not always sure.

                                                                                                      Very few people actually know C. Given that writing it is actually just an extremely elaborate exercise in language lawyering, it truly is a language only a lawyer could love. The worst that could happen is that you get corrected if you give a wrong response—meaning that you’d learn something from it.

                                                                                                      However, I believe that i actually has an automatic storage duration, because it is defined outside of the scope of a function or whatsoever. This means that the variable will be persistent throughout the whole program.

                                                                                                      Indeed so, that’s my understanding as well. But that also means that its initial value is indeterminate: Automatic storage donation is mutually exclusive with static storage donation. Therefore, you cannot actually get the zero-initialization you’d get from static storage duration, and instead the value is indeterminate because it has automatic storage donation.

                                                                                                      1. 6

                                                                                                        Every object declared at file scope has static storage duration. The only objects that have automatic storage duration are those declared at block scope (inside a function) that don’t have the storage-class specifier static or extern.

                                                                                                        1. 3

                                                                                                          The worst that could happen is that you get corrected if you give a wrong response—meaning that you’d learn something from it.

                                                                                                          I agree, which is why I always answer indeed. Thank you.

                                                                                                          Indeed so, that’s my understanding as well. But that also means that its initial value is indeterminate: Automatic storage donation is mutually exclusive with static storage donation.

                                                                                                          I changed my post above. I made a mistake while writing that, due to my lack of time. The fact that i is defined outside of the scope of a function, means that it is static, as mcf described

                                                                                                    1. 3

                                                                                                      Interesting way of using a crate to publish blog posts.

                                                                                                      1. 3

                                                                                                        There’s been even more inventive things in the past. Docs.rs allows arbitrary JavaScript (which is fine because it doesn’t have any authentication) so you end up with pages like https://docs.rs/pwnies/0.0.13/pwnies/

                                                                                                        1. 5

                                                                                                          I’m not convinced it’s that harmless. E.g. the pwnies crate could overlay a convincing fake docs.rs UI, and get you to download compromised source code if you follow manipulated links.

                                                                                                          1. 4

                                                                                                            Unfortunately it’s pretty hard to prevent such cases.

                                                                                                            Due to the nature of Rust builds (build scripts and proc macros can execute arbitrary code) all the HTML generated by rustdoc has to be treated as untrusted. Making things worse, rustdoc uses inline scripts and styles, so adding a CSP is probably not something we’ll be able to do. Even if rustdoc is tweaked to avoid emitting inline stuff, all the documentation generated in the past still uses those, and rebuilding everything from scratch is not really feasible anymore.

                                                                                                            We’re still trying to think about ways to prevent the issue, but we didn’t think of anything good yet. In the meantime, if you find something malicious hosted on docs.rs just hit the security team and we’ll remove it ASAP.

                                                                                                            1. 3

                                                                                                              There are crates like ammonia that will parse and sanitize HTML. This could be used on included HTML files and output from the markdown formatter.

                                                                                                              There are probably a few more holes in rustdoc from naive text-in-html concatenation, but these can be fixed by escaping.

                                                                                                              1. 1

                                                                                                                The problem is we can’t trust the output of rustdoc at all, as there are ways to bypass it completly if someone really wants.

                                                                                                              2. 1

                                                                                                                Put it in an iframe that’s (invisibly) hosted on a subdomain of a sandbox domain.. Crate-name.Sandbox-for-docs.rs

                                                                                                                Or use stuff like ammonia, bleach (python), dompurify (js) to sanitize bad stuff but keep “normal” html.

                                                                                                                1. 1

                                                                                                                  Put it in an iframe that’s (invisibly) hosted on a subdomain of a sandbox domain.. Crate-name.Sandbox-for-docs.rs

                                                                                                                  That was actually an idea I had a few weeks ago, but there are still a lot of open questions about UX and SEO we need to figure out before fully considering it.

                                                                                                                  Or use stuff like ammonia, bleach (python), dompurify (js) to sanitize bad stuff but keep “normal” html.

                                                                                                                  We can’t trust rustdoc to sanitize stuff.

                                                                                                              3. 2

                                                                                                                Well, downloading the source code from docs.rs is neither the easiest nor the safest way to get code … I’m not sure how likely that is in practice.

                                                                                                          1. 6

                                                                                                            My compiler had the craziest bug for a while. If you passed an array to a function and then indexed it specifically using *(p + offset), the resulting binary would segfault. Every other use case worked fine - if you used it in the original function, if you used p[offset], it was just this specific form that broke. The craziest thing is they’re handled by the same function in the compiler!

                                                                                                            It turned out that I had like 3 different errors that all cancelled each other out except in this exact case. I couldn’t believe it for a while, every time I fixed a bug it would break everything else. The way I figured it out was just writing down all the steps and convincing myself every step along the way was accurate, then going from there. That finally let me figure out what the original bug was - I was treating *(p+offset) the same as &p[offset], not the same as p[offset]!

                                                                                                            I don’t want to say how many hours of my life this cost me but you can kind of see it from the bug report: https://github.com/jyn514/rcc/issues/123

                                                                                                            1. 4

                                                                                                              Another bug I fixed by writing things down: For one of my classes I had to write a MIPS program to calculate the square root of a number in fixed point. I had the sqrt algorithm itself working, but the number was stored in hexadecimal format and for the life of me I couldn’t figure out how to convert it to decimal. We’d done something similar previously, but it only worked for integers.

                                                                                                              I had the idea to convert one part at the time: convert the integer part of the number to decimal, then convert the fractional part, then do the proper shifts and OR them together. This works well when the fraction is small but it breaks down when the fraction gets too big: 0xfffff (the largest possible fraction in our format) is more than 5 decimal digits, so when I OR’ed it it would overwrite parts of the integer.

                                                                                                              The professor gave us a different algorithm, but I had no idea how it worked. It went like this:

                                                                                                              • Multiply value by 10^5 (100000)
                                                                                                              • Shift 13 bits to the right
                                                                                                              • If bit 0 is set then add 2
                                                                                                              • Shift one bit to the right
                                                                                                              • Use the algorithm from the last lab

                                                                                                              I tried implementing this and it gave me the wrong answer every time, I had no clue what was going wrong. I wrote up a very lengthy email that looked something like this:

                                                                                                              I’ve tried multiplying by 10,000 both as fixed point (shifted left 14 bits) and without shifting, but neither works. Do you have any idea what I’m doing wrong? I know that it’s going wrong somewhere between here and bin2dec because $a0 is 0x0003a120 and it should be hex(5_00000) == 0x7a120. Actually now that I look at it it seems to only be missing the highest digit?

                                                                                                              I never sent that email. The second I compared the actual value to the expected I saw the problem - the highest bits after the multiplication were getting cut off! I drew some diagrams and after that the lab was easy :)

                                                                                                              Here are the diagrams:

                                                                                                              # keep the overflow!
                                                                                                              # |     hi (t2)     |       |     lo (t1)      |
                                                                                                              # |_________________|       |__________________|
                                                                                                              #   18 - O  | 14 - I         4-I| 14-P | 13 - E
                                                                                                              #            --------------------------          <-- this is what we've currently got in $v0
                                                                                                              # We've kept I and P, but since we're shifting right 13 bits, we need to get those overflow b
                                                                                                              its out of hi
                                                                                                              # |     hi (t2)     |       |     lo (t1)       |
                                                                                                              # |_________________|       |___________________|
                                                                                                              #  14-Z | 5-X |13 - O          18 - I  | 14 - P
                                                                                                              #              -------------------------- <-- this is what we want to keep (1 bit of P)
                                                                                                              # Z - zeros
                                                                                                              # X - discarded
                                                                                                              # O - overflow
                                                                                                              # I - integer
                                                                                                              # P - original precision
                                                                                                              # E - extra precision