1. 4

    Here is mine. This is Emacs using my own custom theme and using a cool mode called purpose that ensures that when I open text files, they go on the left, and the magit, compilation, and shell windows stay where they are.

    1. 5

      Put this in your .emacs and then replace Item 5 with “Use C-x n to get a new scratch buffer at any time. Careful, scratch buffers aren’t auto-saved or backed up, nor recovered in any after power loss (cf. regular file buffers).”

      (defun create-new-scratch-buffer ()
        "create a new scratch buffer to work in. (could be *scratch* - *scratchX*)"
        (let ((n 0)
          (while (progn
                   (setq bufname (concat "*scratch"
                                         (if (= n 0) "" (int-to-string n))
                   (setq n (1+ n))
                   (get-buffer bufname)))
        (switch-to-buffer (get-buffer-create bufname))
        (if (= n 1) initial-major-mode))) ; 1, because n was incremented
      (global-set-key (kbd "C-x n") 'create-new-scratch-buffer)
      1. 7

        Careful, C-x n will shadow the narrow functions. If you use those, you may want to pick a different shortcut.

        1. 2

          This is a good point that is often ignored when people share or recommend customizations. I use a lot of the functionality of the standard keybindings and got frustrated with trying to shift things around. I eventually gave up and went with a custom keymap reachable via the mode-specific-map, which is always there.

          (defvar woz-map (make-sparse-keymap))
          (define-key mode-specific-map (kbd "a") woz-map)
          (define-key woz-map (kbd "n") 'create-new-scratch-buffer)

          Now all the extra functions I want on keys are reachable via C-c a, which is pretty easy to type.

          1. 2

            One thing I do to avoid shadowing is to map my custom stuff to Hyper combos, and remap Caps Lock to Hyper.

            1. 1

              Aha! Here is a method to avoid this in the future:

              C-x n F1

              In fact, any prefix followed by F1 will show the members of the prefix family. Thanks, #emacs!

              1. 1

                Hm, C-x n isn’t bound in my emacs -Q… I’m using GNU Emacs 26.0.50 (build 4, x86_64-w64-mingw32) of 2017-08-07.

                OH WHAT? C-x n n is bound! TIL: https://www.emacswiki.org/emacs/Narrowing

                1. 2

                  Yeah, C-x n is the prefix for a number of keybindings:

                  Global Bindings Starting With C-x n:
                  key             binding
                  ---             -------
                  C-x n d         narrow-to-defun
                  C-x n n         narrow-to-region
                  C-x n p         narrow-to-page
                  C-x n w         widen

                  I use narrow-to-defun and widen daily.

              2. 3

                This is pretty selfexplanatory and I was shocked how little time it took to make use of this hot key habitual. I actually use this key when I can’t remember some other custom keybinding, because, well.. you’ll see:

                ;;open init file (.emacs) with F7
                (defun edit-init-file ()
                  (find-file user-init-file))
                (global-set-key (kbd "<f7>") 'edit-init-file)
                1. 2

                  Thanks, I love it! 👍👍

                  1. 1

                    I have this snippet from around 2009 but I’m going to try yours.

                    ; Drew Adams's suggestion for fast scratch buffer
                    (defun create-scratch-buffer nil
                      "create a scratch buffer"
                      (switch-to-buffer (get-buffer-create "*scratch*"))
                  1. 16

                    This change takes a concise and clear functional-style implementation and updates it to a long, complex, and apparently buggy imperative-style.

                    True, but the numbers in the PR indicate that the new version was faster as n grew. I don’t know how often str::repeat is used, but — and this ties with the point you make about values — when I use standard library functions in Rust, I expect that they be as performant as is possible.

                    What does the community think of the story of this bug? Is there more to it?

                    Every software project has stories like this. In fact, I believe that one of the remote holes in the OpenBSD install was exactly this, an integer overflow in a calloc. What’s important is that the project react quickly and responsibly; I feel that the Rust team did that here.

                    When are optimizations which reduce readability justified?

                    In the standard library, I think they are almost always justified, because that code will be executed much more often than it’ll be read. If not, you might end up like OCaml with 2-3 competing standard libraries.

                    It’s a shame that there was an issue with str::repeat, but for Rust, this is an uncommon scenario so far. How many functions were rewritten from a simple, but slower, style to a more complex, but faster, style without problems? Hopefully, the story of str::repeat is the zebra.

                    What values should a programming language’s standard library prioritize?

                    That depends on the programming language. In Rust, I think that safety and speed are the qualities that should be optimized. This has led the design of Rust for many years — before even 0.1 was released, Rust was GC’ed, but that was removed in favor of the current ownership system, because a GC did not meet the performance expectations that people had of Rust.

                    1. 7

                      In fact, I believe that one of the remote holes in the OpenBSD install was exactly this, an integer overflow in a calloc.

                      Kind of, but the opposite. Caller in sshd hand multiplying numbers to malloc instead of using calloc.

                      1. 1

                        you might end up like OCaml with 2-3 competing standard libraries.

                        Is that the story of how the split happened? Inertia due to readability concerns?

                        1. 6

                          The Jane Street standard library partly came about because common list functions (e.g., map or filter) in the OCaml standard library are not tail-recursive. For many functions, straight recursion is simpler than tail-recursion, but the folks at Jane Street felt that avoiding stack overflow errors was justification enough to have a slightly more complex implementation.

                          1. 1


                      1. 12

                        For the C version, this is a binary search tree

                        Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet

                        So in the end, this compares a particular C implementation of an algorithm and a particular Rust implementation of a different algorithm ?

                        1. 22

                          I think the implicit principle you’re appealing to has consequences you might not appreciate. You can write unsafe code in Java, you can use object pools, you can roll your own string class, you can avoid ever triggering garbage collection, you can write custom bytecode. You can solve a problem “in Haskell” by writing Haskell code that compiles a custom DSL to machine code. All these things are sometimes reasonable, and it is important that you know about them in order to understand the options that you have when trying to meet some performance target.

                          However, performance is almost always an economic consideration, rather than a question of what is theoretically possible. A language may be better suited to achieving certain performance because it is easier to do in that language, even though it is possible in another.

                          Having put it in my words, let me quote extensively from Brian’s post, because I can’t help but think that he already answered you.

                          To the contrary, I think that it is a reasonable assumption that, for any task, a lower-level language can always be made to outperform a higher-level one.

                          Indeed, it might be tempting to conclude that, because a significant fraction of the delta here is the difference in data structures (i.e., BST vs. B-tree), the difference in language (i.e., C vs. Rust) doesn’t matter at all.

                          But that would be overlooking something important: part of the reason that using a BST (and in particular, an AVL tree) was easy for me is because we have an AVL tree implementation built as an intrusive data structure. This is a pattern we use a bunch in C….Implementing a B-tree this way, however, would be a mess. The value of a B-tree is in the contiguity of nodes — that is, it is the allocation that is a core part of the win of the data structure. I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation (and therefore a more complicated and more error-prone interface) that I do imagine that it would have you questioning life choices quite a bit along the way.

                          Contrast this to Rust: intrusive data structures are possible in Rust, but they are essentially an anti-pattern….All of this adds up to the existential win of Rust: powerful abstractions without sacrificing performance. Does this mean that Rust will always outperform C? No, of course not. But it does mean that you shouldn’t be surprised when it does — and that if you care about performance and you are implementing new software, it is probably past time to give Rust a very serious look!

                          1. 3

                            I’m missing a step in his argument. He goes from “it is convenient to use intrusive data structures to implement AVL trees in C” (paraphrase) to “it would not be convenient to implement a B tree using intrusive data structures in C” to “Rust is better because it discourages use of intrusive data structures”. But then, assuming he is correct, why do we have a requirement that the C B tree use intrusive data structures?

                            1. 2

                              I think he’s implying that C leads itself to intrusive data structures in these cases, but I don’t know enough to have my own opinion about whether that’s the case.

                              My point is really just that I don’t think he’s making an obvious mistake about how to compare languages, like the OP suggested.

                              1. 0

                                Well, he writes:

                                I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation …

                                I think he had a C AVL tree and a Rust B tree and then came up with an excuse why comparing the two was reasonable.

                            2. 2

                              Thank you for the clarifications. It looks like there are different ways to read the same title. The actual content is interesting and not as ambiguous.

                              1. 3

                                I think you’re right that the title suggests a comparison that’s not really in the post.

                                I don’t know what a better one would be. We could go 19th century: “Some considerations on the relative performance of C and Rust programs when written naturally, and how the language might contribute to those differences”, but it’s a mouthful.

                                1. 2

                                  “Two programs with differing performance”

                            3. 1

                              I’m with you. Comparisons of languages talking about performance should be done with same algorithms where possible. Otherwise, it’s harder to say what differences are due to algorithm differences instead of language. Essentially, we want to eliminate variables but author introduced them. The explanation I saw in a quick skim is this:

                              “For the C version, this is a binary search tree (an AVL tree), but Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet, which implements a B-tree.”

                              Author used a B-tree, not AVL tree, for Rust since Rust didn’t have an AVL tree. If comparing apples to apples, the more obvious route is to use one of the many B-tree implementations in C to compare against Rust’s B-tree. One might also look at structure to see if they’re similar in implementation to further eliminate variable of algorithmic differences. That would be my performance comparison. Then, I’d look at developer effort of implementation or its maintenance (esp readability) possibly comparing some variations in C and Rust along that spectrum, assessing effect on performance.

                              “I reimplemented a body of C software in Rust, and it performed better for the same task; what’s going on? And is there anything broader we can say about these results?” (my emphasis)

                              I wouldn’t believe anything broader based on this single, case study. All it tells me is that a program (a) ran fast at specific task in Rust and (b) it was faster than a C program with different algorithm using a specific, compiler configuration. That’s literally all it proves. This case study is useful if you’re doing something very similar, know Rust’s benefits, and want to assess if your app could perform well if written in Rust.

                              On good note, it also has some tool mentions and deep analysis that people might enjoy reading and/or find useful. Those seem to be practical benefits of reading this article.

                              1. -2

                                Ugh. I got 1/4 into the post to this point and shouted outloud, “well, you can’t compare that!”

                                title should be, “the relative performance of avltree and btree for a particular application”

                                1. 20

                                  The sad thing is that this is an interesting blog post, yet almost all the discussion across several sites is over two words in the damn title.

                                  Don’t discuss the data structures themselves, the cache effects, the large difference in split loads/stores, the use of DTrace, the demonstration of active benchmarking, or anything remotely technical. :(

                                  Blog post bike-shedding.

                                  1. 19

                                    Of course you can; in fact, I much prefer those kind of comparisons, because they are more telling!

                                    It is way more interesting to know how well a program performs when you write it as it’s meant to be written in the implementation language, rather than twisting and deforming it, because you want to replicate exactly what you did in another language (which Bryan tried to do if you read his previous post). Rust provides generic data structures out of the box; the right way to write programs in Rust is to use those data structures, not to tangle yourself in a mess of intrusive pointers. Bryan wrote a program that is (a) idiomatic Rust, (b) faster than idiomatic C, and that’s very interesting to me. If you’re still pissed that the implementations are different, his code is open source: go implement the Hashmap and/or the BTree in C.

                                    1. [Comment removed by author]

                                      1. 9

                                        Why not? He’s comparing the time it takes for two programs to perform the same task. That’s performance evaluation.

                                1. 4

                                  I think his new design looks great! I’m also pretty happy that he does not impose a font on me; instead, he just declares the font-family to be Monospace (with a fallback to Courier) and that means that his website is displayed on my end with my monospace font of choice. (At the moment, that happens to be Input Mono.)

                                  1. 13

                                    You might find this interesting, it’s an attempt to predict bugs by language features. Unsuccessful, but still interesting enough for me to finish.


                                    1. 5

                                      edit: Hey that is actually really cool and interesting, (the point about clojure is interesting too). It is also a pretty smart way to gather data in a situation where it is normally extremely hard to do so.

                                      Something I just read today too - less about bugs, but more about robustness


                                      1. 3

                                        Thanks! Good link too.

                                        Speaking of which, I highly recommend learning Haskell. It’s a lot of work, but it’s really changed how I think about programing. I would absolutely go back and do it again. It really makes the easy things hard (tiny scripts) but the hard things easy. Very much worth learning in my mind.

                                      2. 1

                                        While Tail Call Optimization would certainly be nice to have in Go to improve performance, in practice it’s not a cause of defects because people just use iteration instead of recursion to accomplish the same thing. It doesn’t look as “nice” but you don’t get stack overflows.

                                        1. 1

                                          Arguably that could be said of all the things on that list. Every programming language community has idioms to best use the available feature set.

                                          Specifically for recursion, I was assuming that the mental shuffle to convert something from recursion (often more elegant and simple) to iteration would cause issues. Since the whole model doesn’t work very well, I clearly was wrong in multiple places, and this very well could be one.

                                          1. 5

                                            Specifically for recursion, I was assuming that the mental shuffle to convert something from recursion (often more elegant and simple) to iteration would cause issues.

                                            I could be wrong, but I suspect most developers find iterative algorithms more straightforward to write iteratively, not recursively, and consider writing them recursively a mental shuffle.

                                            It wouldn’t surprise me if comfort with recursive algorithms is a predictor of developer proficiency, though.

                                            1. 2

                                              You are probably right, but I’d guess now that is more because most developers work in languages that don’t support recursion. Originally, I was also going for the idea that it offers a way for the developer to make a mistake without realizing it. In this case, they don’t realize that recursion isn’t tail optimized, since the language allows it without warning. But since I have yet to see anyone use recursion unless they are used to languages with immutability (and even then they probably just use fold), it probably doesn’t come up much.

                                              As such, it probably makes sense to remove that item, which doesn’t change much, just slightly raises the “c-style” languages and lowers the “lisp-style”.

                                              1. 2

                                                but I’d guess now that is more because most developers work in languages that don’t support recursion.

                                                Most people think about problem solving in an iterative way. They’ll do this, then this, maybe this conditionally, and so on. Imperative. Iterative. Few people think of their problems in a recursive way without being taught to do so. That’s probably why most prefer iterative algorithms in programming languages.

                                                1. 3

                                                  To fully shave the yak, I’d argue this is entirely a product of how programmers are taught. Human thinking doesn’t map perfectly to either format. Recursion is just saying, “now do it again but with these new values”, and iteration requires mutations and “storing state”. Neither are intuitive - both need to be learned. No one starts off thinking in loops mutating state.

                                                  Considering most programmers learn in languages without safe recursion, most programmers have written way more iterative loops so are the most skilled with them. That’s all, and this isn’t a bad thing.

                                                  1. 3

                                                    They might not either be very intuitive. Yet, educational experience shows most students pick up iteration quickly but have a hard time with recursion. That’s people who are learning to program for the first time. That indicates imperative, iterative style is closer to people’s normal way of thinking or just more intuitive on average.

                                                    Glad we had this tangent, though, because I found an experimental study that took the analysis further than usual. I’ll submit it Saturday.

                                                    1. 1
                                                    2. 2

                                                      I agree. And I think there’s a lot that just isn’t possible with that mindset.

                                                2. 3

                                                  Specifically for recursion, I was assuming that the mental shuffle to convert something from recursion (often more elegant and simple) to iteration would cause issues.

                                                  I think it really depends on the algorithm. To my understanding, mapping and filtering is a lot easier recursively, but reducing and invariants tend to be easier iteratively.

                                                3. 1

                                                  I think I remember reading Russ Cox doesn’t like tail recursion because you lose the debug information in the stack traces.

                                                  1. 2

                                                    This is a big pet peeve of mine: because many languages use pointers in stack traces, you can’t see what the values were at that time. I think storing the value instead of just the pointers would be expensive, but it sure would be useful.

                                                    1. 1

                                                      What information would you lose?

                                                      1. 2

                                                        I think that in this example, you’d think that initial directly called final:

                                                        def initial():
                                                        def intermediate():
                                                        def final():
                                                            throw "boom"

                                                        This could make it extremely hard to debug if intermediate happened to modify state and it was the reason why final was failing.

                                                        1. 1

                                                          I think the call stack may be convenient for this purpose, but not necessary. I’m sure there are other (potentially better & more flexible) ways to trace program execution.

                                                1. 8

                                                  Some other factors to consider:

                                                  1. Ease of using UTF-8 strings; I do not use OCaml much anymore, and this is one of the reasons. My brother’s name is Jérôme, not Jérôme.
                                                  2. Ease of concurrency and parallelism; many (most?) programs start off sequential, and at some point, someone wants to extract more out of the machine by making the program parallel.
                                                  3. Ease of error handling: I guess this is related to security, but if error handling is difficult or too verbose, there’s a very strong tendency to avoid doing it.
                                                  4. Ease of extracting more performance: some languages offer good performance by default, but with C, C++, and Rust you can actually go and get more performance by optimizing your data structures for cache.
                                                  1. 2

                                                    That surprises me about OCaml since it was originally created in France.

                                                    1. 6

                                                      They use latin-1 which has all the accented letters we use in French (à, â, é, è, ê, ë, î, ï, ù, û, ç).

                                                  1. 5

                                                    I’m slowly, painfully beating this over-generalization habit out of my system. It’s not easy, but I much rather think “why did I not generalize this?” when it turns out that there are other similar cases than think “OMG why did I generalize this!?” when there is only the one case.

                                                    An easy first step is to follow The Rule of Three; do not abstract/generalize until you have seen three difference instances of the same problem.

                                                    1. 10

                                                      This might be a stupid question, but why not less instead of cat? Cat is not really meant to display files but to concatenate them. I’d definitely like a better pager, but neither w3m or vi in view mode worked for me, so i’m still using less

                                                      1. 5

                                                        Cat is still cat. Bat is like the kitchen sink that just clobbers more? Yeah, I don’t quite understand why this tool is positioning itself relative to cat.

                                                        It is definitely not a clone. But I am all for new, more usable terminal based tools that use what we have at our disposal, more cores, more ram, ssd read/write.

                                                        I’d really like a tool that built an n-gram inverted index of all files below my current dir and allowed me to jump to anything, that showed similar token/phrases/lines, etc. All terminal driven, with an option to load a rich GUI over a local http connection.

                                                        1. 3

                                                          Although I agree with you, I can see why this would be positioned as an alternative to cat.

                                                          Quite a lot of people use cat to preview files, and that’s what bat does. I know less and more exist, but for some reason I still find myself using cat. Perhaps other people do the same.

                                                          1. 4

                                                            I use cat because, if I’m lucky, the output will fit in my terminal, and I’m ready for my next command; with a pager, I need to press a key to exit out of the pager before I can issue my next command, and the output disappears (in the case of less) as if the pager never ran, so I can’t keep context around.

                                                            1. 11

                                                              By the way, less can be configured to act well in these circumstances. I have more as an alias for less -FX. Those two options to less are:

                                                                     -F or --quit-if-one-screen
                                                                            Causes less to automatically exit if the entire file can be displayed on the first screen.
                                                                     -X or --no-init
                                                                            Disables sending the termcap initialization and deinitialization strings to the terminal.  This is sometimes desirable if the deinitialization string does something unnecessary, like clearing the screen.

                                                              I also define $PAGER to be less -FX, so manpages and the like don’t clear the screen once the pager quits.

                                                              1. 5

                                                                I second this. -c would be helpful in $PAGER as well so that everything above on your screen stays untouched.

                                                                Personally, I’ve been rolling with this:

                                                                $ type le
                                                                le is an alias for 'less -FcmNqX --follow-name'
                                                              2. 2

                                                                the output disappears (in the case of less) as if the pager never ran, so I can’t keep context around.

                                                                If you want to get rid of this sort of behaviour globally, disable “alternate screen” in your terminal.

                                                                In tmux, set-window-option -g alternate-screen off. In putty there’s Disable switching to alternate terminal screen under Terminal → Features.

                                                                1. 1

                                                                  Just tested bat, and the output doesn’t disappear. When you need a specific section from a file (not the whole thing) - using bat over cat and not less makes sense. Neat.

                                                              3. 2

                                                                Bat will act like cat if it determines there is no tty. Thus, bat is like less when used interactive and like cat when scripting.

                                                                Like someone else said, people use cat to dump contents to the terminal so they can refer to it while continuing work.

                                                                1. 2


                                                                  Oh.. you can also use it to concatenate files 😉. Whenever bat detects a non-interactive terminal, it will fall back to printing the plain file contents.

                                                              1. 19

                                                                Interesting, though I expected a manifesto for minimalists to be more minimal. :)

                                                                1. 15

                                                                  1.4MB for a minimal manifesto page :) LOL!

                                                                  1. 8

                                                                    That manifesto needs to follow this manifesto: http://brandon.invergo.net/news/2013-03-10-Anti-web-design-Manifesto.html

                                                                    1. 3

                                                                      I clicked on one of the topics at the top, the page scroll to the topic in question, but I couldn’t use the back button to go back up. How about <a href="#fight-for-patero">?

                                                                    1. 8

                                                                      I took me many years, but I’ve finally accepted the rule of three; I’ve burnt myself repeatedly on my poorly thought-out abstractions that support a single actual use case.

                                                                      1. 1

                                                                        This is exactly how I work, and I went through a similar process to the article writer in coming to learn that premature abstraction fails for the same reasons premature optimisation does: before you have enough data to analyse (in the form of code that executes slowly or obvious refactoring candidates), you’re working blind, and it’s sheer luck if what you decided to optimise or abstract would’ve been a hot path or good model. Having only two similar things isn’t enough to predict what the “pivot points” of an abstraction over them and future things might be.

                                                                      1. 4

                                                                        Hey there, I’m Vincent. My writing output is not very stable — I can go months without a post — but here is my small blog: https://vfoley.xyz

                                                                        1. 4

                                                                          As someone who never used Rust I want to ask: does the section about crates imply that all third-party libraries are recompiled every time you rebuild the project?

                                                                          1. 6

                                                                            Good question! They are not; dependencies are only built on the first compilation, and they are cached in subsequent compilations unless you explicitly clean the cache.

                                                                            1. 2

                                                                              I would assume dependencies are still parsed and type checked though? Or is anything cached there in a similar way to precompiled headers in C++?

                                                                              1. 10

                                                                                A Rust library includes the actual compiled functions like you’d expect, but it also contains a serialized copy of the compiler’s metadata about that library, giving function prototypes and data structure layouts and generics and so forth. That way, Rust can provide all the benefits of precompiled headers without the hassle of having to write things twice.

                                                                                Of course, the downside is that Rust’s ABI effectively depends on accidental details of the compiler’s internal data structures and serialization system, which is why Rust is not getting a stable ABI any time soon.

                                                                                1. 4

                                                                                  Rust has a proper module system, so as far as I know it doesn’t need hacks like that. The price for this awesomeness is that the module system is a bit awkward/different when you’re starting out.

                                                                                2. 1

                                                                                  Ok, then I can’t see why the article needs to mention it. Perhaps I should try it myself rather than just read about its type system.

                                                                                  It made me think it suffers from the same problem as MLton.

                                                                                  1. 4

                                                                                    I should’ve been more clear. Rust will not recompile third-party crates most of the time. It will if you run cargo clean, if you change compile options (e.g., activate or deactivate LTO), or if you upgrade the compiler, but during regular development, it won’t happen too much. However, there is a build for cargo check, and a build for cargo test, and yet another build for cargo build, so you might end up still compiling your project three times.

                                                                                    I mentioned keeping crates under control, because it takes our C.I. system at work ~20 minutes to build one of my projects. About 5 minutes is spent building the project a first time to run the unit tests, then another 10 minutes to compile the release build; the other 5 minutes is spent fetching, building, and uploading a Docker image for the application. The C.I. always starts from a clean slate, so I always pay the compilation price, and it slows me down if I test a container in a staging environment, realize there’s a bug, fix the bug, and repeat.

                                                                                    One way to make sure that your build doesn’t take longer than is needed to is be selective in your choice of third party crates (I have found that the quality of crates varies a lot) and making sure that a crate pays for itself. serde and rayon are two great libraries that I’m happy to include in my project; on the other hand, env_logger brings a few transitive libraries for coloring the log it generates. However, neither journalctl nor docker container logs show colors, so I am paying a cost without getting any benefit.

                                                                                    1. 2

                                                                                      Compiling all of the code including dependencies, can make some types of optimizations and inlining possible, though.

                                                                                      1. 4

                                                                                        Definitely, this is why MLton is doing it, it’s a whole program optimizing compiler. The compilation speed tradeoff is so severe that its users usually resort to using another SML implementation for actual development and debugging and only use MLton for release builds. If we can figure out how to make whole program optimization detect which already compiled bits can be reused between builds, that may make the idea more viable.

                                                                                        1. 2

                                                                                          In last discussion, I argued for multi-staged process that improved developer productivity, esp keeping mind flowing. The final result is as optimized as possible. No wait times, though. You always have something to use.

                                                                                          1. 1

                                                                                            Exactly. I think developing with something like smlnj, then compiling the final result with mlton is a relatively good workflow. Testing individual functions is faster with Common Lisp and SLIME, and testing entire programs is faster with Go, though.

                                                                                            1. 2

                                                                                              Interesting you mentioned that; Chris Cannam has a build setup for this workflow: https://bitbucket.org/cannam/sml-buildscripts/

                                                                                  1. 1

                                                                                    The dynamically scoped global context made me happy. No other language except Common Lisp has that as far as I know. I am excited to use this.

                                                                                    1. 2

                                                                                      Won’t that make optimizations extremely hard? I haven’t watched the video, so I don’t know the details (and the Jai language primer makes no mentions of contexts), but if you can’t tell statically what’s in scope, it seems to me that most analyses will have to conservatively assume that the universe is in scope, no?

                                                                                      1. 3

                                                                                        Things may have changed from the last demo I saw of Jai contexts, but this seems to be something intended to be used sparingly, or at least the context should contain only a few root object pointers. Functions that use context simply desugar to context-passing-style. The really interesting problem is what to do about higher-order code.

                                                                                        On other thing that makes this easier: Jai is focused on fast full-compilation, so it doesn’t suffer from the usual restrictions imposed by separate compilation. It would be possible to do conservative global analysis (very cheaply!) to compute which functions need which partitions of the whole context.

                                                                                        1. 1

                                                                                          Scope and optimization here are separate questions and I don’t see how they’re related. Regarding scope, I don’t know the full details but I would assume you have to declare the global variables beforehand, so it’s not like you can introduce arbitrary variables into the context. The compiler knows exactly which static addresses are accessible and which are not. Perhaps that answers your question?

                                                                                      1. 9

                                                                                        Have you heard of vi? It’s a “visual” mode for ed. A truly amazing innovation. It lets you see the file while entering ed commands, and changes get reflected immediately.

                                                                                        1. 2

                                                                                          ex is not ed. i have often wished for ve instead of vi though

                                                                                          1. 2

                                                                                            Isn’t that, mostly, sam?

                                                                                          2. 1

                                                                                            vi, vi, vi - the editor of the beast

                                                                                            vi, vi, vi - the one for you and me

                                                                                          1. 12

                                                                                            Output should be simple to parse and compose

                                                                                            No JSON, please.

                                                                                            Yes, every tool should have a custom format that needs a badly cobbled together parser (in awk or whatever) that will break once the format is changed slighly or the output accidentally contains a space. No, jq doesn’t exist, can’t be fitted into Unix pipelines and we will be stuck with sed and awk until the end of times, occasionally trying to solve the worst failures with find -print0 and xargs -0.

                                                                                            1. 11

                                                                                              JSON replaces these problems with different ones. Different tools will use different constructs inside JSON (named lists, unnamed ones, different layouts and nesting strategies).

                                                                                              In a JSON shell tool world you will have to spend time parsing and re-arranging JSON data between tools; as well as constructing it manually as inputs. I think that would end up being just as hacky as the horrid stuff we do today (let’s not mention IFS and quoting abuse :D).

                                                                                              Sidestory: several months back I had a co-worker who wanted me to make some code that parsed his data stream and did something with it (I think it was plotting related IIRC).

                                                                                              Me: “Could I have these numbers in one-record-per-row plaintext format please?”

                                                                                              Co: “Can I send them to you in JSON instead?”

                                                                                              Me: “Sure. What will be the format inside the JSON?”

                                                                                              Co: “…. it’ll just be JSON.”

                                                                                              Me: “But it what form? Will there be a list? Name of the elements inside it?”

                                                                                              Co: “…”

                                                                                              Me: “Can you write me an example JSON message and send it to me, that might be easier.”

                                                                                              Co: “Why do you need that, it’ll be in JSON?”

                                                                                              Grrr :P

                                                                                              Anyway, JSON is a format, but you still need a format inside this format. Element names, overall structures. Using JSON does not make every tool use the same format, that’s strictly impossible. One tool’s stage1.input-file is different to another tool’s output-file.[5].filename; especially if those tools are for different tasks.

                                                                                              1. 3

                                                                                                I think that would end up being just as hacky as the horrid stuff we do today (let’s not mention IFS and quoting abuse :D).

                                                                                                Except that standardized, popular formats like JSON get the side effect of tool ecosystems to solve most problems they can bring. Autogenerators, transformers, and so on come with this if it’s a data format. We usually don’t get this if it’s random people creating formats for their own use. We have to fully customize the part handling the format rather than adapt an existing one.

                                                                                                1. 2

                                                                                                  Still, even XML that had the best tooling I have used so far for a general purpose format (XSLT and XSD in primis), was unable to handle partial results.

                                                                                                  The issue is probably due to their history, as a representation of a complete document / data structure.

                                                                                                  Even s-expressions (the simplest format of the family) have the same issue.

                                                                                                  Now we should also note that pipelines can be created on the fly, even from binary data manipulations. So a single dictated format would probably pose too restrictions, if you want the system to actually enforce and validate it.

                                                                                                  1. 2

                                                                                                    “Still, even XML”

                                                                                                    XML and its ecosystem were extremely complex. I used s-expressions with partial results in the past. You just have to structure the data to make it easy to get a piece at a time. I can’t recall the details right now. Another I used trying to balance efficiency, flexibility, and complexity was XDR. Too bad it didn’t get more attention.

                                                                                                    “So a single dictated format would probably pose too restrictions, if you want the system to actually enforce and validate it.”

                                                                                                    The L4 family usually handles that by standardizing on an interface, description language with all of it auto-generated. Works well enough for them. Camkes is an example.

                                                                                                    1. 3

                                                                                                      XML and its ecosystem were extremely complex.

                                                                                                      It is coherent, powerful and flexible.

                                                                                                      One might argue that it’s too flexible or too powerful, so that you can solve any of the problems it solves with simpler custom languages. And I would agree to a large extent.

                                                                                                      But, for example, XHTML was a perfect use case. Indeed to do what I did back then with XLST now people use Javascript, which is less coherent and way more powerful, and in no way simpler.

                                                                                                      The L4 family usually handles that by standardizing on an interface, description language with all of it auto-generated.

                                                                                                      Yes but they generate OS modules that are composed at build time.

                                                                                                      Pipelines are integrated on the fly.

                                                                                                      I really like strongly typed and standard formats but the tradeoff here is about composability.

                                                                                                      UNIX turned every communication into byte streams.

                                                                                                      Bytes byte at times, but they are standard, after all! Their interpretation is not, but that’s what provides the flexibility.

                                                                                                      1. 4

                                                                                                        Indeed to do what I did back then with XLST now people use Javascript, which is less coherent and way more powerful, and in no way simpler.

                                                                                                        While I am definitely not a proponent of JavaScript, computations in XSLT are incredibly verbose and convoluted, mainly because XSLT for some reason needs to be XML and XML is just a poor syntax for actual programming.

                                                                                                        That and the fact that while my transformations worked fine with xsltproc but did just nothing in browsers without any decent way to debug the problem made me put away XSLT as an esolang — lot of fun for an afternoon, not what I would use to actually get things done.

                                                                                                        That said, I’d take XML output from Unix tools and some kind of jq-like processor any day over manually parsing text out of byte streams.

                                                                                                        1. 2

                                                                                                          I loved it when I did HTML wanting something more flexible that machines could handle. XHTML was my use case as well. Once I was a better programmer, I realized it was probably an overkill standard that could’ve been something simpler with a series of tools each doing their little job. Maybe even different formats for different kinds of things. W3C ended up creating a bunch of those anyway.

                                                                                                          “Pipelines are integrated on the fly.”

                                                                                                          Maybe put it in the OS like a JIT. Far as bytestreams, that mostly what XDR did. They were just minimally-structured, byte streams. Just tie the data types, layouts, and so on to whatever language the OS or platform uses the most.

                                                                                                  2. 3

                                                                                                    JSON replaces these problems with different ones. Different tools will use different constructs inside JSON (named lists, unnamed ones, different layouts and nesting strategies).

                                                                                                    This is true, but but it does not mean heaving some kind of common interchange format does not improve things. So yes, it does not tell you what the data will contain (but “custom text format, possibly tab separated” is, again, not better). I know the problem, since I often work with JSON that contains or misses things. But the problem is not to not use JSON but rather have specifications. JSON has a number of possible schema formats which puts it at a big advantage of most custom formats.

                                                                                                    The other alternative is of course something like ProtoBuf, because it forces the use of proto files, which is at least some kind of specification. That throws away the human readability, which I didn’t want to suggest to a Unix crowd.

                                                                                                    Thinking about it, an established binary interchange format with schemas and a transport is in some ways reminiscent of COM & CORBA in the nineties.

                                                                                                  3. 7

                                                                                                    will break once the format is changed slighly

                                                                                                    Doesn’t this happens with json too?
                                                                                                    A slight change in the key names or turning a string to a listof strings and the recipient won’t be able to handle the input anyway.

                                                                                                    the output accidentally contains a space.

                                                                                                    Or the output accidentally contact a comma: depending on the parser, the behaviour will change.

                                                                                                    No, jq doesn’t exis…

                                                                                                    Jq is great, but I would not say JSON should be the default output when you want composable programs.

                                                                                                    For example JSON root is always a whole object and this won’t work for streams that get produced slowly.

                                                                                                    1. 5

                                                                                                      will break once the format is changed slighly

                                                                                                      Doesn’t this happens with json too?

                                                                                                      Using a whitespace separated table such as suggested in the article is somewhat vulnerable to continuing to appear to work after the format has changed while actually misinterpreting the data (e.g. if you inserted a new column at the beginning, your pipeline could happily continue, since all it needs is at least two columns with numbers in). Json is more likely to either continue working correctly and ignore the new column or fail with an error. Arguably it is the key-value aspect that’s helpful here, not specifically json. As you point out, there are other issues with using json in a pipeline.

                                                                                                    2. 3

                                                                                                      On the other hand, most Unix tools use tabular format or key value format. I do agree though that the lack of guidelines makes it annoying to compose.

                                                                                                      1. 2

                                                                                                        Hands up everybody that has to write parsers for zpool status and its load-bearing whitespaces to do ZFS health monitoring.

                                                                                                        1. 2

                                                                                                          In my day-to-day work, there are times when I wish some tools would produce JSON and other times when I wish a JSON output was just textual (as recommended in the article). Ideally, tools should be able to produce different kinds of outputs, and I find libxo (mentioned by @apy) very interesting.

                                                                                                          1. 2

                                                                                                            I spent very little time thinking about this after reading your comment and wonder how, for example, the core utils would look like if they accepted/returned JSON as well as plain text.

                                                                                                            A priori we have this awful problem of making everyone understand every one else’s input and output schemas, but that might not be necessary. For any tool that expects a file as input, we make it accept any JSON object that contains the key-value pair "file": "something". For tools that expect multiple files, have them take an array of such objects. Tools that return files, like ls for example, can then return whatever they want in their JSON objects, as long as those objects contain "file": "something". Then we should get to keep chaining pipes of stuff together without having to write ungodly amounts jq between them.

                                                                                                            I have no idea how much people have tried doing this or anything similar. Is there prior art?

                                                                                                            1. 9

                                                                                                              In FreeBSD we have libxo which a lot of the CLI programs are getting support for. This lets the program print its output and it can be translated to JSON, HTML, or other output forms automatically. So that would allow people to experiment with various formats (although it doesn’t handle reading in the output).

                                                                                                              But as @Shamar points out, one problem with JSON is that you need to parse the whole thing before you can do much with it. One can hack around it but then they are kind of abusing JSON.

                                                                                                              1. 2

                                                                                                                That looks like a fantastic tool, thanks for writing about it. Is there a concerted effort in FreeBSD (or other communities) to use libxo more?

                                                                                                                1. 1

                                                                                                                  FreeBSD definitely has a concerted effort to use it, I’m not sure about elsewhere. For a simple example, you can check out wc:

                                                                                                                  apy@bsdell ~> wc -l --libxo=dtrt dmesg.log
                                                                                                                       238 dmesg.log
                                                                                                                  apy@bsdell ~> wc -l --libxo=json dmesg.log
                                                                                                                  {"wc": {"file": [{"lines":238,"filename":"dmesg.log"}]}
                                                                                                            2. 1

                                                                                                              powershell uses objects for its pipelines, i think it even runs on linux nowaday.

                                                                                                              i like json, but for shell pipelining it’s not ideal:

                                                                                                              • the unstructured nature of the classic output is a core feature. you can easily mangle it in ways the programs author never assumed, and that makes it powerful.

                                                                                                              • with line based records you can parse incomplete (as in the process is not finished) data more easily. you just have to split after a newline. with json, technically you can’t begin using the data until a (sub)object is completely parsed. using half-parsed objects seems not so wise.

                                                                                                              • if you output json, you probably have to keep the structure of the object tree which you generated in memory, like “currently i’m in a list in an object in a list”. thats not ideal sometimes (one doesn’t have to use real serialization all the time, but it’s nicer than to just print the correct tokens at the right places).

                                                                                                              • json is “java script object notation”. not everything is ideally represented as an object. thats why relational databases are still in use.

                                                                                                              edit: be nicer ;)

                                                                                                            1. 0

                                                                                                              Oops, too late :D

                                                                                                              1. 3

                                                                                                                Salut mec! I did my undergrad at UdeM and I don’t recall needing a book for this class. I did the class with Marc Feeley and it was sufficient to attend the lectures and to make sure to preview and review the slides. The projects were good, and if you apply yourself and do them, I think you’ll get a lot out of this class without spending $100. TAPL is a cool book, but that would be more useful for Stephan’s grad-level class or a class by Brigitte Pientka at McGill.

                                                                                                                Good luck!

                                                                                                                1. 2

                                                                                                                  I’m mostly looking into this for my personal benefit, this is the only class I’m taking this semester (As I also work full time) and I’d like to get really involved into the subject. I will probably take other classes in the same vein over the next semesters.

                                                                                                                  And spending way too much money on books is my guilty pleasure.

                                                                                                                  1. 1

                                                                                                                    Then I heartily recommend TAPL, it’s a great book, even though it takes dedication to get through it all.

                                                                                                                    Also, www.alibris.com sometimes has pretty good deals on used CS books; I got the Dragon Book (compiler design) for $1 + shipping. There’s no reason why a guilty pleasure should break your wallet :D

                                                                                                                    1. 1

                                                                                                                      Nice, I didnt know that website.

                                                                                                                1. 1

                                                                                                                  Possible typo?

                                                                                                                  A quote mark is (1, 0, 2)

                                                                                                                  Shouldn’t it be (1, 0, 1) since the double-quote character in state 2 takes you back to state 1?

                                                                                                                  1. 0

                                                                                                                    Don’t reach for a profiler, don’t try to set a global variable, or do start/stop timing in code. Don’t even start figuring out how to configure a logger to print timestamps, and use a log output format.

                                                                                                                    Then when you’ve done all this, rerun the command and pass it through a command that attaches timestamps to every output line.

                                                                                                                    That’s some terrible writing. Cool tool, though.

                                                                                                                    1. 1

                                                                                                                      Thank you for reading.