Threads for Ambroisie

    1. 3

      I was poking vtables this week so if anyone’s interested in their layout, it currently goes: pointer to the drop function, two usizes for size and alignment, then n pointers to the object’s methods.

      I believe that with the new upcasting feature, the super-trait’s function pointers come first followed by the sub-trait’s function pointers. That way you can treat the vtable as matching either trait.

      1. 1

        How does upcasting work (layout-wise) when a trait has multiple super-traits? I’d assume some redundancy is necessary in that case.

        1.  

          Hmm, I’m not sure actually. There’s a proposed layout in the original RFC proposal in which some of a super-trait’s methods can come after the sub-trait’s, and sometimes a pointer to a new vtable is required.

      2. 7

        @tuxes do you have an Atom/RSS feed I can subscribe to for the next post?

        1. 4

          I’ve been meaning to. For now my site is just a collection of HTML files. I’ll reply to your comment when I post the next one, or I add an RSS feed.

            1. 1

              Mildly unrelated, but are you just using an RSS reader app? I’m big into the personal blog space, so I’m always looking for the best way to interact with content.

              1. 4

                Yeah, I use a self-hosted reader to be able to read my feeds from my phone or a computer without dealing with sync issues. I used to use FreshRSS years ago, then transitioned to Miniflux (when FreshRSS wasn’t packaged/a module on NixOS) and that’s what I’m using to this day.

            2. 25

              Nice article, I’m actually surprised to see nixpkgs being over 90% reproducible already!

              Also @JulienMalka, could you please add an Atom/RSS feed to your blog? :-)

                1. 1

                  I’ll be subscribing to that as well! Thank you :)

                2. 7

                  the cat picture factory

                  I feel like I’m missing something. Usually I would assume this is simply a light-hearted nickname for “the Internet”, but they go on to say “This time, I was working there, and decided there would not be a repeat. The entire company’s time infrastructure would be adjusted”, implying this is a specific company. Reddit? Imgur?

                    1. 19

                      Ah, now the final paragraph finally makes sense too.

                      1. 5

                        This is like Haldane’s On Being the Right Size where it seems like a technical essay, but really the point is the political message at the end. My fear is that she’s wrong, and Facebook isn’t so much lurching into the past as the future.

                        1. 7

                          It took your comment for me to realize this is about their new content policy and not about some facebook outage I can’t find anything on..

                          1. 3

                            I couldn’t find any relevant-looking recent news about Facebook. What is this even about?

                            1. 41

                              In the new moderation guidelines anti-trans, anti-gay, and anti-man, anti-enby, and anti-woman bigotry are no longer forbidden, and various bigoted sentences (anti-trans, anti-gay, and anti-woman) are given as examples of explicitly allowed speech.

                              Here is a Platformer article on Facebook’s new moderation guidelines. Content warning for hate speech.

                              1. 3

                                They have made some controversial changes to moderation policy in recent days.

                            2. 1

                              Ah thank you, I was having trouble with that.

                            3. 3

                              You weren’t alone. There are so many sites that could be considered “the cat picture factory” that I couldn’t figure it out until the end, and facebook was barely in my top 10 contenders. I guess that place seemed very different to people on the inside than outside.

                            4. 13

                              If anybody from the fish team reads these comments, could I please ask for an Atom/RSS feed for the blog? Thank you :-)

                              1. 21

                                One of our co-maintainers figured it out; here you are: https://fishshell.com/blog/feed.xml

                                1. 4

                                  It’s a static html site, source code is on GitHub. If you want to manually write an atom feed for the entries and we just have to remember to add to it each time, I guess you could try opening a PR?

                                  https://github.com/fish-shell/fish-site

                                  1. 3

                                    Could be a script to generate it in a pre-push hook. Will think about it if I pop the yak stack for it.

                                    1. 6

                                      It’s a Jekyll site so you could use a plugin like jekyll-feed.

                                2. 7

                                  You can also use the absorb feature in git with https://github.com/tummychow/git-absorb I’m using it a lot at work and it’s super nice to have when jujutsu is off the table.

                                  1. 4

                                    Does git absorb work past merge commits (i.e. can it absorb into different ancestor commits that aren’t ancestors of each other)? It’s not clear from the docs.

                                    1. 4

                                      IIRC it stops at merge commits.

                                  2. 25

                                    This has essentially been my main motivation for getting interested in Nix: its potential as a build system for a heterogenous monorepo. Unfortunately, using Nix as a build system has been immensely painful for a many dozens of different reasons. Some big ones include the poor documentation and organization of nixpkgs, the lack of static types, the frustrating nix language (even after many years, I still find myself straining myself to visually parse its syntax and remember how it behaves in not-entirely-obscure cases). You set out to build some simple program or another, but then you realize that one of the dependencies is some obscure C dependency with its own bespoke autotools build system, so you try to figure out how other autotools things are built in nixpkgs but they all use cryptic, undocumented libraries to package autotools things and it’s completely unclear how to apply those libraries to your case, so you spend weeks trolling around forums and discord channels trying to figure out how to make it work.

                                    It very much feels like the Nix maintainers are not seriously interested in making Nix usable–which is fine, but anyone who is expecting to be able to use it for their own projects should be prepared for an unpredictable amount of frustration when it comes to using Nix as a build system (or at least that has been my experience after picking Nix up a couple times a year for the last ~decade). I don’t want to berate Nix, I really think the problems it purports to solve are valuable problems, and it’s high level approach is absolutely correct–but so many of the implementation decisions inhibit broad adoption and from the outside there doesn’t seem to be much interest in addressing those problems.

                                    1. 18

                                      Autotools is literally the easiest build system to package in nixpkgs, it’s the one that has shaped the design of mkDerivation and most of the build vocabulary in nixpkgs.

                                      1. 8

                                        If that’s so, then why did the parent comment author have so much difficulty with it? From my own experience with Nix documentation I won’t believe the answer is because they are incompetent. If the alleged easiest build system to package in nixpkgs is giving people trouble that is a problem, not a dismissal.

                                        1. 3

                                          I think the sibling comment about it being “like Autotools but not Autotools” is why.

                                          I do recognize that Nix’s learning curve is steep, but Autotools really is turn key.

                                        2. 1

                                          I read “bespoke autotools” as “something that serves the same purpose as autotools, but is a bespoke solution and not actually autotools”.

                                          1. 3

                                            My guess is that it was (derived from) actual Autotools but changed in some way that made it confusingly incompatible with Nix’s default Autotools support, but I haven’t enough data to say for sure.

                                            1. 1

                                              That’s an even better theory.

                                        3. 10

                                          Nix and NixOS have been the single greatest investment of my time in the last couple years. I still don’t feel like I’m “good at it”, and given my limited time with it, maybe it’s too early to be good at it – but I think I could be better if it was simply different.

                                          And similar to you, I think a large factor in that feeling is “I still find myself straining myself to visually parse its syntax and remember how it behaves in not-entirely-obscure cases”.

                                          I think a lot of my difficulty with nixlang has to do with any moderately complex code looking like a pyramid of doom which, for me, is mentally taxing. And so I shy away from complexity.

                                          Nix is a delight. Its documentation is not. And the langauge pains me. But I’ll happily continue using it because the results are fantastic.

                                          1. 2

                                            What’s is missing in the documentation ? I ask this because I feel like nix has excellent documentation. I’m a newcomer, now 10months. Documentation have been excellent. And being able use nixpkgs as reference, I haven’t found anything difficult to do. Just nix build and inspect the result till everything works. If nix can’t build ass traces into the derivation.

                                            I think the hardest thing would be adding support for a new language tool chain, aka rust cargo, or something of that nature, but haven’t had to do that.

                                            1. 1

                                              IMO the documentation tends to assume you already know what you’re doing - for instance, the docs for setting up a server tend to assume you already know how to set up an Ubuntu server and want to learn how to port it over to NixOS, so they effectively try to provide a diff, and not a cleansheet tutorial.

                                          2. 3

                                            I’m curious if you’ve looked at guix. scheme’s syntax claims to be easier. but I’ve had a hard time understanding/debugging error messages. And (build-system gnu-build-system) magic might be just as opaque as nix

                                            1. 2

                                              the Nix maintainers are not seriously interested in making Nix usable

                                              We are on Nx which does pretty much what Nix does but for the special case of NPM. It’s a great masterclass of how a tool like this should work.

                                              If I wanted to try out a generic build system I’d go for Bazel or Buck2 (probably buck2).

                                              1. 2

                                                Maybe people like different kinds of things. I tied out Nx and found I needed to read the source of every Nx build template to understand wtf it was going to do or how to work around its (incorrect) assumptions. I would rather write my own typescript build orchestrator than need to read 3 packages deep to understand how to use someone else’s.

                                                I have the same problem with Nix in my brief forays, most built rules are buried deep in abstractions that are either under-documented or are documented but stacked so high I can’t grok what’s going on.

                                                1. 1

                                                  I tied out Nx and found I needed to read the source of every Nx build template to understand wtf it was going to do or how to work around its (incorrect) assumptions.

                                                  For our Typescript projects it kinda just works and the documentation is not even that bad anymore. I agree it’s somewhat byzantine and the jumble of nx.json and project.json files is off-putting as hell, but still… it works.

                                              2. 2

                                                feels like the Nix maintainers are not seriously interested in making Nix usable

                                                Some of my experiences with the project makes me think the same, and has been a motivating factor for our work at Determinate, and specifically Determinate Nix. For example, the lack of clarity around flakes, the fact that Nix itself is usually treated as “plumbing” and not end user software, etc.

                                                Even the day zero experience of “install Nix” has so much work to be done. Work we’ve done almost entirely in OSS, but the upstream project just hasn’t felt motivated enough to adopt our installer. We’ve bent over backwards to make it work 100% and yet.

                                              3. 15

                                                I love fish…except that it doesn’t have ctrl-O (execute line then select the next command from history) and I don’t know what the workaround is. I switched anyway, but every time I have to do something repeatedly that takes more than one command, I feel like I must be missing something. (Am I?) I have ctrl-R, blah, ctrl-O ctrl-O ctrl-O in my fingers from years of bash/zsh.

                                                  1. 2

                                                    Using Unix for > 25 years now. Never heard of C-O before. Thanks!!

                                                    1. 2

                                                      Execute which line? The current input? The once the executed comand terminate, what do you mean by select? Putting it in the promot input?

                                                      Why is this practical? Why do you want to runa command and then the one you have executed before?

                                                      Either way, sounds like something you can implement in 3 ton 5 lines of code.

                                                      1. 1

                                                        I’ve never seen this before. Is this like !! ?

                                                        1. 16

                                                          In bash/zsh, after you find a command in history with c-R, you can go up and down in history with c-P and c-N, and you can execute a command and immediately select the following command with c-O. So if you have a sequence of commands to repeat, you c-R back to the first one, then repeatedly do c-O to execute the sequence. (This is assuming emacs keybindings.)

                                                          Fuzzy history search in fish doesn’t seem to be related to the chronological history list. It just pulls out a single command with no context.

                                                          So I’m hoping that now c-R is incremental in fish that it can do the bash/zsh thing, but I haven’t looked at the beta yet.

                                                          1. 5

                                                            …I really need to read more documentation. What a time saver I never heard of.

                                                            1. 4

                                                              cross posting to say: I am one of today’s lucky 10000..

                                                              1. 2

                                                                Ooooooh. Neat, yeah I can see the appeal.

                                                            2. 1

                                                              I’m glad I’m not alone in this! It’s extremely rare to see someone mentioning this feature.

                                                              In zsh it’s also possible to select multiple completion candidates with Ctrl+o. I miss it even more than executing multiple history lines. There is an open issue about this, but it’s pretty much dead. https://github.com/fish-shell/fish-shell/issues/1898

                                                            3. 4

                                                              As a NixOS user I wish it were possible to get fish inside nix-shell.

                                                              1. 4

                                                                I can say that with Direnv, I’ve rarely needed raw nix-shell. What kind of use case do you have in mind where that wouldn’t work?

                                                                Also I assume you’ve heard of any-nix-shell?

                                                                1. 3

                                                                  Direnv

                                                                  I tried direnv today at your suggestion

                                                                  any-nix-shell

                                                                  I hadn’t heard of it, thanks! Will give it a try.

                                                                  Although, it gives me pause that it’s not enabled by default when the config includes programs.fish.enable = true;. Presumably there’s a good reason for that.

                                                                  update: trying it now, seems to work great. thanks again for the tip!

                                                                  final update: very happy with my setup now

                                                                  1. 1

                                                                    Works fine until your coworker start injecting Bash-only functions into the dev shell for some reason 🤦

                                                                    1. 2

                                                                      Functions wouldn’t work in bash with direnv anyway: the underlying issue is a difference in workflow with that colleague (who seemingly doesn’t use direnv). The solution is to use scripts instead of functions.

                                                                  2. 2

                                                                    Is this just a matter of nix providing some script that can be sourced by fish setting up certain global state and maybe chrooted into a certain path? Are there things fish doesn’t support that stops it from being an option or is it just lack of an upstream fish version of the script?

                                                                    1. 3

                                                                      nix-shell is also used for debugging package builds, with genericBuild, build phases, etc… Those won’t work on anything but Bash due to the builder being “highly advanced” bash code.

                                                                      If it’s only to get a few environment variables in your shell, direnv works well.

                                                                  3. 2

                                                                    Lovely to see the changes and improvements. I keep eyeing Fish to replace my Zsh setup, but worry about losing out on years of muscle memory for POSIX-ish shell syntax.

                                                                    In other news, I wish the blog has an RSS feed, I’d like to keep up to date with new releases to read about features etc…

                                                                    1. 9

                                                                      We’ve made great progress in supporting posix syntax and features that don’t clash outright with fish. You should give it another try!

                                                                      1. 3

                                                                        As a prime example of this, a while back they added the ability to do FOO="blah" some-command instead of the previous version where you need to prefix with env. This alone resolved something like 90% of the ways my daily use of fish diverged from what I would have written in bash or zsh.

                                                                      2. 2

                                                                        I colleague of mine recently switched. That surprised me because he had quite some zsh setup and as far as I know fish does not really offer more feature-wise. He told me that he likes fish because it is more „snappy“.

                                                                        1. 6

                                                                          Out of the box, Fish has far more features than Zsh. It doesn’t offer anything else feature-wise if you install all the Zsh modules that were created to implement Fish features, but they’re fiddly and often just don’t work as well. If you want Zsh like Fish, just use Fish.

                                                                          1. 1

                                                                            I agree. The point is if you already invested the time to set up zsh with all kinds of modules, then switching to fish is not much of an improvement. So I don’t recommend fish to zsh power users.

                                                                            That said, I have now the anecdotal evidence from one person that fish was still worth switching.

                                                                            1. 4

                                                                              It’s far less janky than zsh, mostly because you have less user-authored code (users in general, not you specifically). I don’t touch my config for months on end and things just keep humming along well.

                                                                              1. 1

                                                                                I don’t touch my config for months on end and things just keep humming along well.

                                                                                I don’t touch my Zsh config for years on end and likewise. On the other hand, I imagine my Zsh config took longer to write than your Fish config, though.

                                                                              2. 1

                                                                                it is still an improvement because you end up with less shell code

                                                                        2. 3

                                                                          This allows rr to work with perf_event_paranoid == 2 starting with 6.10, where rr previously required perf_event_paranoid == 1.

                                                                          That’s quite nice, I remember having to fight my school’s sysadmins because of this requirement when begging them to install rr on the students’ machines.

                                                                          1. 3

                                                                            Is there an RSS feed for Ladybird’s blog/announcements?

                                                                              1. 4

                                                                                As the name implies, that tool sends RSS via email. Is there an email2rss you had in mind and meant to link to?

                                                                              2. 1

                                                                                From the headers in the HTML file: https://ladybird.org/posts.xml

                                                                                Any decent feed reader should be able to extract that if you submit the web page.

                                                                                1. 5

                                                                                  that doesn’t include these newsletter posts though, which they seem to treat as a separate thing from the main “news”

                                                                                  1. 2

                                                                                    Ah, sorry I didn’t bother to actually check what the XML file contained. I expected it to be a standard feed, but it’s probably just some vestigial remnant generated by the CMS they use.

                                                                                  2. 2

                                                                                    Yeah, unfortunately this article isn’t in the feed.

                                                                                2. 10

                                                                                  Everything in this list makes sense to me except zellij. I’ve tried it a couple times but found the default shortcuts interfere significantly with the applications I wish to use, meaning I’d need to remap a non trivial number of them. This to me means a lot of configuration changes, not zero.

                                                                                  1. 4

                                                                                    I’ve been happy with the tmux for mere mortals configuration for a while now

                                                                                    1. 3

                                                                                      What’s wrong with default tmux? I’m not a power user. I just have a few sessions on my linode box for things I run and that’s it. But I don’t really feel that I’m missing anything. Am I?

                                                                                      1. 5

                                                                                        The most ironic thing about tmux was that it was a “better screen” and yet the #1 complaint people had about screen was that it stole the ctrl-a binding (beginning-of-line in readline/emacs) tmux made the same mistake by stealing the ctrl-b binding (back one character).

                                                                                        The default modeline is meh, but everything else about tmux is basically perfect out of the box except that binding; they had a chance to fix it and they blew it. (I still use and love tmux, but never with the default config)

                                                                                        1. 2

                                                                                          I’m curious I always found their choice of a leader key to make sense (I use Ctrl-a much pore often I Vim than Ctrl-b in readline shells). What do you use instead?

                                                                                          1. 5

                                                                                            I configured screen to use ^Z instead, within the first week of trying it (in the 90s), and kept that when I switched to tmux. I’ve never understood why they didn’t use ^Z from the start – it’s the sort-of-obvious key binding they make redundant.

                                                                                            1. 4

                                                                                              ^Z is probably the shortkey that i use the most. ^Z to send my helix editor in background, executing some commands in the terminal, then get back to helix with fg.

                                                                                              1. 4

                                                                                                That is a great choice. The whole point if tmux/etc was to eliminate the need for job control. I need to switch to this so it forces me to unlearn that bad habit.

                                                                                                1. 2

                                                                                                  I also used C-z for my screen escape, and kept it when using tmux.

                                                                                                2. 1

                                                                                                  Many years ago, I went through all 32 control characters, and found that ^^ was the only one that I never used anywhere for anything, so that became my leader key. Its only downside is that a couple of terminal programs don’t allow you to type it via Ctrl+6, instead requiring an explicit Ctrl+Shift+6.

                                                                                                    1. 2

                                                                                                      True. mosh didn’t exist when I chose it. And even so, it’s still the least problematic control key (at least for an Emacs user), by a significant margin.

                                                                                                    2. 1

                                                                                                      ^^

                                                                                                      What does this mean? Ctrl, Ctrl?

                                                                                                      1. 1

                                                                                                        Sorry, it means Ctrl-caret.

                                                                                                        In addition to the 26 letters, the 6 symbols @ [ ^ ] \ _ also map to control characters. You can’t reliably use non-standard control characters (like Ctrl+1 et al.) if you want to use tmux etc. via a remote terminal. All of them are already used either at the terminal or in Emacs except Ctrl-caret.

                                                                                                        [Technically ^? also counts as a control character, but it’s basically the same thing as Backspace, so it’s only worth mentioning to forestall someone telling me I forgot it.)

                                                                                                        1. 1

                                                                                                          Sorry, it means Ctrl-caret.

                                                                                                          OIC. Still slightly appalled, TBH, but OK.

                                                                                                          (Is this Emacs notation? I am not an Emacs user and its notation is just one of many reasons.)

                                                                                                          Does that not mean Ctrl + Shift + 6 though, a rather tricky 3-key combo?

                                                                                                          1. 1

                                                                                                            No, it’s not Emacs notation. (That would be C-^.) It’s the same notation as ^C for Ctrl-C.

                                                                                                            As I mentioned in passing, nearly all terminal programs will accept simply Ctrl + 6 to mean ^^, just as they accept Ctrl + 2 to mean ^@. (It doesn’t work on some older terminal programs, which is a downside.)

                                                                                                        2. 1

                                                                                                          I read it as “control-circumflex”.

                                                                                                          1. 2

                                                                                                            CLEARLY it’s bunny ears.

                                                                                                            1. 1

                                                                                                              … gosh. Both at the interpretation – which I am not faulting, BTW – and at the choice of a command key.

                                                                                                              Could be!

                                                                                                    3. 2

                                                                                                      I use tmux for local terminal multiplexing (not really for session persistence) and having to double-chord keybinds I use constantly is annoying. Take a look at the keymap I linked above, it’s intuitive and nicer IMO.

                                                                                                    4. 1

                                                                                                      This is amazing. It interferes with my custom sway config a bit, but it’s a good start on making this, which I’ve used for years, much more useful.

                                                                                                      1. 1

                                                                                                        This tmux config also played a role in obviating my need for tiling window managers - most tiling I needed was for terminals, but now a single fullscreen terminal works as my terminal “app”. Most of the time I want to use the remaining programs fullscreen, so presto, less need for a TWM.

                                                                                                    5. 2

                                                                                                      Yeah, I just tried it and half of the functions require the alt key, but the alt key doesn’t even work. For example, the “new pane” command is Alt + <n>, but that combo is used to put the ~ above letters like ñ. It seems like this is a really common problem, and the FAQ points you to a stack exchange post which tells you to change your operating system’s keybindings, which is just insane. Why would you ship something that is broken by default?

                                                                                                      1. 1

                                                                                                        Let me guess, you’re a Mac user?

                                                                                                        1. 1

                                                                                                          Yes, but I’m sure users of other platforms also prefer software that isn’t broken by default. :)

                                                                                                          1. 1

                                                                                                            “broken by default” hey, it works for me. Just remap your iterm2 so the alt key actually functions.

                                                                                                            1. 1

                                                                                                              Yep, I fully believe it’s a platform specific bug, and that it can be worked around with keybindings. I’m just pointing out that it doesn’t work by default. And it’s not an iTerm2 bug or a broken alt key because Terminal.app and VS Code’s terminal also exhibit native macOS alt key behavior. Zellij is just broken by default (and presumably unwilling to fix their bug, since their FAQ just documents workarounds), at least on some platforms.

                                                                                                      2. 2

                                                                                                        Zellij seems much more powerful and reasonable to configure than tmux, but the defaults are not made for “CLI natives”. They waste precious space time, and get in a way of almost every other CLI tool, AFAICT. What I actually wanted was “as close to tmux as possible, but in Rust, and with better defaults and some extra possibilities”.

                                                                                                        1. 2

                                                                                                          You often need to optimize your software for various, sometimes contradictory or mutually exclusive goals, and handling configs becomes a multi-criteria optimization.

                                                                                                          Configuration simplicity is one of the factors and if it goes hand in hand with other priorities, that’s great. But it doesn’t always.

                                                                                                          One of more absurd examples is that JVM defaults make absolutely zero sense for containers, so you end up tweaking JVM memory flags from day one. I’m not a fan of Java microservices and it’s one of many reasons.

                                                                                                          1. 1

                                                                                                            Have had similar experiences with Zellij - not necessarily the most “zero config” tool i’ve used so far. Also have never tried LazyGit, as I tend to use Fork more. Would be interested to see whether it’s a better choice than Fork for my work.

                                                                                                            1. 2

                                                                                                              Also have never tried LazyGit, as I tend to use Fork more.

                                                                                                              I use Gitup without which I can’t make sense of any big Git repo. But I think comparing a GUI app and a TUI is not fair.

                                                                                                              Reminds me that if I ever want to seriously move over to jj, then I’d need to part Gitup to it.

                                                                                                              1. 2

                                                                                                                Gitup seems to be a very keyboard based application - is that the case?

                                                                                                                1. 2

                                                                                                                  I mostly use it with the mouse but I think most things can be done with the keyboard as well.

                                                                                                            2. 1

                                                                                                              I love my heavily-configured Zellij. You can get around a lot of keybinding conflicts with the ctrl-g lock mode, but it’s better if they never conflict to start with. I ended up reconfiguring all the bindings to match my muscle memory and use it as “nicer tmux.”

                                                                                                            3. 2

                                                                                                              Data types

                                                                                                              The new Arrow-backed types in pandas are a great improvement and we’ll leave it at that

                                                                                                              I wonder what’s implied here.

                                                                                                              1. 3

                                                                                                                This caught my eye because I recently saw an in-progress testing library use the same technique: https://github.com/joeldrapper/quickdraw

                                                                                                                You can pass a custom failure message as a block. Using blocks for the failure messages means we don’t waste time constructing them unless the test fails. You don’t need to worry about expensive failure messages slowing down your tests.

                                                                                                                1. 1

                                                                                                                  Joel Drapper does some high quality Ruby’n. He’s also responsible for the Phlex view component library.

                                                                                                                    1. 2

                                                                                                                      I am not fluent in Ruby, but to my eyes this looks fine?

                                                                                                                      1. 7

                                                                                                                        Well it’s not for multiple reasons.

                                                                                                                        This is using Object#hash (equivalent to Java’s Object.hashCode() or C#’s GetHashCode) as a global cache key for caching HTML attributes rendering. e.g. {class: "foo"} becomes 'class="foo"'.

                                                                                                                        The problem is that this hash code is only meant for hash tables, hence it uses SipHash, so you are not meant to use it as sole key as it’s susceptible to collisions. When two hash codes match, you are supposed to additionally compare the original values that produced the hash codes to handle hash collisions. This code doesn’t do it. It assumes two objects with the same hash code are identical.

                                                                                                                        So this code can sometimes leak HTML attributes from one call to another and silently return the wrong result. Is it very likely? No, it requires a lot of bad luck, but I can’t possibly qualify this as “high quality”.

                                                                                                                        Then this cache is a synchronized FIFO with a fixed 4MiB size and no way to resize it.

                                                                                                                        FIFO is bad here because this cache is here to not have to generate the HTML for static calls, e.g. h1 class: "foo", but since the size is fixed, and you are constantly querying it with dynamic data, you are evicting the actually useful keys. So it should be a LRU or similar.

                                                                                                                        And finally, while it doesn’t really matter on MRI because of the GVL, on Ruby implementations with free threading, you are constantly contending on a global mutex whenever you need to generate an HTML tag. Again it works, and for most users it will never be a problem, but can’t be qualified as “high quality”.

                                                                                                                        1. 2

                                                                                                                          Thanks for the detailed answer, I appreciate you taking the time for it.

                                                                                                                          I’d missed the use of .cache rather than keying on the object itself, that’s unfortunate. Thank you for providing the additional context.

                                                                                                                          1. 1

                                                                                                                            Why not submit an issue or patch? These seem like areas that can be improved rather than quality problems. Phlex isn’t as mature or battle-hardened, it’s nice to see there are places that can be improved.

                                                                                                                            Thread stuff is tricky, and as-mentioned probably not a problem for the majority of users here. As far as performance it also outperforms rails partials and other approaches. https://github.com/KonnorRogers/view-layer-benchmarks . Maybe a multi-threaded benchmark could expose issues not seen there.

                                                                                                                            This meets my bar for quality - I’ve used it and it worked well for me in practice. I suppose the label of quality is arbitrary and it’s easy to point to things that violate our own sensibilities of quality. Most things can be improved, so for me I’m ok with seeing these sorts of things.

                                                                                                                            On a personal note, I remain blocked by you elsewhere for defending RSpec code that worked for over a decade which Rails then broke as not-crappy. I have seen you make a lot of valuable contributions to rails - also a work-in-progress that has needs for improvement - and then question the quality of other peoples’ projects. It seems so unnecessary. I hope you don’t block me here as well for saying so.

                                                                                                                            edit: Oh, some recent changes appeared since this comment. https://github.com/phlex-ruby/phlex/commit/6ac15fd63a8396daeeab30abbf1eb8ae9d2207d5

                                                                                                                            1. 3

                                                                                                                              As far as performance it also outperforms rails partials and other approaches. https://github.com/KonnorRogers/view-layer-benchmarks .

                                                                                                                              No it doesn’t. These benchmark are flawed, as mentioned here: https://github.com/KonnorRogers/view-layer-benchmarks/issues/10. The linked PR is gone because the author deleted the repo, but it explains that Phlex is several time slower than ERB at rendering, and only wins out on these benchmark because it does almost no rendering but spent a lot of time into the partial resolution code. e.g. when you do render "foo" Rails has to search for that partial, that’s slower than for Phlex that just reference a constant.

                                                                                                                              I don’t quite remember the numbers, and anyway it’s been a while now, but at the time of that issue Phlex was something like 4 times slower than ERB.

                                                                                                                              Why not submit an issue or patch?

                                                                                                                              I did back in the day, but the author doesn’t take feedback well, as evidenced by how following my PR showing how the benchmark was flawed simply lead to all benchmarks being deleted whereas during early development there was daily posts on how Phlex was X times faster than Rails. So now I keep my feedback for myself.

                                                                                                                              for defending RSpec code that worked for over a decade which Rails then broke as not-crappy.

                                                                                                                              Like it or not, the code on the RSpec side had a bug. The code in RSpec wasn’t doing what the author though it was doing: https://ruby.social/@byroot/110035443532020856. It’s fine, pretty much every software has bugs.

                                                                                                                              Also you weren’t just “defending” RSpec, I don’t see why I shouldn’t block authors of posts framed like this.

                                                                                                                              rails - also a work-in-progress that has needs for improvement

                                                                                                                              Lol

                                                                                                                              and then question the quality of other peoples’ projects.

                                                                                                                              I’m being factual. If someone pointing a defect in some code make you feel bad, it’s best you continue not to see my posts. Joel’s has Tweeted/Tooted hundreds of time about how X or Y in Rails is bad (most of the time misguided), and how Phlex is X times faster than Rails, and I shouldn’t be allowed to factually point defects in his library?

                                                                                                                              Oh, some recent changes appeared since this comment.

                                                                                                                              Oh boy…

                                                                                                                              1. 1

                                                                                                                                All those ages ago I admitted that was snarky, and that I regretted sending that. It was a similar situation, and I am not taking the same approach.

                                                                                                                                Here, the only defects pointed out were quickly fixed. Does that address your quality concerns?

                                                                                                                                1. 1

                                                                                                                                  Does that address your quality concerns?

                                                                                                                                  No, the FIFO is still a bad idea, and only look good on synthetic benchmarks where attributes are static. In a real world use case, HTML attributes can vary a lot (e.g. id="user-4223") so this FIFO will keep growing / evicting.

                                                                                                                                  Hence dismissing the contention issue is misguided, even more so because not locking on read means bad time on JRuby / TruffleRuby, and now the FIFO isn’t locked at all anymore, so the project became MRI only (but it’s fine to be MRI only if that’s your decision).

                                                                                                                                  And now the FIFO is caped at 1M entries instead of 4MB, given that each entry is an array (40B) that contains the original Hash (minimum 160B), plus the serialized attributes (minimum 40B), that’s at least 240B per entry, so at least 240MB of cache plus ~33MB for the FIFO Hash itself, so in total, more than half the memory provided by Heroku & co. So I expect users to have a nasty surprise when they upgrade.

                                                                                                                                  Ultimately the problem is that while the Phlex API is nice, it’s impossible to make it perform in the same ballpark as ERB or HAML, hence this sort of hacks to look OK on benchmarks (or so I guess because if the benchmarks are still public I can’t find them).

                                                                                                                                  It also suggest that the project is predominantly optimized against benchmarks rather than actual production profiles, or at least real apps load testing. Benchmarks are nice, but they are meant to represent a use case observed in production, not to be purely synthetic.

                                                                                                                                  To address my quality concerns (not that it’s necessary) one would have to remove that cache entirely and be upfront about the expected performance in the doc. It’s fine to not be fast, just be upfront about it to make sure not to mislead your users. I keep hearing people like you who think it’s faster than ERB, that annoys me. I have some projects with some very hard tradeoffs, and I do my best to make sure my users don’t think it’s silver bullet and know exactly what to expect.

                                                                                                                                  1. 1

                                                                                                                                    I think it’s wrong to dismiss partials as irrelevant for benchmarks since I have worked on more than one project wherein partials had to be “unrolled” and/or aggressively cached when it was found it had a profound effect on performance through profiling. ERB implementations had to design around this problem, which is a bummer.

                                                                                                                                    That benchmark repo is not owned by Joel, I don’t know what happened in that fork. I also understand how misleading benchmarks can be irksome. I remember being at a ruby conf in SLC where Ryan Davis claimed benchmarks showed RSpec to be something like o(n^2) slower than minitest, sharing graphs but did not share the code. I highly suspect he found a pathological case where failures were generating large diffs, which is avoidable, and also not representative of how people experience it. I will never know, even though I asked him for it. That’s all to say, I get how benchmarks that you think are misleading are annoying.

                                                                                                                                    I know I mentioned those benchmarks in dispute, and I didn’t look at them super closely. I’m not sure they’re completely unfair, though probably are focused on contrived/targeted performance aspects rather than real world ones. That’s not unusual for me to see in benchmarks, and I’d certainly be interested in more diverse benchmark examples. If my bringing up that benchmark touched a sore spot, apologies.

                                                                                                                                    As far as FIFO vs LRU, I think the case you describe with 1M unique entries possibly stemming from ids the LRU would probably perform worse because of tradeoffs. It’s hard to say, but at the very least I wouldn’t dismiss the idea of using a FIFO cache. These are also defaults, which can change based on analysis, like how Rails changed puma threadcounts. What makes the most sense for the most people is up for discussion.

                                                                                                                                    I think I agree that Phlex’s design fundamentally gives it different performance characteristics as ERB, and whats important to me is whether it’s fast enough in practice, and for me it is. Ruby’s value proposition isn’t absolute speed. Other languages blow it out of the water. It’s expressiveness and productivity.

                                                                                                                                    1. 1

                                                                                                                                      I think it’s wrong to dismiss partials as irrelevant for benchmarks

                                                                                                                                      That’s not what I did… I’m saying the benchmark had a ridiculous rendering / partial inclusion ratio, see: https://github.com/casperisfine/view-layer-benchmarks/commit/b0c7b7d65e5392b5f1d5e9d824540034460752c2

                                                                                                                                      That benchmark repo is not owned by Joel

                                                                                                                                      It was owned by him at the time, or at least he was maintaining an active fork of it, hence why the PR was opened there.

                                                                                                                                      the case you describe with 1M unique entries possibly stemming from ids the LRU would probably perform worse because of tradeoffs.

                                                                                                                                      In both cases they’ll only evict once full, so you gonna sit on multi hundreds MB of caches in each of your process, stressing the GC. Worse they’ll be promoted to the old generation, so you’ll need a major GC to trigger for that memory to be reclaimed. Can’t call that good quality however how you slice it, the FIFO vs LRU is really secondary, the mere existence of this cache is a problem.

                                                                                                                                      Ruby’s value proposition isn’t absolute speed.

                                                                                                                                      As I said, what bother me isn’t that it is slow, it’s that it’s slow while it was advertised as fast.

                                                                                                                                      And yes Ruby’s value isn’t absolute speed, yet Ruby is constantly attacked for its performance and me and my team are working hard to make performance gains on the entire stack from YJIT to Rails, so seeing that people are flocking to a much slower view layer while thinking they’re optimizing annoys me greatly.

                                                                                                                                      1. 1

                                                                                                                                        I did a small amount of finagling in order to get the benchmark you linked to with extra concatenation running. Here are the results on my computer.

                                                                                                                                        ➜  view-layer-benchmarks git:(b0c7b7d) ✗ be rake benchmark
                                                                                                                                        /Users/bradleyschaefer/.asdf/installs/ruby/3.3.1/bin/ruby ./benchmark.rb
                                                                                                                                        DEPRECATION WARNING: `Rails.application.secrets` is deprecated in favor of `Rails.application.credentials` and will be removed in Rails 7.2. (called from <class:TestApp> at ./benchmark.rb:18)
                                                                                                                                        ⚠️ [DEPRECATION] Defining the `template` method on a Phlex component will not be supported in Phlex 2.0. Please rename the method to `view_template` instead.
                                                                                                                                        ⚠️ [DEPRECATION] Defining the `template` method on a Phlex component will not be supported in Phlex 2.0. Please rename the method to `view_template` instead.
                                                                                                                                        Rendering 15025 bytes
                                                                                                                                        ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin23]
                                                                                                                                        Warming up --------------------------------------
                                                                                                                                              view_component   493.000 i/100ms
                                                                                                                                                    partials   305.000 i/100ms
                                                                                                                                                       cells   310.000 i/100ms
                                                                                                                                                       phlex   408.000 i/100ms
                                                                                                                                        Calculating -------------------------------------
                                                                                                                                              view_component      4.869k (± 1.6%) i/s -     48.807k in  10.026114s
                                                                                                                                                    partials      2.978k (± 1.4%) i/s -     29.890k in  10.038679s
                                                                                                                                                       cells      3.108k (± 1.5%) i/s -     31.310k in  10.075911s
                                                                                                                                                       phlex      4.183k (± 1.1%) i/s -     42.024k in  10.048360s
                                                                                                                                        
                                                                                                                                        Comparison:
                                                                                                                                              view_component:     4869.2 i/s
                                                                                                                                                       phlex:     4182.7 i/s - 1.16x  slower
                                                                                                                                                       cells:     3108.2 i/s - 1.57x  slower
                                                                                                                                                    partials:     2978.1 i/s - 1.63x  slower
                                                                                                                                        

                                                                                                                                        I commented out dry-view because it wasn’t working, and looks abandoned-ish anyway. Improving benchmarks is a good idea. Performance does change as you predicted. 10x isn’t sufficient to flip the benchmark into poor performance. Trying 100x as many concatenations, you do see a bigger difference

                                                                                                                                        ➜  view-layer-benchmarks git:(b0c7b7d) ✗ be rake benchmark
                                                                                                                                        /Users/bradleyschaefer/.asdf/installs/ruby/3.3.1/bin/ruby ./benchmark.rb
                                                                                                                                        DEPRECATION WARNING: `Rails.application.secrets` is deprecated in favor of `Rails.application.credentials` and will be removed in Rails 7.2. (called from <class:TestApp> at ./benchmark.rb:18)
                                                                                                                                        ⚠️ [DEPRECATION] Defining the `template` method on a Phlex component will not be supported in Phlex 2.0. Please rename the method to `view_template` instead.
                                                                                                                                        ⚠️ [DEPRECATION] Defining the `template` method on a Phlex component will not be supported in Phlex 2.0. Please rename the method to `view_template` instead.
                                                                                                                                        Rendering 150025 bytes
                                                                                                                                        ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin23]
                                                                                                                                        Warming up --------------------------------------
                                                                                                                                              view_component   103.000 i/100ms
                                                                                                                                                    partials    90.000 i/100ms
                                                                                                                                                       cells    71.000 i/100ms
                                                                                                                                                       phlex    45.000 i/100ms
                                                                                                                                        Calculating -------------------------------------
                                                                                                                                              view_component      1.025k (± 2.6%) i/s -     10.300k in  10.058886s
                                                                                                                                                    partials    902.426 (± 2.0%) i/s -      9.090k in  10.076938s
                                                                                                                                                       cells    723.895 (± 2.6%) i/s -      7.242k in  10.011378s
                                                                                                                                                       phlex    475.138 (± 3.4%) i/s -      4.770k in  10.051123s
                                                                                                                                        
                                                                                                                                        Comparison:
                                                                                                                                              view_component:     1024.7 i/s
                                                                                                                                                    partials:      902.4 i/s - 1.14x  slower
                                                                                                                                                       cells:      723.9 i/s - 1.42x  slower
                                                                                                                                                       phlex:      475.1 i/s - 2.16x  slower
                                                                                                                                        

                                                                                                                                        Using the numbers from the 100x concatenations, Phlex takes roughly 0.002s rendering, and ERB partials 0.001s . I’d love to see benchmarks over less-contrived examples, but there you have it. FWIW ViewComponent claims to be 10x faster on their website :D

                                                                                                                                        1. 1

                                                                                                                                          Rendering 15025 bytes

                                                                                                                                          My PR bumped it to 25k, why did you reduce it to 15k? What else did you modify?

                                                                                                                                          Also the numbers on my machine are noticeably different: https://github.com/KonnorRogers/view-layer-benchmarks/commit/a378799762ebedc4c4b456fdd9ededd3f32b68ba

                                                                                                                                          In passing, the golden rule of benchmarking is that benchmark results shared without the code used to produce them is always bullshit…

                                                                                                                                          So yeah, Phlex is still 2x slower on what still very much a micro benchmark. Don’t remember what the difference was at the time, maybe it progressed.

                                                                                                                                          Phlex takes roughly 0.002s rendering, and ERB partials 0.001s

                                                                                                                                          Yes, because it’s a micro-benchmark, real world template do much more work than that, and it’s not rare to spend upwards of 10 or 20ms rendering ERB in production, this is a meaningful amount of time when you are trying to fit in some SLO.

                                                                                                                                            1. 1

                                                                                                                                              Ah you based yourself off the very first commit of the branch, not the tip which also included https://github.com/KonnorRogers/view-layer-benchmarks/commit/2697cedc0e39737ecdc5c43ac4ccfdb5ae4b970d, specifically to outweigh that attribute cache.

                                                                                                                                              And really, that PR was just the minimal amount of changes so Joel would stop wrongfully bashing Action View on Twitter. There is still many biases etc, and it’s still far from something one would consider a realistic benchmark.

                                                                                                                                              In just 5 minutes of perfectly legitimate changes (more diverse tags, more attributes, more interpolated values), without changing the output size I can deepen the gap further: https://github.com/KonnorRogers/view-layer-benchmarks/commit/80a3ffd90d5d08b0361a875dfd7d01517476039a

                                                                                                                                              1. 1

                                                                                                                                                I don’t agree with the benchmark changes to outweigh the attribute cache as being representative. That almost ensures cache misses by using rand, which I would not expect to be the real-world experience of any small/medium app.

                                                                                                                                                Large apps may be closer to that. Most of us are not Shopify, though it’s a use case that is worth considering as a separate benchmark.

                                                                                                                                                It’s valid to have a variety of benchmarks to understand behavior under different conditions. I can see where you’re coming from in terms of the contrivance of these being misleading without including other ones that show tradeoffs rather than declaring victory.

                                                                                                                                                From the repo’s readme

                                                                                                                                                Benchmarks arent representative of real life and just render nested components / partials. Take all numbers with a grain of salt.

                                                                                                                                                I came in with the statement that Phlex outperforms Rails erb partials and others based on that link. I take your point and won’t continue spreading that.

                                                                                                                                                Performance optimization never been the main thing about Phlex for me, and I think we ended up talking about it because several quality judgements were around optimizations. I think Phlex has a lot of optimization opportunities, and it’s also very usable as it is today. I don’t mark it as poor quality for where it is today.

                                                                                                                                                1. 2

                                                                                                                                                  not locking on read means bad time on JRuby / TruffleRuby

                                                                                                                                                  What kind of bad time are we talking about? My understanding is that you could either get the old value or the new value because reading a value from a Hash is atomic. See https://github.com/jruby/jruby/wiki/Concurrency-in-jruby

                                                                                                                                                  The worst case here is two threads end up calculating the same value and writing the same thing to the key. Not a problem.

                                                                                                                                                  All this stuff about LRU caches is complete nonsense. LRU caches need to write on every read. They are significantly slower.

                                                                                                                                                  In my experience about 99% of HTML attributes are completely static. Sometimes you get the odd unique id or value on a form input.

                                                                                                                                                  The cache size was temporarily hard-coded to 1_000_000 in main, but that was never shipped. This was temporary while we figured out how to auto-size the cache in a reasonable way so we didn’t have to introduce a config.

                                                                                                                                                  FIFO is so much faster, and keys will be hit thousands of times before they are expire and need to be recalculated. Our FIFO cache now also has an upper limit for individual values in the cache. This is so a particularly large attribute — such as serialised JSON — won’t have to evict hundreds of other keys. Large JSON strings are actually pretty fast on their own so there’s little to no advantage to caching them.

                                                                                                                  1. 1

                                                                                                                    I don’t understand the issue with the (production/personal/non-test) database being ephemeral. The entire point of a database is to have long term storage, not to be ephemeral.

                                                                                                                    Specifically, what does the author mean when they say:

                                                                                                                    We could setup a separate, more permanent Postgres instance to use for my personal bookmarks but that comes with its own set of issues.

                                                                                                                    1. 1

                                                                                                                      Administering and maintaining a permanent Postgres instance does come with its own issues. Keeping dev and prod deployment strategies aligned has benefits, even if it means using Postgres instances that get torn down on deploy or whatever.

                                                                                                                    2. 7

                                                                                                                      Having used Gerrit, what is different about stacking compared to having atomic commits and reviewing them commit-by-commit? I’ll admit that most non-Gerrit forges are more-or-less bad at making individual commit reviews work (GitHub is especially bad, I remember being pleasantly surprised at GitLab having a similar “compare to revision X” button).

                                                                                                                      Is it just that GitHub and GitHub-likes are so bad at the “review each commit individually” workflow that people need to write tools on top of that? I honestly feel like I’m missing something, as the way I currently structure commits and branches doesn’t feel any different from what’s described here.

                                                                                                                      1. 9

                                                                                                                        You’re not missing anything. They really are that bad. The PR is the top level “unit of review” and you need a few clicks to step through individual commits.

                                                                                                                        I’ve never used Gerrit, but I’ve landed on plenty of diffs there and the thing that stands out to me is the sheer information density of a proposed change compared to a PR. It’s a power tool, clearly.

                                                                                                                        What else distinguishes it?

                                                                                                                        1. 10

                                                                                                                          A few things I’d highlight about Gerrit’s UI/UX:

                                                                                                                          • Patch based workflow, similar in spirit to the email workflow.
                                                                                                                          • Information dense/simple UI.
                                                                                                                          • Being able to tell “what changed since the last review”.
                                                                                                                          • Being able to review the commit message.
                                                                                                                          • Patch approval can be customized with fine grained rules. It used to be programmable in Prolog, but I think that’s been deprecated? For example, you can allow trivial rebases without losing +2 votes (merge approval).
                                                                                                                          1. 6

                                                                                                                            Also patches in a stack can be merged all at once, individually, or any subset of the history. This is a game changer in my book. A medium feature could be a 10 commit PR, and if there’s an objection to any part of that, the entire PR is blocked.

                                                                                                                            In Gerrit, if the first 6 commits are good and done but the next 4 still need some work, you can trivially just merge the first 4 and keep working on the rest! This makes it so much easier to collaborate on WIP code

                                                                                                                            1. 2

                                                                                                                              That’s awesome, having a several-commit PR stall because someone doesn’t like commit 5 of 7 is super annoying

                                                                                                                              1. 2

                                                                                                                                I am sorry, I feel like the buzzkill here and I am surely missing something, but I don’t get why this seems to be praised as a unique feature of stacks.

                                                                                                                                Git already allows you to apply individual commits (and commit ranges) from a branch to any other branch. This enables you to apply parts of branches to your main development branch. And since GitHub PRs are simply web-exposed branches, the same can be done with GitHub PRs. In fact, I’ve seen it quite a few times where individual commits from a PR where cherry-picked into the main branch.

                                                                                                                                1. 3

                                                                                                                                  Of course! But in Gerrit you can merge a bunch of commits and simultaneously remove them from the stack, without messing up any review comments, and if the stack up to date with main it doesn’t even have to change the commit hashes, unlike a cherry pick.

                                                                                                                                  Yes, you can do apply subsets of a PR in Github, but Gerrit makes doing so a first class citizen. There’s no overhead in it at all, so you can continually roll a stack, merging earlier finished commits while you add new WIP ones to the top. In Github this would either mean creating new PRs, or removing commits from the PR after they’ve been merged, which will not be even remotely clear in Github’s view of the PR history.

                                                                                                                          2. 8

                                                                                                                            Is it just that GitHub and GitHub-likes are so bad at the “review each commit individually” workflow that people need to write tools on top of that?

                                                                                                                            Yes. It’s not a supported workflow, so it doesn’t work if you try to do it. It can work with a bunch of effort to manually do what gerrit does already, but doing that sucks, hence the tooling to make it not suck.

                                                                                                                            “the unit of review is a commit” vs “the unit of review is a branch” are just fundamentally different models.

                                                                                                                            1. 1

                                                                                                                              I often work on stacks of dependent changes like this:

                                                                                                                              1. Refactor/cleanups
                                                                                                                              2. Implement new feature
                                                                                                                              3. Implement other feature

                                                                                                                              I started doing (2), then pulled out (1) into its own change, then worked on (3), which depends on (2) or will have conflicts if I don’t stack it. I send them all out for code review (possibly by different people) and address feedback on them all individually. This would be difficult to do with 3 commits in a single PR, since I would have to keep force-pushing amended commits to keep the 1-2-3 sequence, and I’d lose the ability to see older versions of those commits. With a stacked change, e.g. (1), I can see each version of (1) as I amended it to address feedback (Gerrit calls them “patch sets”), with all comments preserved.

                                                                                                                            2. 3

                                                                                                                              It’s interesting that you’ve chosen to demo this with the markdown treesitter parser, and carefully skip over the contortions required due to their decision to implement it as two parsers (block and inline). I mean it makes sense you don’t complicate your examples with that, since it would detract from your actual program.

                                                                                                                              I recently did my first foray into TreeSitter (via a neovim plugin I’m writing) and I almost immediately tripped over the block-versus-inline parser problem.

                                                                                                                              There is at least one other markdown treesitter parser, which has a unified parser, but I have not tried it (and it looks like the two-phase one is the blessed one)

                                                                                                                              1. 4

                                                                                                                                Yeah TreeSitter really isn’t good for many languages, because it’s model is context-free grammars + ad hoc lexer tricks.

                                                                                                                                So it’s not surprising to me at all that it’s hard to write Markdown parser within treesitter’s model

                                                                                                                                I have looked at the treesitter-bash plugin, and it has similar issues, though not 2 complete parsers. It’s just very ad hoc without much notion of correctness – it’s 200 or 1,000 bugs fixed over ~7 years


                                                                                                                                IMO the underlying conflict is the “lexer modes” problem, or the HTML/JS/CSS “interleaved languages” problem. Shell is kind of like HTML/JS/CSS – many languages in one file – and I’m not sure that you can implement HTML/JS/CSS in any reasonable way in Treesitter either.


                                                                                                                                I wonder if anyone has defined or written some kind of “Markdown subset” that’s easier to parse?

                                                                                                                                I have like 9+ years of Markdown files to test on … I kinda want to “discover” the subset I use. It’s definitely not all the fiddly rules

                                                                                                                                I read over

                                                                                                                                but I don’t like that they got rid of indented code blocks in favor of backtick code blocks. There is a principled reason for it – the “composability rule” – but it also violates the “prime directive” of the text being readable without rendering.

                                                                                                                                1. 2

                                                                                                                                  IMO the underlying conflict is the “lexer modes” problem, or the HTML/JS/CSS “interleaved languages” problem. Shell is kind of like HTML/JS/CSS – many languages in one file – and I’m not sure that you can implement HTML/JS/CSS in any reasonable way in Treesitter either.

                                                                                                                                  I think this has worked fine in my experience. You can “inject” languages in other languages, which works really well with HTML/JS/CSS.


                                                                                                                                  it also violates the “prime directive” of the text being readable without rendering.

                                                                                                                                  I have the exact opposite opnion: I find triple-backticked code blocks much more readable.

                                                                                                                                  1. 1

                                                                                                                                    I think this has worked fine in my experience. You can “inject” languages in other languages, which works really well with HTML/JS/CSS.

                                                                                                                                    Hm are there some examples of that? which applications do it?

                                                                                                                                    1. 1

                                                                                                                                      That’s the default experience in Neovim with nvim-treesitter.

                                                                                                                                      1. 1

                                                                                                                                        OK hm, then I guess a possible solution is to write 4-5 treesitter plugins for shell / YSH :-/

                                                                                                                                        I hadn’t really considered that! Probably because treesitter-bash is 1 grammar, and it was actually started by the author of treesitter …


                                                                                                                                        Thinking about it more, there might be a little difference in that shell is arbitrarily interleaved, but HTML/CSS/JS are more limited:

                                                                                                                                        • HTML contains CSS and JavaScript - <style> <script>
                                                                                                                                        • CSS doesn’t really contain HTML? e.g. a <style> tag
                                                                                                                                        • JavaScript doesn’t contain (unquoted) HTML either, unless you’re using JSX literals, in which case you can have JSX literals that also contain CSS and JavaScript

                                                                                                                                        So it is a similar problem, but not identical …

                                                                                                                                        I guess it depends on how well nvim-treesitter handles HTML containing JS containing JSX containing JS, etc.

                                                                                                                                        If the “injection” can be recursive as well as non-recursive

                                                                                                                                        1. 2

                                                                                                                                          I replied below woth some details about injections.

                                                                                                                                          Injections can arbitrarily nest. The big restrictions is that they need to be proper syntax tree nodes.

                                                                                                                                          That mean that the “parent” language needs to find the end of the injected language on its own (really a tree sitter parser is not aware of the ocntext of inections). That usually works quite well (script tags are easy to scan for, for example) but I am not sure if that applies to your case.

                                                                                                                                          1. 2

                                                                                                                                            Yep, whether Tree-sitter supports indefinite recursion of grammars is the remaining question. Tree-sitter’s language injection docs don’t seem to address that question, so you might need to experiment.

                                                                                                                                            1. 4

                                                                                                                                              Tree sitter injections do not exist in tree-sitter the library as a concept. You can only limit the parser to a list of ranges that will be included.

                                                                                                                                              The concept of I sections is built kn top oft hat by each editor on their own. In theory there is the official TS highlighter in rust but it’s fragile key not very good.

                                                                                                                                              In helix we started out endorsing this highlighter and had to heavily modify it to make incremental parsing work acceptable well. The TS highlighter is alsoentirely overcomplicated, slow and buggy. We are currently rewriting the highlighter entirely for helix.

                                                                                                                                              Nvim has their own highlighter and doesn’t use the highlighter at all.

                                                                                                                                              So really whether infinite recursion is supported depends on your editor in question. But I think for all real world implementation the answer is yes theoretically but it requires an infection in the root node of the grammar. Nobody would practically write an injection query like that where you get truely infinite recursion.

                                                                                                                                              But maybe what you mean whether for example rust -> markdown -> rust -> markdown -> rust works.

                                                                                                                                              The answer to that is yes (in helix and nvim atleast)

                                                                                                                                              1. 1

                                                                                                                                                Hm what is the “official TS highlighter in Rust” ? You mean a Treesitter highlighter FOR Rust, or something else?

                                                                                                                                                Do you have a link to the repo?

                                                                                                                                                In any case, I am not surprised there is an overcomplicated / slow / buggy TreeSitter highlighter, and that it takes multiple rewrites to get it right …

                                                                                                                                                The metalanguage does not make it easy to express many languages IMO. They basically give you context-free grammars plus a few lexer tricks, and IMO that’s not enough


                                                                                                                                                I’d be interested to see how the rust -> markdown -> rust -> markdown works in Helix / nvim

                                                                                                                                                1. 1

                                                                                                                                                  No I am not talking about a parser.

                                                                                                                                                  What I meant was the crate that wraps the core tree sitter library and provides an actual stream of syntax highlighting spans for a given file. It’s part of the main tree sitter repo. This is not generated code.

                                                                                                                                                  https://github.com/tree-sitter/tree-sitter/blob/master/highlight/src/lib.rs

                                                                                                                                                  Many concept in tree sitter that are considered part of tree-sitter are really not defined in the library at all but this highlighter implementation (including injections). Nvim has it’s own highlighter that works quite different. That also means that there are divergences in the query format across editor. We try to coordinate with nvim to keep things somewhat consistent.

                                                                                                                                                  Making the highlighter practical for an editor requires writing your own highlighter (since its not usable out of the box). In our case it’s an adaption of the upstream TS highlighter (not my crime) which made the code even worse and harder to maintain so this is what we have in helix right now:

                                                                                                                                                  https://github.com/helix-editor/helix/blob/master/helix-core/src/syntax.rs

                                                                                                                                                  Nvim has it’s own highlighter that works quite different. I am doing a full rewrite of our highlighter.

                                                                                                                                                  The rust -> markdown-> rust -> markdown case is not merged into helix yet because it revealed bugs in the highlighter that I am fixing with this rewrite (it’s a bit of special case).

                                                                                                                                                  The injections are implememted kn this PR https://github.com/helix-editor/helix/pull/9695/files. There is some noise in there since that PR needed to update some other things. The only important change is the added injection capture in injections.scm. This instruct the highlighter to parse syntax notes that match this pattern as markdown (in this case doc comments).

                                                                                                                                                  The multiple nestong is simply achoeby then having rust code Blocks inside markdown where rust gets injected that then again has markdown comments. The injection is really the same as running the TS markdown parser on a markdown file. Nothing special is needed to make this nested case work.

                                                                                                                                                  Another interesting feature is that you can enable the injection.combined flag. This makes tree sitter parse all syntax nodes that match the injection pattern as a single markdown file instead of treating each seperste node as a seperate file. There are some creativr users of that feature. For example in this case it’s necessary since each doc comment line has a /// at the start of the line. Only the comment-body can be parser as markdown and needs to be combined across multiple lines.

                                                                                                                                                  1. 1

                                                                                                                                                    OK I understand now! Yeah this is a hairy problem

                                                                                                                                                    Concretely I am thinking about how to write CORRECT highlighters for https://www.oilshell.org/

                                                                                                                                                    • bash/OSH
                                                                                                                                                      • Vim syntax highlighting is impressive for what it is, but you can break it
                                                                                                                                                      • the tree-sitter-bash plugin seems to have a whole bunch of bugs, as mentioned. Also there’s way too much C there
                                                                                                                                                    • YSH
                                                                                                                                                      • actually I want to DESIGN the language so it’s easy to highlight. Rust -> Markdown -> Rust is not easy to highlight!

                                                                                                                                                    So yeah this info is useful … It does make sense that the “injections” are not in TreeSitter, but in the highlighter library / glue that goes with the editor

                                                                                                                                                    And it is interesting that different editors do different things, AND there are options!

                                                                                                                                                    Oof

                                                                                                                                                    I always wanted to make a better metalanguage for parsing and syntax highlighting based on my shell experiences, which can cleanly highlight shell, and HTML/CSS/JS, and nested markdown, etc.

                                                                                                                                                    Not sure what that would look like, but it’s an interesting problem

                                                                                                                                              2. 1

                                                                                                                                                Yeah, I think we may pursue the strategy of YSH-only syntax highlighters.

                                                                                                                                                YSH has a much simpler syntax than bash/OSH – it’s similar to Python with the addition of unquoted words. It’s basically a Command/Word/Expression language.

                                                                                                                                                So the “syntax highlighting problem” and the pretty printing problem has actually led me to be much more strict about what’s “allowed” in YSH!

                                                                                                                                                Correct shell syntax highlighting under treesitter almost seems like a lost cause …

                                                                                                                                                Vim syntax also has big problems, though I find them very useful in practice. So not picking on treesitter too much specifically, but many people don’t seem to be aware of its limitations …


                                                                                                                                                Also I have a separate “coarse lexing/parsing” idea that I think could “fit real languages” more so than Vim/Treesitter … though there are certainly some unresolved issues to work out / implement

                                                                                                                                    2. 1

                                                                                                                                      IIRC: NeoVim (or rather, nvim-treesitter) used to use a unified parser, which led to many conflicts in the grammar that are very hard to fix properly.

                                                                                                                                      1. 1

                                                                                                                                        I presume they decided the drawbacks of the older one outweigh those of the newer one. I have no direct experience of the older one: I stumbled across it during research but haven’t tried using it. (Since what I want to do is write a neovim plugin and since 0.10.0 neovim bundles the newer one my hands are tied anyway)

                                                                                                                                      2. 1

                                                                                                                                        It’s interesting that you’ve chosen to demo this with the markdown treesitter parser, and carefully skip over the contortions required due to their decision to implement it as two parsers (block and inline). I mean it makes sense you don’t complicate your examples with that, since it would detract from your actual program.

                                                                                                                                        i don’t know what this refers to. i am not well versed with the parser implementation in tree-sitter-markdown.

                                                                                                                                        1. 2

                                                                                                                                          Because there are two parsers, you have artificial (and opaque) “inline” objects in the outer block parser. Since you’re just fetching their text value in your example, that doesn’t matter; but in a markdown document the stuff represented by the inline objects has further structure (e.g., an emphasised section, hyperlinks, etc)

                                                                                                                                          To get tree sitter objects for the inline bits, one needs to re-parse the fragment represented by the inline node from the block parser, this time with the markdown_inline parser.

                                                                                                                                          Depending on how you interface with the parser(s), this might not be a big deal. I believe the parsers provide a unified api (distinct from the treesitter one) to smooth this over. But in some contexts, such as (as far as I can see) from the other side of neovim, it’s not possible to eg write a treesitter query that will match a mixture of block and inline parser objects.

                                                                                                                                          1. 2

                                                                                                                                            understood. from my understanding, the treesitter term for this operation is “grammar injection”, where a node is reparsed with a separate parser.

                                                                                                                                            but as i have mentioned elsewhere in the comments, this is not relevant to tbsp. if the underlying parser provides a neat CST, the corresponding tbsp program is written neatly. if the underlying tree-sitter grammar is a flat list of nodes (as is the case with, say, the community maintained tree-sitter-cobol), tbsp can only be so powerful.

                                                                                                                                      3. 3

                                                                                                                                        Chromebooks and Chromeboxes are stupid fast. From power button to login screen it’s less than the time to open Chrome on many computers.

                                                                                                                                        Now, once I log in, and restore the 300 applications (tabs) I have restore on every boot…. That takes a hot minute, but less time than loading 300 applications on anything else in the deep past of computing.

                                                                                                                                        I think the biggest problem is the input latency for me. Bluetooth to a election terminal to a remote host that my company “security” software is mitm’ing. Now that’s some end to end latency right there.

                                                                                                                                        1. 1

                                                                                                                                          Having probably used the exact same setup for over a year now, I can say that the only time I notice the lag is when the wireless network acts up in the office and the ping latency increases (about once a month, for an hour or so). If using mosh were more compatible with the agent forwarding etc… I would probably never notice anything.