Threads for knl

    1. 8

      This feels like a good chance to ask: has anybody else been completely unable to use DTrace (or anything built on DTrace such as dtruss) on M2 Pro/Max? I’m surprised that nobody seems to be talking abobut it anywhere, but these (in my opinion) rather critical tools appear completely broken. Running any dtrace program whatsoever instantly causes my entire system to freeze until I forcibly power it off. There’s a thread about it here where an Apple developer replied, but they seem have been wrong about the scope of the issue, which a number of people pointed out in the comments, but there hasn’t been an update in quite a while.

      At least it led to me discovering the wonderful world of eBPF on Linux.

      1. 7

        Yup, the whole dtrace is broken since the M hardware. I reported it as FB12061147 3 years ago and it basically just got a “yes, we know” comment. I have no idea how Apple devs can internally debug anything without it. There’s just no way of doing systemwide syscall profiles anymore as far as I can tell.

        1. 3

          I find it amusing that most of the devs I know use their macbooks to develop on a remote machine, or for Linux, or for VM powered languages. Almost no one develops directly on a macbook for darwin, as debugging is impossible (at least for standard unix tools). Dtrace has not been working for more than a decade, requiring one to disable SIP and other inconveniences. Logging became so convoluted as well.

          1. 2

            Yup, it really sucks (M1 Max for me). I find Instruments kind of confusing, but it at least doesn’t completely hang my system if I’ve done the incredibly uncommon operation of putting my system to sleep even once since boot.

            1. 2

              Yeah it’s not worked for me at all.

              Real (native) development is ignored on macOS. They’re always breaking stuff in the toolchain.

            2. 3

              Is it possible to use Tailwind without a build system of any kind? I see some utility in using it to make simple websites, but to keep them simple, I would prefer just to have html/css.

              1. 3

                I’ve not used tailwind, but https://tachyons.io/ this way (I guess tachyons is like a lightweight tailwind)

              2. 13

                I have read some of the discussions here. You ask pushcx to respect the decisions made by the Brave owners, don’t you respect the decisions made by this site owner? He can block whatever he wants, it’s his site in the end, and he has given enough explanation on the why. You are under no obligation to participate, are you paying for the service somehow or feel entitled to a certain SLA? It could be as well a discussion about off-topic blocks, and end up in the same.

                1. 3

                  I respect this but I would have preferred to know upfront and have time to find alternatives or mitigation rather than being forced all a sudden to change my tools and habits to please site-owner mood. I’m yet to find other popular websites or privacy/freedom-defenders that block brave. I would not pay for lobste.rs indeed anymore because they prefer to make their content not accessible for everyone. at least there are 32bit.cafe, HN, reddit, etc. I can survive without a lobste.rs and will likely close my profile soon like many users.

                  1. 3

                    whaaaat? Where do you pay for lobste.rs, I’d like to do the same! :)

                2. 2

                  Pretty cool! It will certainly make sharing easier. Do these OCI images get “extracted” into zfs filesystems?

                  1. 3

                    Podman has a ZFS snapshotter. This implements the OCI snapshot model by starting from an empty dataset, extracting the base layer, snapshotting it, and then for each subsequent layer, cloning the snapshot of the layer below and extracting the new layer on top, then snapshotting it. When you create a container instance, you get a new clone of the image as a read-write FS.

                  2. 3

                    For those who, like me, didn’t get it by just reading the abstract: Basically, the assumption is that the last three bits of the mantissa are almost always 000. This covers a lot of values, including all integer valued floats below 2^50 + 1. If we assign the tag 000 for unboxed floats, then all these numbers already have the correct tag right within their representation, without any additional bits or allocations. What do we do for all the other floats? Well, those have to be boxed, i.e. allocated on the heap and accessed via pointer.

                    From the paper:

                    This section describes self-tagging, a new tagging technique that exploits the fact that some values naturally contain the appropriate tag corresponding to their type, at the correct location, in their bit arrangement.

                    Self-tagging exploits such occurrences where the tag bits of a pointer appear in its value. Such objects can be unboxed, making them tagged values instead of heap allocated value. However, since only 1/8 of all floats can be unboxed in such a way, a second tag must be reserved for the remaining floats, which still need to be represented as heap allocated values with either tagged or generic pointers.

                    1. 2

                      I’m wondering if this could be made dynamic. I’m dealing with integers representing time in nanoseconds from epoch and I want those in to be fast. I don’t care about floats at all. Could language adopt to the domain quickly?

                      1. 1

                        I think this is maybe feasible with a JIT, but even then likely not worth it.

                        1. This would be very costly for the interpreter as instead of an instruction or two you need to read config out of memory and make a decision based on both the “pointer” and the config.
                        2. The config itself for what tags are in use needs to be computed.
                        3. Changing the config would likely be necessary (it takes some time to figure out what the best tag patterns are) and would be difficult (basically equivalent to a precise moving GC I think, you need to rewrite all existing pointers)
                    2. 6

                      I run 20+ containers on my home lab server. It used to be docker-compose, but it was never working properly. I had to ssh to restart a container once a month or something.

                      Nowadays it is systemd+podman. It took me a while to find a good unit file template, but I never had any problems with it since. The best part is that I can put mount dependencies in my unit files and things still work reliably if zfs pool failed to mount after power crash (the only real source of instability I’m having).

                      1. 2

                        I do this as well, it’s been a very robust and low-maintenance solution for several years. I generate systemd service files with podman generate systemd. I also enable the podman-auto-update service which keeps my containers (marked with the io.containers.autoupdate label) up-to-date automatically.

                        On Debian all of this is available by simply installing podman from the standard repositories.

                        I might check out Quadlet though, only because I find the generated service files rather verbose.

                        1. 1

                          I didn’t know about podman-auto-update, extremely useful. I prefer not to roll most of my containers but I can see a use it for few.

                          I was considering Quadlet/systemd templates but decided to standardize my infra around Apple pkl. Will see…

                        2. 1

                          I’ve been also looking for a good unit file template! Could you give any pointers or a link to it?

                            1. 4

                              statically generating systemd unit files is deprecated:

                              DEPRECATED: Note: podman generate systemd is deprecated. We recommend using Quadlet files when running Podman containers or pods under systemd. – https://docs.podman.io/en/latest/markdown/podman-generate-systemd.1.html

                              dynamically generating systemd unit files is where it’s at these days. drop some declarative (no tedious ExecStart=podman … anymore!) .container files in /etc/containers/systemd/ and it’s good to go. this approach was named ‘quadlet’ before it became part of podman proper; details in docs

                        3. 3

                          I posted this as I’m curious if the results hold in practice. Most of the pandas code I’ve seen is assignment heavy, with modifications in place. Yet, the benchmarks exercise small amout of operations. If anyone tried this in practice, what are the improvements you see? I’m quite happy with polars, API is way nicer than pandas, speed aside.

                          1. 4

                            “x% faster” or “n times faster” always makes me question “faster for what?”.

                            1. 2

                              I don’t trust their benchmarks. I ran their benchmarks source locally on my machine TPCH scale 10. Polars was orders of magnitudes faster and didn’t SIGABORT at query 10 (I wasn’t OOM).

                              (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[SIGINT] $ SCALE_FACTOR=10.0 make run-polars
                              .venv/bin/python -m queries.polars
                              {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
                              Code block 'Run polars query 1' took: 1.47103 s
                              Code block 'Run polars query 2' took: 0.09870 s
                              Code block 'Run polars query 3' took: 0.53556 s
                              Code block 'Run polars query 4' took: 0.38394 s
                              Code block 'Run polars query 5' took: 0.69058 s
                              Code block 'Run polars query 6' took: 0.25951 s
                              Code block 'Run polars query 7' took: 0.79158 s
                              Code block 'Run polars query 8' took: 0.82241 s
                              Code block 'Run polars query 9' took: 1.67873 s
                              Code block 'Run polars query 10' took: 0.74836 s
                              Code block 'Run polars query 11' took: 0.18197 s
                              Code block 'Run polars query 12' took: 0.63084 s
                              Code block 'Run polars query 13' took: 1.26718 s
                              Code block 'Run polars query 14' took: 0.94258 s
                              Code block 'Run polars query 15' took: 0.97508 s
                              Code block 'Run polars query 16' took: 0.25226 s
                              Code block 'Run polars query 17' took: 2.21445 s
                              Code block 'Run polars query 18' took: 3.67558 s
                              Code block 'Run polars query 19' took: 1.77616 s
                              Code block 'Run polars query 20' took: 1.96116 s
                              Code block 'Run polars query 21' took: 6.76098 s
                              Code block 'Run polars query 22' took: 0.32596 s
                              Code block 'Overall execution of ALL polars queries' took: 34.74840 s
                              (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch$ SCALE_FACTOR=10.0 make run-fireducks
                              .venv/bin/python -m queries.fireducks
                              {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
                              Code block 'Run fireducks query 1' took: 5.35801 s
                              Code block 'Run fireducks query 2' took: 8.51291 s
                              Code block 'Run fireducks query 3' took: 7.04319 s
                              Code block 'Run fireducks query 4' took: 19.60374 s
                              Code block 'Run fireducks query 5' took: 28.53868 s
                              Code block 'Run fireducks query 6' took: 4.86551 s
                              Code block 'Run fireducks query 7' took: 28.03717 s
                              Code block 'Run fireducks query 8' took: 52.17197 s
                              Code block 'Run fireducks query 9' took: 58.59863 s
                              terminate called after throwing an instance of 'std::length_error'
                                what():  vector::_M_default_append
                              Code block 'Overall execution of ALL fireducks queries' took: 249.06256 s
                              Traceback (most recent call last):
                                File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
                                  return _run_code(code, main_globals, None,
                                File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
                                  exec(code, run_globals)
                                File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 39, in <module>
                                  execute_all("fireducks")
                                File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 22, in execute_all
                                  run(
                                File "/home/ritchie46/miniconda3/lib/python3.10/subprocess.py", line 526, in run
                                  raise CalledProcessError(retcode, process.args,
                              subprocess.CalledProcessError: Command '['/home/ritchie46/Downloads/deleteme/polars-tpch/.venv/bin/python', '-m', 'fireducks.imhook', 'queries/fireducks/q10.py']' died with <Signals.SIGABRT: 6>.
                              
                            2. 2

                              How would I apply that to a Moonlander?

                              1. 1

                                I mapped one key to NumLock (as I never used that key) and then in Karabiner I have the following:

                                "simple_modifications": [
                                    {
                                        "from": { "key_code": "keypad_num_lock" },
                                        "to": [{ "apple_vendor_keyboard_key_code": "function" }]
                                    }
                                ]
                                
                                1. 1

                                  I think you would have to build its firmware in the QMK repo like in the article, with the snippets added. I don’t think its currently possible to achieve this in the Oryx GUI.

                                  1. 1

                                    Ok, so this is actually easy: in Oryx GUI, there’s “Download Source” link, which contains the relevant files and instructions how to build the layout. So the process above works for Moonlander. Well, almost. Globe+Q and friends work, Ctrl+Globe+F/C work, but the tiling shortcuts with arrows don’t :-(

                                  2. 3

                                    With jj, it seems like you constantly have to shuffle around these logical commit IDs that have no meaning at all.

                                    If I rebase a stacked series of branches with --update-refs, I just identify the fixup line, move it up after it’s corresponding commit (identified by its description, of course), and change it to f for fixup. Because jj doesn’t have interactive rebasing, it seems like you can’t do this?

                                    The interactive rebase is like a GUI, it shows me things I don’t need all the time. Jujutsu seems like it insists on me shuffling around its logical IDs. But I’d rather just have my own good names, i.e. branch names.

                                    1. 6

                                      You may’ve already, but if not I’d say give it a legit try. Maybe for a few weeks.

                                      I’ve fully switched over at this point. And, FWIW (n=1, here), I don’t feel like I”m “constantly having to shuffle around” change IDs.

                                      If I wanted, I could use bookmark (ie branch) names. But, really, for short-lived branches (which is nearly all I need), it just feels unnecessary in jj.

                                      The workflow, once I broke free from the git garbage (poison, one might argue) in my muscle memory, feels streamlined and productive and simpler, while remaining at least as functional.

                                      1. 2

                                        I gave jj a try and found it really nice to use on clean, well established projects. But on my personal, experimental projects it was quite a hassle. I don’t think it’s fault of jj, as it was not built with such use cases in mind: namely, a lot of uncommitted files, half written tests that will eventually be merged, code that is used in the repo, but must never be committed to the repo, and so on.

                                        1. 4

                                          I do similar stuff by using a squash workflow: I have a “local” change that has a lot of random stuff I don’t really want to push, and run a lot of code from there. When I’m ready to push, I create a change before the local, and squash the things I actually want to add to the repo into that change, and push that as a PR.

                                          The benefit of this over just untracked files is that I still get the revision history, so if I end up wanting to revert one of the changes to my local files it’s easy to do so.

                                          1. 3

                                            Sounds like a nice workflow.

                                            But the split command needed to selectively take pieces of a change is IMO such a subpar experience compared to git add -p, at least out of the box, where it ended up launching the extremely clunky (for me) emacs-ediff.

                                            Would be great if something incremental like git add -p eventually landed in jj.

                                            1. 3

                                              I don’t use split, because I don’t love the UI. Instead, I use squash: I create an empty Change before the one I’m working on, and squash the files I want into the (new) parent. And jj squash --interactive lets me select specific sections of files if I need to.

                                              Definitely agree it would be nice to have something with the ui of git add -p

                                              1. 4

                                                If you have specific feedback about how the built-in jj split UI could be improved vs git add -p, you could leave it in Issues or Discussions for the scm-record repo.

                                                1. 2

                                                  Speaking personally, I think you already captured some of my issues in your README (discoverability, predictability). But more than that, it’s the fact that, being used to git add -p, I want something like that, instead of a worse version of magit (worse mainly because of the different key-bindings compared to my editor, I must say).

                                                  Just my 2 cents :)

                                          2. 3

                                            You can disable auto add of new files as of a few weeks.

                                            1. 1

                                              Specifically, it looks like the way to disable auto-track is to configure snapshot.auto-track to be none(). Then you would use jj file track to track files.

                                              If you only want to ignore files in certain directories or files with predictable names, you can use a .gitignore file instead.

                                          3. 2

                                            Out of curiosity, in your example you move a fixup to the proper place and change the type to f for fixup. Are you aware of git rebase -i --autosquash which does it automatically or am I missing something? Of course, with the interactive rebase you can do much more than that but I was wondering.

                                            1. 2

                                              I know about it but it’s not a major burden so I haven’t looked into it. And people are telling me to just learn Jujutsu so maybe I shouldn’t be learning new Git features?

                                              1. 1

                                                I’m not the person you asked, but I am an example of someone who knows about --autosquash yet still prefers the workflow of git rebase -i and manually typing the f. The main reason is that, for my project at work, I don’t like how the required “fixup! ” prefix makes the commit stand out in git log:

                                                1. SOMEPROJ-1234: update dependency
                                                2. fixup! SOMEPROJ-1234: fix failing test
                                                3. SOMEPROJ-1234: add support for foo

                                                Given this project’s requirement to mention a ticket in each commit, the prefix looks too out of place. So I prefer to write “(squash)” after the ticket name in the message and then search for lines with that phrase by eye in my git rebase -i editor:

                                                1. SOMEPROJ-1234: update dependency
                                                2. SOMEPROJ-1234: (squash) fix failing test
                                                3. SOMEPROJ-1234: add support for foo
                                                1. 1

                                                  Why include the ticket number in the fixup commit’s message? It gets absorbed into the previous commit; your history will not include it post rebase.

                                                  1. 1

                                                    Good question – I had to think a bit to remember. It’s because my workflow sometimes includes pushing the fixup commits as-is to the remote branch. I do that when I was pairing with another dev on my computer and I want to make it easy for them to see the history of the work we accomplished that day, which would be harder if I squashed before pushing. When working on large changesets like that, I only squash (and then force push) after my coworker and I think we won’t need more changes and want to consider the final breakdown of commits.

                                            2. 12

                                              Meh. This is partly a joke (only partly), but it comes out of a belief in some parts of academia that writing bad code is the norm in academia, because “it’s only a prototype”. I think that writing bad code is generally a bad idea, including in academia, and that it does not actually come from core truth about prototyping or scientific research, but as an instance of a general decrease in research quality suffered under publish-or-perish pressure. If you keep asking people to produce more papers (which include code), you are going to get more code, worse. You are also going to get worse (less reliable) mathematical proofs, worse benchmarks, worse measurements, etc.

                                              I’m not sure that there are differences between research fields. But in the context where this was originally written (PL academia), I believe this is mostly a bad idea, coming out of wrong institutional incentives. I think of the CRAPL or similar semi-jokes as an attempt to normalize or provide alternative justifications for what is, I think, just lower-quality work.

                                              1. 7

                                                Idk, it takes a long time design and writing software as a full time job to get actually good at writing maintainable, robust, software, and even then it can be a fractal of complexity, with practices and techniques from one domain not translating well to others.

                                                Researches might write a lot of code, but it’s under an unbelievably different set of circumstances, and certainly not the only thing they do, full time. I think it is actually reasonable for their code to be kinda shit from the perspective of professional software engineering.

                                                That said, I do agree that the publish or perish model is fucked up and leads to all sorts of perverse incentives. I just don’t think that shaming researches for their crap code will help make anything better.

                                                1. 6

                                                  I recognize that many different practices exist, but here are some intuitions I have against this idea that “bad-practices code” is somehow a natural outcome of scientific prototyping:

                                                  1. Prototyping is not at all unique to research, people prototype in the industry or as hobbyists all the time. All reasons you can think of, in academia, to write shit code, they also exist somewhere in industry, many times worse. So the idea that academic code would somehow be special in how shitty it is written doesn’t sound very convincing to me. (One thing special about research code is that it tends to be solving difficult problems. I don’t see a relation between the fact that the problem domain is technical/difficult and the idea of giving up on good implementation practices.)

                                                  2. You can save a lot of work in research environment due to the fact that your code does not have users, or you are the only user. Decent user interface, error messages, etc., those take a lot of time in real life (if you aim to produce good software), and you can get rid of them. We often say that there is a 80/20 principle where 20% of the software is the core logic, and 80% is interface layers around it, a research prototyping environment lets you focus on the 20% and mostly ignore the 80%. I think that there are other properties of research environments that let us cut corner, and note that this is completely orthogonal to code quality / implementation practices.

                                                  1. 7

                                                    There are several things to keep in mind when discussing software coming from academia:

                                                    • If PhD students would have chosen industry instead, they would have had started as juniors
                                                    • most PhD advisors are actually not great coders nor have industry experience
                                                    • goal of PhD studies is to explore a topic in depth, in a limited time, and publish something others have not. This translates to very little overlap (kudos to advisors who can carve out pieces of a bigger project to different students). This further translates to very little reviewing and collaboration on the code, teams are small.

                                                    No one will jeopardize the quality of results or make code that spits out wrong results, be it benchmarking or whatnot. But the incentives are not there to produce maintainable nor understandable code, nor the system rewards for it.

                                                    The argument of “academics can produce good code” is quite similar to “one can write memory safe code in C”. Given enough time and support, yes. But there is simply not structure to bring that quality front and center.

                                                    Finally, this license is a pretty crappy one, and the joke is not that even funny.

                                                    1. 4

                                                      As someone who worked (briefly) in a role supporting academic software, what you’re describing is what I experienced.

                                                      Essentially, there were a lot of people who were very skilled at their particular field of research, but would have been junior developers in the software industry - often despite having worked for many decades writing software for academia. A lot of them were aware that their software was bad, but more because they knew that they were inexperienced than because they knew what specifically was wrong. I remember one project where a reviewer had demanded that the code used in a paper be rewritten because it was so bad. The team kept on pointing to things like the lack of tests or not following naming conventions, but fundamentally the whole thing was a chaotic mess, right down to the architectural roots. It worked (probably), but it was written by people who were just trying to get it to work at each step, and didn’t understand how to think more globally.

                                                      One of the reasons I ended up leaving relatively quickly (<1y, iirc?) was that our team, supporting these academics, just had no idea what support to offer. Fundamentally, the people we were supporting needed either to take several years off working with real software working with experienced developers, or they needed someone to write their software for them, neither of which were realistic options. Instead, we offered a lot of courses in how to use git, and how to run pylint+black - both useful skills, but both essentially putting lipstick on a pig.

                                                      In fairness, I have not worked with academics in software research, and I would hope things are a little better there. In the rest of academia, though, programming is essentially just a tool you use to beat the HPC system until it gives you results that look right. And there’s no reason or real means to improve your software development skills, because like you say, there aren’t the incentives, nor the culture or structure that could support such an improvement. And I don’t think something like this license adds to that at all.

                                                      1. 2

                                                        Or maybe it’s also a matter of culture and shared values, and we can actually teach this to students (and arrange, for example, for people to review each other code), and emphasize that it matters for good professional practices and helps further research. Encouraging people to use a CRAPL license is another way to influence a culture and transmit values, my point is that this move probably goes in the wrong direction.

                                                        1. 2

                                                          Yes, absolutely agree on both points. The point I was trying to make is that there is no such widespread culture, nor incentive structure that would surface good programming practices.

                                                        2. 2

                                                          I mostly agree, but it’s worth remembering that there are a lot of outliers.

                                                          I’ve worked with a lot of PhD students who either had a pile of open-source experience prior to their start or took a few years between undergrad and PhD in industry. The latter is often a good idea because a bit of time in industry gives you a better understanding of the interesting unsolved problems. Without that, it’s easy to spend your entire PhD solving a problem that doesn’t really exist.

                                                          Even without that, it’s common for PhD students to do one or more three-month internships with engineering groups. This gives them an intensive crash course in how to build real things.

                                                          Beyond that, academics are increasingly measured on ‘impact’ and, in applied sciences, that means ‘do people actually use the results of your work?’. It’s common for people who supervise PhDs to spend some time consulting and doing other forms of technology-transfer work. Internships for their students can form part of this (send the person who did the work to a company for a bit to work with them on getting it into production). Releasing code in a way that makes it easy for people to pick it up and build on it is great for this. One colleague recommended putting grant numbers in license headers so that you can just do a GitHub search and find all of the places your code for a particular project has ended up when you go to ask the same funding body for more money.

                                                          There are incentives to produce readable and maintainable code, they just aren’t the only incentives. The fact that (in computer science) there’s often one top-tier place to publish research and it has a single deadline per year is the biggest counter. More rolling deadlines would help a lot. If missing a deadline meant submitting to the same place a couple of months later, that would reduce the pressure to make something that works right now a lot.

                                                        3. 1

                                                          A thing you are not mentioning is the academic definition of novelty which means that a lot of improving code for the benefit of future improvements cannot pay off within academia (not even for a different research group). Industry is free to put the code quality expectations into the grant agreements they propose.

                                                          And when the problem is hard, it does add the costs for deviating from whatever shape the authors conceptualise the code in, which is hopefully somewhat similar to the manuscript structure, but not whatever structure people call good practices this year. Especially given that the best practices people are talking about are about the mode of maintainability that is not a relevant consideration anyway.

                                                          Also, just the amount of hardcoding the inputs/parameters justified for a one-off experiment is already enough to call the code unacceptably low-quality for a deployment with multiple users…

                                                          There is of course yet another part of the problem, although its strength varies across fields, where the review practices push some subfields towards a shared narrative of «applications» of their theoretical research which is completely disconnected from, and with a bit of bad luck can be directionally opposite to, what would be worth rewriting into high quality code from an actually applied point of view. This is just sad, but true. I guess product development side of industrial software development employs enough PhDs to approach conferences and offer a handful of developer-days (with education in the relevant domains) per paper for reviewing from their perspective.

                                                          worse benchmarks, worse measurements

                                                          The way to improve it is to show what is used now, so that «well, we could also do X for the same effort as Y» can be discussed usefully.

                                                          Anyway, overall the only way to change incentives is to (a) do the cheap things that are clear-small-positives, (b) make clear that any other improvements require resource investment into the change of incentives. Yes, (b) can be called normalising the drawbacks. I would say that CRAPL is a part of both (a) and (b), making an effort to scope the code publication in a way that makes it cheap and encourages it, while stressing that other deisrable things are expensive.

                                                          1. 3

                                                            A thing you are not mentioning is the academic definition of novelty which means that a lot of improving code for the benefit of future improvements cannot pay off within academia.

                                                            I don’t agree. I think that what you mean is that people cannot reuse their code later, because if they did it would not be novel. But this is false: it is pretty common to reuse something and then extend it with something new, or to use something in a new way that warrants publication. For example, specifically in PL research (which is the scientific context in which the CRAPL was written), there has been an explosion in the last decade or so building on Iris, a framework to build program logics on top of separation logic that is mechanized in the Coq/Rocq proof assistant. Iris is a freely available library, and people have written dozens of paper reusing it, extending it, etc. This is an example of a sub-sub-domain being created by the distribution and reuse of good-quality research code, and there are many similar examples – for example the work on top of the egg library.

                                                            Industry is free to put the code quality expectations into the grant agreements they propose.

                                                            I don’t know which form of research you are familiar with, but in our world “industry” does not offer research grants, the vast majority of research funding is of public origin. (Industrial companies fire their own research or research-and-development subgroups, except if they do machine-learning, and hope that buying startups regularly will suffice to keep innovating.)

                                                            when the problem is hard, it does add the costs for deviating from whatever shape the authors conceptualise the code in, [..] but not whatever structure people call good practices this year.

                                                            The sort of code that could want to use the CRAPL is the code that really is “crap”, as the acronym says. We’re not talking about following good practices that are outdated. We are talking about:

                                                            • code that doesn’t compile with the current HEAD
                                                            • there are no tests
                                                            • variable names are shit
                                                            • there are no comments
                                                            • there is no packaging information, no information about what the dependencies might be, etc.

                                                            This sort of things.

                                                            I guess product development side of industrial software development employs enough PhDs to approach conferences and offer a handful of developer-days (with education in the relevant domains) per paper for reviewing from their perspective.

                                                            Something not entirely different happens in my field, called “artifact evaluation”. If a paper at a SIGPLAN conference is accepted, the authors are offered the possibility to upload their “artifacts” (software code, mechanized proofs, benchmarks scripts), and there is an “artifact evaluation committee” (AEC) that reviews them. AEC reviewers check that they can also build the code locally, that the proofs are valid, that the benchmark results on their machine are consistent with the paper’s qualitative claims, etc. (This review step is optional and does not affect acceptance of the paper.) See for example the POPL 2024 AEC pages. Artifact reviewers are typically PhD students, post-docs, and people who semi-recently migrated from academia to industry.

                                                            Note that this is a change that was decided within the research community, by researchers, who are trying to move the needle in common expectations in the community. To my knowledge there was no involvement of grant funding agencies or the industry in shaping this process to improve things.

                                                            I would say that CRAPL [makes] an effort to scope the code publication in a way that makes it cheap and encourages it.

                                                            I fail to see how CRAPL is better than just telling people: “Releasing your code is easy, just slap the MIT license on it and then make a public repo on {github,gitlab}. Oh, and write a three-paragraph README that summarizes the context of this code and clarifies that it is un-maintained.”

                                                            while stressing that other desirable things are expensive.

                                                            Are they, though? It’s not clear to me that writing shit code lets you do good research better than if you write good code. The cost of following semi-decent practices is not very high, and it’s easy to make result-invalidating mistakes or to find yourself blocked in your explorations if your code is really bad.

                                                            1. 1

                                                              I think that what you mean is that people cannot reuse their code later, because if they did it would not be novel.

                                                              In itself, this is not blocking. But for a lot of advances, things which make sense to on top of the code of an article will get labelled «incremental advancements» and pushed far enough down the conference ranking to be not worth starting in the first place. Which is often a problem, there are quite a few follow-up papers in different fields that would get cited but won’t get published high enough to be worth the effort — but better code won’t solve it.

                                                              Sometimes people manage to structure the research project in a way compatible with adding parts to a framework. If it is Coq/Rocq at INRIA it can even work out better than SageMath…

                                                              I don’t know which form of research you are familiar with, but in our world “industry” does not offer research grants, the vast majority of research funding is of public origin.

                                                              They do create research collaboration with universities, but rarely. And as long as they do not do it more often, their interests are not academia interests, that’s my point exactly.

                                                              Note that this is a change that was decided within the research community, by researchers, who are trying to move the needle in common expectations in the community. To my knowledge there was no involvement of grant funding agencies or the industry in shaping this process to improve things.

                                                              Also note that the current artifact evaluation is exactly «can we reproduce the exact same result without looking into the code», not the code quality under the hood. Which is indeed useful for reproducibility / comparisons (but reuse is not anywhere near the priorities).

                                                              code that doesn’t compile with the current HEAD there is no packaging information, no information about what the dependencies might be, etc.

                                                              Usually the former is because of the latter. Yeah, artifact evaluation has kind of solved specifying the dependency versions. Usually at least one dependency is pinned to an obsolete version, though, because without a good chance of stable resource investment there is no point in upgrading just to demonstrate that the algorithm works.

                                                              there are no tests

                                                              The test suite is literally generating what is used in the article. More tests could be useful for refactoring, but the chances were evaluated as low.

                                                              Comprehensive test suites tend to be more code than the thing tested, so they need a lot of code evolution to pay off.

                                                              variable names are shit there are no comments

                                                              Like half of the article is usually more or less de-facto the comments for the algorithm. Variable names often are similar, too. Long formulas with long variable names would be less readable in the article, so nope.

                                                              The cost of following semi-decent practices is not very high, and it’s easy to make result-invalidating mistakes or to find yourself blocked in your explorations if your code is really bad.

                                                              You know when you switch from explore to exploit, which allows to cut a ton of corners. The code most thoughtfully structured for reuse that I have seen is very prone to hiding result-invalidating mistakes of some kinds (because of all the extra connections between the well-defined and well-isolated parts).

                                                              I fail to see how CRAPL is better than just telling people:

                                                              Because people are more likely to go «wait what» about a non-standard license than finish reading a README.

                                                      2. 4

                                                        I remember years ago trying to use some academic code for image processing that turned out to have an enormous memory leak, to the point that my little work laptop could not run it successfully and the only answer was “get a bigger machine” because trying to fix this mess of spaghetti C++ was not worth the time.

                                                        1. 3

                                                          Is it normalization of deviance, or addressing an elephant in the room? There seems to be an unspoken assumption that sloppy and ad-hoc methods (including but hardly limited to code artifacts) may nonetheless support good (or at least adequate) science. This is clearly true in some cases (as pathfinding is not highway engineering!) and clearly false in others. There has been a long ongoing “reproduction crisis” in the experimental sciences at large, and I’m not aware of it getting any better.

                                                          Given that engineering standards are sometimes in fact so low in research, regardless of the cause or any proposed remedy, is it not better to at least allow one’s work to be made public and inspected, rather than hidden for shame or fear of having to support it? That’s the spirit that I read in this. I’m not sure how realistic it is, as a proposal, but it’s at least a gesture.

                                                          1. 7

                                                            It varies a lot between research groups, but generally ‘research quality’ code will cut corners in places that won’t affect the experiment, or which can be explained in the evaluation. For example, there were some things in CHERI Clang where you’d get a compiler crash instead of an error message. We also just disabled autovectorisation entirely (which made no difference on our early prototypes because they didn’t have vector units). These were known limitations but things that would need fixing before a real release (the Arm folks fixed a lot of these things for Morello).

                                                            That said, I discovered that there were some product groups at Microsoft who had lower standards for production code than I did for research-quality code, so there’s probably more variation between individual teams than there is between academic and industrial code.

                                                        2. 12

                                                          To me, the SQL version is a step backward. I’ve done a few projects with Polars, including one with about 2k SLOC of Polars + Plotly with a hint of pandas because Plotly didn’t support Polars at the time (2022). That Polars code, while seemingly cumbersome for someone unfamiliar with the API, is well-written. It’s composable and testable. The pl.col("something") produce objects that can sit in variables or be built in a function. One example I vaguely remember from that 2022 project was a filter we put behind a method like so that the final selection was something like

                                                          sal_exp_filter = response_within_salary_and_experience_range(
                                                              salary=range(Facts.wage_min, Facts.wage_annual_by_hourly(75)),
                                                              experience=range(8,15))
                                                          return pl.select(Columns.salary, Columns.years_of_experience).filter(sal_exp_filter).collect()
                                                          

                                                          We’d also put all of the columns we were working with into a class like

                                                          class Columns:
                                                              salary = pl.col('salary')
                                                              years_of_experience = pl.col('years_of_experience')
                                                          

                                                          so we got nice completions and renames and usage tracking in our IDEs. You can’t get that (easily at least) with a SQL string, nor can you get the easy composability of functions and variables that Polars affords.

                                                          In the end, you gotta use what makes you productive. Polars’ API took some learning but it ended up being the right thing for our team working independently, asynchronously, and with varying skill levels.

                                                          1. 4

                                                            I can’t agree more with everything you say said so well :)

                                                            The main benefit of Polars that I’ve seen is composability and reusability, something one can’t get from sql. The API is so well thought out: I worked for several years with Pandas and some things were just hard to get in the first try - with polars it just flows naturally.

                                                            1. 1

                                                              Out of curiosity, how did you learn Polars? You’ve used it more than I have, probably by a couple orders of magnitude. I can muddle my way through writing some Polars code, but it doesn’t quite feel intuitive yet.

                                                              SQL reminds me of Latin or English: it’s old and has plenty of warts, but it’s the common language, and that’s huge. Actually, human languages are a good analogy. Why is there a * in COUNT(*)? Is it valid in other aggregate functions? I have no clue, in the same way I barely know what a gerund or infinitive is. But I’ve typed the characters COUNT(*) at every place I’ve worked over the last decade, and I’ve been using verbs for even longer. I don’t truly understand what I’m writing. But I can definitely tell you how many rows are in that table, or what the average salary is, or any other question you’d like to ask about the data.

                                                              Contrast with Polars. It’s clear even to me that pl.col("salary") must be a Python object. I can say that confidently, without worrying that I’m conjugating it incorrectly, or that it’s a propositional phrase masquerading as a Python object. No! It’s a Python object, full stop. The upside is I can build helper functions and classes, and everything works as I would expect. The downside (for me) is the speed/fluency penalty of not having all the idioms for manipulating Polars data.

                                                              1. 4

                                                                I learned Polars the old fashioned way: editor, terminal, and docs open on the screen. Read the API docs for a method, use it, play around a bit, write a test, make code pass the test, and move to the next step. Every now and then, I’d poke around in some discussion groups. If I was really stuck, I’d find out how to do something in pandas and then I’d be more likely to find the terminology in the Polars docs or in some examples.

                                                                My constant problem with SQL is that I want to know types a lot of the time and I want composability. SQL doesn’t (easily) let you care about types and it’s not natively composable.

                                                                Probably the hardest part of moving from a SQL mindset to a DataFrame mindset is really understanding the tools in the toolbox. You have to play with them, see what they can do, and maybe get a little hurt misusing them sometime in order to learn proper use and application.

                                                                I’ve had the (mis)fortune of having had very little SQL in my career. I’ve always worked in document databases, pure CRUD APIs where the SQL is highly abstracted away, or DataFrame-based systems like Spark, Polars, and pandas. SQL of more than about 20 lines properly formatted makes my eyes bleed. I loved Quill when I was building a webapp in Scala, because I could write DataFrame-like Scala code and it would generate the SQL as a macro-time build and show you the effective SQL in your IDE.

                                                                1. 4

                                                                  SQL doesn’t (easily) let you care about types

                                                                  It depends somewhat on the database. Standard SQL and Postgres are fairly strongly statically typed. MySQL and SQLite less so. The problem from a software engineering point of view is that the type declarations are in the table definitions, which are often not close at hand.

                                                                  1. 2

                                                                    Indeed, thank you for that specificity. That’s precisely what I mean.

                                                                    In my particular work area, it is extra steps for me to go look at the types of a table whenever I’m accessing Hive through spark SQL. I don’t control the table definitions or sometimes the tables are created in a completely different process that I do control but it’s not a part of the same code base. That’s an inherited architecture problem, an elephant that I’m slowly eating.

                                                            2. 12

                                                              This seems to follow a model similar to Visual Studio Code, where a server binary is uploaded to the remote host, and communication is done with that. As opposed to e.g. Emacs’ Tramp mode, where you access the files directly. Comes from being more oriented towards remote development than accessing remote files for e.g. sysadmin tasks.

                                                              This does have a different performance profile, especially once you have plenty of extensions running, including language servers, file update checkers etc.; On the other hand, I regularly ran into system issues with this on actual remote servers (i.e. not docker containers) with a Linux setup, as the per-user file notification resources ran out (mostly due to way too many extensions each monitoring the whole project, and some of them buggy enough to include dependencies like node_modules).

                                                              Note that right now, zed itself doesn’t seem to support remote extensions anyway, but I’m sure that’s coming.

                                                              1. 16

                                                                Trying to use Tramp over an 80ms connection caused me to give up Emacs entirely :(

                                                                I think the remote server approach is the only one that works. You do have to bump up inotify limits and such but honestly this is a Linux distro issue – they should be shipping much higher inotify limits than is currently the case. Alternatively I wish more tools had adopted watchman, which acts as a clearinghouse for fs notifications, and which to my knowledge is still the only completely correct implementation of recursive file watching. (I’ve been looking at the Rust notify library recently, and I’ve sadly found that it isn’t resilient to TOCTOU races in all sorts of ways.)

                                                                Edit: sorry, while drafting this post I accidentally dropped a disclaimer that I personally worked on making watchman correct wrt recursive file watching. That’s how I know it’s a difficult problem, but it is doable.

                                                                1. 6

                                                                  Notify maintainer here and you’re right.

                                                                  If I had to pick a reason it is simply how damn hard it is to get inotify right - and how specific that can be to the use use case you’re dealing with.

                                                                  To give an example: The linux man page warns about the fact that realistically you won’t be getting all events (which some people verified). And with the way inotify is designed, you basically don’t know that you’re missing something. Each and every file you want to watch, you have to subscribe to. So missing a “create” event for a folder will make you blind to everything inside. Now try to stay low-overhead by not regularly re-scanning (what even is a good interval for that?) and you will fail. This does not yet account for editor specific behavior and trying to get an API right across multiple OS. Or the fact that network mounts (WSL) won’t give you any events.

                                                                  All of this to say: Projects like watchman go to great lengths to give you reliable file events. And people would complain about resource usage if you had multiple of these running. Sorry linux, but windows gets it right (mostly).

                                                                  1. 5

                                                                    Oh hi! Was meaning to reach out to you. Thank you for maintaining notify and I hope I didn’t sound too harsh!

                                                                    There is some overhead in making notify correct, resilient to overflows and such but it isn’t too bad. I do think notify would have to provide a way to warn devs about inotify limits being too low, and ask them to bump their limits up.

                                                                    My question is, would you be willing to take a patch series that fixed notify? I’m concerned that as usage of notify increases a lot of tools will be broken in subtle and not-so-subtle ways.

                                                                    (The situation you mentioned is real, FWIW — there is no escape from having to lstat all the files within new directories, and you have to do that after you set up the watch so that you don’t miss any notifications in between. You basically have to treat it as a very complicated distributed system with chaotic event sources.)

                                                                    1. 1

                                                                      Patches are always welcome - the hard part is getting them ironed out when crossing OS behavior boundaries.

                                                                      I am sadly very limited in time for any FOSS since starting my last job. And while I managed to get dfaust as a maintainer he is also pretty busy. And that is 3 years after announcing my intent of “EOL” as primary maintainer without any replacement ;)

                                                                    2. 2

                                                                      Honestly inotify sounds so difficult to use that I do wonder if it would be easier to write a FUSE filesystem that fires callbacks for operations so that listening would be cheap and easy - but then, using FUSE would probably make reading/scanning files way more expensive, so that’s not a win for performance.

                                                                      1. 2

                                                                        I just had an idea for bypassing FUSE most of the time that might speed this up.

                                                                        • notification events are written to a shared memory buffer
                                                                        • ./src/ (the monitored path) is a FUSE mount
                                                                        • ./real_src/ (not monitored) contains actual source code
                                                                        • the FUSE daemon forwards reads and writes to ./real_src/ and emits notifications as needed. Most processes never actually hit this, because…
                                                                        • LD_PRELOAD wrapper for libc calls like open() and openat(). In any process that doesn’t bypass libc to open files, attempts to read or write files in ./src/ will be redirected to the correct file in ./real_src/ and also run the same “emit notification if needed” code that the FUSE daemon would
                                                                        1. 2

                                                                          For context: it’s been a few years since I’ve done serious kernel-level hacking for desktop OSes but day-to-day I work on relatively low-resource embedded systems (I’ve got like 1MB of RAM, which is quite spacious compared to lots of microcontrollers). Our stuff runs soft-real-time and we have to really carefully consider queue lengths and load shedding and things like that.

                                                                          In the FUSE case, what would you expect to happen if the consumer of a callback got slow for whatever reason? If you’re going for guaranteed event delivery to the callback it seems like the only thing you could really do in that case would be to block the actual filesystem operation? Maybe you have a bounded queue so that a single slow callback doesn’t result in filesystem blocking, but eventually you’re either going to run out of RAM queuing events, blocking the filesystem because the notification receiver is being slow, or dropping queued events (resulting in lost notifications).

                                                                          1. 1

                                                                            In the FUSE case, what would you expect to happen if the consumer of a callback got slow for whatever reason

                                                                            eh, probably backpressure. entire system slows down so writing to the FUSE FS gets slow.

                                                                            edit: that said, I think with development workloads all you actually care about is “do I need to reload the dev server y/n?” and that doesn’t require precise tracking of events at all. you just need to know whether or not any file was written since the last time you restarted it.

                                                                            1. 2

                                                                              Yeah thinking about this a little without having looked at the underlying API at all for inotify and friends… it almost seems like this could be level-triggered instead of edge-triggered/event-driven. Something similar to select/poll/epoll where you give it a list of files you want to be notified about.

                                                                              1. 2

                                                                                Thanks! “dev tools actually want level-triggered instead of edge-triggered” is a good framing.

                                                                                For restarting dev servers or unit tests, I’m thinking:

                                                                                • it’s not okay to miss events: if one or more writes occurred then you need at least one notification generated
                                                                                • but it’s totally okay to coalesce notifications (if I save a file from vi twice in quick succession, I don’t mind if the dev server only restarts after the second one.)
                                                                                • and it’s actually okay to have spurious notifications too (a dev server restarter can check if files really changed).
                                                                                • it’s okay for a notification to come in a little late (I desire that my dev server restarts in <1s but don’t need it to be <1ms)
                                                                                • and it may still be useful even if it’s somewhat vague about exactly which files changed (a dev server restarter can examine all the files in a directory or something).
                                                                                • it would be enormously nicer if you only needed to establish one “watch” for an entire directory tree instead of one per file or directory in that tree (because it’s much faster and avoids having resource limit pitfalls)

                                                                                I think for dev servers the ideal API call might be like “please notify me if any file anywhere under ./src/ has an mtime greater than 23456789”.

                                                                                edit: an idea to avoid memory exhaustion when there are too many events is to have adaptive imprecision. if I change src/engine/a.cpp, then src/engine/b.cpp in quick succession: if the events are being consumed promptly then emit both events individually, but if consumers are lagging then emit only a single “some things changed in src/engine” event.

                                                                          2. 1

                                                                            Could be. The kernel manpage even says that it’s kinda expected that you really won’t be able to receive all events in time to react and not overrun buffers.. I have issues from people with thousands of files and loosing events to them. So it’s not even about “not missing a folder”.

                                                                            I’m honestly pretty tired of it.

                                                                        2. 2

                                                                          this is a Linux distro issue – they should be shipping much higher inotify limits

                                                                          Yeah, it’s not a hard problem. Sadly the few times where I actually wanted or needed to develop on a remote system meant that I couldn’t do it because the local hardware was too anemic or locked down (WSL? Docker? Nope, you’ll get our standard office drone image plus a IntelliJ license, that’s it). And would you know, that also sometimes meant the dev servers couldn’t be easily modified (“I’d need to make sure that all the umpteen developers and test systems aren’t affected by this!”)

                                                                          Almost makes me miss MFC dev work.

                                                                          1. 2

                                                                            Trying to use Tramp over an 80ms connection caused me to give up Emacs entirely :(

                                                                            Wait, what? On a connection with 80ms of network latency if I pay close attention I can certainly notice the lag in an interactive shell session over SSH, but it’s still way below the threshold of being problematic for general use. But that’s in a setting where you’re paying the round-trip cost on the terminal echo of every keystroke. With TRAMP you’re only going to do that on file saves and buffer clean->dirty transitions (for metadata checks), right? I routinely use TRAMP between systems with network RTTs in that range without any difficulty whatsoever; I’m genuinely curious how it ended up being so much of a problem for you.

                                                                            1. 1

                                                                              If I load or save a file, it blocking the UI for more than 80ms is well over my threshold of acceptability.

                                                                            2. 2

                                                                              Mosh + Emacs running on the remote server works well for me on bad connections.

                                                                              The Mosh website features a remote Emacs running Org, so I imagine this is a common usecase.

                                                                              1. 2

                                                                                Yeah I tried that out for a while. I used to ride the Facebook bus 2-3 hours a day with rather flaky mobile internet – and mosh was a lot better, but still not great. At least back in the day, security folks had serious concerns about it too.

                                                                              2. 1

                                                                                Watchman is great!

                                                                                My only problem with it is that it’s too hard to build it. Nixpkgs follows updates of most packages, providing updates with short delay. However, for watchman they still have 2024.03 (the last is 2024.11). Even the release pages don’t have binary packages for most of the distributions.

                                                                                1. 1

                                                                                  Yeah you’re definitely not the only person to bring this up! Stay tuned.

                                                                                2. 1

                                                                                  I don’t think either approach is universally applicable. Sometimes you want to remotely edit some files on a small machine without deploying an extra editor binary to it (might not even have the disk space or memory to do so), sometimes the remote machine is more powerful and you want as much work as possible pushed there. Sometimes both in the same project (although I’m not sure I’ve seen a setup that can do that well yet).

                                                                                3. 2

                                                                                  The one issue is you need the server it deploys to support your platform. Not a problem for most people, but it is for me on my dayjob. I think TRAMP style can also break down too if it’s different from what it usually expects.

                                                                                  1. 1

                                                                                    Yeah, same, though I’ve been working on porting Zed to illumos (wasmtime main now experimentally supports illumos :) )

                                                                                    1. 1

                                                                                      Yeah, it broke a lot for me because of bad prompts etc., but the universality was a big advantage of this setup. You heard that lot in the vim/emacs arguments, where you couldn’t just install a modern vim with all the plugins on the target system, whereas you always had your “home” emacs.

                                                                                      But, well, that’s more the sysadmin perspective of the days of yore, I think. These days, it’s less likely that you also have to edit that /etc/sendmail.cf on that one weird Apollo Domain/OS server. And more likely that “remote” is on your own system or in a LAN. A lot more control, but also a lot more needs.

                                                                                      Thinking about it, didn’t Plan 9’s sam do that, too? I remember spreading some editor parts on various Unices in the early millennial years…

                                                                                    2. 2

                                                                                      I get a pretty good experience doing remote dev stuff with plan9port Acme and sshfs.

                                                                                      I mount the remote folder I need on /mnt and then open it in Acme. I also open an acme terminal with Win in which I ssh to the remote server and cd to the mounted folder. From there any relative reference to a file inside the mounted folder can be opened in Acme through my local plan9port Plumber with a simple right click on it’s name.

                                                                                      I can also use stuff like grep and find directly on the sshfs mount point in acme, but that can be rather slow, so I do it through the ssh connection in the win window.

                                                                                      At some point I’d like to build an Acme program that takes a host and path then mounts that with sshfs on a mount point and proxies every command it gets through a ssh tunnel to the targeted host, that way I dont need to keep an extra Win around.

                                                                                    3. 4

                                                                                      What is jj equivalent of git pull --rebase? I could not find that in any tutorial on jj, yet it’s something that I believe many use in cases where jj would fit well.

                                                                                      1. 5

                                                                                        jj git fetch && jj rebase -d xyz@origin.

                                                                                      2. 3

                                                                                        I would like to switch my main config repo to jj. As I use it to set up new computers, I would like to do minimal effort beforehand, just install Nix and then checkout the repo and make home manager do its magic. Can I checkout a jj controlled repo with git, as git is now present in every os.

                                                                                        1. 7

                                                                                          Jj interacts with normal git repos. You can use jj with any git hoster and repository.

                                                                                          Any jj repo that was started with jj git init ... is a git repo you can push into any git forge. Any repo you start by cloning a git repo using jj (jj git clone ...) is also a git repo upstream.

                                                                                          Internally all jj repos contain a git repo at this point in time, but that may or may not change in the future when jj ever decides to develop a custom storage mechanism.

                                                                                          1. 1

                                                                                            Yeah, I wanted to change the title to ‘git compatible replacement’, but I thought the post title was getting too long. I apologize about that.

                                                                                          2. 7

                                                                                            Yes. While there are theoretically plans to add a jj-native backend, at the moment, every jj repo is also a git repo.

                                                                                            You might need to do colocation, though, or else the .git dir will be in a nonstandard location, and I’m not sure if/how that would affect remote clones. Generally, I recommend colocation anyway, I haven’t seen any downsides to it yet.

                                                                                            1. 2

                                                                                              Thanks! So if I do git checkout … and then setup the new system so it brings jj in my environment, things should work? I can then just start using jj in that repo without any additional steps? Will try it over the weekend.

                                                                                                1. 3

                                                                                                  git checkout …

                                                                                                  Your CVS/SVN heritage is showing :3 (cf. git clone).

                                                                                                  You can also jj git clone to start it from that side, optionally with --colocate to keep the .git directory in the root of the repo instead of under .jj/ somewhere (and therefore able to use commands from both at once).

                                                                                            2. 5

                                                                                              And the solutions, like scanning a QR code with a separate device, are cumbersome and alien to most users.

                                                                                              Scanning QR codes are not alien or cumbersome for users.

                                                                                              1. 14

                                                                                                Hard disagree. As a Xennial, I would call it a toss up if someone of my generation understands QR codes or not. Gen X and above: not at all. If you find one that knows what a QR code is, it is hard to find someone who can actually scan them (at least with any reasonable speed). I can’t make claims for the generations after me.

                                                                                                I personally still don’t understand passkeys. I would much prefer it just said “Log in with my iPhone” or “Log in with my Samsung phone”. Passkeys limited to whatever computer you’re using at a time is a nightmare. I remain completely unclear whether I should allow Proton Pass to hold my passkeys, a Titan key to hold my passkeys or my iPhone to hold my passkeys. I think PP is the right choice, but security isn’t supposed to be a guessing game.

                                                                                                I think passkeys are not going to pass muster and will go away.

                                                                                                1. 2

                                                                                                  I feel like Whatsapp has proven that “scan a qr code for auth” is a pretty viable option.

                                                                                                2. 7

                                                                                                  I think the accent here is on “with a separate device”.

                                                                                                  1. 5

                                                                                                    It’s kind of a pain to do if the other device is another desktop or laptop computer rather than a phone with a camera though.

                                                                                                    1. 3

                                                                                                      You can invert the flow there and have the phone scan the QR code from the computer screen.

                                                                                                      1. 2

                                                                                                        yeah… not sure how many dedicated PCs even have a webcam tbh.

                                                                                                      2. 1

                                                                                                        I have never scanned a QR code and don’t see a reason ever do so.

                                                                                                        1. 1

                                                                                                          The reason is as-stated. To authorize a device with a passkey.

                                                                                                          1. 1

                                                                                                            Especially now that YouTube has changed the canonical URL for “Never Gonna Give You Up”, invalidating all the existing QR codes pointing to it.

                                                                                                        2. 1

                                                                                                          nix-direnv and flakes already caches, Is there another type of caching i’m not aware of?

                                                                                                          1. 1

                                                                                                            Did you read the article? It answers your question…

                                                                                                            1. 4

                                                                                                              I might be wrong, but I noticed a lot of hostility towards the devenv project, esp on the nix discourse. Every release post is met with a bunch of questions challenging various aspect of the project. I don’t know why is that, but I applaud the team for bringing improvements in each release! I hope that I’ll soon manage to use devenv to bring Nix to my colleagues :)

                                                                                                              1. 4

                                                                                                                I did read the article, I dont’ get what caches are present. I’m not trying to be hostile, I might be a little resistant to another tool that bolts over what I already have.

                                                                                                                So I need direnv, nix-direnv, and now devenv. It’s just another thing to learn and install so I want to know what it’s doing for others.

                                                                                                                Somehow devenv is caching things in sqlite but i wonder why because i can cd into a nix-direnv directory that’s cached in 0m0.001s.. So i’m curious what the cache is really doing..

                                                                                                            2. 8

                                                                                                              I currently use direnv and flakes for my development environments. I do find the flake syntax tedious, but it’s also pretty well-documented. I don’t feel like I’m going to hit a wall with what it can do. I tried devenv once and felt like it was adding considerable complexity, and when I wanted to do something else nix-ish it felt to me like it was kind of in the way. What do other people make of it?

                                                                                                              1. 6

                                                                                                                I’m not using flakes, but direnv instantly clicked with me. Almost all my projects are now handled by direnv, and I really appreciate how easy is to list all the tools I need. At work, I pair direnv with gitlabci, so direnv provides the scripts, and then gitlabci just does direnv shell run ....

                                                                                                                  1. 7

                                                                                                                    🤦🏻‍♂️ devenv… joys of writing late in the night, let me delete that code I wrote at the same time…

                                                                                                                1. 2

                                                                                                                  I’m curious to hear what got into your way, devenv is pretty much Nix all the way.

                                                                                                                  All supported options are documented at https://devenv.sh/reference/options/

                                                                                                                  We’ve also recently added documentation for all languages and services per page, for example at https://devenv.sh/supported-services/postgres/

                                                                                                                  1. 1

                                                                                                                    I’ll give it another try and report back to you.

                                                                                                                2. 6

                                                                                                                  Company: IMC

                                                                                                                  Company site: https://www.imc.com

                                                                                                                  Position(s): Various, both at junior and senior level

                                                                                                                  Location: ONSITE: Amsterdam, Chicago, London, Mumbai, Sydney, Zug

                                                                                                                  Description: We are a high-frequency low-latency trading company that specializes in options market making. There are several positions at different seniority levels (well, two, as we have a pretty flat hierarchy). SREs, Network & DC engineers, Dev Productivity, C++/Java/Python engineers, Trading Support

                                                                                                                  Tech stack: The usual suspects for a trading company: C++, Java, FPGA, running on bare metal Linux, with supporting infra running on k8s. Analysts use mostly Python. Huge amounts of data from various sources

                                                                                                                  Compensation: Depends on the location, but quite competitive. Salary+yearly bonus, there is no equity. Pretty great work/life balance and great company culture.

                                                                                                                  Contact: IMC careers page, feel free to DM also.

                                                                                                                    1. 16

                                                                                                                      Unlike other “falsehoods” articles, this one really lacks detailed explanations for each item.

                                                                                                                      1. 17

                                                                                                                        Going back to the original that started it all, many “falsehoods” articles don’t bother to explain why the things they list are wrong, let alone give you any idea of what you could do instead that the author would think is better. It’s why the whole genre borders on useless.

                                                                                                                        1. 3

                                                                                                                          If you refer to this one as the original, most of the bullet points are self explanatory, or it is easy to think about why if you just think about it for more than 5s.

                                                                                                                          My favorite being:

                                                                                                                          Two-digit years should be somewhere in the range 1900-2099.

                                                                                                                          1. 9

                                                                                                                            I had always understood this to be the original.

                                                                                                                        2. 2

                                                                                                                          So many of them have been missing explanations.