Overall, there was agreement that the original motivations for a large, “batteries-included” standard library no longer held up to scrutiny. “In the good old days,” Ned Deily reminisced, “We said ‘batteries-included’ because we didn’t have a good story for third-party installation.” But in 2023, installing third-party packages from PyPI is much easier.
I’m pretty unconvinced about this for script cases. For development projects setting up a virtualenv or whatever is fine, but for scripts, requirements installation being basically “oh yeah just set it up in the global environment (also be prepared for users to force the install and then break other stuff on their system)” still kinda sucks.
If we are moving towards more things requiring files being installed, I would very much like an out of the box solution for “this script uses pathlib
and dataclasses
, so please run it in an environment where this is available”. A solution that doesn’t involve polluting the current working directory (though that might just be some dot file). If a solution to this exists, I’m much happier about moving forward with a “slimmer” standard library.
Really I’m a fan of the huge standard library because there’s a bunch of serious things that work well (argparse
is great, and it’s great that it’s in the standard library, and it’s not clear to me that it would be there in a new version of the standard library). But if the contributors are all for a slimmer one, hard for me to really say it’s not a good idea
I don’t think Python would be nearly as successful without its current iteration of std. Installing packages, as you allude to, is still atrocious despite their optimism that it’s so easy now.
Hey if it were up to me we would keep growing the stdlib. I do think there’s value in tracking deps internally tho
Do me a favor. On a device of your choice that has a Python interpreter available, open a terminal and run:
python -m venv lobstercomment
cd lobstercomment/
source bin/activate
python -m pip install requests
And tell me what happens. If you’re on Windows, run py
instead of python
, and on line three change to running the appropriate bin/
script for your Windows terminal type (.bat
for cmd.exe
or .ps1
for PowerShell).
What’s parody about it?
Many language package managers, even ones that get widespread praise, have some sort of “start new project” command, and require you to be in the project directory to work with dependencies. Literally the only thing that’s extra about the Python approach is “activating” the isolated environment, and the differences in command names, which are not part of Python packaging (every example of doing command-line stuff in Python, not just packaging, has to account for the fact that on Windows the executable is named py
)
The way I see it, the average Python user isn’t necessarily a programmer. Maybe they code on the side, writing small scripts, experimenting with what Python can do. They just run python myscript.py
and it works everywhere. The details you gave are just one complication out of the many many problems that package managers introduce, and would affect usability for those users. (e.g. requiring a project.toml or equivalent, version collisions between packages, only works when there’s network access, etc.)
Btw I’m not against getting rid of some of the less used packages, but personally I’d prefer to grow the standard library. Python is already a huge downloads, what’s a few more scripts, if it helps usability?
P.S. you don’t find it odd that if I switch between windows and linux, I have to remember if it’s bin or scripts, if it’s cmd.bat or .ps1, etc.?). And also, I use windows and I always use python
, never used py
. So many pointless details..
Again: I don’t see how you can argue for Python being uniquely bad among language package managers when it seems your position is that language package managers are all inherently bad and wrong due to not meeting the needs of non-“programmers”.
I never said Python is uniquely bad. The opposite, right now it is uniquely good, and I think it shouldn’t try to be more like node.
Someone said:
Installing packages, as you allude to, is still atrocious
I gave a sample of starting a project and installing a package into it. You replied:
This is parody, right? Because you’re proving his point with flying colors
But now you are saying Python is “uniquely good” on this?
I do not understand.
Yes, package management is generally bad in most languages. It is a little bit worse in Python unfortunately, but not that much.
But because Python comes with a large and high-quality std lib, you don’t have to deal with package management as much, and sometimes users can avoid it for a long time. Which is why it is uniquely good.
On one machine I get:
/usr/bin/python: No module named venv
On another I get a long error message starting with:
The virtual environment was not created successfully because ensurepip is not available.
On yet another it worked, but only if I changed python
-> python3
in your commands.
Using Python 2, three+ years after it EOL’d and 13 years after Python 3 came out, is its own pile of problems that I can’t help you with.
python 3 is installed on all of the above machines. I’m not sure what you expected to get our of your experiment if you’re so dismissive of someone trying to help.
For the last one, yep – the package python-is-python3
symlinks /usr/bin/python to python3. I have no idea what the deal is the ensurepip
error. It seems debian-like systems need some additional package installed.
FYI I don’t think pathlib
, dataclasses
or argparse
are under threat. At least they are not in the list on PEP 594 – Removing dead batteries from the standard library.
You’re missing out! pathlib
is much easier to work with than os
for file management, and dataclasses
is great for defining typed, well-rounded classes. Most of my classes nowadays are a dataclass.
I guess you are the exception. How do you work with paths, classes or command line interfaces instead? ( I know that there are alternatives, but these are a great start)
What did get deprecated that is about paths and classes? The only few CLI module that were being deprecated (before decision was reverted) have modern alternative module in the stdlib.
I was referring to the list in the PEP, but for the sake of argument, os.path
, class
, and sys.args
all work perfectly fine.
Pathlib
is so much better than os.path
, and sys.args
doesn’t handle flags, subcommands, or default values.
(I know you’re just saying it for the sake of argument, but I’m still here to shill the value of pathlib
)
With pattern matching, you barely need argparse
anymore! /s (just pass in all the args in exactly the right order every time)
If we are moving towards more things requiring files being installed, I would very much like an out of the box solution for “this script uses pathlib and dataclasses, so please run it in an environment where this is available”. A solution that doesn’t involve polluting the current working directory (though that might just be some dot file). If a solution to this exists, I’m much happier about moving forward with a “slimmer” standard library.
In principle I agree, but this wouldn’t actually help me much. I’ve done a lot of work in environments where an ambient Python interpreter (it might be 10 years old) is table stakes, but installing anything third-party has such a high bar as to be effectively impossible. And I don’t even really disagree with policies like this: trusting Python is much easier than trusting everyone with a PyPI account.
So in these cases, we’d all write Python. People who would have run a hundred miles to use Perl instead wrote Python. Having more stuff in the standard library just made it objectively better for us almost all of the time. Making it easier to install untrusted code wouldn’t have helped.
Not ruby, but the best prior art on a shell with a general purpose language is https://xon.sh/
Xonsh is really fun, and I used it for many years. My main gripe was that I really wanted a way to make binaries that would “speak xonsh” as its own interface, and that didn’t really feel in the cards.
There are of course way too many tools that use bash scripts or w/e to work, and xonsh would require me to mess around with my own stuff to get it to work, but it was generally a fun experience. Nothing like just being able to write shell script functions with logic and not have to worry about escaping issues. It’s a very good design.
One thing you can do here is have clicking expand a modal, but Ctrl/shift-click does a new page. [0]
Implementation wise you send a header depending on the action, and get either a snippet or a full page.
Reloading page state or toggling between tabs when you want to just peek at something is also unfriendly UX! But I think that you can sort of tackle this problem in a pretty principled way to get the best of both worlds.
I like documents and simple pages, but on the occasions that we can offer better UX pretty easily, it feels like it’s worth trying out!
[0] please don’t be pedantic about mobile devices. Something should happen there of course
I have definitely had a very good experience with KDE over the past year. It definitely does not look as nice as GNOME, but I can do the stuff I need to do with it, and the plasma desktop environment is mostly things that make sense (screenshot tool in KDE is quite nice and functional, if pretty ugly. GNOME’s one looks way nicer and improves on every release, of course)
We had a very unfortunate thing happen with desktop environments, with both major DE going through very painful transitions at the same time. Gnome is doing a lot of good work improving things, but so has KDE.
We had a very unfortunate thing happen with desktop environments,
It wasn’t just the DEs, either. “Desktop Linux” just generally seemed to get worse for a long time after 2010-ish. And by “worse” I’m not talking about nerdy philosophical holy wars (systemd, etc), but I just mean getting things to work quasi-correctly. Some of it was “growing pains”, like when I had to do arcane rituals to simultaneously appease PulseAudio apps and ALSA apps, but some of it was just too much stuff all breaking in a short timespan. We had NetworkManager always doing something funny, Xorg changes + Wayland stuff, libinput with weird acceleration, polkit, udev, PAM, AppArmor, etc, etc, all changing or being rewritten or replacing each other.
Desktop Linux was nuts for a while. And, honestly, with the whole Snap vs Flatpack vs whatever ecosystems, I feel like we’re still doomed. Mostly because all of the app-containerizations absolutely suck. Badly. They all take up way more space, eat up way more resources when running, take longer to launch, etc. I know and understand that maintaining a massive repository for all supported software for an OS is kind of crazy and seems unsustainable, but these technologies are just not the answer. When will software devs learn that “computers are fast” is actually a lie used to justify lazy programmers? </pedestal>
It wasn’t just the DEs, either. “Desktop Linux” just generally seemed to get worse for a long time after 2010-ish. And by “worse” I’m not talking about nerdy philosophical holy wars (systemd, etc), but I just mean getting things to work quasi-correctly.
Some two years ago, when I was still going through my inner “fuck it, I’m getting a Mac – holy fuck I’m not spending that kind of money on a computer!!!! – but I should really… – LOOK AT THE PRICE TAG MAN!!” debates, a friend of mine pointed out that, you know, we all look at the past through rose-tinted glasses, lots of things broke all the time way back, too.
A few days afterwards we were digging through my shovelware CDs and, just for shits and giggles, I produced a Slackware 10 CD, which we proceeded to install on an old x86 PC I keep for nostalgia reasons. Slackware 10 shipped with KDE 3.2.3, which was still pretty buggy and not quite the “golden” 3.5 standards yet.
Man, it’s not all rose-tinted glasses, that thing was pretty solid. Two years ago I could still break Plasma desktop just by staring at it mencingly – like, you could fiddle with the widgets on the panel for a bit and have it crash or resize them incorrectly, drag a network-mounted folder to the panel to iconify it and then get it to freeze at login by unplugging the network cable, or get System Settings and/or Kwin to grind to a halt or outright crash just by installing a handful of window decoration themes.
Then again, the tech stack underneath all that has grown tremendously since then. Plasma 5 has the same goals on paper but it takes a lot more work to achieve them than it took back in 2004 or whatever.
I love this anecdote, and I’ve had similar experiences.
I’m a software dev these days, myself, and I’ve always been a free software fan/advocate, so I don’t want to shit on anyone’s hard work–especially when they are mostly doing it for free and releasing it to the world for free. But, I do wonder where things went wrong in the Desktop Linux world.
Is it that the “modern” underlying technologies (wayland, libinput, systemd, auth/security systems, etc) are harder to work with than the older stuff?
Is it that modern hardware is harder to work with (different sleep levels, proprietary driver APIs, etc)?
Is it just that there’s so much MORE of both of the above to support, and therefore the maintenance burden increases monotonically over time?
Or is it just the age-old software problem of trying to include the kitchen sink while never breaking backwards compatibility so that everyone is happy (which usually ends up with nobody happy)?
Again, I appreciate the work the KDE devs do, and I’m really glad that KDE and Plasma exist and that many people use their stuff and are happy with it… But…, I will state my uninformed speculation as a fellow software dev: I suspect that the vast majority of bugs in Plasma today are a direct result of trying to make the desktop too modular and too configurable. The truth is that the desktop pieces generally need to know about each other, so that the desktop can avoid being configured into a bad state, or so that widgets can adapt themselves when something else changes, e.g., containing panel resizes, screen size changes, etc. Obviously Plasma does have mechanisms in place for these things, and I don’t know what those mechanisms are (other than that it probably uses DBUS to publish event messages), so this is just speculation, but I imagine that the system for coordinating changes and alerting all of the different desktop parts is simultaneously more complex and more limited than it would be if the whole desktop were more tightly integrated. I strongly suspect that Plasma architected itself with a kind of traditional, Alan Kay-ish, “object oriented” philosophy: everything is an independent actor that communicate via asynchronous messages, and can be added and removed dynamically at runtime. I’m sure that the idea was to maximize flexibility and extensibility, but I also think that the cost to that approach is more complexity and that it’s basically impossible to figure out what will actually happen in response to a change. Not to mention that most of this stuff is (or was the last time I checked, at least) written in C++, which is not the easiest language to do dynamic stuff in.
I suspect that the vast majority of bugs in Plasma today are a direct result of trying to make the desktop too modular and too configurable.
I hear this a lot but, looking back, I really don’t think it’s the case. KDE 3.x-era was surprisingly close to modern Plasma and KDE Applications releases in terms of features and configurability – not on the same level but also not barebones at all, and was developed by fewer people over a much shorter period of time. A lot of it got rewritten from the ground up – there was a lot of architecture astronautics in the 4.x series, so a couple of Plasma components actually lost some featyres on the way. And this was all happening back when the whole KDE series was a big unhappy bunch of naked C++ – it happened way before QtQuick & co..
IMHO it’s just a symptom of too few eyes looking over code that uses technology developed primarily for other purposes. Back in the early ‘00s there was money to be made in the desktop space, so all the cool kids were writing window managers and whatnot, and there was substantial (by FOSS standards of the age) commercial backing for the development of commercially-viable solutions, paying customers and all. This is no longer the case. Most developers in the current generations are interested in other things, and even the big players in the desktop space are mostly looking elsewhere. Much of the modern Linux tech stack has been developed for things other than desktops, too, so there’s a lot of effort to be duplicated at the desktop end (eh, Wayland?), and modern hardware is itself a lot more complex, so it just takes a lot more effort to do the same things well.
Some of the loss in quality is just inherent to looking the wrong way for inspiration – people in FOSS love to sneer at closed platforms, but they seek to emulate them without much discrimination, including the bad parts (app stores, ineffective UX).
But I think most of it is just the result of too few smart people having to do too much work. FOSS platforms were deliberately written without any care for backwards compatibility, so we can’t even reap the benefits of 20+ years of maintenance and application development the way Windows (and, to some extent, macOS) can.
I hear this a lot but, looking back, I really don’t think it’s the case. KDE 3.x-era was surprisingly close to modern Plasma and KDE Applications releases in terms of features and configurability
It was very configurable, yes. But, I was speaking less from the lens of the user of the product, and more from the software architecture (as I came to understand it from blog posts, etc). I don’t know what the KDE 3.x code was like, but my impression for KDE/Plasma 4+ was that the code architecture was totally reorganized for maximum modularity.
Here’s a small example of what I mean from some KDE 4 documentation page: https://techbase.kde.org/Development/Architecture/KDE4/KParts. This idea of writing the terminal, text editor, etc as modular components that could be embedded into other stuff is an example of that kind of thinking, IMO. It sounds awesome, but there’s always something that ends up either constraining the component’s functionality in order to stay embeddable, or causing the component to not work quite right when embedded into something the author didn’t expect to be embedded in.
Back in the early ‘00s there was money to be made in the desktop space, so all the cool kids were writing window managers and whatnot, and there was substantial (by FOSS standards of the age) commercial backing for the development of commercially-viable solutions, paying customers and all. This is no longer the case.
Is that correct? My understanding was that a good chunk of the GNOME leadership were employed by Red Hat. Is that no longer the case? I don’t know the history of KDE and its stewardship, but if Novell or SUSE were contributing financially to it and now no longer are, I could see how that would hurt the person-power of the project.
Some of the loss in quality is just inherent to looking the wrong way for inspiration – people in FOSS love to sneer at closed platforms, but they seek to emulate them without much discrimination, including the bad parts (app stores, ineffective UX).
I definitely agree with this. That’s actually one reason why I tune out the GNOME Shell haters. It’s not that I don’t have some of my own criticisms about the UI/UX of it, but I really appreciate that they tried something different. Aside: And as someone who has worked on Macs for 10 years, it blows my mind when people say that GNOME Shell is at all mac-like; the workflow and UX has almost nothing in common with macOS except for the app-oriented super-tab switcher.
Here’s a small example of what I mean from some KDE 4 documentation page: https://techbase.kde.org/Development/Architecture/KDE4/KParts. This idea of writing the terminal, text editor, etc as modular components that could be embedded into other stuff is an example of that kind of thinking, IMO.
Uhh… it’s been a while so I don’t remember the details very well but KDE 3 was definitely very modular as well. In fact KParts dates from the 3.x series, not 4.x: https://techbase.kde.org/Development/Architecture/KDE3/KParts . KDE 4.x introduced a whole bunch of new things that, uh, didn’t work out well for a while, like Nepomuk, and changed the desktop shell model pretty radically (IIRC that’s when (what would eventually become) Plasma Shell came up). Some frameworks and applications probably went through some rewrites, some were abandoned, and things like DCOP were buried, but the overall approach to designing reusable frameworks definitely stayed.
Is that correct? My understanding was that a good chunk of the GNOME leadership were employed by Red Hat. Is that no longer the case? I don’t know the history of KDE and its stewardship, but if Novell or SUSE were contributing financially to it and now no longer are, I could see how that would hurt the person-power of the project.
I think Red Hat still employs some Gnome developers. But Canonical no longer has a desktop team IIRC, Ximian is pretty much gone, Nokia isn’t pouring money into desktop/mobile Linux technologies etc.. Pretty much all the big Linux players are mostly working on server-side technologies or embedded deployments.
I definitely agree with this. That’s actually one reason why I tune out the GNOME Shell haters.
I don’t really mind Gnome Shell, Linux always had all sorts of whacky “desktop shell” thingies. However, I really started to hate my Linux boxes starting with GTK3.
I dropped most of the GTK3 applications I was using and got a trip down the memory lane compiling Emacs with the Lucid toolkit. But it wasn’t really avoidable on account of Firefox. That meant I had to deal with its asinine file finding dialog, the touch-sized widgets on a non-touch screen, and that awful font rendering on a daily basis. Not having to deal with that more than justifies the money I spent on my Mac, hell I’d pay twice that money just to never see those barely-readable Adwaita-themed windows again *sigh*.
Uhh… it’s been a while so I don’t remember the details very well but KDE 3 was definitely very modular as well.
Fair enough. I definitely used KDE 3 a bit back in the day, but I don’t remember knowing anything about the development side of it. I could very well be mistaken about KDE 4 being a significant push toward modularity.
Oof, I echo all of this so much.
I was a desktop linux user from about 2003 until 2010 or so, going through a variety of distros (Slackware, Gentoo, Arch) and sticking with Ubuntu since 2006ish.
At one point I got tired of how worse things were getting, specially for laptop users and switched to a Mac. I’ve used macs for my main working computers since then and mostly only used Linux in servers/rPis, etc.
About 3 years back, before the new Arm based macs came out I was a bit fed up with my work computer at the time, an Intel based Mac, being so sluggish, so I decided to try out desktop Linux again (with whatever the current Ubuntu was at the time) on my Windows Desktop PC which is mostly just a gaming PC.
I could replicate my usual workflow, specially because I never depended too much on Mac specific apps, but then even on a desktop machine with two screens, the overall experience was just…not great.
The number one thing that irked me was dealing with my two screens which have different resolutions and different DPIs. The desktop UI for it just literally didn’t work and I had to deal with xrandr commands that ran on desktop start and apparently this is “normal” and everyone accepted this as it being ok. And even then I could never get it exactly right and sometimes it would just mess up to a point that the whole display server would need a restart.
Other than that, the way many of these modern web based desktop apps just have all sorts of issues with different DPIs and font rendering.
I thought, how were all of these things still such a massive issue? Specially the whole screen thing with laptops being the norm over the last 15 years and people often using external screens that probably have a different DPI from their laptop anyway?
Last year I decided to acquire a personal laptop again (for many years I only had work laptops) and I thought I’d have a go at a Framework laptop, and this time I thought I’d start with Kubuntu and KDE, as I’d also briefly tried modern KDE on an Asahi Linux installation and loved it.
KDE seems to handle the whole multiple display/DPI thing a lot better but still not perfectly. The web based desktop app and font rendering issues were somehow still there but not as bad (I did read about some Electron bugs that got fixed in the meanwhile).
And then I dove into the whole Snap/Flatpack thing which I was kind of unfamiliar with as not having used desktop linux for so many years. And what a mess! At multiple points I had multiple instances of Firefox running and it took me a while to understand why. Some apps would open the system Firefox, others would go for the containerized one.
I get why these containerized app ecosystems exist, but in my limited experience with it the interoperability between these apps seems terrible and it makes for a terrible user experience. It feels like a major step back for all the improvements and ease of use Desktop Linux made over the years.
I did also briefly try the latest Ubuntu with Gnome and the whole dual screen DPI situation was just as bad as before, I’m guessing related to the whole fractional scaling thing. Running stuff at 100% was fine but too small, 200% fine but too bug, 150%? A blurry mess. KDE deals fine with all those in between scales.
My other spicy opinion on breakage is that Ubuntu not doing rolling releases holds back everything, because bug fixes take too long to get in front of users.
“I don’t want updates to break things” OK well now every bug fix takes at least 6 months to get released. And is bundled with 10 other ones.
I understand the trade offs being made but imagine a world in which bug fixes show up “immediately”
I understand the trade offs being made but imagine a world in which bug fixes show up “immediately”
… that was the world back in the day.
“Oh, $SHIT is broken. I see the patch landed last night. I’ll just grab the source and rebuild.”
And it still is, depending on the distro (obviously, you can manually compile/manage packages with any distro, but some distros make that an officially supported approach).
I agree that Ubuntu’s release philosophy isn’t great, but in its defense, bug fixes are not blocked from being pushed out as regular updates in between major releases.
What I do think is the big problem with Ubuntu’s releases is that there used to be no real distinction between “system” stuff and “user applications”. It’s one thing to say “Ubuntu version X.Y has bash version A.B, so write your scripts targeting that version.” It’s another to say “Ubuntu version X.Y has Firefox version ABC.” Why the hell wouldn’t “apps” always just be rolling release style? I do understand that the line between a “system thing” and a “user-space thing” is blurry and somewhat arbitrary, but that doesn’t mean that giving up is the right call.
To be fair, I guess that Ubuntu’s push toward “snap” packages for these things does kind of solve the issue, since I think snaps can update independently.
It wasn’t just the DEs, either. “Desktop Linux” just generally seemed to get worse for a long time after 2010-ish.
That’s part of why I landed on StumpWM and stayed. It’s small, simple, works well for my use cases, and hasn’t experienced the sort of churn and CADT rewrites that have plagued others.
Moving to FreeBSD as my daily driver also helped there, because it allowed me to nope out of a lot of the general desktop Linux churn.
I switched to Plasma from GNOME because I was tired of my customizations getting obliterated all the time. I also like the fact I can mess with the key combinations in many more apps, since my muscle memory uses Command, not Control. Combined with a couple add-ons and global menu, I’ve never looked back.
We had a very unfortunate thing happen with desktop environments, with both major DE going through very painful transitions at the same time.
The reasons were entirely clear, but people’s memories are short, and there is a tonne of politics.
Microsoft threatened to sue. Red Hat and Ubuntu *2 bigger GNOME backers) refused to cooperate (including with each other) and built new desktops.
SUSE and Linspire (2 of the biggest KDE backers) cooperated.
I detailed it all here.
https://liam-on-linux.dreamwidth.org/85359.html
Someone stuck it on HN and senior GNOME folk denied everything. I don’t believe them. Of course they deny it, but it was no accident.
This is all a matter of historical record.
My hot take is that domain names were a mistake. Phone numbers existed, saying “you use this number to access X” and have your own address book. Bank website not in address book? Well that’s weird isn’t it!
Granted, there’s obviously a lot of things to be done to improve security for browsing, just all this chasing down of unicode lookalikes seems like an interminably hard problem to resolve in a universal way.
Trust is hard enough as it is!
My hot take is that domain names were a mistake. Phone numbers existed, saying “you use this number to access X” and have your own address book.
Sounds a lot like compuserve used to be.
G Suite. I tried fastmail way in the past and there’s enough small things (like spam filtering not being great, and some general futziness I can’t recall) where I just didn’t feel like I was getting a useful experience.
Every time Ruff comes up, the fact that it’s super-hyper-mega-fast is promoted as making up for the fact that it’s basically not extensible by a Python programmer (while the tools it wants to replace all easily are extensible in Python).
But the speed benefit only shows up when checking a huge number of files in a single run. And even on a large codebase:
So I’m still not sure why I should give up easy extensibility for speed that seems to offer me no actual practical benefit.
A bunch of thoughts here!
For these kinds of tools performance (both speed and memory usage) matters a lot, because codebases are effectively unbounded in size, and because for interactive use, latency budgets are pretty tight. There’s also Sorbet’s observation that performance unlocks new features. “Why would you whatchexec this on the whole code base? Because I now can”.
Now, if we speak strictly about syntax-based formatting and linting, you can get quite a bit of performance from the embarrassingly parallel nature of the task. But of course you want to do cross-file analysis, type inference, duplicate detection and what not.
The amount of things you can do with a good static analysis base is effectively unbounded. At this point, maybe Java and C# are coming to the point of saturation, but everything else feels like a decade behind. The primary three limiting factors to deliver these kinds of tools are:
This is high-investment, high-value thing, which requires a great foundation. And I would actually call that, rather than today’s raw performance, the most important feature of Ruff. We can start from fast linting, and then move to import analysis, type inference, full LSP and what not.
From my point of view, Python’s attempt to self-host all dev tools is a strategic blunder. Python really doesn’t have performance characteristics to move beyond per file listing, so it’s not surprising that, eg, pyright does its own thing rather than re-use existing ecosystem.
All that being said, extensibility is important! And Python is a fine language for that. Long term, I see Ruff exposing a Python scripting interface for this. If slow Python scripting sits on top of fast native core that does 90% o the CPU work, that should be fine!
For these kinds of tools performance (both speed and memory usage) matters a lot, because codebases are effectively unbounded in size, and because for interactive use, latency budgets are pretty tight.
Yet as I keep pointing out, my actual practical use cases for linting do not involve constantly re-running the linter over a million files in a tight loop – they involve linting the file I’m editing, linting the files in a changeset, etc. and the current Python linting ecosystem is more than fast enough for that case.
There’s also Sorbet’s observation that performance unlocks new features. “Why would you whatchexec this on the whole code base? Because I now can”.
But what’s the gain from doing that? Remember: the real question is why I should give up forever on being able to extend/customize the linter in exchange for all this speed. Even if the speed unlocks entirely new categories of use cases, it still is useless to me if I can’t then go implement those use cases because the tool became orders of magnitude less extensible/customizable as the cost of the speed.
Long term, I see Ruff exposing a Python scripting interface for this. If slow Python scripting sits on top of fast native core that does 90% o the CPU work, that should be fine!
I think the instant that interface is allowed, you’re going to find that the OMGFAST argument disappears, because there is no way a ruleset written in Python is going to maintain the speed that is the sole and only selling point of Ruff. But by then all the other tools will have been bullied out of existence, so I guess Ruff will just win by default at that point.
they involve linting the file I’m editing, linting the files in a changeset, etc.
Importantly, they also involve only context free linting. Something like “is this function unused in the project?” wouldn’t work in this paradigm. My point is not that you, personally, could benefit from extra speed for your current workflow. It’s rather than there are people who would benefit, and that there are more powerful workflows (eg, typechecking on every keypress) which would become possible
But what’s the gain from doing that?
At minimum, simplicity. I’d much rather just run $ foo
than futz with git & xargs to figure out how to run it only on the changed files. Shaving off 10 seconds from the first CI check is also pretty valuable.
I think the instant that interface is allowed, you’re going to find that the OMGFAST argument disappears,
If you do this in the stupidest possible way, then, sure, it’ll probably be even slower than pure Python due to switching back and forth between Python and native. But it seems to me that that custom linting is amenable to proper slicing into CPU-heavy part and scripting on top:
Importantly, they also involve only context free linting. Something like “is this function unused in the project?” wouldn’t work in this paradigm.
There are already flake8 plugins that detect that sort of thing.
At minimum, simplicity. I’d much rather just run $ foo than futz with git & xargs to figure out how to run it only on the changed files. Shaving off 10 seconds from the first CI check is also pretty valuable.
All the existing tools have a “copy/paste this into your pre-commit config” snippet and then it Just Works. If you are indeed rolling your own solution to run only on the changed files, then I think you should probably pause and familiarize yourself with the current state of the art prior to telling everyone else to abandon it.
Sorry if my comments read as if I am pushing anyone to use Ruff, that definitely wasn’t my intention! Rather, I wanted to share my experience as implementer of similar tools, as that might be an interesting perspective for some.
That being said, I think I want to register a formal prediction that, in five years or so, something of Ruff’s shape (Python code analysis as a cli implemented in a faster language, not necessary specifically Ruff, and not counting already existing PyCharm) would meaningfully eat into Python’s dev tool “market”.
I think Ruff will gain significant “market share”, but for the wrong reasons – not because of any technical superiority or improved user experience, but simply because its hype cycle means people will be pushed into adopting it whether they gain from it or not. I’m already dreading the day someone will inevitably file a “bug” against one of my projects claiming that it’s broken because it hasn’t adopted Ruff yet.
The “not extensible by a $lang programmer” was a reason for not pursuing faster tooling in better suited languages for the web ecosystem, and everything was painfully slow.
In my experience, esbuild (Go) and swc (Rust) are a massive improvement and will trade extensibility for the speed boost every time.
I’ve been using Ruff’s flake8-annotations
checks to get a very quick list of missing annotations as I port a codebase. In a watchexec loop it’s substantially faster than getting the same information from MyPy or Pyright.
Likewise, in another codebase ruff --fix
has already replaced isort (and flake8
and friends).
I’ve never needed the extensibility, though. I’m curious, what do you do with it?
In a watchexec loop it’s substantially faster than getting the same information from MyPy or Pyright.
I’m not sure why you’d need to run it over the entire codebase in a loop, though. Isn’t that the kind of thing where you generate a report once, and then you only incrementally need to check a file or two at a time as you fix them up?
Likewise, in another codebase ruff –fix has already replaced isort
Again, I don’t get it: isort
will fix up imports for you, and my editor is set to do it automatically on file save and if I somehow miss that I have a pre-commit hook running it too. So I’m never in a situation where I need to run it and apply fixes across thousands of files (or if I was, it’d be a one-time thing, not an every-edit thing). So why do I need to switch to another tool?
I’ve never needed the extensibility, though. I’m curious, what do you do with it?
There are lots of popular plugins. For example, pylint on a Django codebase is next to unusable without a plugin to “teach” pylint how some of Django’s metaprogramming works. As far as I can tell, Ruff does not have parity with that. Same for the extremely popular pytest testing framework; without a plugin, pylint gets very confused at some of the dependency-injection “magic” pytest does.
Even without bringing pylint into it, flake8 has a lot of popular plugins for both general purpose and specific library/framework cases, and Ruff has to implement all the rules from those plugins. Which is why it has to have a huge library of built-in rules and explicitly list which flake8 plugins it’s achieved parity with.
I’m not sure why you’d need to run it over the entire codebase in a loop, though. Isn’t that the kind of thing where you generate a report once, and then you only incrementally need to check a file or two at a time as you fix them up?
I like to work from a live list as autoformatting causes line numbers to shift around as annotations increase line length. Really I should set up ruff-lsp
.
Again, I don’t get it: isort will fix up imports for you, and my editor is set to do it automatically on file save and if I somehow miss that I have a pre-commit hook running it too. So I’m never in a situation where I need to run it and apply fixes across thousands of files (or if I was, it’d be a one-time thing, not an every-edit thing). So why do I need to switch to another tool?
I don’t use pre-commit because it’s excruciatingly slow. These things are really noticeable to me — maybe you have a faster machine?
I don’t use pre-commit because it’s excruciatingly slow. These things are really noticeable to me — maybe you have a faster machine?
Can you quantify “excruciatingly slow”? Like, “n
milliseconds to run when k
files staged for commit” quantification?
Because I’ve personally never noticed it slowing me down. I work on codebases of various sizes, doing changesets of various sizes, on a few different laptops (all Macs, of varying vintages). Maybe it’s just that I zone out a bit while I’m mentally composing the commit message, but I’ve never found myself waiting for pre-commit to finish before being able to start typing the message (fwiw my workflow is in Emacs, using magit
as the git interface and an Emacs buffer to draft and edit the commit message, so actually writing the message is always the last part of the process for me).
I gave it another try and it looks like it’s not so bad after the first time. The way it’s intermittently slow (anytime the checkers change) is frustrating, but probably tolerable given the benefits.
I think my impression of slowness came from Twisted where it is used to run the lint over all files. This is very slow.
Thanks for prompting me to give it another look!
The way it’s intermittently slow (anytime the checkers change) is frustrating
My experience is that the list of configured checks changes relatively rarely – I get the set of them that I want, and leave it except for the occasional version bump of a linter/formatter. But it’s also not really pre-commit’s fault that changing the set of checks is slow, because changing it involves, under the hood, doing a git clone
and then pip install
(from the cloned repo) of the new hook. How fast or slow that is depends on your network connection and the particular repo the hook lives in.
I’ve never needed the extensibility, though. I’m curious, what do you do with it?
Write bespoke lints for codebase specific usage issues.
Most of them should probably be semgrep rules, but semgrep is not on the CI, it’s no speed demon either, and last I checked it has pretty sharp edges where it’s pretty easy to create rules which don’t work it complex cases.
PyLint is a lot more work, but lints are pretty easy to test and while the API is I’ll documented it’s quite workable and works well once you’ve gotten it nailed down.
Ah, so you and ubernostrum are optimizing workflows on a (single?) (large?) codebase, and you’re after a Pylint, rather than a pyflakes/flake8.
I’m coming at this from an OSS-style many-small-repos perspective. I prefer a minimally-configurable tool so that the DX is aligned across repositories. I don’t install and configure many flake8 plugins because that increases per-repo maintenance burden (e.g., with flake8 alone W503/W504 debacle caused a bunch of churn as the style rules changed — thank goodness we now have Black!). Thus, I’m happy to drop additional tools like isort
. So to me Ruff adds to the immediately-available capabilities without increasing overhead — seems like a great deal!
It seems like Ruff might slot into your workflow as a flake8 replacement, but you get a lot from Pylint, so I’d keep using the latter. You could disable all the style stuff and use Pylint in a slower loop like a type checker.
Ah, so you and ubernostrum are optimizing workflows on a (single?) (large?) codebase, and you’re after a Pylint, rather than a pyflakes/flake8.
I have both large and small codebases. I do use pylint in addition to flake8 – my usual approach is flake8 in pre-commit because it’s a decent quick check, and pylint in CI because it’s comprehensive. I’ve written up my approach to “code quality” tooling and checks in detail, and you can also see an example repository using that approach.
pylint is, in practice, very memory hungry and frankly slow.
Now i can’t go from there to recommending ruff for the simple fact that ruff is not checking nearly enough stuff to be considered a replacement IMO. Not yet at least. But I’ll be happy to see better stuff happening in this space (disclaimer: I’m writing a rust-based pylint drop-in replacement. Mostly for practice but also because I really suffered under pylint’s perf issues in a past life)
My admiration for ruff comes from the fact that I now have a single tool and a single configuration place. I don’t have to chase how to configure 10 different tools to do linting and ensuring that my python project has some guardrails. For example, my big annoyance with flake8 is that I can’t add it’s config in pyproject.toml
, it has to be a separate file. I really, really, just want to flip the switch and have various checks done on the codebase, and not scour the internet on how to configure these tools, since each has it’s own (quite valid) interpretation of what’s the right way to do things. I just want to stay away from ever creating setup.py
and all those other things I never understood why are needed to package some interpreted code (my dislike for python’s packaging is leaking here :)).
I’m curious, what do you need to change in the tools replaced by ruff? What additional checks do you need to implement?
I personally do not care about the config file thing, and I wish people would stop bullying the flake8 dude about it. Way too many people, back when pyproject.toml
was introduced for a completely different purpose than this, still treated its existence as meaning “all prior config approaches are now illegal, harass everyone you can find until they give up or give in”. Which is what people have basically tried to do to flake8, and I respect the fact that the maintainer laid out clear criteria for a switch to pyproject.toml
and then just aggressively locked and ignored every request that doesn’t meet those criteria.
I’m curious, what do you need to change in the tools replaced by ruff? What additional checks do you need to implement?
I already gave a reply to someone else talking about the whole ecosystem of plugins out there for flake8 and pylint, and Ruff is not at parity with them. So even if I wanted to switch to Ruff I would not be able to – it lacks the checks I rely on, and lacks the ability for me to go implement those checks.
I’ve been slowly but surely giving up on Python for some time, and I’ve often struggled to articulate the reasons why. But having just read some of the flake8 pyproject stuff, it’s hit me that most of it could be described as bullying at some level or other.
Python itself arguably bullies its users, with things like the async
-> ensure_future
change, sum’s special case for str because the devs don’t like it, blah blah. (I want to say something about the packaging situation here, and how much of a pain in the ass it is to maintain a Python project to popular opinion standards in 202x, but I recognise that not all of this is deliberate.) Black’s founding principle is that bludgeoning people into accepting a standard is better than wasting time letting them have preferences. Long ago, when I frequented #python, SOP whenever anyone wanted to use sockets was to bully them into using an external dependency instead. And longer ago, when I frequented python-ideas, ideas that the in-group didn’t take to were made, along with their champions, to run ridiculous gauntlets of nitpicking and whataboutism.
Of course none of the exponents of this behaviour identify it as bullying, but then who would? The results are indistinguishable whether they’re being a dick, evangelizing best practices or just trying to save everyone’s time.
In short I think that, if you don’t want to be bullied into adopting a solution that doesn’t really work for you, you are in the wrong ecosystem.
Some of us use pyflakes
on its own, and are thus used to the zero-configuration experience. The configurability of pylint
is a net negative for me; it leads to bikeshedding over linter configuration.
This is entirely reasonable. In my case, I started a new job and new project, and I’m not invested heavily in the existing python based toolchain, so ruff
was the right choice for us. I don’t like the way these sorts of minor differences get amplified up into existential crises anyway. And no, I’m not new on the internet, just tired of it all.
A thousand times this. $CUSTOMER had wildly divergent coding styles in their repos, and the project to streamline it meant configuring these traditional small tools and their plugins to conform to how PyCharm did things because it’s quite opinionated. And popular among people who are tired of it all.
The tooling included darker
, which is fine, though I personally do not like all of black
’s choices.
Eventually the whole codebase was blackened all at once and ruff replaced everything else.
The pre-commit is fast and my preferences aside, good arguments can be made for those two tools.
It is what it is, a business decision, and the right way to deal with it is to not elevate it to an existential crisis.
Outside the business world, if I had popular projects, I’d dislike ending up with PRs in the black style if the project wasn’t like that. Or having to set up glue for reformatting.
This is probably how all monopolization happens; people become rightfully tired of being ever-vigilant, and inevitably something bad will come out of the monopoly.
Like not getting OSS contributions because of the project’s formatting guidelines.
100% agree. This reminds me of the old “You have a problem and decide to use regexes to solve it. Now you have two problems.” Yes, your linting is faster but now “it’s basically not extensible by a Python programmer”, which means it’s more difficult for people to maintain their own tools.
What is this hex0 program that they are talking about? I don’t understand how that is the starting point, could someone expand?
The program is here: https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX/x86/hex0_x86.hex0
It’s a program that reads ASCII hex bytes from one file and outputs their binary form to the second file.
Yeah, I think this is pretty confusing unless you’re already very guix-savvy; it claims to be fully bootstrapped from source, but then in the middle of the article it says:
There are still some daunting tasks ahead. For example, what about the Linux kernel?
So what it is that was bootstrapped if it doesn’t include Linux? Is this a feature that only works for like … Hurd users or something?
They bootstrapped the userspace only, and with the caveat that the bootstrap is driven by Guix itself, which requires a Guile binary much larger than the bootstrap seeds, and there are still many escape hatches used for stuff like GHC.
reading the hex0
thing, it looks like this means that if you are on a Linux system, then you could build all of your packages with this bootstrapped thing, and you … basically just need to show up with an assembler for this hex0
file?
One thing about this is that hex0
calls out to a syscall to open()
a file. Ultimately in a bootstrappable system you still likely have some sort of spec around file reading/writing that needs to be conformed to, and likely drivers to do it. There’s no magic to cross the gap of system drivers IMO
Hex0 is a language specification (like brainf#ck but more useful)
no, you don’t even need an assembler.
hex0.hex0 is and example of a self-hosting hex0 implementation.
hex0 can be approximated with: sed ‘s/[;#].*$//g’ $input_file | xxd -r -p > $output_file
there are versions written in C, assembly, various shells and as it is only 255bytes it is something that can be hand toggled into memory or created directly in several text editors or even via BootOS.
It exists for POSIX, UEFI, DOS, BIOS and bare metal.
I have no existing insight, but it looks like https://bootstrapping.miraheze.org/wiki/Stage0 at least tries to shed some light on this :)
This article is linking to this online event on Tuesday, with some interesting looking topics! A shame the site doesn’t link to the actual lightning talk titles (stuff like “ How to create the nastiest test inputs ever.”), but it sounds right up the alley of lots of people here.
There was a lot of discussion/drama on reddit & twitter about this - mainly focusing on new restrictions that community members feel blindsided by.
Meeting Notes from Feb 14th say the board reviewed the policy:
The board reviewed the current final draft of the trademark policy and considered it broadly acceptable, with a query on the wording “We will likely consider using the Marks […] for a software program written in the Rust language to be an infringement of our Marks”, which seemed unintentionally strict and on which Ms. Rumbul would seek clarification from counsel.
So there seems to be a disconnect between the Rust community & foundation (which is not too surprising). Notably this document was released with a Google Form for comments, and the foundation will be responding next Monday, hopefully saying they hear the community and will be updating the policy accordingly.
Some of the outrage on Twitter is, imo, too much, though there are some fresh memes. In the future, I think the Rust community needs to have a better reaction to decisions the foundation is attempting to make, and obviously the foundation needs to be ultra-responsive to the community too.
I think the Rust community needs to have a better reaction to decisions the foundation is attempting to make, and obviously the foundation needs to be ultra-responsive to the community too.
To be honest, I am disappointed with the reaction. Not the reaction to the proposal itself (a lot of good points have been made there) but to the anger and conspiracy theories directed at real people. Like can we not strongly object to something without ascribing the worst possible motives?
I think having these kinds of discussions publicly is really great for an open source project. But I really wish it didn’t come with this cost. The Foundation too has lots of room to improve its communication here. Just dumping the proposal on people’s laps and saying “comment on this” maybe isn’t the best RFC process.
Maybe I just haven’t dug deep enough, but… I can’t say that I’ve seen people attack individuals? There’s a ton of deserved anger directed at the Foundation and the Project, because the Foundation and the Project are clearly hostile towards the community if this is the direction they want to move in (even if the policy they end up with will be a watered down version with less insane restrictions), but that’s it.
Yeah? If there was no desire to disallow the community from using the words “rust” and “cargo” and the logo, this draft would never have been written and published in its current state. It’s either extreme incompetence and unbelievable stupidity, or an expression of hostile intent. I will give them the benefit of the doubt and assume the latter.
Erm, can you really not imagine any other options?
If it helps, this proposal comes following a survey asking the community what they want from the trademark policy. The results of which were given to legal counsel to prepare a policy, which in turn was sent back to the community for comment (where we’re at now). This continued community involvement is not hostility.
And the proposal was designed to allow fighting off things like embrace, extend, extinguish threats or hate groups using the logo. It was not intended to be strictly enforced against the community. The Foundation has no desire to be litigious.
You’re right. I can not imagine other options. The proposed policy is extremely damaging to the community, which is either a sign of hostility or incompetence.
Another option from https://blog.rust-lang.org/inside-rust/2023/04/12/trademark-policy-draft-feedback.html:
Fundamentally however, the question at hand is whether we want Rust to be a trademark or not. If we want to be able to defend Rust as the brand it is today, trademark law fundamentally constrains how permissible we can be, especially in public guidelines.
Our answer to the question of whether Rust should be a trademark has been “yes”, just as it has been since before Rust 1.0. Furthermore, our goal is to make a policy that is as permissive as it can be without substantially giving up our right to define what Rust is and is not in the future. Not all open source projects have retained that right.
They way I understand it:
If you want trademark policy to prevent someone from releasing an incompatible dialect of Rust and calling that Rust, the policy should also prevent (or require approval for) many benign usages, otherwise the law does not compile. That is, given the current laws, there’s a binary choice between “allowing anything” and “forbidding anything”.
And Rust is the only language in the history of programming languages which encounters this issue… how? Even Oracle is less restrictive with Java, are they not?
I can see two explanations for this:
Can’t comment on the first one, but the second one seems plausible:
Python, Java, C do not try to prevent dialects. Android using a language called Java does seem like an evidence that Java’s trademark does not prevent forks.
It also is true that a sizable portion of Rust community wants to actively prevent incompatible dialects of Rust.
I was honestly very shocked at people’s nasty reactions. Like just purely forgetting there are people on the other side (that, in theory, you would like to convince not to do a thing!)
An insane amount of snark that’s… honestly, kinda unprofessional? The material criticisms themselves are very valid IMO (did the Rust Foundation talk to any outside party in a more private matter for an outside review before this? It doesn’t feel like it) but Twitter has broken people’s brains on this stuff.
What’s wrong with not donating to a foundation which is clearly actively hostile against a community you care about? The Rust Foundation has made its wishes clear, and it’s obvious that it’s run by people I wouldn’t want to support either.
These meetings are always at a bit unusual times for me so I have a hard time joining them live, but watching through the recordings afterwards has been nice. As with so many things doing stuff with other people really does make it harder to bail on it, and even if the information density is low, it’s still a nice way to reinforce emacs knowledge IMO
Like Lua, Roc’s automatic memory management doesn’t require a virtual machine, and it’s possible to call Roc functions directly from any language that can call C functions
Was reading up on Roc and saw this, does anyone know what this refers to in practice?
Roc uses reference counting for memory reclamation, which is a local technique, instead of a tracing garbage collector, which is a global technique (“requires a virtual machine”).
which is a local technique
I’m curious what you mean by this? Are you referring to something more like newlisp, which ensures local memory is freed immediately after use? Or did you have something else in mind?
which is a global technique (“requires a virtual machine”)
Nothing about tracing garbage collection requires a virtual machine. It does make things easier to discover “roots” and be more precise, but as a counter example, the Boehm-Demers-Weiser GC just scans the stack for anything that might look like a pointer, and all you, as a programmer, have to do is call GC_{malloc,realloc,free}
instead of the typical malloc, realloc, free
. It’s incremental, generational, but not precise. It can miss things. (this is a very simplified explanation of Boehm, a ton more details here)
Tracing garbage collectors do not require a virtual machine, that statement (and not just that statement) is confused.
Feedback: it would be better (and I know it takes time) to explain why it’s confused instead of pointing it out and leaving it at that.
For a deployment failure, immediately reverting is always “Plan A”, and we definitely considered this right off. But, dear Redditor… Kubernetes has no supported downgrade procedure. Because a number of schema and data migrations are performed automatically by Kubernetes during an upgrade, there’s no reverse path defined. Downgrades thus require a restore from a backup and state reload!
I am so glad when tooling goes out of their way to work well for reverts. I think that a lot of operational tooling doesn’t when it totally should (looking at you, Postgres). Obviously hard work and hard to test, but when you have software where you can run one version behind and the current version next to each other, suddenly this sort of stuff feels a lot less scary.
It is possible to perform an incremental update of your Kubernetes control plane node, i.e. running multiple minor versions in parallel. I don’t know how officially supported it is but it’s normal to have at least a small window when you update them in order.
I think it was Audacity that had added telemetry …. in the form of Sentry bug collecting. People really got super pissed off and I was honestly a bit flummoxed. Surely bug reports are reasonable at some level?
It does feel like the best kind of telemetry is the opt-in kind. “Tell us about this bug?” Stuff like Steam has user surveys that are opt-in. It’s annoying to get a pop-up, but it’s at least respectful of people’s privacy. I have huge reservations about the “opt in to telemetry” checkbox that we see in a lot of installers nowadays, but am very comfortable with “do you want to send this specific set of info to the developers” after the fact.
IIRC, Steam also shows you the data that it has collected for upload and gets your confirmation before sending it.
I also appreciate that they reciprocate by sharing the aggregated results of the survey. It feels much more like a two-way sharing, which I think really improves the psychological dynamic.
Unfortunately, bug reports are just a single facet of product improvement that seems to get glossed over. If you can collect telemetry and see that a feature is never used, then you have signals that it could be removed in the future, or that it lacks user education. And automatic crash reporting can indicate that a rollout has gone wrong and remediation can happen quicker. Finally, bug reports require users to put in the effort, which itself can be off-putting, resulting in lost useful data points.
If you can collect telemetry and see that a feature is never used, then you have signals that it could be removed in the future, or that it lacks user education.
But it can be very tricky to deduct why users use or do not use a feature. Usually it can not be deduced by guessing from the data. That’s why I think surveys with free-form or just having some form of channel like a forum tends to be better for that.
A problem with both opt-in and opt-out is that your data will have biases. Whether a feature is used by the people who opted in is not the same question as whether people (all or the ones who pay you) make use of it. And you still won’t know why so..
There tends to be a huge shock when people make all sorts of assumptions and then because they try them and they still fail they start talking to users and are hugely surprised by thing they never thought off.
Even with multiple choice surveys it’s actually not the easiest. I am sure people that participate in surveys of technologies know how it feels to when the data is prevented give wrong assumptions as interpretation.
It’s not so easy and this is not meant anti-survey, but to say that this isn’t necessarily the solution and it makes sense (like with all sorts of metrics) to compare that with actual (non-abstract/generic) questions to end up implementing a feature, investing time and money only to completely misinterpret the results.
And always back things up by also talking to users, enough of them to actually matter.
But it can be very tricky to deduct why users use or do not use a feature. Usually it can not be deduced by guessing from the data. That’s why I think surveys with free-form or just having some form of channel like a forum tends to be better for that.
Asking users why they do/don’t use every feature is extremely time consuming. If you have metrics on how often some feature is getting used, and it is used less than you expect, you can prepare better survey questions which are easier for users to answer. Telemetry isn’t meant to be everything that you know about user interactions, but instead a kick-off point for further investigations.
I agree. However that means you need both and that means that you cannot deduct a lot of things simply by using running some telemetry system.
Also I am thinking more of a situation where when you make a survey and add (optional) text fields to provide context. That means you will see things that you didn’t know/think about, which is the whole point of having a survey in first place.
That’s something I’m not so sure about either though. I don’t really have a problem with anonymous usage statistics like how often I click certain buttons or use certain features. But if a bug report includes context with PII I’m less keen on that. Stack variables or global configuration data make sense to include with a bug report, but could easily have PII unless it’s carefully scrubbed.
Does MacOS do stuff to help with cleaning up orphan processes that, for example, Linux won’t do? Over the years I’ve used multiple tools where I would end up with process leaks but only on my Linux machines. Meanwhile Mac-using coworkers would not have issues (this would be of the “file change means we should kill process and restart it but for some reason on Linux I’d end up with multiple process restarts”).
I think so, but I’m not at all qualified to answer. I know that a lot of the low level process stuff is Mach based, and I’m pretty sure there are hierarchies where the death of a parent also kills all the children. (There is or used to be a ‘loginwindow’ process that handles GUI login, and if that process ever crashed, your whole GUI session went kablooie.) And I believe the messages relating to child processes are sent with Mach messages, not signals.
Thanks to this blog I finally understood the GC difficulties people always mention.
I am very excited for when we can get good support for suspension.
I was following the stack working group which is basically going to help with stuff like call/cc
or generaly “async-y code that has a sync interface” (think blocking IO in Python). Looks like the last 2 meetings were cancelled due to lack of an agenda which… kinda worries me but fingers crossed!
Yeah one interesting thing is the impossibility of stack scanning with a Harvard architecture!! (separate code and data space, as opposed to Von Neumann unified architecture) He didn’t use that term in the post, but I remember it being used in the WASM docs.
I’m not a fan of imprecise scanning, in C or C++ at least. So I guess that whole technique is out the window in WASM, and you need the compiler to determine roots, not determine them at runtime.
What I landed on, and have seen on many projects, is having util
packages within other packages. You don’t get a top-level util
, but util
things will be landing places for helper functions that are useful in a specific domain without being “about” something.
I’ve never seen these grow too big, as generally there’s a willingness to pull out bigger chunks into their own files just out of pragmatism.
What I landed on, and have seen on many projects, is having util packages within other packages. You don’t get a top-level util, but util things will be landing places for helper functions that are useful in a specific domain without being “about” something.
This is why I’ve done too. I move common code within a package down into util
, which might be exported and if usable documented, or internal
, which are implementation details and should never be used.
Kind of surprised about a lot of these points, in particular the fact that they have design decisions that seem to fall over given the bimodal nature of discord servers.
The static buckets for timing and using a database with cheap writes and more expensive reads is like… for most systems you can get away with this (and they are getting away with it for the most part IMO). But given this is their core system, it feels like by now adding a different sort of indexing system for scrollbacks that allows for certain discords to have different buckets seems very important.
EDIT: honestly it looks like the work is almost there. Bucket sizes being channel dependent seems like an easy win, and maybe you have two bucket fields just built-in so you can have auto-migration to different bucket sizes and re-compact data over time, depending on activity.
I don’t know about Cassandra’s storage mechanisms, but I do know that a lot of people with multitenant systems with Postgres get bit by how the data is stored on disk (namely, you make a query and you have to end up fetching a lot of stuff on disk that is mostly data you don’t need). It feels so essential for Discord for data to properly be close together as much as possible.
I’m also surprised that they would bias for writes. My intuition is that chat messages are always written exactly once, and are read at least once (by the writer), usually many times (e.g. average channel membership), and with no upper bound. That would seem to be a better match for a read-biased DB. But I’m probably missing something!
It’s a great question and the answer isn’t straightforward. Theoretically btree based storage is better than LSM for read heavy workloads but almost all distributed DBs–Cassandra, BigTable, Spanner, CockraochDB–use LSM based storage.
Some explanations why LSM has been preferred for distributed DBs:
I’m assuming they are broadcasting the message immediately to everyone online in the channel, and you only read from the database when you either scroll back far enough for the cache to be empty, or when you open a channel you haven’t opened in a while. That would avoid costly reads except for when you need bulk reads from the DB.
Yes, we have realtime message via our Elixir services to distribute messages: https://discord.com/blog/how-discord-scaled-elixir-to-5-000-000-concurrent-users
I’d be very surprised if the realtime broadcast of a message represented more than a tiny fraction of its total reads. I’d expect almost all reads to come from a database (or cache) — but who knows!
It’s a chatroom and people don’t scroll way up super often. They only need to check the last 50 messages in the channel unless the user deliberately wants to see more. There might be a cache to help that? But you can stop caching at a relatively small upper bound. That said I am curious how this interacts with search.
Kind of surprising to me that JS engines would be sticking to UTF-16 despite so much content being UTF-8. I wonder if it would be a worthwhile change in practice to do that kind of migration?
JavaScript used UTF-16 because it wanted to look like (and interoperate with) Java, which used UTF-16. Java used UTF-16 because a lot of the designers worked on the OpenStep specification, which used UTF-16. OpenStep used UTF-16 because they added a unichar type back when 16 bits was sufficient for all of Unicode and couldn’t change it without an ABI break. The same story as the Windows APIs.
There are good reasons for UTF-16 (e.g. better cache usage on CJK character sets than UTF-8 or UTF-16), but none of them apply in typical JavaScript.
JSON has the interesting property that it is encoding agnostic without a byte-order mark. The first character, as I recall, must be either a { or [, and these have different byte sequences in UTF-8, UTF-16, or UTF-32, in either byte order. Apparently recent versions of the spec require UTF-8, but earlier versions had an optional BOM, and some tools added one. Fortunately, the BOM also has an unambiguous encoding and so you can easily detect it and detect encoding with or without it.
0
, true
, and "foo"
are also valid JSONs. I think some parsers reject it without a flag passed, but AFAICT, it’s spec-compliant.
This may have changed in more recent versions, but it was not last time I read the spec (>10 years ago). If parsers reject it unless you pass an extra flag, that’s usually a hint that it’s a non-standard extension.
See https://datatracker.ietf.org/doc/html/rfc8259#section-2
A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts.
JSON has always supported raw literals - the “exceptions” are due to JS properties that people think are literals: undefined, Infinity and NaN. Because the original JSON parser was actually just “validating” (via regex) strings downloaded from the internet and throwing them to eval. This was such a significant part of the net for such a long time that I made JSC’s eval implementation first try to parse the input as a JSON string before throwing it at the interpreter. Because non-jSON tokens are hit fairly rapidly in non-JSON scripts the cost is negligible but the win when someone is actually trying to parse giant amounts of JSON is orders of magnitude both in CPU time and memory usage.
I found the original JSON rfc proposed by Crockford and the “implementation” he includes even in that supports top level values. The reality is that most of the problems in JSON’s syntax boil down to Crockford wanting to be able to avoid actually parsing anything and just pass strings to eval, e.g the aforementioned Infinity, NaN, and undefined “keywords” not being supported, the lack of comments, etc.
Switching JS strings to UTF-8 would have the effect of making all JS ever written very subtly wrong. It would probably take a decade for everyone to successfully migrate.
One language I’m aware of that successfully made the switch is Swift, but it was still relatively early in its life (5 years after initial release): https://www.swift.org/blog/utf8-string/ It was part of a large release with many desirable features, and the compiler for years (still, I think) supported both a Swift 5 and Swift 4 more with per-module granularity.
JavaScript strings aren’t utf16, they’re ucs-2. Web pages display the content of such strings as if they were utf16, but the JS string representation from the PoV of the language is not. There’s a semi-joke spec WTF-16 that’s used to describe how browsers have to interpret things.
The core issue is that JS strings are exposed to the language as sequences of unrelated 16bit values, which means they can (and do) contain invalid utf16 sequences. Because of that there’s no way to robustly go back and forth between the ucs2 data and utf-16 without potentially losing data, and from there you can see why you also can’t go to utf-8. Note that this wouldn’t have been avoided by character iterators rather JS’s historical indexing because the iterators would have been over a sequence of 16bit “characters” as well \o/
Now all that aside, the major production level JS engines (at this point just SM, JSC, and V8) all do a dual encoding such that any string that doesn’t contain a character value greater than 127 is stored and processed as 1 byte per character. The performance sensitivity of this is such that (via the wonders of templates) JSC at least functionally has 2 complete copies of the JS parser (technically there are 4 because the JSC parser has validating vs AST construction modes but the codegen for the validating mode is so much smaller than when building an AST that AST side is the important bit). Similarly the regex engines will compile multiple versions of the regex to handle 8bit vs 16bit strings.
The amount of time I spent during my career helping people with Git or SQL issues is mind boggling. IMHO, this is not the fault of Git or SQL, it’s the fault of our industry.
If a carpenter didn’t know how to use a sawbench or a nailer, we would consider them a bad carpenter. But if a developer doesn’t want to learn the standard tool, and just want get their work done without learning how to master anything, this is somehow acceptable…
I dunno, it seems like a bit of both … Both git and SQL could be designed in a way that retains all their power but with a user interface that makes more sense, and is easier to learn.
And easier to remember – I definitely find myself forgetting SQL stuff I learned, simply because I use 10 other tools 10x more than SQL. git has less of that problem because I just write down all the commands in shell scripts, and mostly forget them.
Good design critique of SQL: https://www.scattered-thoughts.net/writing/against-sql/
Also software is pretty young, so old software systems often have “scars” of things we wouldn’t do today. We could make them better with modern techniques (at least in theory). SQL is powerful but also has a lot of mistakes and bad evolution.
git has a good core, but also a lot of mistakes. Unix shell too.
Just to clarify a bit, if someone said
“It’s the fault of our industry that programmers are resistant to engaging with the tech they work with”
“It’s the fault of our industry that common tools are underinvested in, and we collectively settled on git”
I would agree somewhat with both things. (Although I think there are actually many and drastically worse possible outcomes than git; overall I’m not upset with it compared to e.g. SQL)
I think the problem is mainly that software is just a hodge podge of different crap at every job, which keeps changing.
So people are reluctant to invest any time in one thing. They do the minimum to fix stuff (kinda) and move on. That’s pretty much rational.
There are some things worth investing time in, but they don’t know what those things are. SQL is probably one of those, but it’s often covered up with many leaky abstractions, which are largely due to its inherent flaws (see Against SQL post)
Kubernetes is another conundrum … do I actually invest time in this, or do I just copy and paste some YAML and hope it goes away? (Ironically k8s itself seems to encourage sloppy and superficial usage; maybe part of its appeal is that it doesn’t demand any kind of technical excellence)
My opinion is that Kubernetes is a flawed implementation of a flawed architecture, and it’s worse than its predecessors
https://lobste.rs/s/yovh5e/they_re_rebuilding_death_star_complexity#c_mx7tff
People may argue the “industry is settling on it”, but either way, it’s definitely not because it’s not possible to better, or that those are state of the art ideas. There were/are many competing systems like those from CoreOS, Mesos, Hashicorp, and so forth but we got the one that’s free with a lot of hype
We are living through the worst era of software development IMO. Every investor knows that the first company to nail a market will get a nearly impenetrable 50% of the market and the next 5 companies will share the next 49%. So opportunity costs are all that matter. This has always been true to some extent. However in the 1970-2000 era we were still figuring out through trial and error how to forecast software development, how to invest in it, and we were limited by firmer hardware timelines. So there was enough time to do things right or go back and fix hacks while waiting for another part of the org to catch up. Since the 2000-2001 dot com bubble bursting businesses and investors have figured out how to match investment to forecasting to scale software development nearly infinitely. That is to say that while a given project might but benefit from a staffing increase, the big software companies have rarely found it to be the case that hiring another developer would not lead to a marginal profit increase. The result is booming salaries but also an insistence that no one ever take more time than absolutely necessary for any given task because the way to back those high salaries is to make sure the company wins enough of these races to market. Personally I’m looking forward when the frontiers shrink and we focus on carrying costs in addition to opportunity costs, even if that means lower salaries.
Heck, next time I onboard a developer I’m planning to start the process with “well, can you touch type?”… I think that’s part of the reason people hate being asked to live code in interviews. Many can’t even use a keyboard properly.
As a developer with cerebral palsy, I hate that question. I never look at the keyboard when I type, but I’ve been repeatedly told that I don’t “touch type” because I don’t put the “correct” finger on each key. The fact that the bones in my arm are fused and my hand won’t move that way doesn’t matter - I was told that coding is like carpentry and manual dexterity is more important than intellectual curiosity.
Thankfully, my present employer never even asked that question.
Then you’re touch typing, and pretending otherwise to your face is being kind of an asshole. Maybe you could improve your technique, or use a specially crafted layout or something (Dvorak himself did design one-handed layouts for instance), but in my experience the most important part of touch typing is removing the need to nod constantly.
Now if your condition prevents you from typing faster than say 30 words per minutes, I’d say that would count as a slight disability, and discriminating you on that basis would be wrong, possibly even illegal. That being said, being able to type fast enough remains relevant if it was a prelude to live coding: we ought to give more slack to people who can’t type fast, else we’d just be discriminating against typing speed, which probably wasn’t the goal to begin with.
Same. I know HOW to touch type with f and j and home row. I choose not to, on some vague perception of avoiding carpal tunnel by just having my hands somewhere on the keyboard instead of a rigid location.
I get about 90wpm without looking at a keyboard at all. Though if I’m not paying attention at all, I can type near-total gibberish.
Rgus gaooebs wgeb U tgubj U;n guttubg sine jets vbyut U;n actyakky guttubg ybrekated keys but then eventually I hit a key near the edge of the keyboard and my hands re-align to the keyboard correctly.
I do not relate to my colleagues who will be spartan to the point of incomprehensibility because they can’t type fast enough to get their questions into a group chat. Some people have a hard time communicating remotely, and I blame the inability to type fast enough.
Wow, I’m so sorry to hear that. Touch typing is a very important and valuable skill at this point in time, but “using the right finger” hasn’t been relevant in my whole lifetime.
I agree that we should have higher expectations for people learning the tools of the craft, but there are also hobbyist wood workers in this world who can make things without knowing all their tools. A nail gun is a huge improvement on a hammer, so why shouldn’t we strive to make a better version control? We have the capability to build powerful tools that allow both power users and hobbyists to contribute code.
I’ve been using git for over 10 years myself. I teach git to semi technical university students and I’ll admit I dread trying to show them some of the things mentioned in this post.
You hit the nail on the head with that.
Here’s to hoping some group takes up the mantle to provide a Pijul web client that uses simple technologies ala cgit & SourceHut. There’s a lot of things the VCS tool does right, but I’m skeptical of the web rewrite for “the edge”—along ith how easy it would be self-host and how truly decoupled from Cloudflare’s platform it will be in practice.
Pijul already has a comparable system to cgit or SourceHut. It is called Nest.
I believe the parent comment was referring to the “edge” Nest rewrite using Cloudflare Workers and TypeScript: https://pijul.org/posts/2023-05-23-nest-a-new-hope/
Right. Well you can still run the new version of Nest on one of the host-your-own cloudflare workers runtimes, which seems easier than setting up the 3-4 services you need for each sourcehut component, but whatever.
Easier compared to what? If you’re self-hosting, that’s probably a lot more difficult to set up and administer. I believe there’s a NixOS module for SourceHut that makes it as easy as
enable = true
and some minor config to get running. I’m skeptical of how this whole edge thing is going to work out in the long run. Businesses are competing in incompatible ways, and it’s all “free” for now til it isn’t (see Heroku).However, there should be nothing stopping from some entity to writing a different front-end for the system.
Sure, it’s gonna help them. But they better learn how to use it before they start, or are we going to blame Bosch (or whoever made the nail gun) when the poor hobbyist nails their foot to the floor accidentally?
If git had half the ergonomics and safety features of a hobbyist nail gun, there’d be a lot less complaining about it.
If a nail gun was as complicated to use and reason with as git is, maybe we would.
I absolutely agree with your general point, but the metaphor is a little problematic when it comes to SQL. Unfortunately for software developers, there are 20 companies that make some form of a “nail gun” and they all work pretty differently, are appropriate for different situations, involve different trade-offs, and some of them are expensive so they aren’t taught in schools. To make matters worse, if you use some of them the way you would use others, you’ll shoot yourself in the foot.
I think the problem is not necessarily “not-knowing”, but perhaps more a reluctance to engage with things.
I wouldn’t expect somebody to know PostgreSQL specifics because there’s a solid chance they might not have interacted with PostgreSQL yet in their careers. However I do expect a curiosity around the tools in use in any given team.
I’ve not personally worked anywhere where database management is outside of the remit of the team who primarily uses the database - even when we’ve had DBAs. As a result, even if an ORM is in use, to perform our jobs well, the teams I’ve worked in have always needed to develop database specific knowledge.
Additionally, there are more important transferrable principles than not. For example, using query planners to estimate costs of queries before landing them into the code, or understanding what an ORM is generating so we don’t inadvertently ship something that has odd query patterns. These things get you 80% of the way to sensible database usage, IMO.
I have seen, at times, people express that they shouldn’t have to know these things. I think that attitude really complicates the work and social dynamics of teams.
But carpenters aren’t expect to know every kind of ‘nail gun’, they are mostly expected to have a broad knowledge of the principles and a deep knowledge of one or two specific types which form part of their tool belt. Of course they should have the willingness to learn more if a specialized job calls for it, but usually they can get by with the tools in their toolkit.
If nail guns that fired forwards 90% of the time, but I’m some situations would fire only when you didn’t press the trigger, or would fire at 90 degrees to the angle that you’re aiming them, then carpenters wouldn’t adopt them. In contrast, developers happily adopt such tools and blame the user when they take someone’s hand off.
When is git doing firing backward?
Git is very simple, it’s 3 entities (blobs, trees and references) datamodel and a CLI to create/read/update/delete these entities locally and synchronize them with a remote server. In my analogy, git was the nail gun, and SQL was the table saw.
Very often, when I help fellow developers with their git problem, the first issue they have is that they don’t even know what they are trying to achieve themselves. They just want it “to work.” So even if we would just redesign git from the ground up, the most user-friendly possible, we would still facing the same issue: software engineers don’t want to learn the tool, regardless of how simple the tool is.
For example, how many time have you seen well documented well designed API being misused, because the user of the API was just in a hurry? I’ve seen it hundred of times over my career personally.
Git-the-concept is indeed as simple as you described, but
git
-the-CLI is undeniably far from it.The core paradox with git is that it’s simultaneously a thin wrapper around its data model and plumbing, yet porcelain operations rarely intuitively map to operations on the model.
Every year when we onboard new juniors there are a ton of things I see them struggle with, despite their best intentions. Some of my favourites:
git add
differs depending on whether the path is a submodule or not.git add some/path
andgit add some/path/
are equivalent, unlesssome
is a submodule, in which casegit add some/path
adds the submodule butgit add some/path/
loads all the files in the submodule in your repository (which is something that shouldn’t even be allowed IMHO but anyway…)git add
is offloaded to the shell, sogit add .
doesn’t actually mean “stage all changes in current directory”, it means “stage all changes to every existing entry in the current directory”. If an entry has been deleted, it won’t be added. Forget juniors, I screw up every other refactoring commit this way just because I forget.git rebase --continue
at this point and pop your stash back right away you get to meetgit fsck
.Y’all like crafts analogies, take it from someone who was an electrical engineer at one point: if a salesperson showed me a soldering iron that works like this, they’d either have to set up an embarrassing appointment with a proctologist or they’d unlock a new fetish but either way, I’d be shoving it up their butts.
Then again, I’ve never seen a soldering iron that claims to rework a board where someone else came in and soldered different components to the pads you were working on while you weren’t looking… and might reasonably expect that one that did was significantly more complex than one that didn’t.
I really dislike SQL’s failure to be eminently obvious to people who learn about data structures. This is likely super unpopular but I’m convinced that a version of SQL that doesn’t have query optimizations and just are like “write out the query plan” (which is a tree structure! People understand trees! They understand the concept of indices) would be much easier for developers to deal with.
Data analysts are like… I understand the theory but I’ve never in practice seen firsthand a situation where companies don’t end up having programmers write the SQL. I know these people exist! Just never seen it myself
Tho “just write the queryplan”…. might end up totally messing up portability so this is tricky. In practice loads of people have implementation-specific code for table creation and sometimes for querying but this model would probably make things tougher on that front.