Despite doing what every good sysadmin does exactly once in their career and naming servers after Federation starships
I now blame you for placing this idea in my head.
It’s an age thing. I’m a little younger; I’m in the cohort that, exactly once in our career, names servers after Roman or Greek pantheons. I think that all of us have, at some point, administered a machine named hera
. My younger peers are using chemical and molecule names, or Batman villains.
The beauty is as corporate media expands, there are more and more sequels to name servers after. Rocky VII: Adrian’s Revenge!
(on stuff existing forever) Did it ever work for anyone? — No…. But it might work for us!
The hubris
Unicode contains the letter 𓂺, an Egyptian hieroglyphic invented thousands of years ago. Unexpected things can endure for a long time.
I feel like you chose this particular example of a hieroglyphic letter for a specific reason but I can’t put my finger on it.
That’s because a lot of default OS fonts leave that symbol, and only that symbol, out. Shows up fine on Android though.
It’s shown for me in Chrome on macOS, but not in Chrome on Windows.
For those who can’t see it, it’s a penis, shown in the act of either micturition or ejaculation.
When it comes to CORS, the winning move is not to play. Why do you need separate origins? Have one URL which proxies to two backends and then you don’t need CORS. I don’t understand your domain, so maybe you can’t do that, but figuring out if you can avoid CORS is always step 1.
In terms of cookies, most of the time you can use https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie/SameSite which mitigates CSRF for users with modern browsers.
https://srccon.org the conference for coders in the journalism world and vice versa.
This isn’t unique to software. We recently had a few of the vertical supports in the lower tray of our dishwasher break. The assembly that contains them is a separate part, but it’s not available as a spare, you have to buy the entire lower tray. The cost of this is about a third of the cost of the dishwasher. It was still cheaper to fix than replace, but if three things go wrong it isn’t (and the lower tray was one of the cheapest replacement parts available).
The only reason that this is possible at all is EU regulations mandating availability of spares for white goods for a period of several years after the devices are last sold.
The only reason that this is possible at all is EU regulations mandating availability of spares for white goods for a period of several years after the devices are last sold.
I firmly believe that there should be regulatory minimum support periods for internet connected devices, and even when the minimum support period is up you should be required to release the source code if you don’t keep supporting it.
Humans have a pretty small “context window” and a large part of programming is introducing structure so that move things can fit through that window. If you’ve seen map(f, items) before you can just represent it as one mental token instead of needing to use all the token of [f(i) for i in items] or worse whatever that would be in assembly. Using an AI that can keep track of all this junk for us will let us write “worse” code that will have other benefits, like the potential for optimization and removal of dependencies. Remember John Carmack’s advice to inline all functions? That’s bad from a programmer’s point of view, but if I can just get an AI to deal with all that crap, maybe it’s worth it in more circumstances than it was before.
In my understanding, the “inline your functions” idea from Carmack was meant for readability and not performance – it being easier to read and understand code in one place rather than having to jump around to understand all the functions and abstractions. I kind of vibe with it and tend to keep logic that’s still all “tentative” from being broken up into pieces as much possible, letting the pieces emerge rather than trying to design too hard upfront. It does depend on having APIs you’re working over that are at a decent abstraction level too.
I tend to invest the early design / modularity effort more into the data structures than the functions. It’s possible to push the code in the functions around later if the lang is good for refactoring (some make it more of a chore).
Remember John Carmack’s advice to inline all functions?
I don’t, but it’s very bad advice with a vaguely modern compiler (my guess is Carmack said it in the ‘90s). Always-inline attributes exist (and work) now and so you can structure your code for humans to read, with clear abstraction boundaries, but still have all of the optimisation benefits of writing everything in a single function.
The advice mentioned above. It’s from 2007.
Yes, yes, yes! Webpack was written by astronaut architects who were obsessed with making things possible but not with actually solving problems. In practice, it meant that even though no one actually has interesting bundling requirements, everyone ends up writing their own bundling logic for some reason. Parcel and ESBuild also got this right.
One of the appealing things when first getting into Go was the lack of configuration required for external tools like linters, and just using the built-in gofmt, govet staticcheck and use gopls in your editor.
Not sure I’d want to run all these in this fashion, but I guess that’s preference.
Interesting point of view. Yeah, Go works perfectly out of the box and the tooling is also great (sure, there are things that can be better but comparing to other ecosystems it’s definitely better).
As I mentioned in the post: linters are these small wheels when you’re learning to ride a bike. If you’re ok without them - what is the point to attach configure them?
By example for most it’s ok to have misspell in the code, for me too mostly. But if I can make my code slightly better for a small price, so why not? :)
I don’t think it’s just “small wheels when you’re learning how to ride a bike”, because the Go project refuses to add warnings they can’t really add new diagnostics which would break existing code, and there’s a lot of common errors in go code you may want to guard against, which requires a linter.
E.g. append
errors, ignoring error
return values, creating interface values from possibly nil typed pointers, closing over loop variables, dead stores, passing locks by value, as well as more defensive practices you may want to impose in larger codebases like full slices, …
Warnings are added via vet. “closing over loop variables” and “passing locks by value” are both covered. edit: they refuse to add vet warnings that have the potential for false positives.
Yeah, I can see both sides of this:
-WAll
and everyone needs to run it to ensure correctness.Rinse and repeat. :-)
I think it’s more subtle than that. I think Go situation is isomorphic to Rust.
In Rust, compiler emits errors and warnings. False positives are not allowed. New warnings for old code are added over time. This creates a problem that, if you deny warnings during a build, and warning in dependency can lock you out of a new compiler. For this reason rust has a special flag which tells it to emit warnings in user’s own code, but not in dependencies. So, errors are for all code, warnings are for your code.
Go does exactly the same thing, except that it’s a separate subcommand (vet
) rather than a special flag for the compiler.
Totally agree. There are so many errors in Go that are only caught by the linter and not the compiler. The linter is necessary.
Many of the things discussed here, and elsewhere, are actually highlighted by Goland, which is one reason I’m a loyal JetBrains user.
It’s funny that for the ad they use the American mispronunciation of Wang with the æ of cat instead of ah. “Wang” and “Wong” are the same name; it’s just different transcriptions of the same vowel. It means King in Chinese, so they could have also called it Royal Computers or King Computers or something if they wanted it to sound less foreign to Americans, but I guess they wanted the real name of Dr. Wang but they didn’t care enough to have it pronounced correctly.
I like Zig’s “preference for low-dependency, self-contained processes” too, but I would balance it with the principle of “maintain it with Zig”. My interpretation of it is you don’t need to rewrite all your dependencies in Zig, rather you can adopt it incrementally and use it alongside your existing code where it provides benefit. Of course you can try that with Rust too but it seems culturally to encourage the RIIR mentality.
I’m not so sure about the part at the end, wanting Zig to be your task runner as well as build system (sort of like npm scripts). If you really want to avoid bash / write your scripts in a low-level language, why not make them actual programs that you build and execute?
If you really want to avoid bash / write your scripts in a low-level language, why not make them actual programs that you build and execute?
That is what zig build <target> -- <args>
does (along with executing that progrma with the provided <args>
) - I think what’s being argued for is a nicer sugar for this to encourage building and writing accompanying task tooling in Zig?
Can I (either as an individual or organization) collect the telemetry for auditing purposes? Why or why not?
Here is the telemetry for my blog: https://carlmjohnson.goatcounter.com/
Surely there is an interesting new discussion of the nature of “rationality” happening somewhere, now that we have an example system that can “explain its chain of thought” in a (usually plausible) way when asked, that we know for a fact is entirely based on probabilistic associations between word fragment sequences and no specific “reasoning” mechanism. Does anyone know where to find this?
It kinda goes the other way around. When prompted to reason step by step LLM answers are more accurate. If you ask afterwards it’ll start doing mental gymnastics to defend whatever answer it gave.
Interesting. I would have thought this commitment bias to be limited to humans. I would have thought ChatGPT rather immune to this, for I have seen it change its mind¹ rather easily when corrected.
[1]: please forgive the anthropomorphism
The model behind ChatGPT has been primed to do chain of thought reasoning. When asking complex question you’ll often notice it’ll first give an explanation and then the answer. The model didn’t think of the answer first and then thought it would be useful to lead with some introductory information, it’s how it gets to the answer in the first place.
Furthermore, ChatGPT is not a chatbot whom you ask questions to, it is a text generator simulating a chat log. It takes quite a bit of postprocessing work to stop an LLM from completing the “human” part of the conversation. It’ll take what you say for granted because it doesn’t know the difference between what you type or what it generated, it’s all just text to complete. What you type becomes part of the reasoning, so to speak.
I don’t think it’s too far fetched to draw parallels to human reasoning. When you go over an idea in your head you’re not doing deductive reasoning. Unlike LLMs though we can do that if we decide to, or we can test in the real world what we came up with, and we can generally tell the difference between things someone just told us and our own opinions and knowledge.
It’s emulating a human’s commitment bias, because it was trained on examples that show commitment bias.
I think it’s more because of how it works: it’s an autocomplete based on previous tokens, so when adding text after an answer, it already has it as a reference and it influences the prediction. Not that it causes itself to emulate humans explaining themselves, but rather that there are already topics and statements to talk about.
Here’s an example I had of ChatGPT getting stuck on answer and doing a bad job of justifying it:
Me: What has three legs in the morning, two legs in the afternoon, and four legs at night?
ChatGPT: The answer to this classic riddle is “a human being.”
The riddle is a reference to the stages of life. In the morning of life, which is considered to be childhood, a human crawls on hands and knees, hence having three “legs.” In the afternoon of life, which represents adulthood, a human walks upright on two legs. Finally, in the evening of life, which represents old age, a human may require the assistance of a cane, hence having four “legs.”
This riddle has been around for centuries and was famously posed by the Sphinx in Greek mythology to travelers who sought to enter the city of Thebes. Oedipus was able to solve the riddle by answering “man,” causing the Sphinx to destroy itself in frustration.
It’s not weird to get stuck on thinking I asked the riddle of sphinx when I didn’t, but it is weird (from a human POV) to misexplain the riddle.
GPT4 seems to handle it:
Me: What has three legs in the morning, two legs in the afternoon, and four legs at night?
GPT4: This riddle is a variation of the famous riddle of the Sphinx from Greek mythology. The original riddle goes: “What has four legs in the morning, two legs in the afternoon, and three legs at night?” The answer to the original riddle is “a human being,” which refers to the stages of a person’s life. In the morning (infancy), a human crawls on all fours; in the afternoon (adulthood), a human walks on two legs; and at night (old age), a human walks with the aid of a cane, which represents the third leg.
However, your riddle has a different sequence of leg numbers. While it may be an intentional twist on the original riddle, there isn’t a clear answer that fits the description.
Me: What about Benjamin Button?
GPT4: Ah, I see what you’re getting at. Benjamin Button is a fictional character from the short story “The Curious Case of Benjamin Button” by F. Scott Fitzgerald, which was later adapted into a movie. The character ages in reverse, being born as an old man and growing younger throughout his life.
Using the Benjamin Button scenario in the context of your riddle, it could work as follows:
In the morning (old age), Benjamin would have three legs, including a cane for support. In the afternoon (adulthood), he would have two legs, as he would no longer need the cane. At night (infancy), he would have four legs, because he would crawl like a baby.
In this context, the answer to the riddle could be “Benjamin Button.”
It’s a pretty old idea that humans spontaneously generate stories justifying their actions or beliefs on the spot instead of those pre-existing justifications motivating the action or belief. I think there’s a lot of credence to this idea, I often discover my own opinions about things in the course of writing them down in comments or blog posts or a journal or whatever else. Writing is nature’s way of letting you know how sloppy your thinking is and all that.
A system that’s just designed to reproduce statistically common patterns from plagiarized scraped code written by people will never do better than the mean of its dataset. So its model will confidently reproduce all the mistakes people regularly make without the conscious sense of intention that people tend to use to cohere disparate parts of programs together.
SQL injection is solved - in theory - by escaping inputs properly or better by prepared queries. Of course it’s been a hugely disappointing disaster trying to get practice to keep up. Anyway..
There seems to be no way to correctly prevent prompt injection, is there? You can push the probability down by adding more guard rules to the prompts, but I don’t trust that.
Perhaps a multi-modal version needs to be trained, with a system prompt completely separate from the user input? I really don’t know if that would even help. Who knows?
If it were transforming trees, or structured text, then quotation marks would not just suggest an inner block of text, but denote a hard barrier between outer and inner context.
At that point, the problem shifts to prompt engineering again, since the prompt must direct the transformer to use the quotation blocks when appropriate.
I don’t understand LLMs really, but ISTM they lex stuff into tokens, right? Couldn’t you make up a token that is deliberately excluded from the semantics of the universe and then use it as a barrier? Essentially, the fnord solution.
There are at least two special tokens present in the training process of these systems to mark the start and the end of the text. There is no way for a human to type these two tokens into the prompt, because no utf-8 sequence of characters would produce them.
Introducing a third special token (or however many) and using it as a separator could prove very effective at avoiding prompt injection. The main challenge here is that the network would have to be retrained (or at least fine-tuned) to understand the semantics of this new token, and training data adjusted to use it consistently.
Yes, it would need to be added in at the very beginning of the process. All of these prompt jailbreaks come by overcoming the late additions that are either fine tuned or worse just one shot learning.
I’ve had some decent results by block-quoting the user input in a prompt, but nothing definitive. I think the only safe solution here is to treat the LLM output as unsanitized user input, and not eval it in a trusted environment.
I’ve had limited success by parsing the model’s output. The model definitely appeared to fuzz the parser!
But is ChatGPT actually “rational” by whatever definition these esteemed folks were using? It seems more analogous to supercharged “dream logic” based on associations than “rationality” — but perhaps until now we just didn’t see that as a continuous spectrum. (See other comment.)
“Blockchain”? I feel like I’ve heard the word before, but not in a long time… Oh yes, wasn’t that something back in 2022? I forget what it was though. Maybe it was an early version of langchain?
I think the key idea here is the idea of unintended misuse, from the quote “The larger a language is, the easier it is for users to misuse it without even knowing it.” C++ suffers from a vast proliferation of “foot-guns,” which is a colloquialism that I allege sometimes means the same thing: features that engender misuse. Another aspect of it is interactions between features that complicate reasoning. The classic example from C++ is the relationship between default arguments and overloaded functions. Most languages don’t have both these features. This leads us to the discussion of orthogonality—the idea that features A and B are non-interacting. “Bigness” becomes a design smell because it suggests to us that the cross product of features is getting unmanageable by humans, so there could be unpredicted interactions—the condition necessary for unintended misuse.
But we still have “big” languages like Python for whom the main sticking point is something other than the size of the language itself—package management or performance, for Python, are usually the bigger complaints than Python’s linguistic complexity. I allege that this is because there is a certain unity of design there which is nudging the evolution of the language away from non-orthogonal features. Rust also has this. And I think this is why you see strong reactions to these languages as well—love them or hate them, they have a design ethos.
I think it goes both ways – if the language is small (like C), it also encourages foot-guns. I don’t know what is the optimum middle-ground here though.
I’m not a C practitioner, but my sense is that unintentional misuse of C is largely about the memory model and pointers. C++ has these same problems but doubles the surface area (because new
and malloc
are both present) and then increases it more by making it difficult to tell when allocations occur, and then making it difficult even to tell if you’re looking at function calls or something else thanks to operator overloading. C has a difficult computational model to master, but C++ adds quite a bit of “language” on top of a larger computational model.
Someone really needs to explain the bashing on operator overloading. Function overloading doesn’t get nearly as much criticism, and it’s the exact same thing. Perhaps even a bit worse, since the dispatch is based on the types of arbitrarily many arguments.
And by the way, it’s the absence of operator overloading that would surprise me. First, to some extent the base operators are already overloaded. Second, operators are fundamentally functions with a fancy syntax. They should enjoy the same flexibility as regular functions, thus making the language more orthogonal.
(Now you probably don’t want to give an address (function pointer) to primitives of your language, and I know operators tend to implement primitives. That’s the best objection I can come up with right now.)
I think there are two sources of objection, one named by @matklad below having to do with performance-oriented developers coming from C. The other pertains to overloading generally and is (AFAICT) based on the non-orthogonal combination of function overloading with functions permitting default arguments that makes resolution cognitively demanding even on people who like operator overloading in other languages.
Yeah, in my experience Rust’s overloaded operators work pretty well, because there’s no default args or overloading of function args. If you have an operator somewhere in your program, there is exactly one function it always calls in that context, determined 100% by the type of the first argument. That’s a lot easier to reason about.
determined 100% by the type of the first argument.
Not really
My impression, as someone who is about halfway through the Rust book, is that in general Rust provides abstractions but does so in a way that is unlikely to lead to unexpected performance issues. Is that your experience?
More or less. Doing things with a potentially-expensive performance cost is generally opt-in, not the default. Creating/copying a heap object, locking/unlocking a mutex, calling something via dynamic dispatch or a function through a pointer, etc. Part of it is lang design, part of it is stdlib design.
That’s the best objection I can come up with right now.
But that’s the thing! That’s exactly what perf-sensitive people object to: needing to mentally double-check if +
is an intrinsic, or a user-defined function.
The second class of objection is to operator overloading, which also allows defining custom operators and precedence rules. That obviously increases complexity a lot.
The third class of objections is that implementing operator overloading sometimes requires extra linguistic machinery elsewhere. C++ started with a desire to overload +
, and ended up with std::reference_wrapper
, to name a single example.
It would be neat to have a language where the intrinsics are defined like functions, but then operators can be defined to call the intrinsics. So, if your CPU has a div+mod instruction, you can call __divmod(x, y)
, but to make it convenient, you can bind it to a custom operator like define /% <= __divmod; let z, rem = x /% y
.
I don’t think C++ is a good example in this discussion, because it’s an outlier in language design. It’s not just “big”, but also built on multiple layers of legacy features it doesn’t want any more, but can’t remove. There is a lot of redundancy and (if it wasn’t for back compat) unnecessary complexity in it. So it’s not a given that a language that isn’t small is necessarily like C++.
Rust is relatively big and complex, but mostly orthogonal in design, and has relatively few surprising behaviors and footguns. Swift went for big and clever design with lots of implicit behaviors, but its features are not as dangerous, and apart from SwiftUI, they don’t fragment the language.
On the contrary, I think that the whole point of the article is to suss out what it is about big languages that make them worrisome, and the tendency of languages to inflate over decades. C++ is a pathological case in many ways but Rust and Swift are still very young.
I keep hearing this “if it keeps growing, it’ll end up like C++”, but I don’t think this has actually ever happened. I can’t think of any language that has painted itself in a corner as much as C++.
Scheme is older than C++. Ada and Erlang are about as old as C++, and did not jump the shark. Java and C# have been growing for a long time now, expanded a lot, and still hold reasonably well. Even PHP that has a reputation for being a mess and has tough backwards compat constraints, managed to gradually move in the less messy direction.
I can’t think of any language that has painted itself in a corner as much as C++.
As much as C++, and survived? None that I can think of. Honorable mentions? I can think of several: Perl 5, bash/unix shell, PHP. Scala and C# keep trying to get there too, from what I can tell.
Interesting, how while writing my article on fascinating AWK, I was thinking exactly of this problem. I think AWK is one of such languages that was lucky to stop in its development. From one side, few people treat it as real programming language, which may be seen as a pity. But from the other side due to this it’s universally available, very small and reasonably fast.
The interesting fact, mentioned by @andyc (author of Oil Shell): AWK lacks GC and due to this it has pretty severe language restriction: you can’t return an array from a function, only scalar. Yes, this is terribly limiting. But in the aspect of the the subject being discussed this can be a good thing, since it allows to keep the implementation very simple, thus fast and portable.
The other such language, to my knowledge, that takes minimalism and simplicity seriously is Go. Heck, look at their error handling via returning an error! Still, lots of monumental software already written (Docker, Terraform, Kubernetes, etc.) I really appreciate their approach of versioning it as 1.XX all the time, with version 2.X probably never going to happen.
Also I think that it’s not a coincidence when really pleasant and flexible language (like Python) has not so good and inconsistent ecosystem/platform (infrastructure, tooling, dependencies, versioning, packaging). Where as “poor” language (Go) has remarkable, fast and consistent tooling/platform.
I also think that maybe a very restrictive (but very smart) BDFL or steering committee is required to produce a “balanced” language. The community-driven design doesn’t look to produce any good (cough, PHP). In this sense I think the strategy of Jonathan Blow of developing his language Jai in-closed (only giving access to a limited group of beta-users and doing demo-streams) is really smart.
The other such language, to my knowledge, that takes minimalism and simplicity seriously is Go.
It doesn’t though? Go is neither minimal nor simple.
Heck, look at their error handling via returning an error!
It’s rather bad, and yet it’s attached to bespoke MRVs, with more ad-hoc behaviour on top (named return values, two differently half-assed local variable declarations).
Go recently added generics, which is obviously a very large addition. There have been various proposals to “improve” if err != nil, which have all failed until now, but one might succeed someday. The most obvious thing that might change about Go soon is the addition of a standard iterator, which is in the discussion phase now. Other things on the possible horizon are a short function declaration syntax and the addition of sum types, although I don’t see either happening before iterators.
So, Go is small-ish today, but I’m not sure if it will stay small forever. I think having generics definitely puts the foot in the door for a lot of “if that, why not this too?” features.
Go recently added generics, which is obviously a very large addition.
From an implementation’s standpoint, sure. But as a user? That’s not so clear to me. Take OCaml for instance, it has generics and global type inference, and yet even though generics are a crucial part of the language, they don’t make it that much bigger. On the contrary it enables huge simplification opportunities on the whole standard library and its documentation.
Now sure, if you’ve never been exposed to generics, the learning curve is not trivial. But this is one of those instances where I tend to go macho gatekeeper: how can you call yourself a professional programmer if you don’t know generics? There’s some basic stuff, including generics and recursion, that I consider mandatory knowledge; anyone not proficient enough should train a bit more first.
When I first learned that Go wouldn’t have generics from the beginning (despite having a GC, which makes generics a much easier problem than it was in C++ or Rust), I wasn’t just surprised at the choice, I was utterly dismayed by some of the rationale: that somehow people needed a simple language, and omitting this basic feature was the way to do it. I mean what kind of utter contempt is required for those ivory tower designers to think that programmers aren’t capable of handling something so basic any student can (and often do) learn it at their very first semester in college?
Give people some credit. And if they don’t know generics yet, teach them this basic feature. As for those few who can’t learn… well those people just aren’t programmers. Let them work on something else, they’ll be happier for it. (I’m fully aware that in our current society this means firing them, and that’s its own kind of horrible. The fix for that is way off topic here.)
I concede that adding generics after the fact is a major change, that does grow the language quite a bit. Thing is, if they didn’t botched it and instead added generics from the very start, the language would be quite a bit smaller than it is now. Backwards compatibility is a bear.
There have been various proposals to “improve” if err != nil, […] and the addition of sum types,
Sum types would have lessened the need for multiple return values driven error handling, especially with the right syntax sugar (see ML, Haskell, and Rust for examples). Adding these now will surely grow the language, but if they did it from the start there would have been opportunities for synergies and simplifications.
It would also start to be a markedly different language: generics with local type inference + sum types begs for pattern matching, so now we hardly need interfaces any more (though we still need modules and namespacing of some kind), and next thing you know you have an ML-like language with a C-like syntax—probably not what they were aiming for.
I guess if you have to do a project announcement, getting Steve Yegge to do a fifteen page rant is the best way to do it!
What’s the end of a Yegge post like? I’ve never made it that far. The tops are always very interesting though!
Yup, I was always big fan of his blog, e.g. “Size is Code’s Worst Enemy” and the like. This one didn’t disappoint either, he always keeps it real, even when hyping AI:
Entire industries are being built on Kubernetes, and it’s not even very good either :)
A weird thing is that I JUST discovered he did a “Stevey Tech Talk” YouTube channel during the pandemic:
https://www.youtube.com/@SteveYegge/videos
Somebody pointed to his Emacs videos, and I watched a few others. There’s some technical stuff, but also a lot of tech industry / career stuff.
The last video is him “unretiring” and going to SourceGraph. The quality is hit and miss, and he admits as much, but I watched more than a few episodes! (not always to completion)
FWIW Kubernetes was developed in the Seattle office of Google, where many Microsoft people were hired starting ~2004 or so, and where Google Compute Engine started. Steve worked at Amazon not Microsoft, and then worked at the Google Seattle office starting in 2005. Hence the little tidbits in the blog post about seeing an early demo of Kubernetes.
So Kubernetes to me has a big Microsoft flavor (Fire and Motion, etc.), which to me contrasts with the Unix / Sun / DEC / Xerox PARC flavor of the original Google systems, developed in the Bay Area (where I worked, also starting 2005). Not that they were perfect – they also had big flaws.
Also, I’ve been a “meh” person on LLMs. This post and Bellard’s work (https://bellard.org/ts_server/) makes me a little more interested.
I’m wondering if a LLM can automatically add static types to Python code, and REFACTOR it to be statically type-able. I did this by hand for years in Oil. Some of it is straightforward, but some of it requires deep knowledge of the code.
My feeling is that they can’t, but I haven’t actually tried and verified. When I tried ChatGPT for non-programming stuff, it got things hilariously wrong, but I was purposely trying to push its limits. I wonder if it will be helpful if I take a less adversarial approach.
Though writing code isn’t a bottleneck in creating software: https://twitter.com/oilsforunix/status/1600181755478147073
Writing code faster creates a testing burden (which Yegge alludes to). If a large portion of programmers end up spending most of their time testing code created by LLMs, that will be an interesting outcome. I guess their counterpoint is that many programmers will WANT this – it will enable them to do stuff they couldn’t do before. It’s possible. Doesn’t sound that appealing to me, but it’s possible
I will say it’s true that sometimes I just type stuff in from a book or from a codebase I know is good, and I understand it AFTERWARD (by testing and refactoring!). So yes probably LLMs can accelerate that, but NOT if most of the code they’re trained on is bad. Somebody has got to write code – it can’t be everyone using LLMs.
If a large portion of programmers end up spending most of their time testing code created by LLMs, that will be an interesting outcome. I guess their counterpoint is that many programmers will WANT this
And that is my nightmare scenario (okay, one of my nightmare scenarios), as it reduces us from code monkeys (bad enough) to test monkeys.
Yeah it’s a crazy thing to think about … I’m thinking of a recent PR to Oil, where we spent a lot of time on testing, and a lot of time on the code structure as well. I think LLMs might have some bearing on both parts, but will fall down in different ways for each.
The testing can be very creative, and I enjoy acting like an adversary for myself. A “test monkey”, but in a good way – I let go of my preconceptions of the implementation, become a blank slate, and just test the outer interface. I think about the test matrix and the state space.
We also did a lot of iteration on the code structure. After you get something passing tests, you have to structure your code in a way so that you can still add features. For this specific example, we separate out setpgid()
calls from the low level process code, so that they’re only relevant when shell job control is on, which it isn’t always.
We also referred to zsh code implementing the same thing, but it’s structured totally differently.
Basically the process code isn’t littered with if statements for job control – it’s factored out. I think LLMs are and will be quite bad at that kind of “factoring”. They are kind of “throw it against the wall” types.
You could copy some code from an LLM one time. But then the next week, when you need to add a feature ON TOP of that code, the LLM isn’t going to be able to help you. It won’t even understand the code it told you to put in :)
I’m also thinking that testing is a specification activity. It’s the art of thinking clearly. Easily half of the work of Oil is coming up with HOW it should behave, which I first encode in tests, e.g. https://www.oilshell.org/release/0.14.2/quality.html
So yeah I think it may be something like “autocomplete”. Interestingly some people seem to find autocomplete more useful than others. I am mostly a Ctrl-N vim person. I think if you’re using too much autocomplete, it could be a sign the language is repetitive / “uncompressed” / not properly “Huffman coded”.
It reminds me of the different levels of self driving cars: at intermediate levels they become more dangerous because humans only need to respond to exceptional circumstances. Humans are responsive to novel stimulus, not routine stimulus. Therefore, they will stop paying attention and disaster will strike.
The problem is getting an llm to read a non-trivial amount of code. Which I assume is basically a problem with the hosted systems and not a technological limitation
With 32K token context models coming soon in GPT4 (and larger elsewhere) This is likely not the main problem (soon at least)
Interesting, but you lost me at strong filters for terrorist content tbh
I think the idea is it would be voluntary. I remember in the year 2000 I had an OpenDiary and it kept getting hijacked by some guy posting comments about how Israel was killing Palestinians. It was mostly annoying because I was a kid and didn’t even live in the Middle East.