I was a tad disappointed about some of the points in this article, but I think that’s just because I had unrealistic expectations. It’s actually really well-written and, as far as I can tell, it tries to stay away from making statements that are too bold. It’s definitely something that I want to read over the weekend, when I have more time!
However, some of the claims it makes are… either a little generous, or a little too conservative to be useful. For example:
There are no reliable techniques for steering the behavior of LLMs.
This is kind of a given. The really interesting questions is whether such reliable techniques might even exist. Intuitively, based on what I remember from studying loosely related problems years ago (e.g. model order reduction), I suspect there aren’t, and that it is inherently impossible to develop models that perform reliably and unsurprisingly under general conditions through means that fall significantly short of enumerating behaviours (i.e. building the whole model) in the first place. I don’t have a theory for it – it’s not even junk science, it’s basically hoodoo.
[As proof that LLMs often appear to learn and use representations of the outside world, ] models’ internal representations of color words closely mirror objective facts about human color perception (Abdou et al., 2021; Patel & Pavlick, 2022; Søgaard, 2023).
I haven’t read Søgaard’s paper but I have skimmed Abdou et al.‘s, and with the caveat that it’s definitely way over my head and I’m probably not grasping it fully, this is not a very fair conclusion. Abdou’s findings are essentially about some correlations between colour space and colour naming. A model trained on long lists of names for things may well simply preserve these correlations because they already exist in its training set, not because it has developed them on its own.
Granted, this isn’t a failure of this article per se. I doubt it was intended as a critique of any of the papers it cites.
Human performance on a task isn’t an upper bound on LLM performance
While this claim may be generally true of any program, the article unfortunately operates with a somewhat self-referrential definition of “performance” and “on a task”: “Concretely, LLMs appear to be much better than humans at their pretraining task of predicting which word is most likely to appear after some seed piece of text (Shlegeris et al., 2022), and humans can teach LLMs to do some simple tasks more accurately than the humans themselves (Stiennon et al., 2020)”
It’s probably not surprising that LLMs are much better than human at building completion lists for a list of tokens, just like it’s not surprising that the ENIAC outcalculated pretty much everyone at arithmetic. But the leap from that to tasks in general seems a little bold to me, as it effectively includes the assumption that any task can be adequately represented in terms of token completion candidates. That claim is simultaneously hard to conclusively refute and hard to prove.
This is a really really good concise article for getting up to speed from zero on what’s currently known about LLMs - with every claim supported by many good sources.
“Brief interactions with LLMs are often misleading”
This one is so important! I keep seeing examples of people who’s opinions I trust and respect trying an LLM for the first time, having it respond in a WILDLY inaccurate way, and writing the whole field off as hype.
Anyone who’s spent significant time with these tools knows that there are things they get wrong consistently, and things they get wrong occasionally, and things they usually get right - and often there are ways you can phrase prompts that give you much better results for the things that they initially fail at.
My favourite example of this is still the way ChatGPT looks like it can read the content of a URL but actually can’t, and hallucinates the content instead. I see this catch out new users all the time - they ask for a summary of URL and assume the whole thing is garbage when it comes up with a blatantly inaccurate result.
Agree! This is highly informed, balanced and to the point. A clear signal in the sea of surrounding noise.
I was a tad disappointed about some of the points in this article, but I think that’s just because I had unrealistic expectations. It’s actually really well-written and, as far as I can tell, it tries to stay away from making statements that are too bold. It’s definitely something that I want to read over the weekend, when I have more time!
However, some of the claims it makes are… either a little generous, or a little too conservative to be useful. For example:
This is kind of a given. The really interesting questions is whether such reliable techniques might even exist. Intuitively, based on what I remember from studying loosely related problems years ago (e.g. model order reduction), I suspect there aren’t, and that it is inherently impossible to develop models that perform reliably and unsurprisingly under general conditions through means that fall significantly short of enumerating behaviours (i.e. building the whole model) in the first place. I don’t have a theory for it – it’s not even junk science, it’s basically hoodoo.
I haven’t read Søgaard’s paper but I have skimmed Abdou et al.‘s, and with the caveat that it’s definitely way over my head and I’m probably not grasping it fully, this is not a very fair conclusion. Abdou’s findings are essentially about some correlations between colour space and colour naming. A model trained on long lists of names for things may well simply preserve these correlations because they already exist in its training set, not because it has developed them on its own.
Granted, this isn’t a failure of this article per se. I doubt it was intended as a critique of any of the papers it cites.
While this claim may be generally true of any program, the article unfortunately operates with a somewhat self-referrential definition of “performance” and “on a task”: “Concretely, LLMs appear to be much better than humans at their pretraining task of predicting which word is most likely to appear after some seed piece of text (Shlegeris et al., 2022), and humans can teach LLMs to do some simple tasks more accurately than the humans themselves (Stiennon et al., 2020)”
It’s probably not surprising that LLMs are much better than human at building completion lists for a list of tokens, just like it’s not surprising that the ENIAC outcalculated pretty much everyone at arithmetic. But the leap from that to tasks in general seems a little bold to me, as it effectively includes the assumption that any task can be adequately represented in terms of token completion candidates. That claim is simultaneously hard to conclusively refute and hard to prove.
This is a really really good concise article for getting up to speed from zero on what’s currently known about LLMs - with every claim supported by many good sources.
I really like the terms in section 4: sycophancy and sandbagging :)
It perfectly describes what I experienced with ChatGPT and why it annoys me
i.e. it confidently says wrong things, then you correct it, and it apologizes and claims the opposite
“Brief interactions with LLMs are often misleading”
This one is so important! I keep seeing examples of people who’s opinions I trust and respect trying an LLM for the first time, having it respond in a WILDLY inaccurate way, and writing the whole field off as hype.
Anyone who’s spent significant time with these tools knows that there are things they get wrong consistently, and things they get wrong occasionally, and things they usually get right - and often there are ways you can phrase prompts that give you much better results for the things that they initially fail at.
My favourite example of this is still the way ChatGPT looks like it can read the content of a URL but actually can’t, and hallucinates the content instead. I see this catch out new users all the time - they ask for a summary of URL and assume the whole thing is garbage when it comes up with a blatantly inaccurate result.
I wrote about that one here: https://simonwillison.net/2023/Mar/10/chatgpt-internet-access/
Another good article by the author.
https://wp.nyu.edu/arg/why-ai-safety/