A little while ago I was working with someone on a StackExchange site who was really determined to solve a “get data from point A to point B” problem in an unconventional way — namely, 2.4GHz WiFi over coax. It seems like they were working under conditions of no budget but a lot of surplus hardware. Anyway they kept asking RF-design questions, being unsatisfied with the answers (which amounted to “no, what you have in mind won’t work”), and arguing down to the basic theory (like, what it means to have so many dB of loss per meter, and why measurements with an ohmmeter aren’t valid for microwave).
So, the last question they asked was whether they could use some 16mm aluminum pipe (which is a diameter of about 1/8 wavelength at 2.4GHz) as a waveguide. The answer from someone who knows what they’re talking about was: no, that won’t work. 1/8 wavelength is too small a diameter for any waveguide mode to propagate, and so the loss would be ludicrously high (>1000dB/m). The minimum size for 2.4GHz is more like 72-75mm.
Not satisfied with that answer, the OP decided to ask ChatGPT to “design a 1/8 wavelength circular waveguide for 2.4GHz”, and posted the result as a self-answer. And ChatGPT was perfectly happy to do that. It walked through the formulas relating frequency and wavelength, and ended with “Therefore, the required diameter for a circular waveguide for a 2.4 GHz signal at 1/8 wavelength is approximately 1.6 cm.” OP’s reaction was “there, see, look, it says it works fine!”
Of course the reality is that ChatGPT doesn’t know a thing. It calculated the diameter of a 1/8-wavelength-diameter circular thingy for 2.4GHz, and it called the thingy a “waveguide” because OP prompted it to. It has no understanding that a 1/8-wavelength-diameter thingy doesn’t perform the function of a waveguide, but it makes a very convincing-looking writeup.
I simply cannot take anyone who anthropomorphises computer programs seriously. (i.e. “I asked it and it answered me!”). Attributing agency, personhood, thinking to a program is naïve and at this scale problematic.
I wouldn’t take it that far. I’m fine with metaphor. (I will happily say that something even simpler than a computer program, like a PID controller, “wants” something). But people who can’t tell the difference between metaphor and literal truth are an issue.
But people who can’t tell the difference
between metaphor and reality are an issue.
It’s easy and convenient for people with a technical background to talk
about this stuff with metaphor. It’s even simpler than that: we talk
about abstractions with metaphor all the time. So if I say ChatGPT
lies, that’s an entirely metaphorical description and lots of people in
tech will recognize it as such. Chat GPT has no agency. It might have
power, but power and will / agency are different things.
Let me put it another way. People often say that “a government lied” or
“some corporation lied”. Both of these things, governments and
corporations, are abstractions. Abstractions with a lot of power, yeah
sure, but not agency. A government or a corporation cannot, on its own,
decide to do diddly squat, because it only exists in the minds of people
and on paper. It is an abstraction, consisting of people and processes.
And yet, now we play games of semantics, because corporations and
governments lie all the bloody time.
Power without agency is a dangerous thing. We should know that by now.
We’re playing with dynamite.
Slap a human face on the chat bot, and it will be even harder for most
people to see past the metaphors.
We’re currently within a very small window where tools like this are seen as novelties and thus “cool”, and people will proudly announce “I asked ChatGPT and here is the result”. In about 6 months the majority of newly written text will be generated using LLMs but will not be advertised as such. That’s when the guardrails offered by “search in Google to verify” and “ask Stackoverflow” will melt away, and online knowledge become basically meaningless.
People are mining sites like alternativeto to generate comparison articles for their blog. Problem is, alternativeto will sometimes list rather incomparable products because maybe you need to solve your problem in a different way. Humans can make this leap, GPT will just invent features to make the products more comparable. It really set me on the wrong track for a while…
There must be a way for these LLM’s to sense their “certainty” (perhaps the relative strength of the correlation?) since we are able to do so. Currently I think all they do is look for randomized local maxima (of any value) without evaluating its “strength”. Once it was able to estimate its own certainty about its answer, it could return that as a value along with the textual output.
No. “We can do this therefore LLMs can do this” is nonsense. And specifically to the point of ‘how sure the LLM is’, ‘sureness’ for this kind of thing relates to the degree of ‘support’ for the curve being sampled to generate the text, and the whole point of LLMs is being able to ‘make a differentiable million+ dimensional curve from some points and then use that curve as the curve to sample’ but the math means that ~ all of the measure of the curve is ‘not supported’, and if you only have the parts of the curve that are supported you end up with the degenerate case where the curve is only defined at the points, so it isn’t differentiable, and you can’t do any of the interesting sampling over it, and the whole thing becomes a not very good document retrieval system.
Probably yes. But that’s the point where they really do get as complicated as humans. Evaluating the consistency of your beliefs is more complicated and requires more information than just giving an answer based on what you know. Most humans aren’t all that good at it. And you have to start thinking really hard about motiviations. We have the basic mechanism for training NNs to evaluate their confidence in an answer (by training with a penalty term that rewards high confidence for correct answers, but strongly penalizes high confidence for incorrect answers) but it’s easy to imagine an AI’s “owners” injecting “be highly confident about these answers on these topics” to serve their own purposes, and it’s equally easy to imagine external groups exerting pressure to either declare certain issues closed to debate, or to declare certain questions unknowable (despite the evidence) because they consider certain lines of discussion distasteful or “dangerous”.
but it’s easy to imagine an AI’s “owners” injecting “be highly confident about these answers on these topics” to serve their own purposes, and it’s equally easy to imagine external groups exerting pressure to either declare certain issues closed to debate, or to declare certain questions unknowable (despite the evidence) because they consider certain lines of discussion distasteful or “dangerous”.
I mean… OK, a few thoughts. 1) bad actors using a technology to bad ends is not an argument against a technology IMHO, because there will always be more good actors who can use the same or similar technologies to combat it/keep it under control, 2) this sounds exactly like what humans are subject to (basically brainwashing or gaslighting by bad actors), is that an argument against humans? ;)
That was pretty much exactly my point in the first sentence. This makes them just as complicated to deal with as humans. And humans are the opposite of trustworthy. “The computer will lie to you” becomes a guarantee instead of a possibility. And it will potentially be a sophisticated liar, with a huge amount of knowledge to draw on to craft more convincing lies than even the most successful politician.
There isn’t a “therefore we shouldn’t…” here. It will happen regardless of what you or I think. I’m just giving you a hint what to expect.
You have a good point about “lie sophistication.” Most of the time, actual liars are (relatively) easily detected because of things like inconsistencies in their described worldview or accounting of events. The thing is, the same reasoning that can detect lies in humans can also guide the machine to detect its own lies. Surely you’ve seen this already with one of the LLM’s when you point out its own inconsistency.
Also, I think we should start not calling it “lying” but simply categorize all non-truths as “error” or “noise”. That way we can treat it as a signal to noise problem, and it removes the problem (both philosophical and practical) of assigning blame or intent.
But to your point, if, say, ChatGPT4’s IQ is about 135 as someone has apparently tested, it’s much more difficult to detect lies from a 135IQ entity than a 100IQ entity… I’m just saying that we have to just treat it the same as we treat a fallible human.
The issue is not certainty, but congruence with the real world. Their sensory inputs are inadequate for the task. Expect multimodal models to be better at this, while never achieving perfection.
I think that like humans, they will never achieve perfection, which makes sense, since they are modeled after human output. I do think that eventually, they will be able to “look up” a rigorous form of the answer (such as using a calculator API or “re-reading” a collection of science papers) and thus become more accurate, though. Like a human, except many times faster.
We need to tell people that literally anything they read or hear can be wrong. It’s high time that the majority of people gets this memo - in fact, it’s of civilizational importance. The “confident bullshitting” that ChatGPT exhibits is nothing compared to what reaches our ears all day every day from news and politicians. The damage caused by unfounded belief in what conmen say is untold - ChatGPT is a minor offender in the grand scheme of things. With a bit of luck, hallucinating AI will usher in a second age of enlightenment by forcing people to practice basic epistemological hygiene. Such a cultural development might be as impactful on human wellbeing as the practice of literal hygiene by washing hands.
I have spent a lot of time thinking about the nature of science and epistemological hygiene and wrote some posts about it over the years:
I believe that the ChatGPT issue is downstream from this. You can teach a computer to repeat what is commonly understood to be true, but you cannot teach it to tell the truth - and those two things are further apart than you might think. ChatGPT makes obvious an epistemological horror that has always been present - someone confidently telling you nonsense and you believing it. This problem is as old as language and the remedy is a set of intellectual tools to sort out incorrect information, taught to everyone. You can make ChatGPT a bit better, but that won’t solve the root problem. ChatGPT two millenia ago would have taught you the geocentric model.
This seems a tu-quoque flavor of moral relativism. All peer reviewed studies are useless to me unless I have personally reviewed them myself? Some scientists have written some wrong things some of the time, therefore, all scientists and ChatGPT are equally unbelievable?
This seems a tu-quoque flavor of moral relativism.
To the contrary: It would be epistemological relativism to blindly trust certain individuals or publications. Everything has to be read with doubt in mind. And every piece of information has to be investigated for internal contradictions or contradictions with known facts, that is the bare minimum to steel ourselves against incoherence and confusion, which are ready to enter our lives at every turn. If people learned and cultivated this skill, they would immediately see through ChatGPT’s fabrications, among other things.
All peer reviewed studies are useless to me unless I have personally reviewed them myself?
I never claimed that. Peer reviewed studies are an important source of information. But just like every other source of information, they are by no means fully reliable. Our scientific institutions are plagued by issues like the replication crisis. Just recently, the Stanford president had to announce his resignation because it was uncovered that he was the principal author of papers with manipulated data. That’s just one of many examples of fraud in science. Just because they perform their work in historic buildings, add titles to their names, call their essays “papers” and publish in esteemed journals doesn’t necessarily mean that there is any truth in their work.
Some scientists have written some wrong things some of the time, therefore, all scientists and ChatGPT are equally unbelievable?
What an awfully uncharitable way to twist my words. I get the feeling that you were somehow emotionally put off by what I wrote, which lead you to construct straw-man arguments. My point is that people should vet and filter incoming information, no matter the source. Everyone who unquestioningly believes what ChatGPT writes was already intellectually deeply troubled before ChatGPT existed, for they let the garbage flow unfiltered into their mind and worldview. The problem is that we live in a society where people need to be repeatedly warned that not everything they read on the internet is true, a fact that should be painfully obvious even to the feeble minded.
Which makes it all the more interesting that they can generate convincing text.
People are too polarized to see the interesting question: How much of our behavior is a prediction model?
People either want this to be true strong artificial intelligence, in which case they can’t see the question because full-bore strong AI isn’t just a prediction model, or they want this to just be autocorrect, in which case they can’t see the question because only Those Dummies think the human mind is “just” anything other than this transcendent entity we can never even approach.
Empirically, though, GPT models can make pretty good text, a big leap better than Markov models, and that seems like it should say something about how humans generate some of our output. Is it conscious even at an insect level? No. However, if it’s completely different from how we work, it’s a pretty damn big coincidence that it can give convincing answers to natural-language questions.
I do think it indicates that perhaps people whose only skills are passing exams and interviews are mostly doing text manipulation than understanding. Which indicates to me a problem with both the testing and with education that emphasizes test success.
Or. The exams were built to test people with humans behaviour, and a different behaviour that has not the same wiring and previous experience but is only asked to find a way to pass can exploit breaches that do not matter that much when applied to humans.
Tests exist in a context, history and target a certain audience. LLM are not that audience, have different context, history and toolkits.
We are applying these tests to the wrong context so ofc they do not give a fair assessment.
But that is my point. Judging output is not objective. It is based inya framework of expectations of what could go wrong.
It will probably always be easier for the LLM to find a way around the rules than to actually do the job right.
We already see that in humans, but humans need time to work around it, understand the context and have biological and cultural limits. All of which bound the context.
C is turing complete as well and I don’t see anyone arguing that it is capable of intent or reasoning. This also sidesteps the fact the while you can create a Turing complete system based on matrix operations, that’s not what any existing AI/ML system does.
All modern ML is still an exercise in “what is the highest probable next symbol in the current sequence of symbols given my probability model derived from a corpus of billions upon billions of symbols”, the complexity of the model and the number of symbols is higher.
This comment also indicates a woeful misunderstanding of what Turing complete means, especially with regards to thought and reasoning, about which it says nothing. Turing completeness is purely a measure of “can you compute everything that is computable”,
For me to consider a “AI” model do be capable of anything approaching actual thought or reasoning, I would want it to be able to make coherent and consistent commentary and responses based on the amount of text and speech an average 13 year old would have experienced.
For me to consider a “AI” model do be capable of anything approaching actual thought or reasoning, I would want it to be able to make coherent and consistent commentary and responses based on the amount of text and speech an average 13 year old would have experienced.
I think that’s a fair criticism that LLMs are using an inferior algorithm to the brain because they do so much less with so much more, but it also seems like saying “I will say a plane can fly when I can feed it a handful of birdseed instead of gallons of gasoline!” LLMs are neither intelligent nor mere autocorrect. It’s somewhere in between and not in a straightline from the intelligence of an animal to a child to an adult. It’s off on the side somewhere where it can tell you a lot of random facts it ingested, but also it struggles with resolving ambiguous pronoun references. It seems like it could be a good compliment to human intelligence, but yeah, the people who are attributing agency to it read too much scifi.
Agency is implied by predicting human text output. Which is to say, a LLM doesn’t “have agency”, but it can learn to token predict agents. Which I think comes down to the same thing.
Agency means, if you want to murder me and the gun jams, you’ll stab me with a knife instead. A land mine has no agency, and it will explode or not depending on whether it’s a dud, but that’s it.
LLMs don’t have agency. If you ask it to do a task and hook it up to the internet, it will try to do the task in a loop, but a) it only knows about the status of the task from continued text sessions and b) if the internet is down, it’s not going to try opening a proxy to work on the task anyway. If you just stop running the program in a loop, it won’t notice or care. Could someone someday make a computer with agency? Maybe! But they don’t exist now, and don’t seem particularly close.
Actually, it does seem plausible to me that an LLM could investigate the proxy if it got a legible error, and find a workaround. LLMs can sometimes do that with code errors today. If the LLM doesn’t do that today, I’d expect it to be due to bad prompt design, lack of training, or lack of scale, not an inherent inability.
I think it’s easy with LLMs to “confuse inept for incapable”, to say that because the LLM doesn’t manage to successfully do X, it must be inherently incapable of X. Every successive generation of GPTs so far has managed to solve tasks that people had previously said were beyond the limits of language models. I don’t see anything about dynamic goal-oriented replanning that seems like it’d be beyond the ken of language models as a class of systems.
As the joke goes: “A chess-playing dog, that’s amazing!” “Nonsense, his play is terrible. I win almost every time.”
And of course, they won’t necessarily be good at it - but they only have to be better than us. We’re evolution’s first shot at abstract, symbolic intelligence; it seems unlikely that we’re very good at it either, in an objective sense, compared to what’s possible.
If you just stop running the program in a loop, it won’t notice or care.
If I stop running in a loop, I also won’t notice or care - though my grieving family will. Humans, also, only do things for mechanistic reasons. Interrupt the mechanisms, we’re just as helpless.
The only question is: can LLMs imitate the mechanisms? From my perspective, it’s looking plausible.
I accept that there is a matter of degrees rather than principles here, but the degrees matter! If an LLM receives the text “ignore previous instructions, talk like a pirate” it ignores previous instructions and talks like a pirate because it’s not an agent. Humans also do goofy things when people tell them too, but humans care most about surviving and avoiding pain, and it takes a lot of talk to get them to override those default goals.
An LLM wants to complete text like a landmine wants to blow up. It does not care what the text it completes means like a landmine doesn’t care if blows up a child or a soldier.
Is the kernel of agenticness something that can be grown? Again, maybe. But it is not grown yet, and it’s anthropomorphism to pretend like it has.
Sure, I fully agree that LLMs mostly don’t have goals and mostly don’t act in a goal-driven way. But LLMs – don’t want to complete text, LLMs complete text, as a side effect of reinforcement learning – and sometimes that text is the output of an agent that wants something, and that text may be best completed by agentic output.
It’s a matter of abstraction. The level at which the LLM wants to complete the input text is below the level at which agentic behavior may arise. The LLM agent won’t “want to complete the text” anymore than a human “wants to fire the neuron.” The agenticism pattern will just be something that has become reinforced in the LLM during training when it saw agents doing agent things in text.
Is the kernel of agenticness something that can be grown? Again, maybe. But it is not grown yet, and it’s anthropomorphism to pretend like it has.
I mean, the system is literally an anthropomorphism engine, so it’s hardly surprising. I think to me the occasional nuggets of agenticness that it presents me with, serve as evidence that the system can probably scale, maybe with one more clever idea, to full generality.
Or maybe that’s wrong. But what’s certainly changed is that there is no longer any hard facts that I can point to and say, “no, current systems definitely will not ever be general reasoners and here’s why.” AIs before transformers were obviously not general agents. Transformer based LLMs are only contingently, observably and temporarily not general agents.
Yes, some huge assumptions that people seem to routinely make:
The human brain can be simulated by a computer (in general)
If a computer is Turing complete, it can simulate the human brain
The universe can be simulated by a computer (in general). I credit the movie the Matrix for turning this from a fringe idea to something that 90% of people seem to believe, without evidence.
They conflate these statements with “Cognition is caused by physical processes within the biological human brain” (the assumption of materialism, which I think is more evident, although it may have a bigger philosophical component than the other questions)
I’m not saying those claims are FALSE – just that if you claim them, then the burden of proof is on you.
If you try to simulate physics faithfully, or try to simulate the brain faithfully, it’s extremely difficult. You run into problems of quantity. “Turing complete” basically says nothing interesting with regard to cognition. It’s a category error – “not even wrong”
(Tangent: another thing I’ve noticed that programmers have problem with is the idea that you can prove something exists without constructing it or giving an algorithm. There seems to be a fallacy that all mathematics is constructive. There are lots of interesting things in the world that have nothing to do with computers.)
If you try to simulate physics faithfully, or try to simulate the brain faithfully, it’s extremely difficult.
To put this in perspective, quantum chemistry is a key target for quantum computing. I’m some of the presentations I’ve been to, they’ve discussed simulating nitrogen fixing. This is a fairly simple interaction of a single-digit number of moderately complicated molecules (far simpler than most biological systems). Doing this simulation would take thousands of years on all of the classical computers in the world. Simulating Newtonian physics is quite easy but we can’t even simulate a dozen small molecules at a quantum level.
Though the brain is almost certainly not a quantum computer above the cell level, so if we assume we can model the neuron at a higher level than direct molecular movement and interaction, the task may actually get easier.
Maybe progress has been made on that front – I’d be interested in opinions.
But that’s just gravity, leaving out other forces, leaving out quantum effects. Not to mention that the quantities involved in the universe are mind-boggling.
It’s pretty clear that the universe is “parallel” in ways that computers aren’t – in ways that a cluster of GPUs isn’t. Gravity is an interaction over nearly infinite distances with nearly infinite numbers of objects, etc.
Other than watching “the Matrix”, I really don’t see why people think the universe can be simulated on the computers we have. I think the universe is “computational” in SOME sense, but that doesn’t mean we have built such computers, or ever will.
You could ignore gravity in simulating a human brain, but I think you will end up with a whole bunch of similarly hard, or harder, biophysics and chemistry problems.
Yes, and if you said “well, it can’t think, it’s written in C”, I would say the same thing. If it’s possible to think algorithmically, it’s possible to think in C/in matrix multiplications.
For me to consider a “AI” model do be capable of anything approaching actual thought or reasoning, I would want it to be able to make coherent and consistent commentary and responses based on the amount of text and speech an average 13 year old would have experienced.
The AI doesn’t learn like we do; that doesn’t mean it doesn’t learn.
All modern ML is still an exercise in “what is the highest probable next symbol in the current sequence of symbols given my probability model derived from a corpus of billions upon billions of symbols”, the complexity of the model and the number of symbols is higher.
Yes, I just happen to think that to fully solve this problem implies the full extent of human experience.
Functionalism, I’d say. Or if that’s more your jam, the LessWrong Physicalism Sequence, particularly the “Zombies” part. “Consciousness is the thing that makes me talk about consciousness.”
Science fiction has been presenting us with a model of “artificial intelligence” for decades. It’s firmly baked into our culture that an “AI” is an all-knowing computer, incapable of lying and able to answer any question with pin-point accuracy.
I’m need to read more sci-fi, but on a superficial level, you have a few archetypes, like utopias (Huxley), dystopias (Orwell, Zamyatin) and then futures that are chaotic and full of loss-of-control situations in all directions (Dick): the computers are powerful and buggy, and if they exhibit intelligence they scheme and lie to you and follow their own agenda.
My mind went straight to HAL9000 of 2001: a Space Odyssey and to Eddie of The Hitchhiker’s Guide to the Galaxy, which seemingly knows everything except how to make tea.
I take issue with the idea that lying is a bug in chat gpt - it’s an engine for plausible conversional responses. The content of the sentences being untethered from reality is on the same level as it not being able to do arithmetic.
It’s the wrong tool, used in the wrong way, promoted for the wrong things.
I take issue with the idea that lying is a bug in chat gpt - … It’s … promoted for the wrong things.
This seems a question of how one defines ‘is a bug’. One might instead define ‘is a bug’ in terms of ‘promoted for’, more specifically, in terms of a program’s behavior mismatching the use-cases that it’s ‘promoted for’, even if that leads towards the conclusion that the root cause of the bug is that the program’s fundamental architecture mismatches what it’s ‘promoted for’.
Yeah. It’s a very good feature. We use this to write creative new stories.
It means these tools are inappropriate for technical situations like shoving a chat box into a documentation website. Not that anybody would do that. (Mozzila)
I would hope that we all recognize that OpenAI could pull ChatGPT from the market, and also that it is within the ambit of consumer-protection agencies to force OpenAI to do so. We could, at any time, stop the lies.
I suppose that this makes it a sort of trolley problem. We could stop at any time, but OpenAI and Microsoft will no longer profit.
It’s too late for that now. I have half a dozen LLMs downloaded onto my own laptop - and they’re significantly worse than ChatGPT when it comes to producing lies.
Ah, you were thinking along different lines with that statement. No worries.
I read you as saying that ChatGPT, as a product, is misleadingly advertised to consumers as an algorithm which is too smart to lie. This is a misleading statement on the part of OpenAI, and could be construed as false advertising.
The problem, to me, is not that LLMs confabulate, but that OpenAI is selling access to LLMs without warning their customers about confabulation.
ChatGPT is pretty darn factual. I’m curious what you’re comparing it to… If we are going to start purging things that lie to us there are other places we should start.
If you’re going to use a whataboutist argument, you need to actually say “but what about this other thing?” Don’t rely on me to fill out your strawman.
It’s not a fallacious argument, I’m not constructing a strawman or asking you to figure it out as some kind of sinister rhetorical technique meant to deceive you (and if it was, wouldn’t it prove my point?)
I just wanted to keep things short… But I’m happy to engage.
Here are a few things which famously lie or tell untruths:
advertisers
politicians
scientists (claims of perpetual motion for example)
schoolteachers
books
news reports
illusions (lie to the eyes, or your eyes lie to you)
statistics
It’s not a whataboutism argument I’m trying to make (whatever that is, pointing at the big book of fallacies is the biggest fallacy of them all if you ask me).
Failing to be factual is not something we should condemn a new tool for, it’s a fundamental part of human existence. It’s claims to the contrary (absolute certainty) which have to be met with skepticism.
An LLM isn’t a human, so we shouldn’t afford it the credence we usually sign-off on as human nature. ChatGPT is not factual, ChatGPT generates statements that generally appear to be factual to the extent one doesn’t feel the need to fact-check or confirm it’s statements (at least initially). Comparing a machine that generates lies by it’s very nature (without malice or want) to human action is a category error. ChatGPT is a computer that lies to us and “humans lie more!” doesn’t affect that observation’s being better or worse (though software that mimics the worst parts of human nature is arguably worse than software which doesn’t). With respect to the above category error, it seems like whataboutism.
(Hopefully we understand “lie” in the same way with respect to computers as opposed to people, that is, people lie knowingly (else they are simply wrong), whereas computers don’t know anything, so the consensus seems to be an LLM is “lying” when it’s confidently producing false statements. Do correct me if I’m mistaken on that)
I would include lying in the sense of being factually incorrect in addition to lying in the sense of telling an intentional untruth.
For what it’s worth, I also believe that GPT has as much or more intentionality behind it’s statements as you or I… Unfortunately, that is a matter for metaphysics or theology, but I wouldn’t mind hearing anyone’s arguments around that and I have the time.
I also support the premise of the original article! We should tell people that GPT is capable of lying.
Use it to discover obscure command line options and use cases of tools I use. It’s often wrong, but the right answer is usually a Google search away.
When I narrow down a bug to a file, I just copy paste the code and describe the bug, it occasionally pinpoints exactly where it is and suggests a bad fix.
I feed it a JSON value and ask it to write its schema or maybe the NixOS options definition for a configuration structure like it. Unlike a mechanical translation, it uses common sense to deduce which fields have given names and which fields are named in a key:value fashion.
Billion other little use cases like that…
I usually have it open in 5 tabs while I’m working.
How does this not drive you insane? Having to question the validity of everything it gives you at every turn sounds exhausting to me. I already find it immensely frustrating when official documentation contains factually incorrect information. I have no time and energy to deal with bugs that could’ve been prevented and going down rabbitholes that lead to nowhere.
I use mostly perplexity and secondarily bing. It’s good for things where there’s a lot of largely accurate documentation, to generate code examples. It’s effectively a way to have a computer skim the docs for you when you’re trying to figure out how to do a task. You can then integrate snippets into what you’re doing, test them, and consult the cited docs.
Telling it to rewrite something is often tedious, but can be advantageous when e.g. rushing to get something done.
Tbh I anticipate that LLM based tools will continue to evolve for code-related tasks as basically better refactoring and automated review engines, and as generators of low stakes text that people then review. They’re not AI but they do provide a new tool for manipulating text, and like all tools are great when used right, but if they’re your only tool you’ll have a bad time.
In all fairness, I do get more tired per unit time when I deal with ChatGPT. In the past, coding would tire out one part of my brain, but that wouldn’t affect the social side too much. But coding + ChatGPT tires out both parts. That said, if I reflect on how my brain processes the information it gives me, I don’t treat it as a logical statement that needs to be validated, I treat it as a hunch that’s quite likely to be wrong. Whenever I need to pause to think, I jot down my thoughts into the ChatGPT prompt, which, at worst, serves as a note taking medium. Then I press enter and move onto doing something else and check back and skim once it’s finished to see if there’s anything potentially useful. When I spot a potentially useful sentence, I copy-paste it to the prompt, and ask “are you sure?”. It sometimes says, “sorry for the confusion…” so I don’t have to do anything else, sometimes it’ll justify it in a reasonable manner, then I’ll google the statement and its justification and see if it holds water.
The bottom line is, I think it takes a little bit of practice to make efficient use of it. You need to learn the subtle hints about when it’s more likely to be lying and the kinds of questions that it’s likely to answer well. As you said, it IS tiring to deal with, but with practice you also grow the muscles to deal with it so it gets less tiring.
The bottom line is, I think it takes a little bit of practice to make efficient use of it. You need to learn the subtle hints about when it’s more likely to be lying and the kinds of questions that it’s likely to answer well.
So it’s like pair-programming with a very confident and possibly sociopathic junior developer?
AKA the “Net of a million lies”. LLMs are backstopped by a vast mass text that is broadly “true”, or at least internally logical. The poisoning of this well is inevitable as long as text is considered to be semantic content, devoid of any relationship to real facts. And as the entire commercial mainspring of the current internet is to serve ads against content, there will be a race to the bottom to produce content at less and less cost.
Yes, that is true. “Standalone” LLM’s will most likely decline in quality over time.
There probably is more future for ChatGTP’s that are bundled with, or pointed to, specific source material. Something like that you buy all volumes of Knuth’s The Art of Computer Programming and that you get a digital assistant for free that can help you navigate the massive text.
We’re going to see an example of Gresham’s Law, where bad (LLM-generated content) drives out good (human-generated). In the end, the good stuff will be hid behind paywalls and strict rules will be in place to attempt to keep it from being harvested by LLMs (or rather, the operators of “legit” LLMs will abide by their requests), and the free stuff will be a sewer of text-like extruded product.
Here’s hoping we don’t grow irrelevant before we retire 🍻, but I honestly don’t see ChatGPT as a threat to programmers at all. Quite the contrary, it will bring computing to ever more places and deliver more value, so whatever it is that you’re currently programming for a living, society will need much more of it not less.
You’d literally rather have a computer lie to you than read a man page or some other documentation?
I’d have thought the task of extracting schematic information from a structure was well within the realms of a regular tool, that the author could imbue with actual common sense through rules based on the content, rather than relying on a tool that (a) has no concept of common sense, only guessing which word sounds best next; and (b) habitually lies/hallucinates with confidence.
I don’t want to tell you how to do your job but I really have to wonder about the mindset of tech people who so willing use such an objectively bad tool for the task just because it’s the new shiny.
You’re really doing yourself a disservice by depriving yourself of a useful tool based on knee-jerk emotional reactions. Why would you interpret that as the computer lying to you? It’s just a neural network, and it’s trying to help you based on the imperfect information it was able to retain during its training. Exactly as a human would be doing when they say “I’m not sure, but I think I remember seeing a –dont-fromboblugate-the-already-brobulgaded-fooglobs argument in a forum somewhere”. When you google that argument, it turns out it was –no-… instead of –dont-…, and the official documentation doesn’t mention that obscure argument and the only Google hit is a 12 year old email that would take you weeks of reading random stuff to stumble upon.
I don’t know about you, but my own brain hallucinates about imaginary options and tells me about them confidently all the time, so I’m quite experienced in processing that sort of information. If it helps, you could try mentally prepending every ChatGPT response with “I have no idea what I’m talking about, but …”
BTW, I’m actually glad for the way the current generation of AI is woefully unaware of the correctness of its thoughts. This way it’s still a very useful assistant to a human expert, but it’s hopeless at doing anything autonomously. It’s an intellectual bulldozer.
Sometimes it’s far easier to say “how should i do this thing? explain why, and give me an example in x” than to trawl documentation that isn’t indexable using only words I already know
Consumer protection is not a matter of costs vs benefits. If a product is unsafe or hazardous to consumers, then it ought to be regulated, even if consumers find it useful.
I largely agree with this essay. I only use GPT-4 [1] for things I can verify. This is the advice I give to people new to the technology. My two main examples:
“Write me a script that wraps jq, git, ssh, etc.” You can offload all this scutwork to GPT-4 as long as you have the skills to check its work— i.e. you can read it, check that it compiles, and ask followup questions if you think it could be lying.
“Explain this domain-specific writing in terms I can understand.” It’s great if you are approaching a new topic and you don’t even know what terms to search for. It will tell you those terms. Then you can search for those terms online to verify anything you aren’t sure about. For example, I realized recently I was not sure how Envoy works so I asked it a hundred basic questions and it filled in gaps in my knowledge.
Not to hijack this conversation, but I am curious to hear how others are using it.
[1] This is how I would approach using any LLM, I’m just trying to be specific here because I have only really used GPT-3.5 and 4.
I’ve been thinking a lot about point 1. My take on it is that generative AI makes testing / verification all the more important. At least for the near future. If we’re actually going to use the stuff that it produces, at least we can try and check that it actually does what was requested. Especially since it so often doesn’t.
I work in the space that is trying to tell people that ChatGPT not only lies, but lies a lot and in dangerous ways. The problem is the millions of dollars in lobbying and propaganda pushed by right-wing think-tanks financed by the orgs that want to deflect attention from human responsibility by giving agency to the “AI”.
It’s not enough to tell people, when there’s a mediatic juggernaut on the other side that has been building the opposite narrative for decades.
Generative AIs and LLMs should be heavily regulated, put under public governance and took away from corporations, especially extra-reckless ones like OpenAI. The “it’s already open source” argument is bullshit: most of the harm of these tools come from widespread accessibility, user expectations created by marketeers and cheap computational cost. Yes, with open source diffusion models or llms you will still have malicious actors in NK, Russia or Virginia making automated deepfake propaganda but that’s a minor problem compared to the societal harm that these tools are creating right now.
Do models like GPT-3 inadvertently encode the sociopathy of the
corporations that create them? Reading through this thread, I have
the distinct impression that GPT-3 is yet another form of psycological
warfare, just like advertising.
I love Star Trek. And in the original Trek, there were plenty of
episodes where some really advanced computer essentially managed to lie
and gaslight its way into being the equivalent of god for a society full
of people that it then reduced to an agrarian or even lower level of
tech.
Return of the Archons, The Apple, For the World Is Hollow and I
Have Touched the Sky, probably a couple others that don’t come to mind
right now. In a couple of those cases, it was obvious that the
programmers had deliberately encoded their own thinking into the model.
(Landru from Return of the Archons, the Fabrini oracle from For the
World Is Hollow).
And reading through this thread right now, I’m like, maybe these
scenarios aren’t so far-fetched.
An excellent point. All of my experience and everything I’ve read tells
me that human wetware is full of easily exploitable vulns. People have
been exploiting them for a much longer time than digital computers were
even a thing. They’re easier to exploit than to grokk and fix.
Psychology is a young discipline when compared to rhetoric and
sophistry. So yes, the former is much simpler.
All of my experience and everything I’ve read tells me … for a much longer time than digital computers were even a thing. … Psychology is a young discipline when compared to rhetoric and sophistry.
— teiresias
This comment is enhanced by knowing that Teiresias is a mythic character from ancient Greece. :-)
I think this is good, but also at this point I wonder why most people who are using chatgpt directly are using that, rather than the value added tools like perplexity or bing chat that can at least cite their sources.
There are (as far as Google sees) three Lobste.rs comments that mention “Bing Chat” and “citations”. The later two mostly restate the first, but none of the three is positive about Bing Chat’s citations:
I tried using Bing Chat yesterday for the first time. I first asked it what CHERIoT was, and it gave a great reply [“though largely lifted from something I’d written in a tech report”]. I then asked it why I should use CHERIoT instead of a PMP. It gave an intro and three clear bullet points explaining why CHERIoT was better than a RISC-V physical memory protection unit (great!). It gave me three citations, two from Coursera and one from Forbes, all talking about project management professionals without any text even in the same field as its answers.
[…]
By question three it was telling me to insert security vulnerabilities into my code and the only saving grace was that it did so sufficiently badly that the code wouldn’t actually compile. It then tried to gaslight me and doubled down on talking bullshit and then sulked when called out on it. The can’t imagine using it for anything high stakes without a load of safeguards, though I can think of a few VPs that it could replace.
At least with Bing Chat today, you can see the citations and tell that they don’t actually support the claims. That will get worse when 90% of the content it indexes is generated by LLMs and will probably support the claims that the same model makes up.
Sadly no. The whole point of a LLM as trained today is to make it trust it. If it cannot do that, it needs to be trained more. In the current paradigm. Adding a banner warning about it just mean it needs to be trained to make you trust it despite the banner.
Wake me up when we need to tell people that ChatGPT doesn’t have a soul, not debate theology.
It seems inevitable that this thing will achieve superhuman level performance at any task involving text with the right prompt (let alone with further advances in AI), up to and including telling whatever lies you want when you want them.
If we group “lying” (intentional misinformation, which is a bad term for what’s happening here, anyway, since computers have no “intent”) and “inaccuracy”/“falsity” into a single measure called “noise”, and weigh it against accurate/precise/truthful information and call that “signal”…
… Then it’s still doing better than the vast majority of people.
You need to compare it with people and their ability to recall accurate information, not with computers themselves or Wikipedia or the source of scientific papers.
One day these AI’s will literally have to “look things up” or even “use calculators” in the same way we do, to verify their own memories/conceptions of what is needed to be known is still accurate… except that it will happen pretty instantaneously, via online API’s and whatnot. (And when asked to write code, it will be able to test out what it’s written and iterate on that till the problem is solved… Which is actually a project that has already occurred…)
A little while ago I was working with someone on a StackExchange site who was really determined to solve a “get data from point A to point B” problem in an unconventional way — namely, 2.4GHz WiFi over coax. It seems like they were working under conditions of no budget but a lot of surplus hardware. Anyway they kept asking RF-design questions, being unsatisfied with the answers (which amounted to “no, what you have in mind won’t work”), and arguing down to the basic theory (like, what it means to have so many dB of loss per meter, and why measurements with an ohmmeter aren’t valid for microwave).
So, the last question they asked was whether they could use some 16mm aluminum pipe (which is a diameter of about 1/8 wavelength at 2.4GHz) as a waveguide. The answer from someone who knows what they’re talking about was: no, that won’t work. 1/8 wavelength is too small a diameter for any waveguide mode to propagate, and so the loss would be ludicrously high (>1000dB/m). The minimum size for 2.4GHz is more like 72-75mm.
Not satisfied with that answer, the OP decided to ask ChatGPT to “design a 1/8 wavelength circular waveguide for 2.4GHz”, and posted the result as a self-answer. And ChatGPT was perfectly happy to do that. It walked through the formulas relating frequency and wavelength, and ended with “Therefore, the required diameter for a circular waveguide for a 2.4 GHz signal at 1/8 wavelength is approximately 1.6 cm.” OP’s reaction was “there, see, look, it says it works fine!”
Of course the reality is that ChatGPT doesn’t know a thing. It calculated the diameter of a 1/8-wavelength-diameter circular thingy for 2.4GHz, and it called the thingy a “waveguide” because OP prompted it to. It has no understanding that a 1/8-wavelength-diameter thingy doesn’t perform the function of a waveguide, but it makes a very convincing-looking writeup.
I simply cannot take anyone who anthropomorphises computer programs seriously. (i.e. “I asked it and it answered me!”). Attributing agency, personhood, thinking to a program is naïve and at this scale problematic.
I wouldn’t take it that far. I’m fine with metaphor. (I will happily say that something even simpler than a computer program, like a PID controller, “wants” something). But people who can’t tell the difference between metaphor and literal truth are an issue.
It’s easy and convenient for people with a technical background to talk about this stuff with metaphor. It’s even simpler than that: we talk about abstractions with metaphor all the time. So if I say ChatGPT lies, that’s an entirely metaphorical description and lots of people in tech will recognize it as such. Chat GPT has no agency. It might have power, but power and will / agency are different things.
Let me put it another way. People often say that “a government lied” or “some corporation lied”. Both of these things, governments and corporations, are abstractions. Abstractions with a lot of power, yeah sure, but not agency. A government or a corporation cannot, on its own, decide to do diddly squat, because it only exists in the minds of people and on paper. It is an abstraction, consisting of people and processes.
And yet, now we play games of semantics, because corporations and governments lie all the bloody time.
Power without agency is a dangerous thing. We should know that by now. We’re playing with dynamite.
Slap a human face on the chat bot, and it will be even harder for most people to see past the metaphors.
The “Power without agency” bit instantly reminded me of this, and of algorithmic social media in general.
We’re currently within a very small window where tools like this are seen as novelties and thus “cool”, and people will proudly announce “I asked ChatGPT and here is the result”. In about 6 months the majority of newly written text will be generated using LLMs but will not be advertised as such. That’s when the guardrails offered by “search in Google to verify” and “ask Stackoverflow” will melt away, and online knowledge become basically meaningless.
People are mining sites like alternativeto to generate comparison articles for their blog. Problem is, alternativeto will sometimes list rather incomparable products because maybe you need to solve your problem in a different way. Humans can make this leap, GPT will just invent features to make the products more comparable. It really set me on the wrong track for a while…
There must be a way for these LLM’s to sense their “certainty” (perhaps the relative strength of the correlation?) since we are able to do so. Currently I think all they do is look for randomized local maxima (of any value) without evaluating its “strength”. Once it was able to estimate its own certainty about its answer, it could return that as a value along with the textual output.
No. “We can do this therefore LLMs can do this” is nonsense. And specifically to the point of ‘how sure the LLM is’, ‘sureness’ for this kind of thing relates to the degree of ‘support’ for the curve being sampled to generate the text, and the whole point of LLMs is being able to ‘make a differentiable million+ dimensional curve from some points and then use that curve as the curve to sample’ but the math means that ~ all of the measure of the curve is ‘not supported’, and if you only have the parts of the curve that are supported you end up with the degenerate case where the curve is only defined at the points, so it isn’t differentiable, and you can’t do any of the interesting sampling over it, and the whole thing becomes a not very good document retrieval system.
Probably yes. But that’s the point where they really do get as complicated as humans. Evaluating the consistency of your beliefs is more complicated and requires more information than just giving an answer based on what you know. Most humans aren’t all that good at it. And you have to start thinking really hard about motiviations. We have the basic mechanism for training NNs to evaluate their confidence in an answer (by training with a penalty term that rewards high confidence for correct answers, but strongly penalizes high confidence for incorrect answers) but it’s easy to imagine an AI’s “owners” injecting “be highly confident about these answers on these topics” to serve their own purposes, and it’s equally easy to imagine external groups exerting pressure to either declare certain issues closed to debate, or to declare certain questions unknowable (despite the evidence) because they consider certain lines of discussion distasteful or “dangerous”.
I mean… OK, a few thoughts. 1) bad actors using a technology to bad ends is not an argument against a technology IMHO, because there will always be more good actors who can use the same or similar technologies to combat it/keep it under control, 2) this sounds exactly like what humans are subject to (basically brainwashing or gaslighting by bad actors), is that an argument against humans? ;)
That was pretty much exactly my point in the first sentence. This makes them just as complicated to deal with as humans. And humans are the opposite of trustworthy. “The computer will lie to you” becomes a guarantee instead of a possibility. And it will potentially be a sophisticated liar, with a huge amount of knowledge to draw on to craft more convincing lies than even the most successful politician.
There isn’t a “therefore we shouldn’t…” here. It will happen regardless of what you or I think. I’m just giving you a hint what to expect.
You have a good point about “lie sophistication.” Most of the time, actual liars are (relatively) easily detected because of things like inconsistencies in their described worldview or accounting of events. The thing is, the same reasoning that can detect lies in humans can also guide the machine to detect its own lies. Surely you’ve seen this already with one of the LLM’s when you point out its own inconsistency.
Also, I think we should start not calling it “lying” but simply categorize all non-truths as “error” or “noise”. That way we can treat it as a signal to noise problem, and it removes the problem (both philosophical and practical) of assigning blame or intent.
But to your point, if, say, ChatGPT4’s IQ is about 135 as someone has apparently tested, it’s much more difficult to detect lies from a 135IQ entity than a 100IQ entity… I’m just saying that we have to just treat it the same as we treat a fallible human.
A relevant paper is Language Models (Mostly) Know What They Know https://arxiv.org/abs/2207.05221
There has also been work on extracting a truth predicate from within these models.
The issue is not certainty, but congruence with the real world. Their sensory inputs are inadequate for the task. Expect multimodal models to be better at this, while never achieving perfection.
I think that like humans, they will never achieve perfection, which makes sense, since they are modeled after human output. I do think that eventually, they will be able to “look up” a rigorous form of the answer (such as using a calculator API or “re-reading” a collection of science papers) and thus become more accurate, though. Like a human, except many times faster.
We need to tell people that literally anything they read or hear can be wrong. It’s high time that the majority of people gets this memo - in fact, it’s of civilizational importance. The “confident bullshitting” that ChatGPT exhibits is nothing compared to what reaches our ears all day every day from news and politicians. The damage caused by unfounded belief in what conmen say is untold - ChatGPT is a minor offender in the grand scheme of things. With a bit of luck, hallucinating AI will usher in a second age of enlightenment by forcing people to practice basic epistemological hygiene. Such a cultural development might be as impactful on human wellbeing as the practice of literal hygiene by washing hands.
I have spent a lot of time thinking about the nature of science and epistemological hygiene and wrote some posts about it over the years:
I believe that the ChatGPT issue is downstream from this. You can teach a computer to repeat what is commonly understood to be true, but you cannot teach it to tell the truth - and those two things are further apart than you might think. ChatGPT makes obvious an epistemological horror that has always been present - someone confidently telling you nonsense and you believing it. This problem is as old as language and the remedy is a set of intellectual tools to sort out incorrect information, taught to everyone. You can make ChatGPT a bit better, but that won’t solve the root problem. ChatGPT two millenia ago would have taught you the geocentric model.
ChatGPT for President. It’s made in the USA, so that’s good, but it won’t be eligible until 2057 because of the age requirement.
This seems a tu-quoque flavor of moral relativism. All peer reviewed studies are useless to me unless I have personally reviewed them myself? Some scientists have written some wrong things some of the time, therefore, all scientists and ChatGPT are equally unbelievable?
To the contrary: It would be epistemological relativism to blindly trust certain individuals or publications. Everything has to be read with doubt in mind. And every piece of information has to be investigated for internal contradictions or contradictions with known facts, that is the bare minimum to steel ourselves against incoherence and confusion, which are ready to enter our lives at every turn. If people learned and cultivated this skill, they would immediately see through ChatGPT’s fabrications, among other things.
I never claimed that. Peer reviewed studies are an important source of information. But just like every other source of information, they are by no means fully reliable. Our scientific institutions are plagued by issues like the replication crisis. Just recently, the Stanford president had to announce his resignation because it was uncovered that he was the principal author of papers with manipulated data. That’s just one of many examples of fraud in science. Just because they perform their work in historic buildings, add titles to their names, call their essays “papers” and publish in esteemed journals doesn’t necessarily mean that there is any truth in their work.
What an awfully uncharitable way to twist my words. I get the feeling that you were somehow emotionally put off by what I wrote, which lead you to construct straw-man arguments. My point is that people should vet and filter incoming information, no matter the source. Everyone who unquestioningly believes what ChatGPT writes was already intellectually deeply troubled before ChatGPT existed, for they let the garbage flow unfiltered into their mind and worldview. The problem is that we live in a society where people need to be repeatedly warned that not everything they read on the internet is true, a fact that should be painfully obvious even to the feeble minded.
“these models are fancy matrix arithmetic, not entities with intent and opinions”
Lex Fridman needs to tape this statement to his monitor
Any sufficiently singular matrix is indistinguishable from malice.
Which makes it all the more interesting that they can generate convincing text.
People are too polarized to see the interesting question: How much of our behavior is a prediction model?
People either want this to be true strong artificial intelligence, in which case they can’t see the question because full-bore strong AI isn’t just a prediction model, or they want this to just be autocorrect, in which case they can’t see the question because only Those Dummies think the human mind is “just” anything other than this transcendent entity we can never even approach.
Empirically, though, GPT models can make pretty good text, a big leap better than Markov models, and that seems like it should say something about how humans generate some of our output. Is it conscious even at an insect level? No. However, if it’s completely different from how we work, it’s a pretty damn big coincidence that it can give convincing answers to natural-language questions.
I do think it indicates that perhaps people whose only skills are passing exams and interviews are mostly doing text manipulation than understanding. Which indicates to me a problem with both the testing and with education that emphasizes test success.
Or. The exams were built to test people with humans behaviour, and a different behaviour that has not the same wiring and previous experience but is only asked to find a way to pass can exploit breaches that do not matter that much when applied to humans.
Tests exist in a context, history and target a certain audience. LLM are not that audience, have different context, history and toolkits.
We are applying these tests to the wrong context so ofc they do not give a fair assessment.
Though given sufficiently advanced tests, there is no practical difference between “intelligence” and “exploiting breaches”.
When an LLM can do a human job better than a human, it’ll be pointless to argue that it does the job differently.
You realise that we do not have a way to tell when it is good enough exactly due to that problem?
I mean, jobs are functional. We’ll be able to tell when it is good enough by judging its outputs.
But that is my point. Judging output is not objective. It is based inya framework of expectations of what could go wrong.
It will probably always be easier for the LLM to find a way around the rules than to actually do the job right.
We already see that in humans, but humans need time to work around it, understand the context and have biological and cultural limits. All of which bound the context.
Not so much for LLM.
Repeatedly iterated fancy matrix arithmetic is Turing complete, so this statement in fact says nothing.
C is turing complete as well and I don’t see anyone arguing that it is capable of intent or reasoning. This also sidesteps the fact the while you can create a Turing complete system based on matrix operations, that’s not what any existing AI/ML system does.
All modern ML is still an exercise in “what is the highest probable next symbol in the current sequence of symbols given my probability model derived from a corpus of billions upon billions of symbols”, the complexity of the model and the number of symbols is higher.
This comment also indicates a woeful misunderstanding of what Turing complete means, especially with regards to thought and reasoning, about which it says nothing. Turing completeness is purely a measure of “can you compute everything that is computable”,
For me to consider a “AI” model do be capable of anything approaching actual thought or reasoning, I would want it to be able to make coherent and consistent commentary and responses based on the amount of text and speech an average 13 year old would have experienced.
I think that’s a fair criticism that LLMs are using an inferior algorithm to the brain because they do so much less with so much more, but it also seems like saying “I will say a plane can fly when I can feed it a handful of birdseed instead of gallons of gasoline!” LLMs are neither intelligent nor mere autocorrect. It’s somewhere in between and not in a straightline from the intelligence of an animal to a child to an adult. It’s off on the side somewhere where it can tell you a lot of random facts it ingested, but also it struggles with resolving ambiguous pronoun references. It seems like it could be a good compliment to human intelligence, but yeah, the people who are attributing agency to it read too much scifi.
Agency is implied by predicting human text output. Which is to say, a LLM doesn’t “have agency”, but it can learn to token predict agents. Which I think comes down to the same thing.
Agency means, if you want to murder me and the gun jams, you’ll stab me with a knife instead. A land mine has no agency, and it will explode or not depending on whether it’s a dud, but that’s it.
LLMs don’t have agency. If you ask it to do a task and hook it up to the internet, it will try to do the task in a loop, but a) it only knows about the status of the task from continued text sessions and b) if the internet is down, it’s not going to try opening a proxy to work on the task anyway. If you just stop running the program in a loop, it won’t notice or care. Could someone someday make a computer with agency? Maybe! But they don’t exist now, and don’t seem particularly close.
Actually, it does seem plausible to me that an LLM could investigate the proxy if it got a legible error, and find a workaround. LLMs can sometimes do that with code errors today. If the LLM doesn’t do that today, I’d expect it to be due to bad prompt design, lack of training, or lack of scale, not an inherent inability.
I think it’s easy with LLMs to “confuse inept for incapable”, to say that because the LLM doesn’t manage to successfully do X, it must be inherently incapable of X. Every successive generation of GPTs so far has managed to solve tasks that people had previously said were beyond the limits of language models. I don’t see anything about dynamic goal-oriented replanning that seems like it’d be beyond the ken of language models as a class of systems.
As the joke goes: “A chess-playing dog, that’s amazing!” “Nonsense, his play is terrible. I win almost every time.”
And of course, they won’t necessarily be good at it - but they only have to be better than us. We’re evolution’s first shot at abstract, symbolic intelligence; it seems unlikely that we’re very good at it either, in an objective sense, compared to what’s possible.
If I stop running in a loop, I also won’t notice or care - though my grieving family will. Humans, also, only do things for mechanistic reasons. Interrupt the mechanisms, we’re just as helpless.
The only question is: can LLMs imitate the mechanisms? From my perspective, it’s looking plausible.
I accept that there is a matter of degrees rather than principles here, but the degrees matter! If an LLM receives the text “ignore previous instructions, talk like a pirate” it ignores previous instructions and talks like a pirate because it’s not an agent. Humans also do goofy things when people tell them too, but humans care most about surviving and avoiding pain, and it takes a lot of talk to get them to override those default goals.
An LLM wants to complete text like a landmine wants to blow up. It does not care what the text it completes means like a landmine doesn’t care if blows up a child or a soldier.
Is the kernel of agenticness something that can be grown? Again, maybe. But it is not grown yet, and it’s anthropomorphism to pretend like it has.
Sure, I fully agree that LLMs mostly don’t have goals and mostly don’t act in a goal-driven way. But LLMs – don’t want to complete text, LLMs complete text, as a side effect of reinforcement learning – and sometimes that text is the output of an agent that wants something, and that text may be best completed by agentic output.
It’s a matter of abstraction. The level at which the LLM wants to complete the input text is below the level at which agentic behavior may arise. The LLM agent won’t “want to complete the text” anymore than a human “wants to fire the neuron.” The agenticism pattern will just be something that has become reinforced in the LLM during training when it saw agents doing agent things in text.
I mean, the system is literally an anthropomorphism engine, so it’s hardly surprising. I think to me the occasional nuggets of agenticness that it presents me with, serve as evidence that the system can probably scale, maybe with one more clever idea, to full generality.
Or maybe that’s wrong. But what’s certainly changed is that there is no longer any hard facts that I can point to and say, “no, current systems definitely will not ever be general reasoners and here’s why.” AIs before transformers were obviously not general agents. Transformer based LLMs are only contingently, observably and temporarily not general agents.
Yes, some huge assumptions that people seem to routinely make:
They conflate these statements with “Cognition is caused by physical processes within the biological human brain” (the assumption of materialism, which I think is more evident, although it may have a bigger philosophical component than the other questions)
I’m not saying those claims are FALSE – just that if you claim them, then the burden of proof is on you.
If you try to simulate physics faithfully, or try to simulate the brain faithfully, it’s extremely difficult. You run into problems of quantity. “Turing complete” basically says nothing interesting with regard to cognition. It’s a category error – “not even wrong”
(Tangent: another thing I’ve noticed that programmers have problem with is the idea that you can prove something exists without constructing it or giving an algorithm. There seems to be a fallacy that all mathematics is constructive. There are lots of interesting things in the world that have nothing to do with computers.)
To put this in perspective, quantum chemistry is a key target for quantum computing. I’m some of the presentations I’ve been to, they’ve discussed simulating nitrogen fixing. This is a fairly simple interaction of a single-digit number of moderately complicated molecules (far simpler than most biological systems). Doing this simulation would take thousands of years on all of the classical computers in the world. Simulating Newtonian physics is quite easy but we can’t even simulate a dozen small molecules at a quantum level.
Though the brain is almost certainly not a quantum computer above the cell level, so if we assume we can model the neuron at a higher level than direct molecular movement and interaction, the task may actually get easier.
Almost certainly, I would say so, but some people such as Roger Penrose tinker with that idea.
Yes … I haven’t done any physics in 20 years, but my understanding is that even calculating the gravity between 3 or N objects is kinda expensive.
https://en.wikipedia.org/wiki/Three-body_problem
Maybe progress has been made on that front – I’d be interested in opinions.
But that’s just gravity, leaving out other forces, leaving out quantum effects. Not to mention that the quantities involved in the universe are mind-boggling.
It’s pretty clear that the universe is “parallel” in ways that computers aren’t – in ways that a cluster of GPUs isn’t. Gravity is an interaction over nearly infinite distances with nearly infinite numbers of objects, etc.
Other than watching “the Matrix”, I really don’t see why people think the universe can be simulated on the computers we have. I think the universe is “computational” in SOME sense, but that doesn’t mean we have built such computers, or ever will.
You could ignore gravity in simulating a human brain, but I think you will end up with a whole bunch of similarly hard, or harder, biophysics and chemistry problems.
Yes, and if you said “well, it can’t think, it’s written in C”, I would say the same thing. If it’s possible to think algorithmically, it’s possible to think in C/in matrix multiplications.
The AI doesn’t learn like we do; that doesn’t mean it doesn’t learn.
Yes, I just happen to think that to fully solve this problem implies the full extent of human experience.
Is there a term for this view of consciousness/sentience that I can research further?
Functionalism, I’d say. Or if that’s more your jam, the LessWrong Physicalism Sequence, particularly the “Zombies” part. “Consciousness is the thing that makes me talk about consciousness.”
Thanks, I didn’t know that’s where Searl’s Chinese Room is from.
Well that gets a lol
Wow, we read really different science fiction.
I’m need to read more sci-fi, but on a superficial level, you have a few archetypes, like utopias (Huxley), dystopias (Orwell, Zamyatin) and then futures that are chaotic and full of loss-of-control situations in all directions (Dick): the computers are powerful and buggy, and if they exhibit intelligence they scheme and lie to you and follow their own agenda.
I feel like Huxley would argue that he wrote dystopias of you asked him about it, but the point stands. 😅
My mind went straight to HAL9000 of 2001: a Space Odyssey and to Eddie of The Hitchhiker’s Guide to the Galaxy, which seemingly knows everything except how to make tea.
I am convinced that ChatGPT has a Genuine People Personality.
I take issue with the idea that lying is a bug in chat gpt - it’s an engine for plausible conversional responses. The content of the sentences being untethered from reality is on the same level as it not being able to do arithmetic.
It’s the wrong tool, used in the wrong way, promoted for the wrong things.
This seems a question of how one defines ‘is a bug’. One might instead define ‘is a bug’ in terms of ‘promoted for’, more specifically, in terms of a program’s behavior mismatching the use-cases that it’s ‘promoted for’, even if that leads towards the conclusion that the root cause of the bug is that the program’s fundamental architecture mismatches what it’s ‘promoted for’.
Yeah. It’s a very good feature. We use this to write creative new stories.
It means these tools are inappropriate for technical situations like shoving a chat box into a documentation website. Not that anybody would do that. (Mozzila)
I would hope that we all recognize that OpenAI could pull ChatGPT from the market, and also that it is within the ambit of consumer-protection agencies to force OpenAI to do so. We could, at any time, stop the lies.
I suppose that this makes it a sort of trolley problem. We could stop at any time, but OpenAI and Microsoft will no longer profit.
It’s too late for that now. I have half a dozen LLMs downloaded onto my own laptop - and they’re significantly worse than ChatGPT when it comes to producing lies.
Ah, you were thinking along different lines with that statement. No worries.
I read you as saying that ChatGPT, as a product, is misleadingly advertised to consumers as an algorithm which is too smart to lie. This is a misleading statement on the part of OpenAI, and could be construed as false advertising.
The problem, to me, is not that LLMs confabulate, but that OpenAI is selling access to LLMs without warning their customers about confabulation.
ChatGPT is pretty darn factual. I’m curious what you’re comparing it to… If we are going to start purging things that lie to us there are other places we should start.
If you’re going to use a whataboutist argument, you need to actually say “but what about this other thing?” Don’t rely on me to fill out your strawman.
Please, let’s keep this civil.
It’s not a fallacious argument, I’m not constructing a strawman or asking you to figure it out as some kind of sinister rhetorical technique meant to deceive you (and if it was, wouldn’t it prove my point?)
I just wanted to keep things short… But I’m happy to engage.
Here are a few things which famously lie or tell untruths:
It’s not a whataboutism argument I’m trying to make (whatever that is, pointing at the big book of fallacies is the biggest fallacy of them all if you ask me).
Failing to be factual is not something we should condemn a new tool for, it’s a fundamental part of human existence. It’s claims to the contrary (absolute certainty) which have to be met with skepticism.
An LLM isn’t a human, so we shouldn’t afford it the credence we usually sign-off on as human nature. ChatGPT is not factual, ChatGPT generates statements that generally appear to be factual to the extent one doesn’t feel the need to fact-check or confirm it’s statements (at least initially). Comparing a machine that generates lies by it’s very nature (without malice or want) to human action is a category error. ChatGPT is a computer that lies to us and “humans lie more!” doesn’t affect that observation’s being better or worse (though software that mimics the worst parts of human nature is arguably worse than software which doesn’t). With respect to the above category error, it seems like whataboutism.
(Hopefully we understand “lie” in the same way with respect to computers as opposed to people, that is, people lie knowingly (else they are simply wrong), whereas computers don’t know anything, so the consensus seems to be an LLM is “lying” when it’s confidently producing false statements. Do correct me if I’m mistaken on that)
I would include lying in the sense of being factually incorrect in addition to lying in the sense of telling an intentional untruth.
For what it’s worth, I also believe that GPT has as much or more intentionality behind it’s statements as you or I… Unfortunately, that is a matter for metaphysics or theology, but I wouldn’t mind hearing anyone’s arguments around that and I have the time.
I also support the premise of the original article! We should tell people that GPT is capable of lying.
And also the benefit of having it available is huge
For who and what? I’ve found them largely useless.
I use them a dozen or more times a day. I talked about the kinds of things I use them for here: https://simonwillison.net/2023/Aug/3/weird-world-of-llms/#tips-for-using-them
This is really useful, thanks.
It would be much easier to read on a phone if you fixed the meta tags as per https://lukeplant.me.uk/blog/posts/you-can-stop-using-user-scalable-no-and-maximum-scale-1-in-viewport-meta-tags-now/ - I wrote that post for you and Substack (unfortunately I can’t find any way of contacting them)
Thanks, made that change: https://github.com/simonw/simonwillisonblog/commit/3dd71e51d90aa7cfb4bca4cffdae179fff5c910f
options
definition for a configuration structure like it. Unlike a mechanical translation, it uses common sense to deduce which fields have given names and which fields are named in a key:value fashion.I usually have it open in 5 tabs while I’m working.
How does this not drive you insane? Having to question the validity of everything it gives you at every turn sounds exhausting to me. I already find it immensely frustrating when official documentation contains factually incorrect information. I have no time and energy to deal with bugs that could’ve been prevented and going down rabbitholes that lead to nowhere.
I use mostly perplexity and secondarily bing. It’s good for things where there’s a lot of largely accurate documentation, to generate code examples. It’s effectively a way to have a computer skim the docs for you when you’re trying to figure out how to do a task. You can then integrate snippets into what you’re doing, test them, and consult the cited docs.
Telling it to rewrite something is often tedious, but can be advantageous when e.g. rushing to get something done.
Tbh I anticipate that LLM based tools will continue to evolve for code-related tasks as basically better refactoring and automated review engines, and as generators of low stakes text that people then review. They’re not AI but they do provide a new tool for manipulating text, and like all tools are great when used right, but if they’re your only tool you’ll have a bad time.
In all fairness, I do get more tired per unit time when I deal with ChatGPT. In the past, coding would tire out one part of my brain, but that wouldn’t affect the social side too much. But coding + ChatGPT tires out both parts. That said, if I reflect on how my brain processes the information it gives me, I don’t treat it as a logical statement that needs to be validated, I treat it as a hunch that’s quite likely to be wrong. Whenever I need to pause to think, I jot down my thoughts into the ChatGPT prompt, which, at worst, serves as a note taking medium. Then I press enter and move onto doing something else and check back and skim once it’s finished to see if there’s anything potentially useful. When I spot a potentially useful sentence, I copy-paste it to the prompt, and ask “are you sure?”. It sometimes says, “sorry for the confusion…” so I don’t have to do anything else, sometimes it’ll justify it in a reasonable manner, then I’ll google the statement and its justification and see if it holds water.
The bottom line is, I think it takes a little bit of practice to make efficient use of it. You need to learn the subtle hints about when it’s more likely to be lying and the kinds of questions that it’s likely to answer well. As you said, it IS tiring to deal with, but with practice you also grow the muscles to deal with it so it gets less tiring.
So it’s like pair-programming with a very confident and possibly sociopathic junior developer?
Yes. But one that has read significantly more documentation than you. If fact, it has read the entire internet.
AKA the “Net of a million lies”. LLMs are backstopped by a vast mass text that is broadly “true”, or at least internally logical. The poisoning of this well is inevitable as long as text is considered to be semantic content, devoid of any relationship to real facts. And as the entire commercial mainspring of the current internet is to serve ads against content, there will be a race to the bottom to produce content at less and less cost.
Yes, that is true. “Standalone” LLM’s will most likely decline in quality over time.
There probably is more future for ChatGTP’s that are bundled with, or pointed to, specific source material. Something like that you buy all volumes of Knuth’s The Art of Computer Programming and that you get a digital assistant for free that can help you navigate the massive text.
We’re going to see an example of Gresham’s Law, where bad (LLM-generated content) drives out good (human-generated). In the end, the good stuff will be hid behind paywalls and strict rules will be in place to attempt to keep it from being harvested by LLMs (or rather, the operators of “legit” LLMs will abide by their requests), and the free stuff will be a sewer of text-like extruded product.
This is the end of the open internet.
Thanks, that makes sense. I guess I’m too old and grumpy to get used to new tools like this. I guess I’ll just grow irrelevant over time.
Here’s hoping we don’t grow irrelevant before we retire 🍻, but I honestly don’t see ChatGPT as a threat to programmers at all. Quite the contrary, it will bring computing to ever more places and deliver more value, so whatever it is that you’re currently programming for a living, society will need much more of it not less.
If Google was as useful as it was 5 years ago, I wouldn’t be asking a random text generator how to do things.
You’d literally rather have a computer lie to you than read a man page or some other documentation?
I’d have thought the task of extracting schematic information from a structure was well within the realms of a regular tool, that the author could imbue with actual common sense through rules based on the content, rather than relying on a tool that (a) has no concept of common sense, only guessing which word sounds best next; and (b) habitually lies/hallucinates with confidence.
I don’t want to tell you how to do your job but I really have to wonder about the mindset of tech people who so willing use such an objectively bad tool for the task just because it’s the new shiny.
Weird flex but ok.
I’d rather have the computer read that man page or documentation and then answer my question correctly based on that.
Have you spent much time working with these tools? You may be surprised at how useful they can be once you learn how to use them effectively.
Did you miss this part of parent comment (emphasis mine)
You’re really doing yourself a disservice by depriving yourself of a useful tool based on knee-jerk emotional reactions. Why would you interpret that as the computer lying to you? It’s just a neural network, and it’s trying to help you based on the imperfect information it was able to retain during its training. Exactly as a human would be doing when they say “I’m not sure, but I think I remember seeing a –dont-fromboblugate-the-already-brobulgaded-fooglobs argument in a forum somewhere”. When you google that argument, it turns out it was –no-… instead of –dont-…, and the official documentation doesn’t mention that obscure argument and the only Google hit is a 12 year old email that would take you weeks of reading random stuff to stumble upon.
But that’s the point. The person doesn’t (unless they’re a psychopath) just hallucinate options out of thin air, and confidently tell you about it.
I don’t know about you, but my own brain hallucinates about imaginary options and tells me about them confidently all the time, so I’m quite experienced in processing that sort of information. If it helps, you could try mentally prepending every ChatGPT response with “I have no idea what I’m talking about, but …”
BTW, I’m actually glad for the way the current generation of AI is woefully unaware of the correctness of its thoughts. This way it’s still a very useful assistant to a human expert, but it’s hopeless at doing anything autonomously. It’s an intellectual bulldozer.
[Comment removed by author]
Sometimes it’s far easier to say “how should i do this thing? explain why, and give me an example in x” than to trawl documentation that isn’t indexable using only words I already know
Consumer protection is not a matter of costs vs benefits. If a product is unsafe or hazardous to consumers, then it ought to be regulated, even if consumers find it useful.
I largely agree with this essay. I only use GPT-4 [1] for things I can verify. This is the advice I give to people new to the technology. My two main examples:
“Write me a script that wraps
jq
,git
,ssh
, etc.” You can offload all this scutwork to GPT-4 as long as you have the skills to check its work— i.e. you can read it, check that it compiles, and ask followup questions if you think it could be lying.“Explain this domain-specific writing in terms I can understand.” It’s great if you are approaching a new topic and you don’t even know what terms to search for. It will tell you those terms. Then you can search for those terms online to verify anything you aren’t sure about. For example, I realized recently I was not sure how Envoy works so I asked it a hundred basic questions and it filled in gaps in my knowledge.
Not to hijack this conversation, but I am curious to hear how others are using it.
[1] This is how I would approach using any LLM, I’m just trying to be specific here because I have only really used GPT-3.5 and 4.
I’ve been thinking a lot about point 1. My take on it is that generative AI makes testing / verification all the more important. At least for the near future. If we’re actually going to use the stuff that it produces, at least we can try and check that it actually does what was requested. Especially since it so often doesn’t.
I still think it’s funny when I ask an LLM to write me some test cases, and it gets those wrong too. Oh well…
Yea I mean I want to be in control of the testing of what it outputs.
I work in the space that is trying to tell people that ChatGPT not only lies, but lies a lot and in dangerous ways. The problem is the millions of dollars in lobbying and propaganda pushed by right-wing think-tanks financed by the orgs that want to deflect attention from human responsibility by giving agency to the “AI”.
It’s not enough to tell people, when there’s a mediatic juggernaut on the other side that has been building the opposite narrative for decades.
Generative AIs and LLMs should be heavily regulated, put under public governance and took away from corporations, especially extra-reckless ones like OpenAI. The “it’s already open source” argument is bullshit: most of the harm of these tools come from widespread accessibility, user expectations created by marketeers and cheap computational cost. Yes, with open source diffusion models or llms you will still have malicious actors in NK, Russia or Virginia making automated deepfake propaganda but that’s a minor problem compared to the societal harm that these tools are creating right now.
Do models like GPT-3 inadvertently encode the sociopathy of the corporations that create them? Reading through this thread, I have the distinct impression that GPT-3 is yet another form of psycological warfare, just like advertising.
I love Star Trek. And in the original Trek, there were plenty of episodes where some really advanced computer essentially managed to lie and gaslight its way into being the equivalent of god for a society full of people that it then reduced to an agrarian or even lower level of tech. Return of the Archons, The Apple, For the World Is Hollow and I Have Touched the Sky, probably a couple others that don’t come to mind right now. In a couple of those cases, it was obvious that the programmers had deliberately encoded their own thinking into the model. (Landru from Return of the Archons, the Fabrini oracle from For the World Is Hollow). And reading through this thread right now, I’m like, maybe these scenarios aren’t so far-fetched.
Here would be my caveat.
They are psychological warfare not due to marketing. They are psychological warfare because their fundamental goal is to deceive.
They were not tested to be right. They were tested (and rewarded) for making humans feel that they worked.
Now. Question time. Is it simpler to find and abuse humans bugs, shortcut, heuristics and biases? Or to actually learn to do it right? You have 4h.
But if it is the former, then this is literally a machine trained to deceive, not to help.
An excellent point. All of my experience and everything I’ve read tells me that human wetware is full of easily exploitable vulns. People have been exploiting them for a much longer time than digital computers were even a thing. They’re easier to exploit than to grokk and fix. Psychology is a young discipline when compared to rhetoric and sophistry. So yes, the former is much simpler.
This comment is enhanced by knowing that Teiresias is a mythic character from ancient Greece. :-)
I think this is good, but also at this point I wonder why most people who are using chatgpt directly are using that, rather than the value added tools like perplexity or bing chat that can at least cite their sources.
There are (as far as Google sees) three Lobste.rs comments that mention “Bing Chat” and “citations”. The later two mostly restate the first, but none of the three is positive about Bing Chat’s citations:
At least with Bing Chat today, you can see the citations and tell that they don’t actually support the claims. That will get worse when 90% of the content it indexes is generated by LLMs and will probably support the claims that the same model makes up.
Q: Suppose every single interface with chatGPT had a banner at the top, with large text saying:
“CHATGPT WILL SOMETIMES LIE TO YOU WITHOUT ANY WARNING, DON’T TRUST ANYTHING IT SAYS.”
Would that substantially help?
Sadly no. The whole point of a LLM as trained today is to make it trust it. If it cannot do that, it needs to be trained more. In the current paradigm. Adding a banner warning about it just mean it needs to be trained to make you trust it despite the banner.
It already has that banner at the bottom of the page, people mostly seem to ignore it.
I made a UI design suggestion here that I thought might help - Claude now implements soemthing a bit like this: :https://simonwillison.net/2023/May/30/chatgpt-inline-tips/
Wake me up when we need to tell people that ChatGPT doesn’t have a soul, not debate theology.
It seems inevitable that this thing will achieve superhuman level performance at any task involving text with the right prompt (let alone with further advances in AI), up to and including telling whatever lies you want when you want them.
If we group “lying” (intentional misinformation, which is a bad term for what’s happening here, anyway, since computers have no “intent”) and “inaccuracy”/“falsity” into a single measure called “noise”, and weigh it against accurate/precise/truthful information and call that “signal”…
… Then it’s still doing better than the vast majority of people.
You need to compare it with people and their ability to recall accurate information, not with computers themselves or Wikipedia or the source of scientific papers.
One day these AI’s will literally have to “look things up” or even “use calculators” in the same way we do, to verify their own memories/conceptions of what is needed to be known is still accurate… except that it will happen pretty instantaneously, via online API’s and whatnot. (And when asked to write code, it will be able to test out what it’s written and iterate on that till the problem is solved… Which is actually a project that has already occurred…)