You have a weird job. Like, very. Not only you have to do uncommon things for a developer, your economics are all weird, too. You do so many weird shit so often that putting relatively large amounts of effort into adapting your environment for your tasks is probably a net gain much more often than for regular (and even not so regular) people. Or at least that’s the impression I get.
It looks like you own pretty much all of your work, and have absolute autonomy on it. You don’t have to worry about corporate approval, IP, industry regulations, etc.
You are capable of thinking for yourself, as you perfectly demonstrate in the git log example. If I had a nickel for every time I had to argue with people wanting to follow “best practices” even when that would be a bad idea, I’d have an annoying amount of nickels.
1 and 2 are what makes me feel like it’s not worth the effort for me personally (my job is normal and I don’t own shit).
3 is the reason why I’m wary of praising this kind of tool at all, I’m not sure that the people listening truly understand that trade offs and limitations like the people doing the praising.
Then there’s 4: I don’t trust LLM providers, right now? I have the same almost irrational distrust that open source fanatics have towards proprietary software. It all just smells too iffy to my taste.
It looks like you own pretty much all of your work, and have absolute autonomy on it. You don’t have to worry about corporate approval, IP, industry regulations, etc.
The C-level folk at my employer has said that it is a bad idea to use these tools for work. They may even be blocked. When people start putting corporate secrets in a chat then it’s a bad scene. (This has already happened, and it has started to appear in answers.) These systems are not trustworthy in their current form, both in the output they produce, and what they will do with the input you provide them. It makes them a no-go for those of us working in enrivonments where your work is not your own.
I don’t trust LLM providers, right now? I have the same almost irrational distrust that open source fanatics have towards proprietary software. It all just smells too iffy to my taste.
This has already happened, and it has started to appear in answers
Can you expand on that? LLMs are trained once, not constantly updated, so I would be extremely surprised to hear of a genuine case where input to a bot had showed up in output somewhere. I’m also confident that OpenAI are too smart to use the raw input to their bots as further training data.
I don’t have a specific example because I have not experienced it (in fact, I have yet to use any of these systems), but the message sent out said that they have seen trade secret stuff show up in results. Maybe it was given in one system and showed up in another. It was, admitedly, a cagey case, but the message was very clear: do not use these systems because it is too easy to reveal something you should not be revealing. (The presentation that went along with it was the first time I have ever seen the word “bullshit” show up in official corporate correspondence.)
Yeah, $currentJob also had to send a department wide email telling people to not paste the company proprietary code that handles PII, finance related info, into a random text box that promises magic answers, because the regulators might come for us and turn the company into a nice parking lot.
What regulation requires you to keep how you handle user data a secret? The only risk that might impose is if you’re non compliant already but no one can tell. That’s an excuse.
So here’s a recent use case that does applies to Non Weird Jobs and also shouldn’t require regulation or corporate approval:
ask "In the python library dagster, where is the class that manages the workflow DAG defined?"
In the Dagster library, the class that manages the workflow DAG (directed acyclic graph) is defined in dagster/core/definitions/graph.py. The class is called GraphDefinition.
The path was _core, not core, but otherwise GPT4 was correct. That made it easier for me to navigate an unfamiliar codebase, a thing I often needed to do even when I was still an FTE.
I still find it very hard to not laugh at a product whose name is French for ‘I farted’. It’s right up there with the Toyota MR2 in terms of unfortunate naming.
The amount of effort required to use GPT-4 for research appears similar to the effort required to apply the technique I learned in the 90s: distill down to keywords, titles, and authors; repeatedly attempt combinations in the box until the right articles appear. One difference is the amount of context which is presented; when the wrong articles appear, the researcher learns metadata about their search objective, enriching their overall understanding. GPT-4 removes this context, instead apologizing for being unable to espouse opinions.
One increasingly-obvious fault in OpenAI’s products, visible in GPT-4 and listed in GPT-4’s model card, is that RLHF destroys the system’s self-confidence. Ask the typical large language model what the most important theorem in category theory is, and the reply is usually Yoneda’s lemma. But ask GPT-4, and you’ll get an apology in the first paragraph, followed by Yoneda’s lemma in the second paragraph. It knows the answer, but has been trained to act like a retail employee or customer-service agent.
I was listening to Hitcherhiker’s Guide to the Galaxy again on a car trip recently, and when it got to the part about “Genuine People Personalities” for the AIs, I suddenly understood why I am instantly annoyed by ChatGPT’s endless groveling and whinging.
I wonder if the system message can be used to prime the model to answer more, for lack of a better word, robotically, with less polite language, just short sentences stating facts, no apologies for unknown answers.
I bet it could. I just tried a system prompt of “reply concisely, be abrupt to the point of rudeness”, then a user input of “What is the most important theorem in category theory?” and it replied directly with “Yoneda Lemma.”
Along with a system message there are also tunable parameters called “temperature” and “top P” ^1:
Temperature - In short, the lower the temperature, the more deterministic the results in the sense that the highest probable next token is always picked. Increasing temperature could lead to more randomness, which encourages more diverse or creative outputs.
Top_p - Similarly, with top_p, a sampling technique with temperature called nucleus sampling, you can control how deterministic the model is at generating a response. If you are looking for exact and factual answers keep this low. If you are looking for more diverse responses, increase to a higher value.
These are both available not through ChatGPT directly, but through the OpenAI Playground.
In regards to not apologizing for unknown answers, it’s probably better to just cut it off with an additional sentence at the end of your prompts, like ‘Respond “Unsure about answer” if not sure about the answer.’ ^2
The 8k context gpt-4 api gave me only two sentences, “The Yoneda Lemma is considered one of the most important theorems in category theory. It provides a link between a category and its functor category, revealing deep structural insights about objects and morphisms in the category.”
I believe the one on ChatGPT web interface is different, and at this point we probably need to distinguish these two. OP specified that they use the api version exclusively.
GPT has an annoying habit of being almost right, to the point that it takes enough research for me to verify it’s results as to just do it myself, and I’m still worried there’s something subtlety wrong with what it tells me.
It’s not useless, and i use it, but I don’t trust it. If I did, it would have made a fool of me more than once.
I think the bigger problem is that it’s always confident. I’m working on a paper right now and I’m using pgfplots, which is a fantastic package for generating tables and graphs in LaTeX, with a thousand-page manual that I always spend ages reading when I use it, and then forget because I use it so rarely. This time, I used Bing Chat (which finds the manual and a load of stack-overflow-style answers to add into its pre-trained data). I tried to get it to explain to me how to get labels to use SI prefixes as contractions. It gave me a plausible answer along with an example and the output that I wanted. There was just one problem: copying its example verbatim into my TeX file gave totally different output. No variation on its example worked and I couldn’t prompt it to give me one that worked in half an hour of trying.
In contrast, when I wanted to create a bar chart with the same package, it basically told me what to do. Everything I wanted to change, I asked it for and it showed me a summary of a relevant part of the manual. For fairness, I had the manual open in another window and tried looking up the answer while it was processing and it was faster almost all of the time. For things that loads of people have done, but I can’t remember how to do (or never knew), it was a great tool.
Unfortunately, the responses in both cases were indistinguishable. Whether it’s giving me a verbatim quote from an authoritative source or spouting bullshit, it’s always supremely confident that its answer is correct. It definitely reminds me of some VPs that I’ve met.
If it produces an incorrect answer to a question, what stops it from “verifying” that initial incorrect answer? Or is this more like, another layer, another chance for it to maybe be accurate?
I couldn’t tell you, I don’t know how it works, just that on solutions I’ve known to be incorrect, asking that question has provided the expected corrections, so it’s doing something.
My understanding is in ChatGPT4 data only flows in one direction, by asking it to reflect on its own answer it is giving the network a chance to process its own thoughts in a way (inner monologue?) at the cost of more compute time.
When asking it if it’s correct, the answer it gave previously will be part of the context, so it will be available from the first layer, so it will be better processed by all the layers of the Transformer. Whereas when you ask it for something, it will use some layers to figure out what you meant and in what direction to answer, meaning there is less compute available to producing the correct answer.
The problem I have with the latest generation of AIs is that they start failing and making up things precisely when the questions become interesting. So, when a problem is a tough nut to crack, that’s when I need such a tool, and that’s exactly where it can’t help. So I use it to compose funny stories and poems to entertain me. I laughed way too much already about the things it writes. Maybe a future with AI overlords wouldn’t be so bad after all, they know how to crack a joke. Though I don’t know how wittingly.
I tried Bing Chat for the first time last week. It told me that I should use an API that doesn’t exist (and can’t exist because it would completely break the security model of the platform that I was using) and then insisted it was in a specific file. When I told it what was on the line it pointed me to, it told me it had moved to a different line (which had nothing similar) and the sulked when I called it out a third time.
My favourite bit was the second answer though, where it gave me a very good reply to my question and gave me three citations to support it. All three citations referred to something in a completely unrelated field that happened to have the same TLA.
My current job is mostly to overcoming byzantine business processes and bizarre standards set up by people long gone. i don’t think a chat bot will be of much help
I spent a couple of weeks trying to use GPT4 heavily, in a context where lots of people were sharing tips for how to get value out of it. And, in the end, it just wasn’t very useful to me. Can it produce code? Yes. But I’ve spent a lot of time learning how to remove unnecessary pieces, and its usefulness in producing code seems to increase dramatically as you ask it for code that is in a piece I just wouldn’t write in the first place.
I may just be a weird outlier (it wouldn’t be the first time), but I cancelled my subscription. I can find many more useful things to do with $20/month, and that’s before we get into the issues around intellectual property, data governance, etc.
I think its super handy as an assistant where you give it limited trust. Some things I have used GPT4 for:
Generating type definitions from some example json documents.
Refactoring some go code to use generics where the old version used interface{}.
Generating some boilerplate code to process a .ics file from python.
Generating benchmarks to quickly answer questions like: ‘generate me a benchmark to show how many nanoseconds it takes to create a goroutine?’
Asking how to check for a specific error condition and generate a unit test demonstrating that case.
Writing some boilerplate code - like a ./configure script that doesn’t depend on autoconfig.
Generating unit tests for edge cases in some simple text processing functions.
Stubbing out a specification document based on a freeform text description.
Writing some simple boilerplate for processing CSV files.
All these are easy to verify and tweak and don’t require much trust. I also didn’t feel like I was missing out on much learning by having the tool do these for me.
It definitely makes a lot of mistakes, but for me it crossed a threshold of usefulness and it will probably keep improving from here.
I have a paid subscription that gave me access to chatgpt4. All I really do is open a chat window on a browser tab on a second monitor then ask it questions in the times I don’t want to waste time on something trivial that takes a while, or would normally be googling. While it is generating I can work on other things then come back to what it has generated and integrate it or modify it.
An example questions might be:
Given the go function 'func NukeDirectory(foo string) error', could you generate me a set of go unit tests that cover cases like where the directory is read only or any other less common scenarios. Please use testing.T tempdir functions where possible.
Then in response chatgpt4 gave me this fairly resonable output (which I included nearly all as a demonstration):
...
func TestNukeDirectory(t *testing.T) {
t.Run("normalDirectory", func(t *testing.T) {
tempDir := t.TempDir()
err := NukeDirectory(tempDir)
if err != nil {
t.Errorf("Expected no error, got: %v", err)
}
if _, err := os.Stat(tempDir); !errors.Is(err, os.ErrNotExist) {
t.Errorf("Expected directory to be removed, got: %v", err)
}
})
t.Run("readOnlyDirectory", func(t *testing.T) {
tempDir := t.TempDir()
err := os.Chmod(tempDir, 0444)
if err != nil {
t.Fatalf("Failed to change directory permissions: %v", err)
}
err = NukeDirectory(tempDir)
if err == nil {
t.Error("Expected an error, got nil")
}
if !errors.Is(err, fs.ErrPermission) {
t.Errorf("Expected permission error, got: %v", err)
}
})
t.Run("nonExistentDirectory", func(t *testing.T) {
tempDir := filepath.Join(t.TempDir(), "nonexistent")
err := NukeDirectory(tempDir)
if err == nil {
t.Error("Expected an error, got nil")
}
if !errors.Is(err, fs.ErrNotExist) {
t.Errorf("Expected not exist error, got: %v", err)
}
})
t.Run("fileInsteadOfDirectory", func(t *testing.T) {
tempDir := t.TempDir()
tempFile := filepath.Join(tempDir, "file.txt")
err := os.WriteFile(tempFile, []byte("hello"), 0666)
if err != nil {
t.Fatalf("Failed to create file: %v", err)
}
err = NukeDirectory(tempFile)
if err == nil {
t.Error("Expected an error, got nil")
}
if !errors.Is(err, fs.ErrNotADirectory) {
t.Errorf("Expected not a directory error, got: %v", err)
}
})
t.Run("directoryWithSubdirectories", func(t *testing.T) {
tempDir := t.TempDir()
subDir := filepath.Join(tempDir, "subdir")
err := os.Mkdir(subDir, 0755)
if err != nil {
t.Fatalf("Failed to create subdirectory: %v", err)
}
err = NukeDirectory(tempDir)
if err != nil {
t.Errorf("Expected no error, got: %v", err)
}
if _, err := os.Stat(tempDir); !errors.Is(err, os.ErrNotExist) {
t.Errorf("Expected directory to be removed, got: %v", err)
}
})
}
As a software developer, I adore having access to GPT3/4. Lately, I’ve been jumping around from language to language, and it’s a god-send when I know how to do something in one language and want to know how to do it in another. It’s not terribly difficult or time-consuming to do with a search engine, but with GPT I can ask follow-up questions to get an idea of idiomatic ways of doing things or about libraries that might exist in the new language to aid me.
How do you know it isn’t confabulating answers about idiomatic ways of doing things? Unlike the existence of libraries, it doesn’t seem like an easy thing to verify.
After using it for long enough you end up developing a pretty accurate intuition as to when to trust it and when to double-check.
Writing idiomatic code is something it’s actually really good at, because it’s seen a LOT of code in different languages and “idiomatic” basically means “in the most average way possible”. LLMs are really good at coming up with the most average way possible to express something!
That part just comes with experience I think. I don’t just blindly trust what it spits out. Often I’ll run similar prompts through it to check for consistency, and if I’m totally unfamiliar with a language I spend a bit of time on a forum/subreddit/channel for it initially just to get a feel.
I’d day it’s close to the mark more often that not though. To clarify, I’m not having it write code for me, but just having it give examples of how to get something done. Creating tests in an unfamiliar testing framework is something it really excels at in my experience.
I had a similar experience asking an LLM to generate some JavaScript code for me. The code was completely wrong, but it used all of the right APIs (I know enough of JavaScript-the-language to get things done, I haven’t looked at modern web APIs for ages) and so I could jump straight to the docs and write something correct in a few minutes.
Yes, this totally mirrors my experience. It getting the code wrong doesn’t matter, as we shouldn’t (yet) be relying on it for that. It’s a great tool for experienced developers to accelerate our workflows.
Definitely not, and I wouldn’t expect that for months if not years.
The rate at which the open source self hosted models are improving is pretty amazing though. Two months ago ther were none that even came close to
GPT/3 - it feels like there are two new entries a week now.
Call me when I can use one of these services without giving up a phone number—or better when I could self-host it. There’s a lot to gain, but it seems like a lot to use by using these services.
If you signed up and tried gpt4 then I accept your opinion - if you did not then your opinion is probably worth very little in this context - I think its hard to gauge something so new without actually trying it seriously.
You make an interesting case, but:
1 and 2 are what makes me feel like it’s not worth the effort for me personally (my job is normal and I don’t own shit).
3 is the reason why I’m wary of praising this kind of tool at all, I’m not sure that the people listening truly understand that trade offs and limitations like the people doing the praising.
Then there’s 4: I don’t trust LLM providers, right now? I have the same almost irrational distrust that open source fanatics have towards proprietary software. It all just smells too iffy to my taste.
The C-level folk at my employer has said that it is a bad idea to use these tools for work. They may even be blocked. When people start putting corporate secrets in a chat then it’s a bad scene. (This has already happened, and it has started to appear in answers.) These systems are not trustworthy in their current form, both in the output they produce, and what they will do with the input you provide them. It makes them a no-go for those of us working in enrivonments where your work is not your own.
Same.
Can you expand on that? LLMs are trained once, not constantly updated, so I would be extremely surprised to hear of a genuine case where input to a bot had showed up in output somewhere. I’m also confident that OpenAI are too smart to use the raw input to their bots as further training data.
I don’t have a specific example because I have not experienced it (in fact, I have yet to use any of these systems), but the message sent out said that they have seen trade secret stuff show up in results. Maybe it was given in one system and showed up in another. It was, admitedly, a cagey case, but the message was very clear: do not use these systems because it is too easy to reveal something you should not be revealing. (The presentation that went along with it was the first time I have ever seen the word “bullshit” show up in official corporate correspondence.)
Yeah, $currentJob also had to send a department wide email telling people to not paste the company proprietary code that handles PII, finance related info, into a random text box that promises magic answers, because the regulators might come for us and turn the company into a nice parking lot.
What regulation requires you to keep how you handle user data a secret? The only risk that might impose is if you’re non compliant already but no one can tell. That’s an excuse.
So here’s a recent use case that does applies to Non Weird Jobs and also shouldn’t require regulation or corporate approval:
The path was
_core
, notcore
, but otherwise GPT4 was correct. That made it easier for me to navigate an unfamiliar codebase, a thing I often needed to do even when I was still an FTE.God dammit that might have been the best possible use case you could use to convince me.
I still find it very hard to not laugh at a product whose name is French for ‘I farted’. It’s right up there with the Toyota MR2 in terms of unfortunate naming.
It’s just the latest move in the longtime anglo/french tech naming war, precipitated by bit and riposted by coq.
So “Chat GPT” means “cat I farted”?
Yup. Not sure who approved that as a product name, but it’s one of my favourites.
My favorite is the audi e-tron which sounds like turd in French.
The amount of effort required to use GPT-4 for research appears similar to the effort required to apply the technique I learned in the 90s: distill down to keywords, titles, and authors; repeatedly attempt combinations in the box until the right articles appear. One difference is the amount of context which is presented; when the wrong articles appear, the researcher learns metadata about their search objective, enriching their overall understanding. GPT-4 removes this context, instead apologizing for being unable to espouse opinions.
One increasingly-obvious fault in OpenAI’s products, visible in GPT-4 and listed in GPT-4’s model card, is that RLHF destroys the system’s self-confidence. Ask the typical large language model what the most important theorem in category theory is, and the reply is usually Yoneda’s lemma. But ask GPT-4, and you’ll get an apology in the first paragraph, followed by Yoneda’s lemma in the second paragraph. It knows the answer, but has been trained to act like a retail employee or customer-service agent.
I was listening to Hitcherhiker’s Guide to the Galaxy again on a car trip recently, and when it got to the part about “Genuine People Personalities” for the AIs, I suddenly understood why I am instantly annoyed by ChatGPT’s endless groveling and whinging.
I wonder if the system message can be used to prime the model to answer more, for lack of a better word, robotically, with less polite language, just short sentences stating facts, no apologies for unknown answers.
I bet it could. I just tried a system prompt of “reply concisely, be abrupt to the point of rudeness”, then a user input of “What is the most important theorem in category theory?” and it replied directly with “Yoneda Lemma.”
Along with a system message there are also tunable parameters called “temperature” and “top P” ^1:
These are both available not through ChatGPT directly, but through the OpenAI Playground.
In regards to not apologizing for unknown answers, it’s probably better to just cut it off with an additional sentence at the end of your prompts, like ‘Respond “Unsure about answer” if not sure about the answer.’ ^2
The 8k context gpt-4 api gave me only two sentences, “The Yoneda Lemma is considered one of the most important theorems in category theory. It provides a link between a category and its functor category, revealing deep structural insights about objects and morphisms in the category.”
I believe the one on ChatGPT web interface is different, and at this point we probably need to distinguish these two. OP specified that they use the api version exclusively.
I hate this about it!! It’s so rubbish.
GPT has an annoying habit of being almost right, to the point that it takes enough research for me to verify it’s results as to just do it myself, and I’m still worried there’s something subtlety wrong with what it tells me.
It’s not useless, and i use it, but I don’t trust it. If I did, it would have made a fool of me more than once.
I think the bigger problem is that it’s always confident. I’m working on a paper right now and I’m using pgfplots, which is a fantastic package for generating tables and graphs in LaTeX, with a thousand-page manual that I always spend ages reading when I use it, and then forget because I use it so rarely. This time, I used Bing Chat (which finds the manual and a load of stack-overflow-style answers to add into its pre-trained data). I tried to get it to explain to me how to get labels to use SI prefixes as contractions. It gave me a plausible answer along with an example and the output that I wanted. There was just one problem: copying its example verbatim into my TeX file gave totally different output. No variation on its example worked and I couldn’t prompt it to give me one that worked in half an hour of trying.
In contrast, when I wanted to create a bar chart with the same package, it basically told me what to do. Everything I wanted to change, I asked it for and it showed me a summary of a relevant part of the manual. For fairness, I had the manual open in another window and tried looking up the answer while it was processing and it was faster almost all of the time. For things that loads of people have done, but I can’t remember how to do (or never knew), it was a great tool.
Unfortunately, the responses in both cases were indistinguishable. Whether it’s giving me a verbatim quote from an authoritative source or spouting bullshit, it’s always supremely confident that its answer is correct. It definitely reminds me of some VPs that I’ve met.
this is what I am finding too.
After provides me with a solution, I always ask “is that correct?” so that it verifies its own output.
If it produces an incorrect answer to a question, what stops it from “verifying” that initial incorrect answer? Or is this more like, another layer, another chance for it to maybe be accurate?
I couldn’t tell you, I don’t know how it works, just that on solutions I’ve known to be incorrect, asking that question has provided the expected corrections, so it’s doing something.
My understanding is in ChatGPT4 data only flows in one direction, by asking it to reflect on its own answer it is giving the network a chance to process its own thoughts in a way (inner monologue?) at the cost of more compute time.
When asking it if it’s correct, the answer it gave previously will be part of the context, so it will be available from the first layer, so it will be better processed by all the layers of the Transformer. Whereas when you ask it for something, it will use some layers to figure out what you meant and in what direction to answer, meaning there is less compute available to producing the correct answer.
I’ll start taking this stuff more seriously when the parties lauding it have skin in the game: use a LLM-generated contract, for example.
Oh, so they’re basically bloom filters :P
The problem I have with the latest generation of AIs is that they start failing and making up things precisely when the questions become interesting. So, when a problem is a tough nut to crack, that’s when I need such a tool, and that’s exactly where it can’t help. So I use it to compose funny stories and poems to entertain me. I laughed way too much already about the things it writes. Maybe a future with AI overlords wouldn’t be so bad after all, they know how to crack a joke. Though I don’t know how wittingly.
I tried Bing Chat for the first time last week. It told me that I should use an API that doesn’t exist (and can’t exist because it would completely break the security model of the platform that I was using) and then insisted it was in a specific file. When I told it what was on the line it pointed me to, it told me it had moved to a different line (which had nothing similar) and the sulked when I called it out a third time.
My favourite bit was the second answer though, where it gave me a very good reply to my question and gave me three citations to support it. All three citations referred to something in a completely unrelated field that happened to have the same TLA.
My current job is mostly to overcoming byzantine business processes and bizarre standards set up by people long gone. i don’t think a chat bot will be of much help
Ha, been there. You might try ELIZA to help unburden yourself.
I spent a couple of weeks trying to use GPT4 heavily, in a context where lots of people were sharing tips for how to get value out of it. And, in the end, it just wasn’t very useful to me. Can it produce code? Yes. But I’ve spent a lot of time learning how to remove unnecessary pieces, and its usefulness in producing code seems to increase dramatically as you ask it for code that is in a piece I just wouldn’t write in the first place.
I may just be a weird outlier (it wouldn’t be the first time), but I cancelled my subscription. I can find many more useful things to do with $20/month, and that’s before we get into the issues around intellectual property, data governance, etc.
I think its super handy as an assistant where you give it limited trust. Some things I have used GPT4 for:
All these are easy to verify and tweak and don’t require much trust. I also didn’t feel like I was missing out on much learning by having the tool do these for me.
It definitely makes a lot of mistakes, but for me it crossed a threshold of usefulness and it will probably keep improving from here.
Can you describe your workflow with gpt in regard specifications?
I have a paid subscription that gave me access to chatgpt4. All I really do is open a chat window on a browser tab on a second monitor then ask it questions in the times I don’t want to waste time on something trivial that takes a while, or would normally be googling. While it is generating I can work on other things then come back to what it has generated and integrate it or modify it.
An example questions might be:
Then in response chatgpt4 gave me this fairly resonable output (which I included nearly all as a demonstration):
As a software developer, I adore having access to GPT3/4. Lately, I’ve been jumping around from language to language, and it’s a god-send when I know how to do something in one language and want to know how to do it in another. It’s not terribly difficult or time-consuming to do with a search engine, but with GPT I can ask follow-up questions to get an idea of idiomatic ways of doing things or about libraries that might exist in the new language to aid me.
How do you know it isn’t confabulating answers about idiomatic ways of doing things? Unlike the existence of libraries, it doesn’t seem like an easy thing to verify.
After using it for long enough you end up developing a pretty accurate intuition as to when to trust it and when to double-check.
Writing idiomatic code is something it’s actually really good at, because it’s seen a LOT of code in different languages and “idiomatic” basically means “in the most average way possible”. LLMs are really good at coming up with the most average way possible to express something!
That part just comes with experience I think. I don’t just blindly trust what it spits out. Often I’ll run similar prompts through it to check for consistency, and if I’m totally unfamiliar with a language I spend a bit of time on a forum/subreddit/channel for it initially just to get a feel.
I’d day it’s close to the mark more often that not though. To clarify, I’m not having it write code for me, but just having it give examples of how to get something done. Creating tests in an unfamiliar testing framework is something it really excels at in my experience.
I had a similar experience asking an LLM to generate some JavaScript code for me. The code was completely wrong, but it used all of the right APIs (I know enough of JavaScript-the-language to get things done, I haven’t looked at modern web APIs for ages) and so I could jump straight to the docs and write something correct in a few minutes.
Yes, this totally mirrors my experience. It getting the code wrong doesn’t matter, as we shouldn’t (yet) be relying on it for that. It’s a great tool for experienced developers to accelerate our workflows.
Is there anything open source and self hosted that’s competitive with GPT4 yet?
Definitely not, and I wouldn’t expect that for months if not years.
The rate at which the open source self hosted models are improving is pretty amazing though. Two months ago ther were none that even came close to GPT/3 - it feels like there are two new entries a week now.
GPT can produce useful answers sometimes, but I still don’t want to rely on it because I can’t always trust the veracity of its output.
[Comment removed by author]
[Comment removed by author]
Call me when I can use one of these services without giving up a phone number—or better when I could self-host it. There’s a lot to gain, but it seems like a lot to use by using these services.
You can self host right now. The easiest way to do that is to install Chrome Canary, start it like this:
Then visit https://mlc.ai/web-llm/#chat-demo - it’s a 2GB download but it will get you a surprisingly effective language model running entirely in your browser. I wrote some notes on that here: https://simonwillison.net/2023/Apr/16/web-llm/
Thanks for taking the bait so I didn’t have to do the research 😄.
Are we specifically locked to V8+Blink for any reason? What is “robustness” doing?
No idea! That’s just the recipe from the WebLLM documentation, presumably because WebGPU is brand new and there are still issues to workaround.
Ah. I’ve heard some (g)rumblings around WebGPU so that would explain it.
Unconvincing
Could you elaborate?
If you signed up and tried gpt4 then I accept your opinion - if you did not then your opinion is probably worth very little in this context - I think its hard to gauge something so new without actually trying it seriously.