I don’t get the fascination about computer-generated content. I haven’t delved much in it but I’ve seen glimpses of the DALL-E pictures, before that GPT-3 texts, all the various this-whatever-doesn’t-exist.
I don’t get the interest. I only see empty attempts at mimicking conscience that ultimately fails to produce anything meaningful. I have the same feeling that I have when I listen to someone very good at talking without a purpose. Some people (especially politicians but not only) are very good at talking for a long time, catching the ear without ever really telling anything. It’s quite fascinating when you realize that the person has in fact only been using glue words, ideas and sentences but there’s no substance at all when you take out these fillers.
It’s what I see in all of this. We got to a point where we make computers to churn out filler content for our mind, but there’s no nutritional value in it. We’re taking out the human producer of memes and honestly I’m a bit terrified we’ll end up brain-deadly consuming content produced by things. What happens when the art/ideas/entertainment/… is made without “soul”? What is the point of all this?
I’m not very good at ordering and communicating my thoughts myself but I’m very scared already when my 12 years old son gets stuck scrolling short videos, the idea of taking the human “soul” out crushes my hope for the future of our species.
It’s fun. SO much fun. Getting DALL-E to generate a heavy metal album cover made of pelicans made of lightning? I found that whole process deeply entertaining. See also the “fantasy breakfast taco” game I describe at the end of my post.
I see these tools fitting in the category of “bicycles for the mind”. They help me think about problems, and they do things like break me out of writer’s block and help me get started writing something. In DALL-E’s case it’s an imagination enhancer: I can visualize my ideas with a 20s delay.
Aside from those, here’s a use-case you may not have considered which I tried recently on GPT-3. If you’re on disability benefits and a government office cancels them and you need to write a letter - but you don’t have much experience writing formal letters. I tried the prompt “Write a letter to the benefits office asking why my disability claim was denied”. Here’s the result: https://gist.github.com/simonw/6e6080a2f51c834c13b475743ef50148
I find this a pretty convincing attempt. I could do a better job, but I’ve written a lot of letters in my time. Give it a prompt with specific details of your situation and you’ll get something even more useful.
In DALL-E’s case it’s an imagination enhancer: I can visualize my ideas with a 20s delay.
I’ve been using the free VQGAN-CLIP[1] to generate things like “Moomins as Dark Souls in the style of linocut / watercolour / screenprint etc.” to give me interesting things to practice linocutting, watercolours, acrylic painting, etc.
Why do you expect it to have a vague notion of a “soul”? It’s a tool. It’s not an artist, it’s an automated Photoshop.
How do you feel about recorded music? Music used to exist only as a live human performance, and now we have soulless machines playing it. New music can be made with button presses, without fine motor skills of playing an instrument. Now we can create paintings without motor skills of using a brush.
To me DALL-E is a step as big as photography. Before cameras if you wanted to capture what you see, you had to draw it manually. With cameras it’s as trivial as pressing a button. Now we have the same power for illustrations and imaginary scenes.
Selfies have disrupted portrait painters, and this without a doubt will be disruptive for artists and photographers. Short term such commoditization sucks for creators, and the way ML is done is exploitative. Long term it means abundance of what used to be scarce, and that’s not necessarily a bad thing.
Basically - I think at most, things like this will serve two purposes:
Inspiration stuff for artists - look at some ideas, see new possible connections.
Pornography (not in the sense you’re used to, but “meets my specific thing”)/pot-boilers; it follows a situation of like “I want X character in Y setting”, and generates something mostly coherent, but in a bland way. But maybe that’s all someone wants…
Shitposting.
I think these right now are devoid of much “soul” for lack of a better term - it’s impressive they can follow the prompt, but it’s impassionate and at times they feel like they’ve been painted by someone with dementia.
Why is it seen as filler content? A lot of what the author ends up creating are descriptions that came from a person, and the AI is just doing its best to visualize it. There are thousands of instances of a description - why not generate them all?
I tell this to people from time to time that life is just state traversal… and AI helps traversing it
The internet is crammed with people showing off images they’ve generated with DALL-E - this is my attempt at that genre. I tried to include some useful tips I’ve picked up to make this more interesting than just a bunch of weird images.
You can also try out DALL-E mini, which is an excellent but confusingly named recreation of some of the ideas in the DALL-E paper, unaffiliated with DALL-E itself: https://huggingface.co/spaces/dalle-mini/dalle-mini
Damn, now I want my band to be named Pleny HLan… (FYI there is a real [instrumental post-rock] metal band named Pelican.)
As a nonexpert, I find this even more mysterious than GPT-3 because there seem to be two different, very difficult tasks glued together:
Recognizing the concepts implied by the prompt
Assembling the components of the imagery implied by the concepts.
Where by “concepts” I especially mean things like “lying on” or “album cover”, not visual objects like “pelican” or “dog”.
Though I recognize that the AI isn’t implemented as a concatenation of these tasks (right?) it’s just a big opaque multilayer neural spaghetti, the very exemplar of “(here a miracle occurs)” in the old S. Harris cartoon, or the “Step 2: ??” Of the underwear gnomes…
It kind of is a concatenation of those tasks. DALL-E is built on top of GPT-3 - it’s GPT-3’s language model that lets it turn “ceramic pelican in a Mexican folk art style with a big cactus growing out of it” into a weird blob of numbers that models the concepts and their relationship - understanding things like “growing out of it” is crucial to correctly responding to the prompt.
Two articles I found helpful recently for building a better model of how this stuff works under the hood:
I’ve been playing with DALL-E for a couple of days. I’ve been having a blast.
I can’t draw. I’m a writer and I have some need for crude art from time-to-time but finding useful, free stuff to illustrate a joke or something is a long, hard slog online.
I don’t do commercial content creation, so it works fine for me. More importantly, it allows me to quickly create something, anything, that generally conveys an idea and continue the content creation process. If I were doing something commercial, it would be enough of a start to flush out later.
I’m looking forward to having fun incorporating DALL-E into various creative workflows, see what happens.
I don’t get the fascination about computer-generated content. I haven’t delved much in it but I’ve seen glimpses of the DALL-E pictures, before that GPT-3 texts, all the various this-whatever-doesn’t-exist.
I don’t get the interest. I only see empty attempts at mimicking conscience that ultimately fails to produce anything meaningful. I have the same feeling that I have when I listen to someone very good at talking without a purpose. Some people (especially politicians but not only) are very good at talking for a long time, catching the ear without ever really telling anything. It’s quite fascinating when you realize that the person has in fact only been using glue words, ideas and sentences but there’s no substance at all when you take out these fillers.
It’s what I see in all of this. We got to a point where we make computers to churn out filler content for our mind, but there’s no nutritional value in it. We’re taking out the human producer of memes and honestly I’m a bit terrified we’ll end up brain-deadly consuming content produced by things. What happens when the art/ideas/entertainment/… is made without “soul”? What is the point of all this?
I’m not very good at ordering and communicating my thoughts myself but I’m very scared already when my 12 years old son gets stuck scrolling short videos, the idea of taking the human “soul” out crushes my hope for the future of our species.
I’m excited about this for two principle reasons:
Aside from those, here’s a use-case you may not have considered which I tried recently on GPT-3. If you’re on disability benefits and a government office cancels them and you need to write a letter - but you don’t have much experience writing formal letters. I tried the prompt “Write a letter to the benefits office asking why my disability claim was denied”. Here’s the result: https://gist.github.com/simonw/6e6080a2f51c834c13b475743ef50148
I find this a pretty convincing attempt. I could do a better job, but I’ve written a lot of letters in my time. Give it a prompt with specific details of your situation and you’ll get something even more useful.
I’ve been using the free VQGAN-CLIP[1] to generate things like “Moomins as Dark Souls in the style of linocut / watercolour / screenprint etc.” to give me interesting things to practice linocutting, watercolours, acrylic painting, etc.
[1] https://github.com/nerdyrodent/VQGAN-CLIP
Why do you expect it to have a vague notion of a “soul”? It’s a tool. It’s not an artist, it’s an automated Photoshop.
How do you feel about recorded music? Music used to exist only as a live human performance, and now we have soulless machines playing it. New music can be made with button presses, without fine motor skills of playing an instrument. Now we can create paintings without motor skills of using a brush.
To me DALL-E is a step as big as photography. Before cameras if you wanted to capture what you see, you had to draw it manually. With cameras it’s as trivial as pressing a button. Now we have the same power for illustrations and imaginary scenes.
Selfies have disrupted portrait painters, and this without a doubt will be disruptive for artists and photographers. Short term such commoditization sucks for creators, and the way ML is done is exploitative. Long term it means abundance of what used to be scarce, and that’s not necessarily a bad thing.
Basically - I think at most, things like this will serve two purposes:
I think these right now are devoid of much “soul” for lack of a better term - it’s impressive they can follow the prompt, but it’s impassionate and at times they feel like they’ve been painted by someone with dementia.
Why is it seen as filler content? A lot of what the author ends up creating are descriptions that came from a person, and the AI is just doing its best to visualize it. There are thousands of instances of a description - why not generate them all?
I tell this to people from time to time that life is just state traversal… and AI helps traversing it
The internet is crammed with people showing off images they’ve generated with DALL-E - this is my attempt at that genre. I tried to include some useful tips I’ve picked up to make this more interesting than just a bunch of weird images.
DALL-E has a slow moving waiting list, but GPT-3 is free for anyone to try out now: https://simonwillison.net/2022/Jun/5/play-with-gpt3/
You can also try out DALL-E mini, which is an excellent but confusingly named recreation of some of the ideas in the DALL-E paper, unaffiliated with DALL-E itself: https://huggingface.co/spaces/dalle-mini/dalle-mini
You can also simply use https://www.craiyon.com/ which has a similar model (also known as dall e mini)
From their FAQ: “OpenAI asked us to change the name of our app which quickly became viral to avoid confusion with their model.”
I’ve seen a ton of people who don’t understand that DALL-E and DALL-E mini are entirely different projects, so that change makes sense to me.
Damn, now I want my band to be named Pleny HLan… (FYI there is a real [instrumental post-rock] metal band named Pelican.)
As a nonexpert, I find this even more mysterious than GPT-3 because there seem to be two different, very difficult tasks glued together:
Though I recognize that the AI isn’t implemented as a concatenation of these tasks (right?) it’s just a big opaque multilayer neural spaghetti, the very exemplar of “(here a miracle occurs)” in the old S. Harris cartoon, or the “Step 2: ??” Of the underwear gnomes…
It kind of is a concatenation of those tasks. DALL-E is built on top of GPT-3 - it’s GPT-3’s language model that lets it turn “ceramic pelican in a Mexican folk art style with a big cactus growing out of it” into a weird blob of numbers that models the concepts and their relationship - understanding things like “growing out of it” is crucial to correctly responding to the prompt.
Two articles I found helpful recently for building a better model of how this stuff works under the hood:
I’ve been playing with DALL-E for a couple of days. I’ve been having a blast.
I can’t draw. I’m a writer and I have some need for crude art from time-to-time but finding useful, free stuff to illustrate a joke or something is a long, hard slog online.
Now I can think of something creative, have DALL-E draw it, and move on. Like so: https://twitter.com/danielbmarkham/status/1541455118725521409
I don’t do commercial content creation, so it works fine for me. More importantly, it allows me to quickly create something, anything, that generally conveys an idea and continue the content creation process. If I were doing something commercial, it would be enough of a start to flush out later.
I’m looking forward to having fun incorporating DALL-E into various creative workflows, see what happens.