1. 86
  1. 4

    Can anyone with experience recommend the best way to get started with this?

    1. 19

      I’ve spent a lot of time testing out stable diffusion over the past week, and I don’t think there is a single best approach to starting out.

      If you just want to try a few prompts with a relatively low barrier to entry, check out dreamstudio. Fair warning: they require phone verification to sign up, and you only get about 50(?) free generations with standard settings. I only used this for quickly testing out the AI to get an idea of how good it was.

      If you want to run it on your own machine, and you’ve got a decent GPU (need about 7GB VRAM), you can download it from the official repository. You can get the model from here; you do have to make an account, but it only requires email validation. You can install the dependencies pretty easily with conda. If you’re not familiar with the tool, you’ll have to download it, add it to your path and shell, create an environment from the environment.yaml, and activate the environment. This is kind of a pain, but StackOverflow has lots of useful debugging information. You’ll also have to point the script to your model file (check out the README for details). If your GPU has < 10 GB VRAM, you may need to reduce the precision of the model. I couldn’t figure out how to do this with flags, but you can edit the script (e.g., txt2img.py) directly. To reduce VRAM usage, use model.half() to return a half-precision version of the model. You can reassign model to this value shortly after it’s creation. Also, you may want to disable the safety checker and watermarking of images, which is fairly straightforward – none of the code is obfuscated in the least. If you need details, see this guide. You can then run scripts/txt2img.py or scripts/img2img.py (see the README for suggested flags).

      If you want to run it on your own machine, and you have at least some kind of Nvidia GPU, check out this fork. It will generate images much slower (about a factor of 10x for me, possibly slower with older GPUs), but it can run on just about any amount of VRAM (I’ve heard of users getting it working on their GTX 1060). I think the author achieved this by loading only part of the model onto the GPU to run, and subsequently moving it to standard ram while the next chunk runs.

      I’ve heard other users have been able to get stable diffusion running on AMD GPUs, possibly with some kind of CUDA compatibility software. I didn’t look into this because I only have a Nvidia GPU.

      This comment really only scratches the surface. There are literally over a thousand forks of the project to explore.

      1. 3

        There are also ways to run it on certain CPUs: https://github.com/bes-dev/stable_diffusion.openvino

    2. 4

      Reposting from a work chat about this exact article …

      This actually feels eerily reminiscent of discussions about machine code vs. assembler vs. high level languages in programming … it’s not “real programming” if you’re not programming on the silicon vs. programming is now guiding the machine tools to generate code.

      “It’s not programming art if you’re letting a compiler AI model handle the details”.

      1. 2

        It does seem to take a skill to generate good AI images. I don’t need to watch 30 hours of Bob Ross to acquire those skills. Skills like brush strokes & color choices aren’t really important with this new medium, but there’s clearly skills that I don’t yet possess. But I can also stumble into a semi-decent output without the skill every now and then too.

        1. 5

          Prompt engineering is a genuinely tough thing. You really have to think outside the box and it can sometimes require a few rounds of iteration with manually scribbling on prior outputs to get what you want. It’s not going to make artists go away any time soon. I use a lot of AI-generated art on my blog but I pay a lot of money to commission artists for things like the stickers you see on the blog.

          1. 1

            The phrase “prompt engineering” threw me for a bit, but I like it!

          2. 2

            Between the article and this comment, I now wonder if we’re seeing the dawn of “Poser, but for 2D art”.

            Which isn’t bad, since Poser is “good enough” for a lot of tasks, but hasn’t replaced professional 3D artists.

        2. 3

          I think this has a good commercial use case. Film set often requires quick iterations over story panels to decide how to construct a scene before even scouting location for the actual shoot. These story panels, previously would require artists to work hand-in-hand with the director to visualize the script, could now be semi-automated. This should help those folks to be able to iterate a lot faster!

          What an age to live in.