1. 41
    1. 10

      Scrying definitely feels like a more accurate term.

      1. 2

        From a discussion elsewebs, comparisons to photography seem fairly apt – exploring and composing and arranging a scene, but also just a lot of luck and happening to be at the right place at the right time (or chancing upon the right seed, in the case of image synths.)

    2. 3

      It’s interesting to me that preserving the seed and adding/removing tags will still generate a similar image. I would have expected that something would cause the RNG to “diverge” at some point (calling .next() a different number of times).

      I guess it’d make sense if the seed is only used to generate the initial noise image that diffusion iterates on, and from there the process is entirely deterministic. Maybe that’s how it works.

      1. 6

        Yes. Only the initial noise is random, the sampling / denoising process is entirely deterministic. (I ported Stable Diffusion to Swift: https://github.com/liuliu/swift-diffusion).

        1. 1

          Wow. This is such a small comment with huge implications. Given the new apple silicon architecture, this is going to be really impressive once you get to packaging this up into an app. I haven’t built it myself (yet), but I’m thoroughly impressed that you ported it!

          1. 2

            Thanks! This is not that big of a deal ATM. Some people already ported the model using PyTorch -> CoreML conversion tools to potentially run in an app. However, I do believe my approach would be better for memory usage as well as new features that requires training (Textual Inversion / Dreambooth).

    3. 3

      Doesn’t seem any different from coming up with good search terms to get Google to find what you’re looking for? I notice I can often find things others fail to find. The only difference is better (in the sense that it gets the desired results; not that they are better in some more objective sense) search terms.

      1. 1

        Maybe a difference in magnitude, rather than kind? I’ve never had to spend more than a minute or two maximum finagling search engine results, but scrying good looking images out of Stable Diffusion can be an hours long process very easily.

    4. 2

      One other trick that can help reduce artefacts is try to generate a lower resolution image (256x256) and then run it through an upscaler several times.

      1. 2

        AUTOMATIC1111’s UI has a feature for this, bust still just using SD. Hi-res fix first generates at native resolution, then upscales (optionally even in latent space) and continues to add details with the prompt in mind.

    5. 1

      Why no underscore with “unreal engine” but underscore with other spaced terms?

      1. 2

        unreal engine giving better lighting is a property of the normal stable diffusion model underneath waifu diffusion. It’s complicated, but unreal_engine isn’t a tag that existed in the original stable diffusion dataset.

        1. 1

          Is there a way to determine what tags are available in your model file or can you point to where the initial tag set might be found online? That would probably be helpful in my experimentation…

          1. 2

            This is something you just have to learn the hard way by messing with values. There’s prompt engineering guides out there for Stable Diffusion, but in general this is stuff you just learn by typing in random words and seeing what happens. You can type in excessively stupid things and get a decent result.

          2. 1

            You can actually browse a version of the dataset used for training at https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2Fknn5.laion.ai&index=laion5B&useMclip=false

            What I recommend is searching around there and seeing what some of the captions look like.

          3. 1

            I believe this doc has links to an index of images-text-pairs in the training set: https://github.com/LAION-AI/laion-datasets/blob/main/laion-aesthetic.md

      2. 1

        1 token vs 2 tokens?