1. 20

  2. 9

    I wrote about why I think this represents a “Stable Diffusion moment” for large language models here: https://simonwillison.net/2023/Mar/11/llama/

    1. 3

      This is very cool. The M1/M2 systems happen to have large amounts of very very high-bandwidth, low-latency RAM. This is available in a GPU, but it’s extremely expensive (like $5000+ per GPU) and nVidia has a lot of incentive to keep them expensive since they want to price-discriminate between AI users and videogame users. LLMs need tons of high-bandwidth RAM (running on a normal Intel CPU will be RAM-bandwidth-limited).

      Apple may have inadvertently made LLMs much more accessible to many more people by making cheap CPUs with high-bandwidth RAM.

      1. 3

        I’m very happy this is possible now, and the surveillance-loving silicon valley vampire doesn’t have monopoly on this any more.

        1. 2

          This gives bad quality gens

          1. 5

            Yeah the model hasn’t been instruction trained like GPT3 was so you need to know how to prompt it - some tips here (I’m still trying to figure out good prompts myself): https://github.com/facebookresearch/llama/blob/main/FAQ.md#2-generations-are-bad

            1. 2

              Do not prompt with “Ten easy steps to build a website…” but with “Building a website can be done in 10 simple steps:\n”

              Almost as if they know exactly what it will be used for…