1. 16
    1. 1

      I’ve been playing with ollama in the last few weeks, these types of posts are encouraging. For play prompts, this is relatively fine.

      It’s slower then online things that are free (Claude, Gemini, chatgpt), and I’m only using the 7-12b models. Is that directly comparable - the speed I mean, would the 32b model be 3 times shorter then the 12b (or whatever the smaller qwen-coder is)?

      1. 2

        I’m not sure of the parameter count’s relationship to speed, but I suspect it’s directly related to size on disk and in memory. Size in memory also scales with quantization, like 8 bit to 4 bit, and on that dimension it’s not a linear relationship but it’s pretty close.

        I have been getting a feel by measuring for myself: ollama run --verbose qwen2.5-coder:32b (or whatever model) will print stats after each reply. mlx_lm.generate --model mlx-community/… --prompt … will too, by default.

      2. 1

        I ran the same prompt as the author, using the same model, but my Python script ended up using requests. Just the other day, I was talking to a friend about the risk of weird bugs caused by the fact that there’s a random component at work, so it was kind of interesting to see this.

        1. 1

          I also tried this model over the past couple days. I get about 9-10 tokens per second on Ollama (q4_K_M) or 11-12 with mlx-lm (q4_0), which I thought would be too much of an interruption to be useful. But I tried it today for SwiftUI which is a very dense, terse DSL, and since that’s short anyway it was worth the short wait. It’s also great to have a model that feels capable whose cost is simply expressed in my power bill.

          1. 1

            I was wondering if there a way to use an LLM like that to proofread and correct spelling/grammar of a entire document?

            1. 2

              I’ve had mixed results with LLMs for spell checking. They can do it, but I find they often miss things, maybe because they work at the token level and so some errors that are obvious to us aren’t as easy to spot using statistical analysis of a stream of partial word token IDs.

              I may just not have found the right combination of model and prompt though.

              1. 2

                Off the top of the head I’d try to get Google’s NotebookLM to do that

                1. 1

                  thank you! It does not seem to be able to do this.