1. 7
    1. 1

      A common misconception about RAG is that it needs to be implemented using embeddings and a vector index of some sort.

      Backing RAG with a full-text search engine, like the one built into SQLite - is a perfectly valid strategy. It’s actually similar to how some of the largest deployed RAG instances work: Google Gemini, Bing, ChatGPT Browse, Perplexity all work more like traditional search than vector indexes.

      1. 1

        misconception

        Of course this is not a binary type of statement; in some setups you need to balance recall with throughput.

        1. 1

          Work nudged us to try Copilot, and one experience with it made me think access to the guts of RAG could be a separately useful thing.

          Specifically, I used a pretty generic term that has a specific meaning in our code, and the completion included a reasonable-for-our-context definition of it. My guess is that we had written a definition of the term that was in context; a way to jump to that source material might’ve been at least as useful as the completion, because I might’ve been able to copy more from there!

          LLMs don’t automatically output what part of the context they used, of course, but there’s always naive approaches like searching the context for words used in the output, and perhaps something more sophisticated can be done by reaching into the internals, like attention visualization does.