1. 15
    1. 7

      Embeddings and vector search are really cool and, honestly, I see more use in them than in LLMs themselves (which are still useful). Embeddings are pretty much the heart of RAG, and you need RAG if you want to make use of an LLM for anything more than a toy.

      1. 8

        While embeddings are commonly used to help implement RAG, they’re not actually required.

        You can build RAG systems that use regular full-text search instead. Take the user’s question, ask an LLM to extract search terms from it, then run those search terms against a search engine (like Elasticsearch or even SQLite FTS, which is pretty good) and dump the results into the context.

        This has some advantages over embeddings. First, it’s cheaper and easier to build - you don’t need thousands or millions of embedding vectors, you can just use a boring old FTS index. And secondly, it actually behaves better on some kinds of queries. Embeddings don’t tend to do well with exact phrase matches, and there are some RAG queries where exact terms (such as brand names) are more useful than fuzzy semantic search.

        Search-based RAG has been deployed at enormous scale - it’s effectively how Bing and Google Gemini search and Perplexity work.

        I built a FTS-based RAG system in this livestream here: https://simonwillison.net/2024/Jun/21/search-based-rag/

        Another option is to go hybrid: to combine both vector search and FTS search in the same query. Here’s an example of that using the sqlite-vec extension: https://alexgarcia.xyz/blog/2024/sqlite-vec-hybrid-search/index.html

      2. 4

        you can embed many more things than just text. Images, videos, graphs, in fact anything you can throw a neural network at.

        1. 2

          Yes I thought about mentioning multimodal models. Was trying to not throw too much at once to my fellow technical writers who are encountering these concepts for the first time. Also figured that they’d be most interested in text. But we do operate with images quite a lot, and audio sometimes (e.g. meeting recordings). So I will add a footnote on multimodality. Thanks

        2. 2

          Embeddings are foundational and provide massive utility. People in industry know this and don’t undervalue them.

          1. 6

            My experience is that the vast majority of professional software engineers, including those in industries that could benefit from embeddings, haven’t figured out what they are or why they are useful yet.

            They’re very unintuitive in my opinion. Turning arbitrary text and images into a 768 long array of floating point numbers is a weird thing to do. It’s also weird how they show you what’s “most related”, but they’re no good for seeing what isn’t related - it’s very hard to pick a threshold and say “ignore anything that’s further away than distance X”.

            I gave a talk about this a year ago, where my pitch was pretty much “this is why it’s worth figuring out what thes things are”: https://simonwillison.net/2023/Oct/23/embeddings/

            1. 2

              Anecdotally, I’ve had the opposite experience. Software engineers I work with are pretty tuned in here.

              1. 4

                This piqued my curiosity that maybe my expectations of my fellow developers were out of date, so I ran a poll on Twitter (I have enough followers there that this is a somewhat meaningful sample size): https://twitter.com/simonw/status/1849679967669674402

                692 responses, 55% said “I know how to use them”, 21% understood them but had not used them, 21% had heard of but didn’t understand them and only 3% hadn’t heard of them at all.

                My Twitter following leans heavily into the AI-curious, but I’m still surprised by the results. Clearly embedding a are a lot less obscure now in October 2024 than I had expected!

            2. 4

              I thought about hedging the title with “among technical writers” (i.e. “embeddings are underrated among technical writers”) but figured that it would be sufficiently clear given that:

              1. the domain name of the site is technicalwriting.dev

              2. I mention in the post that the content is geared towards technical writers (TWs)

              Among TWs I will rashly assert that we are indeed collectively sleeping on the potential of embeddings.

              Sorry if I calculated wrong though and the title came off too strong or inaccurate or whatever

              1. 2

                Fair enough.

            3. 2

              I’ve been poring over Sphinx docs a lot over the past few months and am not sure that the relationships this found matches the relationships I naturally fall into. Like it maps basics to directives, but not domains or roles, or directives and roles to each other. Something important it did catch is the link between configuration and latex. I can think of two things that would make this really useful:

              1. Can you use this to find related sections? So not just relate latex to configuration, but to configuration#options-for-latex-output or (even better!) configuration#confval-latex_elements.
              2. The killer app, IMO, would be connecting Sphinx docs and docutils docs. Connect SphinxPostTransform to Transforms, because Sphinx sure as heck doesn’t explain what they are!
              1. 1

                Yes, totally trivial to modify this to generate per-section embeddings and recommendations. I have done it before. One of the many reasons I love the Sphinx extension system.

                Agreed that it’d be very valuable to unite the Sphinx docs and Docutils docs more seamlessly, thanks for the suggestion.

                I can also follow up with line-by-line stats on how many page-level recs seem reasonable and which ones don’t. Would be good to have detailed benchmarks. Thanks for the comments and for reading