1. 13
  1.  

  2. 7

    Keep in mind that the overhead of calling a user-defined function in a hotpath (say, as a join condition) can easily exceed the cost of the function itself. For example, in my rust program I wrote a hardware-accelerated (avx2) hamming distance function to calculate the distance between two blobs but the overhead of SQLite calling into my rust code for a self-join of just 100k rows (where t2.rowid > t1.rowid, even) takes hours to run but retrieving those same values upfront and then calculating the hamming distance with that same function in-memory/in-app (not in SQLite) finishes in seconds.

    1. 5

      (Copying my last comment in case someone finds it useful)

      I have a template repository for writing your own sqlite extension in rust: https://github.com/epilys/vfsstat.rs

      This uses FFI and no dependencies (I have log, but it’s not necessary).

      EDIT: also, if you define the regexp function as plain regexp, this overrides the regexp operator in sqlite allowing you to use a custom implementation. This was necessary some versions of sqlite before which didn’t ship with a regex implementation, Example: https://github.com/epilys/buke/blob/90f79ac49788848ca28a91ccbca9bf7217201bcb/src/db.rs#L169-L201

      1. 2

        Author of the linked post here!

        I remember looking at your repo at some point. I’m not very experienced with C (or systems programming in general) so I decided to stay in Rust land as much as possible. I’m now trying to learn more about potential approaches to implement a custom SQLite VFS, and your repo will likely be useful.

        1. 1

          Glad if it can help. My approach was 100% FFI based but that’s not necessarily the only way. Maybe there’s room for “safe” library wrappers that abstract all this CCI setup dance.

        2. 1

          This was necessary some versions of sqlite before which didn’t ship with a regex implementation

          Hmm… the SQLite docs say:

          No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message.

          What versions of SQLite ship with a regex implementation?

          1. 1

            https://www.sqlite.org/changes.html

            2021-06-18 (3.36.0), they started shipping their implementation by default on CLI builds:

            The REGEXP extension is now included in CLI builds

            1. 1

              Ah, I see: it’s include in the command-line tool, but not available (by default) for library usage:

              $ sqlite3 :memory: "select 'foo' regexp 'o'"
              | 'foo' regexp 'o' |
              |------------------|
              | 1                |
              $ python3 -c "import sqlite3; c = sqlite3.connect(':memory:'); c.execute(''' select 'foo' regexp 'o' ''');"
              Traceback (most recent call last):
                File "<string>", line 1, in <module>
              sqlite3.OperationalError: no such function: regexp
              

              That makes SQLite more convenient for interactive use, but it’s a pain if I want to distribute data in SQLite-format databases, since I can’t assume the target application will have a matching configuration of SQLite.

              1. 1

                Yes, if you control the application you can enable the regexp extension by yourself in the sqlite build, or, if you rely on another libsqlite distribution you can override the regexp everytime you start the application, e.g. in python, it’s easy to plug in re module’s search as a user defined function with a wrapper, see for example how django does it (ctrl + f ‘regex’)

                https://github.com/django/django/blob/e30d6678421b7573a1995f61521f14348c9b2a17/django/db/backends/sqlite3/_functions.py#L41