this looks cool, and kudos for using LMDB. The 35MB binary size leapt out at me, though — what contributes to that? (LMDB itself is only ~100KB.) Lots of stemming tables and stop-word lists?
I am 0.3 certain that at least one significant component is serialization code: rust serialization is fast, but is rumored to inflate binaries quite a bit. I haven’t measured that directly, but I did observe compile time hits due to serialization.
The Rust SDK is also a bit surprising. For instance, the fn to add documents to the index, add_documents() is async and therefore “returns” a future, but the future is itself a data structure representing “progress”, which seems redundant and error-prone. So in order to wait for completion of add_documents(), a wait_for_pending_update() (which has some arbitrary defaults, BTW) loop is needed instead of simply doing add_documents(...).await?.
I also don’t see any support of atomic transactions in contrast to e.g. Tantivy.
this looks cool, and kudos for using LMDB. The 35MB binary size leapt out at me, though — what contributes to that? (LMDB itself is only ~100KB.) Lots of stemming tables and stop-word lists?
Rust isn’t entirely svelte either. It doesn’t take too many transitive dependencies before you’re at 10MB.
How do you call yourself minimalist when you’re pulling in that many dependencies?
To be clear, I’m not the author. But if I were, this would come off more as a personal dig than a real question. Be kind. :)
Holy passive aggressiveness, batman :)
Remind me to avoid rhetorical questions in the future.
It depends what you compare it to. An Elasticsearch x86-64 gzipped tarball is at > 340MB https://www.elastic.co/downloads/elasticsearch
If anyone wants to figures this out, two tools to use are:
I am 0.3 certain that at least one significant component is serialization code: rust serialization is fast, but is rumored to inflate binaries quite a bit. I haven’t measured that directly, but I did observe compile time hits due to serialization.
My guess is assets for the web UI are packed into the binary
It’s inspired by it, sitting on top of it would be tricky given Lucene is a Java library.
Thanks. I cleared that up.
Another Rusty alternative is Sonic: https://github.com/valeriansaliou/sonic
It’s cool, but contrary to TypeSense, doesn’t support HA, although it’s high on the “under consideration” list.
The Rust SDK is also a bit surprising. For instance, the
fn
to add documents to the index,add_documents()
isasync
and therefore “returns” a future, but the future is itself a data structure representing “progress”, which seems redundant and error-prone. So in order to wait for completion ofadd_documents()
, await_for_pending_update()
(which has some arbitrary defaults, BTW) loop is needed instead of simply doingadd_documents(...).await?
.I also don’t see any support of atomic transactions in contrast to e.g. Tantivy.
Seems very interesting. I wonder how it compares to Xapian.