In order to compare learned indexes with B-Trees, we created 4 secondary indexes over 3 real-world datasets: (1) Weblogs, (2) Maps, and (3) web-documents (…) [numbering not mine]
A more real use case for B-trees is ERP systems. If learned indices can handle the transactional volume of an ERP system better than B-trees, then maybe DBMS implementors should start paying attention…
That is, the only thing we need to do is to execute the model for every key and remember the worst over- and under-prediction of a position.
and
For example, a model without hidden layers can be trained on over 200M records in just few seconds if implemented in C++. However, for more complex models, we opted to use Tensorflow, which, because of its overhead, took significantly longer. Yet, we are confident that we can grid-search over the hyperparameter space relatively quickly, likely on the order of minutes for simple models.
It’s unusable for anything but static data sets, and if you have a static data set presumably you can do better than b-trees. They do also compare with a bad hashtable at the end and only barely beat it.
I was going to give you an answer in pseudocode but never mind. I made about three or four circular lists [1] in formal specs and code trying. Having forgotten coding, I forgot just how messy and fun those suckers were to try to describe in a clean, compact, and low-level way. I did two of three with most of best one being clean in node descriptions but still looked a little messy on one spot. I’m going to save that one as an exercise for near future. :)
[1] If it wasn’t what you had in mind, it was still fun to toy with anyway.
A more real use case for B-trees is ERP systems. If learned indices can handle the transactional volume of an ERP system better than B-trees, then maybe DBMS implementors should start paying attention…
I’ve yet to read the paper in full but the abstract is promising.
The abstract is promising but unfortunately:
and
It’s unusable for anything but static data sets, and if you have a static data set presumably you can do better than b-trees. They do also compare with a bad hashtable at the end and only barely beat it.
I have yet to read the abstract but your comment is promising. :)
I have yet to read their comment but your comment is promising. :) (quiz: what is the shape of this infinite linked list?)
data [contentOfUnknownValue] = [] | contentOfUnknownValue : [contentOfUnknownValue]
Got close but not correct. Hint, the shape is similar to something you fry eggs in. Hint 2, start from root comment and follow down.
I was going to give you an answer in pseudocode but never mind. I made about three or four circular lists [1] in formal specs and code trying. Having forgotten coding, I forgot just how messy and fun those suckers were to try to describe in a clean, compact, and low-level way. I did two of three with most of best one being clean in node descriptions but still looked a little messy on one spot. I’m going to save that one as an exercise for near future. :)
[1] If it wasn’t what you had in mind, it was still fun to toy with anyway.
Haha nice. Im glad you had some fun. I love programming, it is dangerous. Can be very addicting.
Howard Chu (OpenLDAP lead) has some rebuttal about the usefulness of this:
http://www.openldap.org/lists/openldap-devel/201712/msg00000.html