Abstract: “Relaxed Radix Balanced Trees (RRB-Trees) is one of the latest members in a family of persistent tree based data-structures that combine wide branching factors with simple and relatively flat structures. Like the battletested immutable sequences of Clojure and Scala, they have effectively constant lookup and updates, good cache utilization, but also logarithmic concatenation and slicing. Our goal is to bring the benefits of functional data structures to the discipline of systems programming via generic yet efficient immutable vectors supporting transient batch updates. We describe a C++ implementation that can be integrated in the runtime of higher level languages with a C core (Lisps like Guile or Racket, but also Python or Ruby), thus widening the access to these persistent data structures.
In this work we propose (1) an Embedding RRB-Tree (ERRB-Tree) data structure that efficiently stores arbitrary unboxed types, (2) a technique for implementing tree operations independent of optimizations for a more compact representation of the tree, (3) a policy-based design to support multiple memory management and reclamation mechanisms (including automatic garbage collection and reference counting), (4) a model of transience based on move-semantics and reference counting, and (5) a definition of transience for confluent meld operations. Combining these techniques a performance comparable to that of mutable arrays can be achieved in many situations, while using the data structure in a functional way.”
Author here. Nice to see this around, thanks!
Good work and welcome to Lobsters! Extra props for getting performance close to mutable state in a functional way. Last I saw make progress on such things was Lammich who made a tool for synthesizing imperative implementations from functional specs in Isabelle/HOL. They had C++-like performance. I think more work on high-performance replacements for or correctness-preserving synthesis of mutable structures or algorithms can only be a good thing.
Although your work is out of my area, I try to dig out good CompSci research to post on social media so it reaches builders who might apply it. Too much just unfairly fades away. I mostly post formal specification or verification work with a practical bent. I do keep an eye out for stuff like this, too, since we have a diverse crowd here. Mine are here. A few people post on distributed or database-oriented systems a lot. Some of those submissions might grab your interest given the persistence work.
Thanks a lot for the comments! Yeah there is a ton of interesting CompSci around here, I will keep following this community. Cheers!
I posted the video and Github versions of this work in my comments on the Xi editor [1]. The whole CppCon video is worth watching for background on the data structures. But I linked directly to the demo of a text editor using persistent data structures, which is pretty cool.
Here is a cool demo of async loading of big text files – you can navigate and I think even edit while loading:
https://youtu.be/sPhpelUfu8Q?t=1601
Using immer, Clojure-like immutable data structures in C++:
https://github.com/arximboldi/immer
The editor is a demo of the library: https://github.com/arximboldi/ewig
Copy of my comment on the Xi editor:
[1] https://lobste.rs/s/xwiyxi/xi_editor_for_next_20_years#c_tvqyol
That text editor was neat. Good to have a C++ library, too, for real-world adoption.
Super interesting! Out of curiosity, where do you find these papers? Do you read the journals, follow conferences or dig into CiteSeer? Thanks for sharing!
Sure thing! :)
I get results in specific fields with specific terms Ive learned over time. I combine them with quotes and filters in different ways in Google. I go 5-10 pages into results on some to combat rigged results hiding good papers. I also brainstorm ideas on how Id solve problems coming up with scores of designs. I turn those into specific keywords checking for related work that usually finds a bunch of it. Finally, I look at particularly talented (or obscure!) researchers’ publication lists since they often have more good work.
Those are main methods. Ive occasionally grabbed off journals, too, but usually is exploratory or targeted Googling. I was going to tell you what keywords or design ideas led to this one to help find more but I cant remember. Ran through too much stuff recently. Sorry. Occasionally toy with idea of making a guide to getting research out of DDG/Google with attempt at describing those techniques.
This is gold, keep posting :) That’s why I love the Internet: it allows for serendipity, and then gives you tools to systematise it. I was using CiteseerX extensively when I was looking for papers, but lately I’ve gotten nostalgic (or just old) and am reading books from the 80s and 90s. There’s really cool stuff like the Taligent OpenDoc architecture or an early version of the XSLT language that look much more like Scheme. It’s like a trip to an alternate universe.
Oh yeah. There’s some great stuff in them. Especially since some things really common now still had competition back then with interesting ideas. A quick example is how people are exploring weaker CPU’s for neural nets to cram more brains on same area of chip. That was what Hillis did for evolutionary computation when his Thinking Machines company took multiprocessing from a handful of CPU’s to 64,000 tiny ones in one machine. The obscure concept took a lot of time to come back.
Far as books, I originally learned my search techniques from a book: the American Library Association’s official guide from 1994 to finding information offline and online. Tells you how to use everything from trade associations to specialized databases (eg Moody’s) to boolean queries to get information. Although search features got simplified, the basic suggestions they gave me still pay off to this day. I think I got it at a thrift store or something. A side benefit was seeing the data broker market and problem coming a mile away since a few like LexisNexis were described in the book as good sources. ;)