1. 74
  1. 24

    This was originally submitted yesterday. I jumped at the old title and wrongly removed the story. We occasionally get off-topic posts on how to take notes and I thought this was one more. We talked about it in the chat room a bit, I re-evaluated, and encouraged the author to resubmit.

    1. 4

      Was wondering where it went! Saw in the moderation log it was removed.

      I assume a lot of technology is about personal or organizational productivity, I assumed the story got the balance wrong though.

      1. 2

        As usual, good job on transparency. :)

      2. 8

        Nice work. I had thrown this in a background tab yesterday before it got deleted, and spent the morning marveling at your FUSE filesystem for your database. That is very cool and makes the project seem even more useful.

        1. 4

          I’m going to redesign it, because it has a basic flaw: the case of filename clash is not handled. I could append disambiguating identifiers to such cases, but then again I thought you might not always want to browse all your files at once.

          So maybe the basic workflow would be to mount a single Document and its files and have the previous behaviour secondary.

          Also, I want to redesign the schema with inheritance so that everything is Metadata, even Documents, and then you can attach metadata to other metadata and so on.

          1. 3

            Either seems like it could work well. On a decently configured system, if you could make the mounting of the document automatic then spawn xdg-open or similar to launch the reader from the UI, that seems like a slick way to handle it.

        2. 6

          This is really cool! It’s always nice to see how fairly abstract domains (a graph of documents meant for consumption by people) are mapped to concrete structures. I hadn’t seen that method of creating the reference index before and it’s really awesome.

          This does make me wonder though, has anyone put forth a spec for a Zettelkasten transfer format? It seems like there are new tools everyday that are actually unique and not just clones of each other. If everyone home-rolls their own format then trying new tools seems prohibitive. A consistent way to import/export would be nice. Some might balk at the sqlite dependency, but I can see this as a first step towards something like that.

          One suggestion: It might be better to have an explicit tag for a reference ID. This would stop conflicts incase someone decides to store hex in their note, or wants to include an ID w/o it being a reference. It also means if I build a tool ontop of this, I don’t have to explicitly query the database to know if a string is a reference. I imagine wanting to show references without extra queries would be common for tools built ontop of it, especially for visual tools.

          1. 4

            Cool use case. Though I’m not sure if the author really intends this to be accessed through the sqlite3 CLI via unwieldy SQL commands? More realistic would be to use this info to code the document model for some sort of program with a human-oriented interface.

            Regarding UUIDs as primary keys: there’s really no need for this unless you plan to be syncing records with other instances. Just use an INTEGER PRIMARY KEY or even the automatic ‘rowed’ column that all SQLite tables have. An integer is about 10x smaller, and faster and more memorable than a long blob! If you later want to do sync and need a globally unique ID, you can add and populate a UUID column later.

            1. 4

              I had this question as well. The link to the repo is a little hard to find, but shines light on this: https://github.com/epilys/bibliothecula#tooling

              1. 2

                Thanks for pointing this out, I will make the link more visible (it was only in the page footer)

                1. 3

                  This is a really great design. Thank you for making it.

                  1. 1

                    Thank you for the kind words, I appreciate it.

              2. 2

                Though I’m not sure if the author really intends this to be accessed through the sqlite3 CLI via unwieldy SQL commands?

                I have in fact started writing such a tool here. SQL is indeed unwieldy but it can also be easily scripted if you want such a workflow.

                Regarding UUIDs as primary keys: there’s really no need for this unless you plan to be syncing records with other instances

                I chose UUIDs for two reasons: I have been splitting and merging databases a lot (so one development reason) and also I thought someone might want to do namespacing with a deterministic UUID with versions 3 and 5 (a design reason). It’s absolutely as valid to choose a smaller integer instead (yeah UUIDs are actually integers but indeed they are bigger than whatever bit size sqlite’s INTEGERs are)

              3. 3

                I wrote something quite similar to this around a year ago. I wanted a personal wiki for my notes. I had been using Dokuwiki for at least a decade or so, but wanted something smaller, simpler, and with first-class Markdown support.

                I had started out wanting the pages to just be files on the filesystem for rock-solid future-proofing but between page history, indexing, full-text search, and all the corner cases involved I eventually discovered that I would basically be re-implementing Dokuwiki, which clashed with my goal of smaller and simpler.

                It turned out just INSERTing the documents into an sqlite3 database and indexing them with the built-in FTS5 feature was pretty trivial. The only other major thing I had to bolt on was a Markdown parser modified to understand [[wiki links]].

                I don’t have the code posted anywhere because it’s nowhere near respectable enough for public consumption.

                1. 3

                  I’ve patched the markdown-it.py parser to understand intra links for the web GUI I wrote for bibliothecula, in case you want to check it out for inspiration.

                2. 2

                  Am I the only one more than a little freaked by this throw away comment…?

                  “explore online using sql.js, sqlite3 compiled to webassembly”

                  1. 2

                    What about it freaks you out?

                    1. 4

                      My mental model of the world hasn’t quite caught up the world….

                      My mental model is still struggling to get past the hump of JavaScript is that irritating broken tacked on thing for web form validation which then got hugely abused to work around M$ stalling the development of web standards resulting in a terrifying pile of kludgery that are today’s web frameworks… which is quite a bit behind, “Hey you can compile a massive C program to wasm and run it in a web page.”

                      1. 5

                        You can do cool things with wasm. At my previous job we had a data processing engine written in C++ that ran on the server as well as in the browser. The js blobs where a bit big (20MB), but otherwise it worked well. The browser really is the new OS and wasm makes things possible that used to be very hard.

                        1. 2

                          Crudely simplifying, the 1st order approximation is that wasm is the new Flash, or a more successful take at NaCl. Most notably, in that AFAIU it’s a low level binary format like ELF or PE/COFF. However, compared to Flash, it has some important advantages IIUC: more standard and less company-owned (though I wouldn’t be surprised if Google or other corp does a power play at that), safer (reusing JS engines’ sandboxing & security engineers), and FFI with JS is native, and the default “runtime” (a.k.a. OS API) equals that of JS. The FFI makes it easy to piggyback on humongous heaps of existing JS code (caveat emptor as to quality), as well as gradually introduce into existing JS code. The default runtime being same as of JS makes the browsers now be literally OSes for wasm binaries; notably however, there’s also an effort to establish a second standard runtime, IIRC called wasi, more or less resembling a typical C runtime IIUC, to make it easy to use wasm as a cross-platform bytecode akin to JVM or CLR.

                          I’m curious if someone at some point designs an extension to RISC-V that would be aimed at directly executing wasm binaries, and if it becomes standard enough to make wasm be the de facto universal binary format; possibly thus commoditizing CPU architectures to irrelevance.

                          1. 1


                            I don’t know whether to be amazed and impressed or whether I should vomit on my keyboard.

                            1. 1

                              <shrug /> tool as any other. What I personally like however, is how it kinda undercut NaCl, which was a huge proprietary top-down engineering effort, by going through the kitchen door via a sequence of smaller evolutionary open-source steps, where each of them was actually beneficial to many people & projects. (Not that I mean small steps - e.g. creating emscripten was I believe one person’s “stupid hobby because-why-not project” initially, but definitely not a trivial one. If you don’t know what I’m talking about, how WASM happened was basically: (1) emscripten, (2) asm.js, (3) WASM.)

                              1. 1

                                And this train keeps rumbling on…. https://lobste.rs/s/1ylnel/future_for_sql_on_web

                                1. 2

                                  Ah yes :) however, consider now one more thing together with this, namely distributed/decentralized web (stuff like IPFS or dat/hypercore protocol). I have somewhat recently fiddled a tiny bit in the Beaker Browser 1.0, and most notably its integrated web content editor; I found the experience quite amazing and somewhat mind-opening; for me, it harkened back a bit to a touch of the experience at the early ages of WWW, when I was playing with website editors, excitement and playfulness pumping through my veins. Notably from perspective of my argument, some publicly available Beaker Browser websites already provide JS snippets for implementing a mutable & persistent guestbook, by virtue of hypercore’s JS distributed data storage API. Now, level this up with SQLite’s fully featured abstraction layer, and the seeds are planted for a decentralized internet, where your code (written in any language thanks to WASM) and data can be ~trivially replicated among heterogenous p2p nodes worldwide. One question that’s not fully certain to me is how can this be practically useful over current internet, esp. for non-techy people; but I guess possibly at the beginning of WWW, the future usefulness of it was also not fully fleshed out and clear.

                                  edit: I mean, ok, maybe that’s not the way it “ought to be done”; I guess in theory, instead of WASM in a browser, it “ought to be” some small, platonically idealistic abstract VM kernel, top-down designed & written by an order of enlightened FOSS monks financed by a fund of benevolent worker cooperatives, that’s quietly and memory-efficiently chugging along on a solar-powered RISC-V chip produced locally from fully-renewable materials. But until we’re there, what we have is what we have.

                                  1. 2

                                    This is a promising tech. The problem will to get enough users. Maybe if they managed to embed it in something like firefox so a sizable proportion of the ’net gets it out of the box.

                                    the way it “ought to be done”.

                                    I’d usually point at this quote….

                                    At first I hoped that such a technically unsound project would collapse but I soon realized it was doomed to success. Almost anything in software can be implemented, sold, and even used given enough determination. There is nothing a mere scientist can say that will stand against the flood of a hundred million dollars. But there is one quality that cannot be purchased in this way - and that is reliability. The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay. Tony Hoare

                                    But the people doing this aren’t rich.

                                    They’re “because we can” monks or punks, maybe ponks?

                                    But they are sacrificing reliability, because it sort of doesn’t matter. It’s at the serving up cat pictures level of usefulness. But if enough cat loving “because we can” ponks become peeved with not seeing their kitty, they can patch it into the shape of something that works.

                                    This quote still applies…

                                    There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. Tony Hoare

                                    Sigh. All strength to the Ponks!

                    2. 2

                      A small note: UUID structure is version-dependent. That is, if you use UUIDv1, v3 or v5, there’s inherent structure that helps ensure that there aren’t collisions between machines.

                      UUIDv4 is almost completely random with barely any structure. Functionally, there’s no difference between using randomblob(16). The structure is described in RFC4122. So essentially, all bits but bits 4, 6 and 12-15 are randomly set and the others just indicate the UUID version. Some versions (like python’s) just set the entire value randomly, ignoring the version bits.

                      If you want to use a UUID that does have some structure, you can use one of the uuid1(), uuid3() or uuid5() functions listed in the python uuid documentation.

                      1. 1

                        Somewhat confusing comment, because you say there’s no functional difference and then describe said difference. Python’s uuid4 uses a random value but sets the version bits in the UUID constructor. From a quick skim of the RFC I didn’t see it mentioning it’s optional (correct me if I’m wrong).

                        1. 2

                          Thanks for pointing out what I missed in python’s implementation. I was wrong about that.

                          The version bits are not optional afaik, however they add no additional collision-avoidance since they’re costants. A v4 uuid is equivalent to a 122bit random number, so idk what benefit the added complexity in the article adds vs just using randomblob(16). By “functional difference,” I definitely could’ve specified that I meant in terms of collision-avoidance, since that’s usually the property people want when reaching for UUIDs.

                          1. 2

                            I added the note to be informative, it’s not that using randomblob is wrong or less helpful in this case. As said elsewhere you can have rowids or INTEGER AUTOINCREMENT or any other identifier.