1. 10
  1.  

  2. 3

    This note about his other project (that the robin hood hash table depends on) is quite curious:

    This robin hood hashing is implemented using my project Object Persistence In C (OPIC). OPIC is a new general serialization framework I just released. Any in-memory object created with OPIC can be serialized without knowing how it was structured. Deserializing objects from OPIC only requires one mmap syscall. That’s say, this robin hood implementation can work not only in a living process, the data it stored can be used as a key-value store after the process exits.

    Right now, the throughput of OPIC robin hood hash map on small keys (6bytes) is 9M (1048576/0.115454). This is way better than most NoSQL key-value stores. The difference might come from write ahead logs or some other IO? I’m not sure why the performance gain is so huge. My next stop is to benchmark against other embedded key-value store like rocksdb, leveldb and so forth.

    1. 6

      DBs have an API that comes with the cost of copying, transportation and serialisation. I just assumed the author was comparing their bespoke embedded DB and it may have not dawned on them the why.

      Lots of people get excited by DBfoobar and, like a library, use it to save writing themselves some otherwise boring code. After a while something nudges you into exploring building a tiny bespoke DB and you find (even if you are not a great programmer) that you get a 10x speedup for your efforts. After a while you realise that of course a generalised DB (infinite data width and multiple variable type) is always going to be significantly slower and that writing your own DB is really no big deal.

      Same happened for me when I was prodded that learning how to implement a trie for a prototype bytecode VM I was working on would not be a bad idea. My learnings actually made for a better bytecode that was tied to the underlying structure.

      I learnt something and things got better. Win win :-)

      1. 2

        I can’t say for sure since I didn’t see a link to OPIC but it’s unclear to me if OPIC produces portable files. On top of that, using mmap as your storage layer has a lot of durability issues. Or rather, no durability guarantees.

      2. 3

        I’m interested in this part:

        this project (OPIC including the hash table implementation) is approved by google Invention Assignment Review Committee as my personal project. The work is done only in my spare time with my own machine and does not use and/or reference any of the google internal resources.

        Does Google not allow you to spend your spare time making something unless it’s approved by a committee?

        1. 5

          Does Google not allow you to spend your spare time making something unless it’s approved by a committee?

          Basically, yes.

          (More accurately, you can spend your spare time making whatever you like as long as you don’t release it to the world and do give it to Google. The IARC is “only” required if you for some silly reason don’t want to do that)

          1. 1

            This is how it’s worked at my two latest employers as well. The way it was described to me was manifold:

            • Protect the company liability. If you release a project that violates some legal boundary and someone realizes that you work for Big Corp, it’s very hard to prove you didn’t use some of Big Corp’s resources in developing that project so Big Corp is a legal target.
            • Protecting you. If you develop something that is quite popular/makes you money, the company can claim that you used company resources to develop it and thus own some of it. Of course, they would be a giant dick to do that, but laws aren’t about people being dicks or not.
            • Making sure you aren’t possibly using domain knowledge you’re being paid to use at the company to further your private benefit.

            I think one problem with Google is they are so large it’s quite hard to work on a side project that does not use some of your Google domain knowledge so their committee is much more elaborate in deal with it. Where I have worked you make an amendment to your contract describing the personal project and the company waives the IP.

            1. 4

              “Protecting you. If you develop something that is quite popular/makes you money, the company can claim that you used company resources to develop it and thus own some of it. “

              That’s some funny sophistry on the part of who described it to you. It should be read as follows:

              “ROI for company. The company wants to make as much money as possible off your time and brains. Anything you release that might make you money down the line is something that might make the company money down the line. It’s better for the company if they own everything you do during your employment with any immediate or future value.”

              Also why I’d never take a position for Microsoft, IBM, Oracle, etc if I was doing R&D. Universities would at least give me a chance of getting my work out there for people to use.

            2. 1

              I think most places put this kind of clause in any contract you sign.

              I suspect those that think it is unreasonable probably are currently unwittingly under such terms, whilst those who do just get it struck out of the contract.

              I have mine amended to state if using any resources or time of the other party, they own it otherwise it’s mine.

              Most are okay with this, it is after all just part of the negotiations[1], those that are not are probably best steered clear of anyway.

              [1] remember kids, don’t be silly and think only to negotiate on money ;-)

              1. 1

                I think most places put this kind of clause in any contract you sign.

                Most places in the US maybe. It’s less common in Europe, and depending exactly where you are in Europe it may not even be enforceable under local employment law (which doesn’t stop companies putting it in there anyway).