1. 7
  1.  

  2. 4

    The solution is one big giant yaml table? Why not stick with sql and have one big table there?

    1. 3

      Do the reasons outlined in the article not make sense to you?

      1. 2

        I don’t know a lot about Pokemon, so let’s consider a database of delicious Mexican foods.

        You’ve got a row in your database for burrito. But some burritos have steak and some have pork, so you join on the meat table. Except some burritos are vegetarian and don’t have meat. And some burritos are super, so you have to join on the guacamole yes or no table. And now your queries return all these “duplicate” rows.

        So you sit down and convert everything to yaml and instead of joining multiple tables, you just create an entry for “super steak burrito” and “pork burrito” and “plant burrito” and so on. Everything is so much simpler. Makes sense to me.

        My question is why that needs to be done in yaml. Why can’t I put “super steak burrito” and “plant burrito” in a sql database?

        1. 2

          My question is why that needs to be done in yaml.

          Ah, I see.

          And now your queries return all these “duplicate” rows.

          Not in the cases you describe! ? If there’s a 1:1 relationship between rows in two tables – even if one side optional (or both, for that matter –, then a join will not multiply rows. Where you get the problem is if you have something like a condiment table that lists steak, pork, vegetarian and “super”, and another table that links burritos and condiments, so that each burrito can correspond to multiple rows in the link table.

          Why can’t I put “super steak burrito” and “plant burrito” in a sql database?

          It’s not traditionally easy to put structured data in an SQL database without flattening it out into a relational schema. A number of RDBMSs have arrays, Postgres has hstore and json as native data types, but elsewhere you have a lot more manual work to do (such as manually encoding to JSON to put it in a text column or something (and then it’s excessively painful to query on it, etc.)).

          At that point you’re working at cross purposes to your tool to such an extent, I’d ask why you’re putting that into an SQL database in the first place…

          Of course, if you narrow the question to Postgres (or generally any RDBMS with structured column types), then I guess the answer is, yeah you could do that. But one of his other considerations becomes even more relevant: the YAML format is much easier for other people to use independently of the code he’s writing for it, as compared to not just a database dump, but a dump with structured column types.

          What might be quite sensible is to use the YAML form as fixtures to load into a database with structured columns – basically, asking “why pickle?” rather than “why YAML?”

      2. 2

        The eventual solution isn’t just one giant YAML table. The author seems to decide on having separate files for language-game pairs with distinct data (e.g. jp-gold, jp-silver, ww-gold, ww-silver, crystal), each file containing a giant YAML table. And then there’s a single separate file with “data specific to a set of forms” – it doesn’t need to be in the per-version files because it’s “extremely unlikely to ever change”.

      3. 2

        I’m pretty sure a decent MySQL schema would be faster than this YAML plan, and provide more power and flexibility.

        The current approach obviously has some problems, but from the description given it doesn’t seem like it should be that slow. If it is, then there’s some problem that isn’t being described clearly in the section on the current approach. Maybe it’s just the codebase itself or how the joins are being performed (possibly without indices on the appropriate columns? or the joins are being done on non-unique columns?)

        I don’t think the YAML approach is going to solve the problems being described, though. If anything it’s going to be worse in the end because you won’t have a powerful engine like MySQL trying to help you with optimizing the storage.

        1. 1

          Databases are extremely slow. Code that walks data structures in memory is orders of magnitude faster (easily 2 to 3, sometimes 6 or 7 – and even more seems plausible, but I’ve not heard real-world stories of that).

          The problem addressed by a DBMS is that you’d need to write the code that walks the data structure(s)… and design the data structures themselves for the query at hand, for that matter. Then even the slightest change to the query might require starting over from scratch. There’s also the question of how to update the data structure, given the way it’s walked. Etc.

          A “powerful engine” isn’t fast so much as it’s fast given the generality of access it provides. You give it a declarative description of what you want in SQL and it does its level best to satisfy that in a reasonable timeframe, right there on the spot, even if you never queried the data this way before.

          1. 2

            Most DBMS when properly configured will keep things in memory that are being accessed regularly.

            And you can supplement the DBMS with Redis or Memcached or whatever you prefer if you have some portion of your data that can fit in the available memory (up to and including everything obviously.)

            1. 1

              You still have a bunch of context switches and all the un-/marshalling and the query parsing and the planner and all the other related overhead in there. Keeping the data in memory helps a DBMS avoid roundtrips to disk, but it can’t erase the gigantic overheads you pay in total for the generality of access. Code shuffling pointers in registers with no context switches runs on a different performance scale entirely. To use a DBMS is to waste a huge amount of CPU cycles.

              And rightly so, most of the time. I’ll claim that the wasted performance does not matter whatsoever in Eeevee’s case. But to say that he is losing out on performance is… not terribly plausible. ? Loading the data may be slow, of course, because that does involve file access and a bunch of parsing and copying etc. And, of course, Eevee acknowledged that (“I’ll use pickle for caching”). But once the data is in memory, accessing it can’t plausibly be slower than using a DBMS.