I just used this the other day to seed some data. Very handy, and as always, I really enjoy your articles.
Thanks! And agreed it’s one of those small little things that isn’t always needed but when you do need it very very handy.
It makes me so sad that I have to learn a different set of tricks for every DB.
Just learn the tricks for Postgres then you don’t need to learn any other DBs :)
Seeing that Postgres makes working with JSON a completely seamless experience, I don’t see why you’d pick anything else. It’s best of both worlds. Make relational tables when you have relational data, store documents when you have documents. Meanwhile, Citus addresses the common complaint with scaling Postgres quite nicely.
Thanks for the mention. Us at Citus are also biased towards just starting with Postgres and not complicating things early on. And then today we just shipped some tooling to make it easy to move from existing Postgres (such as RDS) directly into Citus with essentially no downtime - https://www.citusdata.com/blog/2017/11/16/citus-cloud-2-postgres-and-scale-without-compromise/
Fantastic work! :)
Wow Citus is cool thanks for sharing! I had not heard about it before.
Are there many folks who migrate from Dynamo to Postgres? I mostly hear about this (and have only participated) the other way around.
I think it all depends on the company and your use case, but I’ve observed an increase in it. https://containership.engineering/dynamodb-to-postgres-why-and-how-aa891681af4d is just one example, but there’s a few others that pop up from time to time and probably more that don’t get talked about.
SELECT master_create_distributed_table(‘stores’, ‘id’, ‘hash’);
SELECT master_create_distributed_table(‘stores’, ‘id’, ‘hash’);
I know you want to avoid joins crossing server boundaries, but this config is a mistake. The problem with this is that sharding at the account level means you can’t scale any customer past the size of your single largest server (and you’ll probably have to jump through hoops to assign and migrate them to that server and keep other customers off). It’s an interesting coincidence that you chose this domain for your example because it looks like Shopify, which had exactly this problem.
Indeed you are then limited to how large of an instance you can scale up a single customer on, but for many this is a perfectly reasonable bound. As for how do you move them to their own server we have some functionality specifically for that (https://www.citusdata.com/blog/2017/03/15/a-look-at-isolating-tenants/). It does admittedly come back to how large your overall dataset is likely to be, then how large each customer is likely to be. If all your data across all your customers is 10 GB then of course sharding makes no sense at all.
This is great thanks! Is there anything you’d say specifically if someone asked you why to chose Postgres over MySQL? Let’s say they’re using AWS and not Citus. I had a hard time making an argument for Postgres at my current job where basically all the SQL databases are MySQL. In my case it’s for analytics and upserts are really useful which at the time Postgres didn’t have. HyperLogLog is also a really useful feature but AFAIK you can’t add the extension on AWS RDS.
For me the biggest thing is around flexible datatypes (arrays, hstore, range types, and JSONB) along with the corresponding indexes that can use used with them like GIN and GiST. And then truly extensions, even if not using HyperLogLog you might need geospatial support via PostGIS, maybe full text search, maybe foreign data wrappers, and there’s a much longer list.
If memory serves, there are more subtle ways to leverage indexing on the jsonb type. It may not be best perf-wise to index all the things.
Yes, you definitely can add any sort of functional b-tree index on any field in JSONB. For most cases, I’ve found that creating a GIN index gives you quite a bit of flexibility without having to think too much ahead of time on what you want to index. If you’re using JSONB in production and there are only a few keys you do want to pay attention to them that approach should work fine, if it’s less defined a GIN index is helpful.
I would be rather pleased we got some more PostgreSQL stuff on here :)
A shameless plug, but for those that want a regular flow of Postgres news - http://www.postgresweekly.com
This is a lot of anti-Rails FUD, I’m kinda surprised to see it upvoted here.
I personally have stopped writing Sinatra apps for everything but one to three page applications; I always end up either re-building two thirds of Rails poorly by adding tons of gems, or just straight porting the app to Rails as it grows.
I generally wouldn’t view this as anti-rails by any means, in fact we still have a large amount of rails applications that do exist at Heroku. The difference is where we formally had a rails application that handled a massive amount of things we’ve pulled it apart to have an API at the core, and multiple supporting rails applications at the edges. This allows us to move faster and have a cleaner separation between each application than before.
anything that makes the development environment different than production is a disaster waiting to happen
So you run the exact same hardware, OS, all identical versions?
you’ll have to know what IP spoofing is, and how to address it.
This is a pro for Rails, not anti. You don’t have to know about how it works, Rails just takes care of it for you.
Great for those of you trying very hard not to design an API.
Silly, silly FUD. Tools to help you parse things quickly don’t mean bad design.
THE RAILS ROUTER
This is just random statements of preference with nothing backing it up whatsoever.
You also go through zero of the things in the ActionPack layer, all those are super big advantages.
To be fair to Sinatra…
How do I separate my views in different folders?
This is super easy. erb :"foos/index".
How can I protect my app against CSRF?
Other random things:
while also adding pretty substantial dependencies like qt.
You don’t need to add qt to do full browser testing in Rails.
Getting asset pipeline for designers doesn’t come without a lot of backend model adjustments from engineers.
I don’t even understand what this means.
making JSON the default format becomes a nightmare when you have to take browsers into account.
You do know that Rails does a lot of stuff to detect awkward Accept headers and make them better, right?
Code bloat: Rails doesn’t help here
Neither does Sinatra.
the nature of a pure API app is radically different than the one of a web app:
Both of these statements under this heading are silly, and wildly unsubstantiated.
Anyway, basically, I found this post to be a near-incoherent ramble of wild statements, hence ‘fud.’ Rails is CERTAINLY not perfect, but I don’t think this post does a good job of pointing out the flaws.