In addition to the PG capabilities and extentions mentioned in the article,
I would like to add 2 more:
ltree – a built-in, totally free, extension to operate on tree structures
(we use this in production)
timescaledb – this is a 3rd party extension that allows PG to (not quite, but closer ) support, fast aggregation queries based on time buckets
To be honest, continuous queries (rather than aggregation queries)
is, in my view, a must feature for folks implementing the ‘speed layer’ in what used to be called ‘lambda architecture’
But what timescaleDB extension is offering, may be sufficient for many business scenarios, for which
continuous queries might be an overkill.
As a side note, there use to be a PG extension that did continuous queries (pipelineDB), but it is no longer
Is it no longer called this?
Wow that’s a fantastic project! Thanks for the link.
Don’t forget LISTEN/NOTIFY.
Something to be aware of is that LISTEN/NOTIFY is not usable with bouncer, which make sense. So, as soon as you want to use your database to handle more connections you will have some surprise if you relied on LISTEN/NOTIFY in your codebase. (True story!)
I wish asynchronous notifications were supported in the official JDBC driver. Right now if you want to use LISTEN from Java code, you either have to poll for notifications constantly (which increases notification latency and adds load to the server) or use a third-party driver that hasn’t hit its 1.0 release yet.
I don’t remember where I read this but I do remember reading about adding a middle layer between to better leverage LISTEN/NOTIFY, I haven’t used the feature myself so I have unfourtanly forgotten about source but if someone else is wiser I would appriciate input. If I recall correctly sage math switched from RDB to Postgres a couple of years back and utilised LISTEN/NOTIFY quite heavily, could be interesting to dig up.
The problem with PostgreSQL is the vast number of configuration options and tweaks that you need to make it work optimally. Postgres when it’s not your job provides an initial point, but once your performance needs are past this, you’re shooting in the dark, googling and randomly applying values while fishing through the vast (thankfully vast) documentation.
Isn’t this true for all datastores? Defaults for MongoDB, MySQL, MariaDB, Datomic, etc will probably always need some tweaking at some point. Although that point may take some time to reach. I really think it is impossible to provide “one size fits all”-solution as use cases and environments are way too different.
It may be true with all databases and stores, but the thing is that for certain workloads, certain defaults behave better. Which means, that if you want to write that database software X is underrated (or definitely your second choice) while writing about this, you need to offer more than the defaults. It does not matter if you’re telling me that hstore is there. Tell me how you tuned it in order to remove any other key store from your stack for example. Do not tell me that there is a JSON type. I know it, I read the release notes. Tell me what you did to remove a JSON store. Otherwise you’re still using it as a relational database, or with a workload that is light for the hardware you have at hand.
I’m very sensitive today about this, as I’ve spent the day testing Postgres 12 (which according to the release notes is plainly faster than 11 simply by upgrading) and I have not seen that assertion in our particular use case. And yes, I know our schema is not the best, this happens when “you need to move fast and not break things” but there I am now, facing certain hardware constraints trying to make tweaks until we migrate.
So it does not matter, at least to me, that feature X is there, unless someone has a story to share how they migrated to that feature.
I know this can be read like I am seeking free consulting. No I am not. I am just ranting to feel better. I am not even attacking the blog posts, or the people who wrote them. They do provide valuable service to the community.
I think, when you start hitting I/O or CPU constraints on your PG cluster(s), the easiest solution is throw more hardware at the problem, and it basically just goes away. You can do this for a LONG time usually, as machines that can run PG can get pretty big. But the more correct answer is, when you start hitting these limits, it’s time to get a professional into the mix. This is true for ANY data store, not just PG. Professionals tend to know the in/outs of the datastore tech, and how to tweak it for best perf with your use case.
Yes it has a lot of tweaks, and some of that you can test and do yourself, if you have a proper testing environment, but if you are part of the move fast and break things mindset, it’s probably easier to just reach out to a PG shop, and let them fix you up.
The other thing to do, even before you start hitting these limits, is to consider whether the assumptions embedded in your architecture are still valid. You might still need a single relational source of truth, but perhaps you can move some of the traffic off of the canonical data store, for instance. Maybe you’re misusing your database because that was the correct trade-off to accept, but it’s becoming less tenable.
Agreed, but for me that’s sort of an annual/bi-yearly type of review(time frame fungible depending on growth-rate), where you go through your architecture, and see what’s working, what’s not working, what needs tweaking/replacing, etc.
It just seems like something that rarely happens, and the costs of not doing it are compounding. Maybe I’m just a little sensitive at the moment because my team is struggling with a mountain of unmaintained systems that are full of horrible bodges and hacks that almost certainly made sense at the time.
Put it in your calendar for a year from now.. review your architecture/planning decisions and make sure they still make sense ;)
For my shop, I make it a point to do it once a year, in report format and send it upstream. It also lets us know where we should be focusing our energy, etc.
There is HyperLogLog type available for Postgres as well, and as @vladislavp said, there is TimescaleDB that provides fast aggregation of log data, so even in that 2 cases you probably do not need other data store (till you reach limits of the PG).