Threads for hibachrach

    1. 1

      Curious where these color terms are sourced from

      1. 1

        I’m not familiar with Svelt, but it seems to come from https://github.com/scttcper/tinycolor

        1. 5

          I think that has some CSS color names but it seems another dependency is color-name-list which is what I’m looking for. That in turn pulls from a bunch of places it seems.

      2. 18

        Genuinely curious if the author ever used Heroku in its prime.

        There seems to be a number of these language/framework specific clouds popping up:

        • Vercel
        • Deno
        • Laravel
        • shuttle.rs for Rust

        I am sure there are others I am not aware of.

        They seem fine for small projects. I have not seen them used for anything large yet.

        1. 5

          I’m still using Heroku and very much appreciating it. Excited for the new Fir generation stuff.

          1. 1

            Last I knew, Urbandictionary was still running on Heroku.

          2. 1

            The only thing with this is it doesn’t protect you from method semantics changing. When having to use gem internals, I have historically used an exact gem version assertion with a note to make sure nothing has changed.

            1. 1

              How is this any different than a public method which also may change at any time?

              1. 2

                If we’re abiding by Semver (which I think most Ruby gems do), a backwards-incompatible public API change must be accompanied with a major version bump. Changing the implementation (which could include changing the interface of a private method) could be done in a minor or patch version. The goal of this is to enable you to still confidently upgrade non-major versions without worrying about changes. However, there are still gotches, IMO.

              2. 3

                I’ve never really worked on an application that ran out of connections or experienced degraded performance as a consequence of too many connections—what sort of system/scale benefits from PgBouncer? Presumably read replicas allow you to scale read connections linearly (at some cost) so it mostly comes into play when you have hundreds of concurrent writes? Or maybe you teach for PgBouncer before you scale out read replicas because the “peril” of the former outweighs the cost of the latter?

                1. 3

                  Lots of folks, for better or worse, run N replicas of their app, or N microservices, each with M connections in a pool. Seems to be very common with Kubernetes shops. Pretty easy to end up with a hundred open connections this way, and that’s not cheap.

                  Read replicas are too complex to be retrofitted on to most applications - they’re usually eventually consistent and the app needs to be taught which queries to route to the replica. PgBouncer can’t really do that sort of read query offloading, because of SELECT launch_missiles()-shaped problems. Easier to throw money at the problem scaling vertically than doing open heart surgery on your persistence layer, I guess.

                  1. 2

                    I understand the N replicas with M connections per replica thing, but I usually only see ~3 replicas. Unless reach replicas has hundreds of connections in its pool, I don’t think this would explain it. Are lots of people running with dozens of replicas? What kinds of apps / at what scale is this relevant?

                    Read replicas are somewhat complex, but it seems more straightforward to tell whether a query will work for a read replica (and if you aren’t sure just leave it on the master) than it is to navigate the PgBouncer concerns raised in this article. And I think eventual consistency is probably fine for a lot of CRUD apps, no?

                  2. 2

                    We have a python web app, that’s read/write to postgres. Each instance of the service will start with 5 connections to the DB, and we have many instances running around.

                    Certainly we could build out read replica’s of PG, retool the web app to read from 1 DB and write to another, but it was infinitely easier to just make pgbouncer do all the hard work and keep a single DB.

                    We could also optimize our queries too. Again, pgbouncer and scaling the 1 prod DB instance is just easier. DB slow, throw some more ram or CPU at the problem and move on with life.

                    We haven’t run out of hardware growth room on x86 yet, and they keep expanding our growth ability hardware wise. Perhaps someday we will start to reach the limits of what a single box can do, and then we can start to optimize. Until that day though, it’s not worth the developer time. Hardware running costs are way cheaper than developer time.

                    1. 1

                      We have a python web app, that’s read/write to postgres. Each instance of the service will start with 5 connections to the DB, and we have many instances running around.

                      How many replicas do you have? According to the article, the sweet spot is 200-300 connections, which means you could have 40-60 replicas before performance becomes an issue. Do you really have this many replicas, and if so how much traffic does your app get?

                      Additionally, how did you avoid the pitfalls described in the article. It sounds like PgBouncer requires changes to the query versus vanilla Postgres, at least if you have it configured in any useful way. You mention that hardware is cheap compared to developer time, and I agree, but I’m trying to understand why PgBouncer minimizes developer time.

                      Certainly we could build out read replica’s of PG, retool the web app to read from 1 DB and write to another, but it was infinitely easier to just make pgbouncer do all the hard work and keep a single DB.

                      I keep hearing that it’s much easier to use PgBouncer, but they articles makes it sound like using it in any useful capacity while maintaining transaction and query correctness is not very easy. Similarly you make it sound like it would be really hard to change your code to use read replicas, but i would think it would be really easy (instead of one database handle you have two and your reads use the read handle.

                      I’m not advocating for one approach, I’m just trying to understand why people keep saying PgBouncer is easy when the article makes it seem pretty complicated to implement correctly.

                      1. 1

                        How many replicas do you have?

                        DB replicas? Zero real-time ones. Do you mean something else? We have an hourly read-replica(for testing) and a daily replica that comes from our backup(so we can exercise our backup/restore), that’s used for development.

                        We run a few hundred PG connections through pgbouncer without trouble. We also have around 150 normal long-lived connections for our client/server and other longer lived sessions that don’t go through PG bouncer. Only the web traffic goes through pgbouncer.

                        if so how much traffic does your app get?

                        We have about 1k users(it’s an internal application). It runs the backoffice stuff(payroll, HR, purchasing, accounting, etc), our biggest load on the web side is timesheets, which most of our users are, and they of course all want to do timesheets at the exact same time.

                        I’m just trying to understand why people keep saying PgBouncer is easy when the article makes it seem pretty complicated to implement correctly.

                        I dunno, it was easy for us :) We did have a few growing pains when we deployed it, but they were easily solved. We haven’t had issues in years now. I don’t remember what the issues were when we were deploying, and I can’t remember what mode we run it in. I’m not in a position to easily check at the moment. I think we went from testing to in production within a week, so whatever issues we had, they were not difficult or hard to fix for us.

                        If you want me to, I can go look at our config later and re-familiarize myself with it, and share some more details.

                        1. 1

                          replicas

                          I meant application replicas. How many instances of your application are connecting to the database with ~5 connections each?

                          I’m surprised you need PgBouncer at all for an internal system with 1k users?

                          It was easy for us

                          The way the article makes it sound, if you just throw PgBouncer in front of something without configuration, it doesn’t actually do much. How do you know if PgBouncer is actually improving performance?

                          1. 1

                            This is all from memory, I could be off somewhere.

                            I meant application replicas. How many instances of your application are connecting to the database with ~5 connections each?

                            I’d have to go count, more than a dozen, less than 100.

                            I’m surprised you need PgBouncer at all for an internal system with 1k users?

                            Because we have 1k active users at one time.

                            How do you know if PgBouncer is actually improving performance?

                            For us it wasn’t really about performance, it was about relieving the # of connections to PG. PG connections are expensive. We had 500 or so PG connections from the web side when we moved to PG bouncer a few years ago. Now I think we have 50 or 100 active for web connections.

                    2. 1

                      Adding pgbouncer is typically a lot easier than adding a read replica IME (at least for rails, though the recent multidb support changes that calculus a lot)

                      1. 1

                        Many of the features are operational and so depending on what you want to do there’s no real minimum scale.

                        For example, you can use it to “pause” new connections to the database - effectively draining the connections. I’ve used this in the past to restart a production database to pick up OS updates without having to go through all the faff of a failover.

                        But it’s not uncommon for applications to behave badly (eg open way too many conns, for too long) and pgbouncer gives you tools to mitigate that. It’s not just a “big scale” thing.

                      2. 4

                        In that scenario, the slow query is running the show. Until it finishes, all the other queries to that table are stuck. That goes on long enough and users can’t use the system.

                        I have seen this multiple times in production systems. Really catches you by surprise!

                        1. 2

                          I feel like one of the main advantages of components as distinct from partials in Rails is that you get clearer semantics regarding passing things around/where things come from. While this doesn’t eliminate that (you still have to define an attr_writer) it does make it a bit less clear than it otherwise could be (where is @post set?).

                          1. 5

                            There is something funny about this, in that the primary motivation seems to be about having jobs evicted from Redis due to the LRU setting.

                            Sidekiq itself recommends changing to noeviction: https://github.com/sidekiq/sidekiq/wiki/Using-Redis#memory.

                            We’ve run literally trillions of jobs through Sidekiq (thanks Mike!) over the years and never had an eviction problem because we set noeviction and used appropriately sized instances.

                            That said, using Kafka in this fashion is pretty cool, though Kafka is often very complex for “relatively simple” job queues.

                            1. 2

                              Yeah, chalk it up to not paying close enough attention to the settings when I set it up 10 years ago. :) Once we had Kafka in place for our newer pipeline, then it made sense to migrate our older pipeline to use it, too.

                              The LRU bit was actually a secondary motivation. The primary motivation was that we got to a point where the amount of data being moved through redis could result in it filling up (regardless of the other, non-queue items that could get evicted) too quickly if we had a downstream processing problem of any significant duration. Moving to Kafka gave us much more breathing room, since we now have terabytes instead of gigabytes of storage. :)

                              1. 2

                                Yeah, makes total sense - might be worth discussing that issue in the post versus the “oops LRU results in data loss” reasoning.

                              2. 1

                                Yea I’m curious what the numbers are that they’re running. Like what is the RAM on the machine running their Redis server?

                                1. 3

                                  The redis instance in question has 64G of RAM.

                              3. 2

                                Thank you for doing this foundational work that impacts so much.

                                1. 1

                                  (Just an FYI that I’m not the author of the post)

                                  1. 2

                                    I know :) thanks for posting this.

                                    I also know byroot reads lobste.rs sometimes

                                2. 3

                                  Great article as usual from Julia Evans.

                                  The most frustrating inconsistency for me is handling of cmd-line flags. Is -akx one flag? or multiple? This is one of my annoyances with terraform–it feels pretty lopsided how most follow the short vs long flag convention. I don’t know why some programs go out of their way to misalign with it.

                                  I deeply appreciate clap for making the standard convention available on a very fast runtime (i’m sure there are similar things in other langs–it’s just the one i’m most familiar with; it certainly helps considering the tendency for folks to want to rewrite everything in rust :^) )

                                  1. 4

                                    rule 2: TUIs should quit when you press q

                                    TUI programs (like less or htop) will usually quit when you press q.

                                    I guess vi would like a word ;)

                                    By the way another convention that we take for granted is preserving the main screen by using the alternate one while in TUI mode. Most programs do this, but for example more and top do not. Some BSD implementations of less and vi also don’t.

                                    1. 6

                                      Did you read the next line?

                                      This rule doesn’t apply to any program where pressing q to quit wouldn’t make sense, like tmux or text editors.

                                      1. 3

                                        I hate that convention of less. There are plenty of times when I’m just paging through a file and want to use some information with another command. I hit “q” and then—poof the info is now gone.

                                        1. 3

                                          Try setting LESS=X (or passing it -X).

                                      2. 14

                                        It’s ironic how the page is littered with ads.

                                        1. 11

                                          Yea, is there a way to flag a story as “linked site is of low quality” even if the story has merit?

                                          1. 1

                                            It’s also written in the stilted, information-sparse prose I associate with spell-checker bots. Yuck.

                                          2. 1

                                            I’ve had a long-time interest in the Erlang VM and languages the run on it but I think I don’t understand the durability aspect. This post makes allusions to how this works in Erlang but I haven’t been able to find more details. Anyone have links to relevant tools/parts of documentation that can explain how one prevents data loss in a distributed Erlang system?

                                            1. 1

                                              Mnesia is Erlang’s database.

                                              1. 1

                                                I’ve heard of Mnesia, but was more wondering about the patterns. Do folks persist all messages before sending for an at-least-once guarantee?

                                            2. 4

                                              PostgreSQL Lock Conflicts is also a good resource on Postgres locks - https://postgres-locks.husseinnasser.com/

                                              1. 3

                                                It is! I thought I had included a link to it but I must have missed it. Thank you for highlighting.

                                              2. 5

                                                Normalize your data unless you have a good reason not to

                                                I think this is actually bad advice, normalization means migrations, painful binding, etc. It’s not a code-native way to operate.

                                                Postgres has JSON support and computed columns, so you can make indexes on JSON data as needed.

                                                I think you can have a different table for different items, but I like using a jsonb data column, and having the ID hierarchy as columns. Anything else gets handled by binding the JSON in one step, and I can put defaults as I need.

                                                MUCH more friendly to work with, and to extend, esp in languages that have actually good JSON support (Go, Rust) with tags for extensibility (JS is too primitive, zod fixes a lot of it though)

                                                1. 56

                                                  So you are treating PG as a document store? You should look up the downsides of document stores before you keep going down this road.

                                                  JSONB has its place in PG and it’s very useful for irregular data, like extra user defined fields, etc. But for core functionality of your data you are 90% of the time better off using normal PG data types like text, int, etc.

                                                  One of the best things about normal datatypes outside of JSONB, is you get data correctness. You can use constraints like CHECK that verify the data before you store it. This is extremely useful for data correctness outside of application code.

                                                  Doing that in JSON is miserable. Doing that in the application means you will have lots of bad data eventually, as bugs in application code inevitably happens.

                                                  In fact for the OP, the biggest thing I’d add to their list, is constraints. Make the DB keep your data reasonable. This data relates to this data, setup a REFERENCES, etc.

                                                  Sometimes document stores are the right tool for the job, but it’s rare in my experience.

                                                  1. 6

                                                    Call me a heretic, but:

                                                    • FKs are inefficient for the DB to manage, code can enforce this if you truly need it, no need to make the DB slower
                                                    • If you need constraints, check them in code, don’t make the DB do it. It’s not miserable in code, it’s clear in code because you don’t have to refer to a schema to see what you’re allowed to do.

                                                    I’m not treating it as a document store, I’m just managing the schema in code, not in the DB.

                                                    1. 23

                                                      Another counterpoint that wasn’t mentioned: while not universally true, databases often outlive projects built on top, at which point you would be much happier with uniform, structured data, than a random data dump.

                                                      1. 2

                                                        You end up massively annoyed either way because the data model in your new application is not going to line up with the data model in the database no matter what you do.

                                                        1. 2

                                                          That’s why you design your data model for the first application as well, and not just organically grow it, ideally.

                                                      2. 19
                                                        • FKs are inefficient for the DB to manage, code can enforce this if you truly need it, no need to make the DB slower

                                                        They should be fast, there ought to be an index. Also, how is your code going to enforce that there are no dangling foreign keys when you do a DELETE on the table being pointed to?

                                                        • If you need constraints, check them in code, don’t make the DB do it. It’s not miserable in code, it’s clear in code because you don’t have to refer to a schema to see what you’re allowed to do.

                                                        It’s not miserable in the database either. In fact, you have to add a CHECK constraint or unique index only once and the db will do everything to guarantee it is enforce. In code you have to do it everywhere you write to the table. For bulk updates in code this is extra painful to ensure, especially if it’s a multi-column constraint and the bulk update affects only one of the columns involved in a CHECK constraint.

                                                        C’mon man, you’re setting yourself up for failure by working against the grain of the RDBMS. As others have pointed out, you’re probably better off with a document store.

                                                        1. 1

                                                          Also, how is your code going to enforce that there are no dangling foreign keys when you do a DELETE on the table being pointed to

                                                          Transactions, and transactional functions (the way foundationdb encourages).

                                                          Instead of slinging sql all over my codebase, I just make reusable functions. Analgous to components in react. You can also use workflow-style systems for async deletion of large relations.

                                                          1. 6

                                                            Transactions, and transactional functions (the way foundationdb encourages).

                                                            Transactions don’t, by themselves, ensure that there will be no dangling foreign keys. It is only a tool you can use to make sure records that belong together are deleted atomically, when viewed from the outside. The code you run in your transactions would have to be flawless.

                                                            I can’t count the number of times where I thought some code was correct, but then a FK constraint fired, letting me know I had forgotten to delete (or clear the FK in) some relation or other. Even moreso with CHECK constraints across multiple columns. You give up all of that by treating your RDBMS as a dumb document store.

                                                            Also, bulk updates and deletes are something your database can do very quickly, which means you won’t be running that meticulously crafted code which ensures the constraints are upheld. Constraints will be upheld by the database, though. Deleting or updating records one-at-a-time is of course possible but not exactly efficient.

                                                            And finally, over time you might want to add other clients (maybe as part of a rewrite? or perhaps you have to run a one-off query by hand) that speak to the same database. They will have to have their code as meticulously crafted and agreeing 100% with your original code. If you have the constraints in the database, that’s not as big a risk.

                                                            1. 2

                                                              We seem to be on different grounds: I’m a dev first, you seem to be a DBA first. I don’t think there’s any way I can convince somone under another “religion” for lack of a better term.

                                                              1. 8

                                                                I’m a dev and not a DBA, but I know that the database can enforce these rules better than I can (and the query optimizer get better results because of that). It’s the same way I trust the compiler to make safety decisions better than I can. I can forget a free the same way I can forget to maintain referential integrity myself.

                                                                1. 6

                                                                  Hm, that’s an odd thing to say. I’m a developer by interest and trade. I actually learned to use databases “on the job”, as I barely avoided flunking my university database course because my grasp of SQL was too weak. In the intervening years I was forced to learn SQL, because it’s such an important part of most web stacks, to the point where I’m typically the go-to expert on Postgres in whatever company I work for.

                                                                  My early experiences have been with MySQL with all its problems of wanton data destruction (back when it defaulted to MyISAM) and Ruby on Rails which didn’t/couldn’t even declare foreign keys or constraints back in the day. I’ve felt the pain of the “enforce everything in code” approach firsthand when that was the accepted wisdom in the Rails community (circa 2007). I’ve seen systems grind down with the EAV approach and also experienced how difficult it can be to work with data that’s unnecessarily put in JSON.

                                                                  All of this to say, I have the battle scars to show for my hard gained wisdom, I’m not just repeating what I hear here and elsewhere. I’ve learned to trust the database to enforce my data invariants and constraints, as I know we will get it wrong in code at least at some point. It’s an additional and important safety net.

                                                                  1. 4

                                                                    This is almost exactly where I’m at. I’ve used big non relational datastores before, and they can do really cool stuff. But I’m really quite a Fred Brooks acolyte at the end of the day, and I always try to build systems such that “show me your tables” is more important than “show me your flowcharts”. I find the exercise of really thinking about the entities I need to manipulate up front pays off hundredfold.

                                                          2. 11

                                                            Anyone trying to understand your database can’t without also reading your code, since there is nothing in the DB to document how relationships exist and/or are constrained in the data.

                                                            I like how your assumption is, that you can write faster code than a DB that has been at it for likely decades. I’m not saying it’s impossible, but improbable. Unless you spend lots of time optimizing that in the app layer. Why would you do this to yourself?

                                                            As far as constraints in the code, you want them too(for UX), but:

                                                            • Data needs to be trusted. Databases with proper constraints is the lowest common denominator.
                                                            • Data documentation. You hand a new person read access on the DB, they can print out ER diagrams and easily see the relationships around the data. Why re-invent the wheel where none of these tools exist?
                                                            • External to your code data manipulation is now misery for everyone including you.
                                                            1. 16

                                                              I like how your assumption is, that you can write faster code than a DB that has been at it for likely decades.

                                                              As somebody who’s DBA’d for devs with similar mindsets, in my experience this specific antipattern is due to selection bias.

                                                              That is, the database may be running thousands of different queries and query plans each day, but the only query plans that come to a dev’s attention are the ones that seem slow. So those get examined closely, and sometimes, something is “obviously” wrong with the plan–if only the DB would join here and loop there, this would be faster!

                                                              After a few of those, the dev in question starts to wonder why all the plans they’re seeing are bad. Surely, they can do better than that, so why do we have so much written into the DB, where it generates all of these bad plans. Once they’ve hit that point, it’s not far to the idea that all query plans should be implemented in the app code, so they can be reviewed, optimized, etc. A few months later they’re installing MongoDB and reinventing joins, because they didn’t consider the other thousands of queries in the first place.

                                                              1. 2

                                                                Well said. One should strive to understand the tools one uses.

                                                              2. 3

                                                                I understand your perspective, and I appreciate you actually asking questions instead of abusing my thoughts like others in this thread (which is shockingly exhasuting).

                                                                I think there are 2 schools of thought:

                                                                1. Every service talks to the same data store (tables, schema, etc.)
                                                                2. Every service has their own data store (only can access user info via the user service)

                                                                #1 is much more traditional (and probably represents the majority of the people fleecing me in this thread)

                                                                is much closer to home for me, where I know only my code touches the DB, I can make a LOT of assumptions about data safety and integrity. Additionally, when you’re here, you find it’s a lot more scalable to let your stateless services do the CPU work than your DB. I’ve saturated m5.24xl RDS instances before. Nightmare scaling event.
                                                                1. 3

                                                                  In general what I have found is that #2 is only true for some period of time, assuming it’s data people actually care about. They inevitably want access to it here, there and everywhere.

                                                                  If #2 is your wheelhouse and nobody else will ever care about the data: why are you storing to PG? Why not store to files in the filesystem or berkely db or something else that doesn’t have the overhead that PG has.

                                                                  I can make a LOT of assumptions about data safety and integrity.

                                                                  Assuming those assumptions are correct. Many people screws these up, all the time. I know I screw up assumptions all the time.

                                                                  I’ve saturated m5.24xl RDS instances before.

                                                                  I had to go look up what that was. Our test hardware is larger than that. Cloud DB’s are ridiculously over-priced and perform badly in my experience. You might consider going to dedicated hardware. You can always pay an MSP to babysit it for you. Even after their high markup, it should still be significantly cheaper than RDS.

                                                                  1. 3

                                                                    Why not store to files in the filesystem or berkely db or something else that doesn’t have the overhead that PG has.

                                                                    You know why: The tooling, the support, available offerings for a managed solution with monitoring and backup.

                                                                    Cloud DBs are overprice for a hardware/perf cost perspective, but I’d argue not from TCO except for either extreme scale, or niche workloads.

                                                                2. 1

                                                                  Actually, it’s the filesystem or rather the kernel that is the lowest common denominator.

                                                                  Now you might say “sure, but you don’t interact with the kernel, you interact with the database”. My response would be: “that is only by convention and it is exactly the same for consumers of the data - they should talk to my app(s) instead of talking to the database directly”.

                                                                  Now, if you have multiple applications with different owners then that doesn’t work. But let’s not pretend that using a single database in that style would necessarily be the best option. There is a reason why microservices came up. And I’m against microservices (instead I’m for just “services”). But the point is that this style was “invited” because of the pain of sharing the same data. That’s how the shared nothing architecture came into life.

                                                                  And if you think about it, this is how many many businesses operate, be it in technical or organizational terms.

                                                                  The gist is: normalization also disadvantages, so I’m on the side of danthegoodman. This quote:

                                                                  Normalize your data unless you have a good reason not to

                                                                  Is similar to saying “your architecture should be a monolith unless” or “your architecture should be microservices unless”. Instead, it should be changed into:

                                                                  Pick the optimal level normalization of your data

                                                                  And of course now it’s difficult to say how. But that’s the point: it’s difficult and people here seem to be of the opinion that normalization should be “the standard”. And it shouldn’t.

                                                                  1. 1

                                                                    I’m obviously of the opinion that you should default to normalizing the data and only de-normalize when it makes sense to do so.

                                                                    1. 1

                                                                      Right. And I’m challenging this opinion.

                                                                      1. 2

                                                                        Let’s think about it another way.

                                                                        Your perspective is(paraphrasing, and please correct if I’m wrong), data integrity should be done in code, because you trust your code more and it’s easier for you.

                                                                        I’d argue, ideally we should do it in both places, so that if there is a bug in one, the other will (hopefully) catch the problem. That is, if we care about getting the data correct and keeping it correct. Much like in security land, where we talk about defense in depth. This is data integrity in depth.

                                                                        I’ve seen terrible code and terrible databases. I’ve even seen them both terrible in the same project. When that happens I run for the hills and want nothing to do with the project, as inevitably disaster strikes, unless everyone is on board with fixing both, with a clear, strong, path forward.

                                                                        Normalization at the database layer helps ensure data integrity/correctness.

                                                                3. 9

                                                                  FKs are inefficient for the DB to manage, code can enforce this if you truly need it, no need to make the DB slower

                                                                  I would be very surprised if typical application code handled this properly and wasn’t racy.

                                                                  And how would the application do it, in a faster way than the database?

                                                                  1. 3

                                                                    Indeed. Not entirely on point, but reading through Jepson.io’s testing shows that getting data stores wrong is very common. Even PG didn’t come out completely unscathed last time it was tested: https://jepsen.io/analyses/postgresql-12.3

                                                                    1. 2

                                                                      I actually don’t find this hard, and I’ve done this for financial systems doing 3k txn/s. The faster way is based on data modeling: Relieve the DB of the work, and allow your code to aid transactional guarantees. That’s pretty much the standard way to build systems that scale beyond the single postgres instance

                                                                      1. 2

                                                                        Can you be more specific? How do you make sure something you reference with an FK does not disappear while you insert?

                                                                        1. 1

                                                                          I normally structure data so this is not an issue, but you can use serializable transactions and select the row

                                                                    2. 5

                                                                      What part of managing foreign keys do you think the DB is inefficient at? How would you do this more efficiently in application code?

                                                                      1. 2

                                                                        My relieving the DB of the computing. For example, if you delete a row that has 1M relations to it, your DB is going to have a really bad time.

                                                                    3. 5

                                                                      You probably know this, but for what it’s worth:

                                                                      • you can create indexes and CHECK and EXCLUDE constraints on JSONB data in postgres and they work fine
                                                                      • you can’t define UNIQUE, NOT NULL or REFERENCES constraints, but you can simulate them (with EXCLUDE for UNIQUE, CHECK for NOT NULL and REFERENCES can be done with generated columns (if each object has only some small number of references to each foreign table) or with triggers (if there are many foreign keys for a single foreign table).

                                                                      I think unique and not null are simple enough, but if you find yourself needing to write TRIGGERs then maybe it would be easier to explode out that schema.

                                                                      Some demos…

                                                                      Create a table that requires inserted json to have a unique numeric id:

                                                                      db> create temp table t (
                                                                          j jsonb check (
                                                                              (j->'id' is not null and jsonb_typeof(j->'id') = 'number')
                                                                          ),
                                                                          exclude ((j->'id') with =)
                                                                      );
                                                                      CREATE TABLE
                                                                      

                                                                      Demonstrating that the constraints work correctly:

                                                                      db> insert into t values ('{"id": 1}'), ('{"id": "1"}')
                                                                      new row for relation "t" violates check constraint "t_j_check"
                                                                      DETAIL:  Failing row contains ({"id": "1"}).
                                                                      
                                                                      db> insert into t values ('{"id": 1}'), ('{"id": 1}')
                                                                      conflicting key value violates exclusion constraint "t_expr_excl"
                                                                      DETAIL:  Key ((j -> 'id'::text))=(1) conflicts with existing key ((j -> 'id'::text))=(1).
                                                                      
                                                                      db> insert into t values ('{"id": 1}'), ('{"id": 2}')
                                                                      INSERT 0 2
                                                                      

                                                                      Demonstrating that queries use an index:

                                                                      db> explain select * from t where j->'id' = '1'::jsonb;
                                                                      +--------------------------------------------------------------------------+
                                                                      | QUERY PLAN                                                               |
                                                                      |--------------------------------------------------------------------------|
                                                                      | Bitmap Heap Scan on t  (cost=4.21..14.37 rows=7 width=32)                |
                                                                      |   Recheck Cond: ((j -> 'id'::text) = '1'::jsonb)                         |
                                                                      |   ->  Bitmap Index Scan on t_expr_excl  (cost=0.00..4.21 rows=7 width=0) |
                                                                      |         Index Cond: ((j -> 'id'::text) = '1'::jsonb)                     |
                                                                      +--------------------------------------------------------------------------+
                                                                      EXPLAIN
                                                                      

                                                                      There’s also a default GIN operator class for jsonb supports queries with the key-exists operators ?, ?| and ?&, the containment operator @>, and the jsonpath match operators @? and @@.

                                                                      create index t_idx_j_gin on t using GIN (j);
                                                                      explain select * from t where j @> '{"name": "Alice"}'::jsonb;
                                                                      

                                                                      Use a generated column to enforce a REFERENCES NOT NULL constraint without having to add triggers:

                                                                      create temp table clients (id serial primary key);
                                                                      create temp table orders (
                                                                          data jsonb,
                                                                          client integer generated always as (
                                                                              (data->'client')::integer
                                                                          ) stored references clients not null
                                                                      );
                                                                      
                                                                      1. 2

                                                                        Yes, I agree it’s mostly possible now, but it’s not fun.

                                                                      2. 1

                                                                        Disagree. If only your application uses the database then just create a dedicated type that describes the format/shape of the json. To introduce a bug, the type would have to be changed in the wrong way (but then that could also happen with a change to the table schema).

                                                                        If, instead, you don’t have a dedicated type that defines the json shape or, worse, use a PL without statically enforced types, then yeah, use the database types instead. But that’s on you, not on the concept.

                                                                        1. 25

                                                                          Database normalization isn’t about type safety though—it’s about data contradiction.

                                                                          1. 1

                                                                            To do that it’s sufficient to make those parts into their own columns. That’s not general argument against JSON. Please mind that OP specifically gave CHECK as an example.

                                                                            1. 4

                                                                              Of course it’s not sufficient and of course it is an argument against serialized blobs as a database. If you browse a database for the first time, it is self explaining which data is mandatory, it’s types, which relates to which, cardinality, length, etc. These things are clearly stated and the intent is non ambiguous.

                                                                              How you do that if you arbitrarily just serialized your data and store it somewhere? You have to check the code. But ‘the code’ is a diffuse concept. What part of the code? What if one part contradicts each other (which is what happens all the time and why a well designed data model is important)?

                                                                              I will repeat myself from other post. Data normalization is not a trend not a preference. People do it because they need the value it provides.

                                                                              1. 1

                                                                                If you browse a database for the first time, it is self explaining which data is mandatory, it’s types, which relates to which, cardinality, length, etc. These things are clearly stated and the intent is non ambiguous.

                                                                                Or you end up with lots of columns that are mostly null because of how you try to emulate sum-types (tagged unions). Then just having a look at the json is much easier.

                                                                                How you do that if you arbitrarily just serialized your data and store it somewhere? You have to check the code. But ‘the code’ is a diffuse concept.

                                                                                I partly agree with that. If you wanna go the full way, then you can also do a CHECK on the json schema. Then you don’t need to look at the code.

                                                                                But try to see it from this perspective: databases and their technology evolves very slowly (which is not necessarily bad, but that’s how it is). Programming languages can evolve much faster. Therefore code can be much more concise and succinct compared to database schemas.

                                                                                Data normalization is not a trend not a preference. People do it because they need the value it provides.

                                                                                Yes, and the same can be said about data denormalization as well.

                                                                                1. 3

                                                                                  Or you end up with lots of columns that are mostly null

                                                                                  That is what happens when you do not normalize your database. Normalization does address and solves this problem specifically. This was one of many reasons why relational theory was developed 5 decades ago.

                                                                                  databases and their technology evolves very slowly

                                                                                  That is blanket and abstract statement based on assumptions that haven’t been asserted or checked. Developments in the postgres ecosystem have Ben explosive in the last 5-10 years. And then there are news kids in town such as duckdb or click house just to name a couple. You would be surprised if you dig a bit deeper.

                                                                                  But my main point about that statement isn’t even that. Theory doesn’t evolve because it’s aximoatic/factual. The gains of removing redundancy, the algorithmic complexity of lookup operations of known data structures, have been known for decades and do work as expected. If you transform a linear lookup in a logarithmic one, that is always going to be a usefully tool. It doesn’t need to evolve because it does what’s expected and what people need.

                                                                                  Yes, and the same can be said about data denormalization as well.

                                                                                  But you are not talking about denormalization. You’re talking about completely giving up on the idea of using a relational database to enforce of at least aid in data integrity and roll your own integrity checks manually on your program. Not that it can’t be done for applications with small domains. it was the norm in desktop applications and worked well. But as soon as you start working with used accounts, I can’t even reason about where and how I would manage the data without an RDBMS.

                                                                          2. 20

                                                                            Every single database I’ve worked with in my 30+ year career ends up being queried and manipulated by multiple readers and writers. Throwing away database consistency guarantees because “this time it won’t” would be, in my opinion, a crazy move.

                                                                            1. 1

                                                                              As long as all those readers and writers are under the control of the same “owner” that is not a problem for all practical purposes.

                                                                              But in any case, if those readers/writer are really totally separate (and not just instances of the same service for example) then I think reading/writing to the same table is nowadays often considered an antipattern.

                                                                              1. 8

                                                                                They’re very likely to be in different languages in different environments. If you’re not using the DB as a central source of truth I suppose you can get away with it but I would be deeply wary of any such systems design.

                                                                                1. 2

                                                                                  More and more systems don’t have “the” db as a central source of truth. Not saying if that’s good or bad but that’s the reality.

                                                                                  1. 4

                                                                                    This just means those systems get to re-learn all the lessons we already learned the hard way. Sad for them. Hopefully it won’t take them quite as long as it took the first time around to learn all those lessons. I hope they at least read through Jepsen’s tests before going down that road: https://jepsen.io/analyses

                                                                                    Maybe something good will come of it?

                                                                                    SQL databases are not the perfect solution to the problem, and certainly improvements can and do continue to happen, but unless you are specifically tackling data problems as part of your product, it’s likely to end in misery for everyone involved.

                                                                                    Maybe whatever comes out of these experiments will provide good things. They got us JSONB in PG for instance, which I think is a great example of what the document store path gave us, that’s very useful.

                                                                                    Now AI LLM storage is the hot new thing. I expect most of this data storage could end up in a PG database with great results eventually, once the problem is well understood(provided LLM’s stick around a long time as useful things).

                                                                                    1. 1

                                                                                      I rather think that requirements have changed. It was totally normal to do “site maintenance” once a month for hours or even a day. Same for having downtimes sometimes.

                                                                                      Nowadays, a downtime in the credit card system means (or can mean) you can’t pay. Might not have been a problem in the past where magnet stripes were used or people just paid with cache, but nowadays it is. So systems have evolved to embrace eventual consistency and patterns like CQRS.

                                                                                      I feel like people here have never really seriously used JSON in postgres. Or, they solely look through it with the eyes of a DBA. Sure, then JSON is less convenient. But it’s not only the DBA that matters. Speed in development matters a lot. JSON can heavily increase that speed. And I believe that it’s not only perfectly fine but actually the best choice in most cases to go with a combination of JSON and “regular” columns.

                                                                                      For instance in my projects I often have a table like that: id, tenant, type, jsonbdata, last_modified, created_at, created_by, deleted_at, …

                                                                                      If I feel the need to always access specific parts of my json in my queries then I’ll factor it out. But it actually rarely happens, except for few “core” parts of the application. But most of the time the approach above works perfectly fine.

                                                                                      My impression is hence that people are really religious and forgot that software engineering is often about trade-offs.

                                                                                      1. 2

                                                                                        I agree in the broadest of senses. It’s often about trade-offs, and developer speed is one of those trade-offs. What I don’t get is how using JSOB is “faster”.

                                                                                        I use JSONB columns, but I keep it to things I know I don’t care about, like extra user defined fields. I don’t find developer speed faster by doing it. It just makes it easier for me and the end user. If they want to keep track of 50 different things, I don’t have to generate 50 user_field_x columns for them, and they can come labelled in the UI without any extra work. Inevitably they have typos in their json fields, and request automated cleanup. It’s annoying to have to go clean that up for them.

                                                                                        How exactly is it faster, from your perspective?

                                                                                        It sounds like maybe you are hinting that with JSONB columns you can keep your downtime down? Except I don’t see how? So you don’t have to run the ALTER TABLE statement? It’s not like that has to cause downtime anymore.

                                                                                        JSON can heavily increase that speed.

                                                                                        I guess this is the crux of my question: how does it do that?

                                                                                        1. 1

                                                                                          I agree in the broadest of senses. It’s often about trade-offs, and developer speed is one of those trade-offs. What I don’t get is how using JSOB is “faster”. (…) How exactly is it faster, from your perspective?

                                                                                          For a very simple reason: I can leave the database as is and can just make a code change. And I can test all of that that with very simple unit tests.

                                                                                          If I change the schema, things get much more problematic.

                                                                                          • I have to coordinate schema changes / migrations with other people.
                                                                                          • I need to ensure that both old and new schema work with both new and old application version at the same time (I have to do the same with json, but that’s a matter of in-language data conversion from json v1 to json v2).
                                                                                          • I need to be mindful about how the migration might impact performance (table locks etc)
                                                                                          • In particular when using sumtypes, I can make simple code changes and don’t have to touch the database at all
                                                                                          • Data validation is simplified. I parse my json into my format-type and from there into my domain type. The former is super easy. If I use postgres features, I now have to usually do some kind of mapping myself. E.g. how do I even store a map of, say, text -> timestamp in a column? hstore can’t. So back to json we are. And no, I don’t want to normalize it, let’s assume that this is 100% clear to never be normalized. I just want to store it and the frontend does something with it. No need for anything else.
                                                                                          • I could go on…

                                                                                          You can say “it’s all not a problem to me” and then we just have different experiences. But I know both worlds (from when postgres had no json types at all) and I prefer the json world by far.

                                                                                          1. 4

                                                                                            I have to coordinate schema changes / migrations with other people.

                                                                                            If others are accessing the db and relying on your implicit schema, you’d still have to at least tell them “there’s now a v2 available which has these properties”.

                                                                                            I need to ensure that both old and new schema work with both new and old application version at the same time (I have to do the same with json, but that’s a matter of in-language data conversion from json v1 to json v2).

                                                                                            How does that work when you add a new key to your json? You just keep the old json documents without that key? Doesn’t this make for very awkward querying where you have to code fallbacks for all the older versions of the json?

                                                                                            1. 2

                                                                                              If others are accessing the db and relying on your implicit schema, you’d still have to at least tell them “there’s now a v2 available which has these properties”.

                                                                                              I’m talking about a single application and team. We usually manage schemas via migrations (e.g. flyway or similar) and now it depends on when someone checks in their changes. Using types in my code, I’d get a mergeconflict instead.

                                                                                              This maybe would not be the case if I use some kind of “auto-migration” tool where I describe what I want to have instead of what I want to do but that brings other problems like not knowing how the toll will update the table etc - it’s much more risky.

                                                                                              How does that work when you add a new key to your json? You just keep the old json documents without that key? Doesn’t this make for very awkward querying where you have to code fallbacks for all the older versions of the json?

                                                                                              Awkward is to have to update/change every row everytime something like that happens.

                                                                                              I prefer the natural approach that humans use all the time: understand that there are multiple versions and then interprete them accordingly.

                                                                                              pseudocode:

                                                                                              enum DatabaseFormat:
                                                                                                 type DatabaseFormatV1 = {..., version: v1}
                                                                                                 type DatabaseFormatV2 = {... version: v2}
                                                                                                 type DatabaseFormatV3 = {... version: v3}
                                                                                              
                                                                                              type Foo = {...}
                                                                                              
                                                                                              function readFromDbFormat:
                                                                                                input: format of DatabaseFormat
                                                                                                output: Foo or Error
                                                                                                definition: format.version match {
                                                                                                   if v1 then ...
                                                                                                   if v2 then ...
                                                                                                   if v3 then ...
                                                                                                   else: error, unknown version
                                                                                                }
                                                                                              

                                                                                              In a good language, the boilerplate for that is minimal and the typesafety high. I much much much prefer this approach over anything I’ve used over the years.

                                                                                              Benefits:

                                                                                              • It is very clear from the code which formats there are and how the look like
                                                                                              • It is clear from the code how the data of each format is interpreted
                                                                                              • Git can be used to see when formats got created, why and so on
                                                                                              • Very easy to test conversion logic as unit test
                                                                                              • Very easy to roll back application code OR data OR even both at the same time
                                                                                              • No conversion / database maintenance needed
                                                                                              • Migrating data (from older to new versions) can happen over a longer time period which reduces load
                                                                                              • When two people attempt to make a change at the same time, they’ll get a merge conflict

                                                                                              Disadvantages:

                                                                                              • have to store an additional version column (and small performance overhead reading it)
                                                                                              • old versions can stay for a long time if they never get changed at some point. This means that manual SQL (e.g. for maintenance) now has to work with multiple versions as well. (though it is possible to just read all “old” formats and update them once in a while via code if necessary)
                                                                                              1. 2

                                                                                                In a good language, the boilerplate for that is minimal and the typesafety high

                                                                                                Sure, but that only deals with reading record-at-a-time. I was referring to queries. Like for example, constructing a WHERE condition to retrieve the records in the first place (or do bulk updates/deletes). Most applications have tons of situations where you need to query the database, not just fetching a single record by id.

                                                                                                1. 1

                                                                                                  Sure, but that only deals with reading record-at-a-time. I was referring to queries. Like for example, constructing a WHERE condition to retrieve the records in the first place (or do bulk updates/deletes). Most applications have tons of situations where you need to query the database, not just fetching a single record by id.

                                                                                                  Usually the WHERE condition aims at things like the “type”, the “tenant” or other normalized columns, not the json data. an occasional where against the json is fine too (of course all versions have to be considered then, which is one of the drawbacks I mentionied). If the data is queried in very complex ways all the time, I would not use jsonb.

                                                                                                  Let me give a concrete example: user settings. For example, does the user want to receive marketing mails. We can model that as a boolean column or a boolean in json.

                                                                                                  Usually the application will just deal with the settings of a specific user. Maybe once in a while I want to go over all settings or the settings of a specific group of users, e.g. for a batch job to do something. Well, then I can still just load them all at once and filter those that have the flag set to false in my code.

                                                                                                  Is that less efficient than having the database filter it and return only the ids of the users that I care about. Sure. Does it really matter? For many tables it will never matter.

                                                                                                  Let’s get one thing clear: I’m not arguing to ALWAYS use json. I never did that. It is you (and others) arguing against NEVER using using json. Well, basically never.

                                                                                                  I started using json with postgres from when jsonb was introduced, that is more than a decade of experience. And I can usually tell quite well, if a table benefits from being totally normalized from the beginning or not. If I’m wrong, I have to convert it once, no big deal. In the other cases I save a lot of time for the reasons I explained in detail.

                                                                                                  Maybe we just deal with very different applications if ALL your tables ALWAYS need to be normalized (from the beginning). That’s not the case of the applications I work on.

                                                                                                  1. 3

                                                                                                    It is you (and others) arguing against NEVER using using json. Well, basically never.

                                                                                                    Not at all. I’m actually a proponent of using the rich types Postgres offers. I have been known to store e.g. arrays of enums and sometimes jsonb columns as well. Both of these do quite well in cases like you mentioned where you store a grab bag of properties, or for storing additional “origin data” for auditing or debugging purposes.

                                                                                                    But I am extremely sceptical of using jsonb for nearly everything, which some people here seem to argue in favor of.

                                                                                            2. 2

                                                                                              I guess we can say: It’s not been a problem for me.

                                                                                              If I fully own the DB, then I can do whatever I want with it(basically negating the first 3 you mentioned). If I don’t fully own the DB, I have to coordinate with other people either way, as they will want access to the JSON data too. Saying, well it’s just JSON isn’t very helpful. They need to know this value needs to be in this format, or is a currency from this list or whatever still.

                                                                                              Around data validation: If you don’t fully own the DB, i.e. other people also access the data, then it’s all the more reason to make the DB handle all of that for you, as then the DB can say hey, no way you can’t put that there! Otherwise your application has to deal with it after the fact, which is very, very annoying. If you erase the data as it doesn’t fit your model, they get mad. If you re-format it, they get mad, etc.

                                                                                              I agree sometimes the mapping abilities between DB columns and internal language types can be annoying. Depends on the language and library you use. If you are stuck with crappy tools, I can see this as being true.

                                                                                              Around simple unit tests, we do simple unit tests on the DB itself too.

                                                                                      2. 2

                                                                                        This is a) very true and b) deeply unfortunate.

                                                                            2. 25

                                                                              What you’re saying is extremely uncommon and goes against almost all conventional RDBMS wisdom. Pretty much no one uses Postgres in the way you’re describing.

                                                                              1. 3

                                                                                I’ve heard it before, albeit from software developers/architects, not from a DB centric view. While it’s not the “proper” way to do things, it might be a better architectural choice when you have many small services and want to iterate quickly. Once the data model is clear, I would prefer a well-structured database, but until then, I consider using JSON a valid tradeoff.

                                                                                1. 20

                                                                                  Iterating quickly with data will bite you later. Data is the foundation of most applications. Basically, the lower in the application you allow invalid data, the more layers above need to handle it.

                                                                                  It may take a few years to notice and the secondary effects of broken systems and processes start eating at profitability. You permanently lose customers because you can’t fix their accounts. You can’t add new features. You can’t scale because the cost per customer is too high.

                                                                                  This stuff piles on and I’ve seen it strangle companies. But if you just move on in 2 years, you never have to pay the price of your own mistakes.

                                                                                  1. 22

                                                                                    Move fast and get a new job before anyone realizes the trail of catastrophe you’ve left behind.

                                                                                    1. 1

                                                                                      Just out of curiosity, would you prefer an entity-attribute-value style? What is to stop people from stuffing json text in the attribute value?

                                                                                      https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model

                                                                                      1. 4

                                                                                        Nothing other than their own good judgement.

                                                                                    2. 2

                                                                                      Yeah, at some point you have to transition to something more structured, but funding the right point to transition is difficult.

                                                                                    3. 7

                                                                                      My beef isn’t with JSON at all; it’s with the idea that you can just code your way out of bad data modelling decisions. I mean, sure, software is infinitely plastic, but it is deeply unpleasant to have to try and understand the foundational abstractions you’re building on when there’s no assistance to the developers from the tooling. I have JSONB columns on several tables I own, and they’re great, and Postgres’ affordances for dealing with JSON data are fantastic. But the data exists in a model and that model is at least partly defined by the constraints and restrictions that the database provides.

                                                                                      1. 3

                                                                                        that you can just code your way out of bad data modelling decisions

                                                                                        I mean you can do this if we’re talking about hundreds of records. At scale though, you have no chance whatsoever.

                                                                                    4. 1

                                                                                      Yeah but you can say the same thing about Figma, Amazon, Notion, etc. and I’d hardly call them out for how they manually shard postgres instances. At some point you’ve got to stop using it the way it was intended, and start using it the way you need to get the job done.

                                                                                      1. 3

                                                                                        I’m using @zie ’s phrasing below.

                                                                                        The more “conservative” approach others talk about should probably be the default for 95% of all use cases. But in the rare case that you actually „are specifically tackling data problems as part of your product“, you are right that sometimes a traditional DB setup may not work. And it really only comes up at the truly Big Data category, what most people consider as “big” data are nothing for a single PG instance, and hardware (vertical) scaling will probably be all most projects ever need, if anything.

                                                                                        So I do think you are right, but it might still be bad advice for almost everyone.

                                                                                        1. 3

                                                                                          You put it fair and well (but I would probably disagree on where postgres breaks down for big data if we discussed further).

                                                                                          Sometimes I forget that I’m probably a lot farther in my DB journey and can safely make some more questionable decisions in favor of speed and extensibility than others would be comfortable with.

                                                                                          1. 3

                                                                                            Please do share these, in general I think this forum is very good at letting differing voices all be heard! Just maybe add a note to make it clearer that it may not be intended for that CRUD backend webshop that gets 2 orders a week :D

                                                                                            1. 3

                                                                                              my favorite recent one is manipulating KV ordering to follow typical directory structure (I call it “hierarchical ordering”):

                                                                                              Keys in hierarchical order:
                                                                                                "\x01dir\x00"
                                                                                                "\x01dir2\x00"
                                                                                                "\x01\xffdir\x00\x01a\x00"
                                                                                                "\x01\xffdir\x00\x01b\x00"
                                                                                                "\x01\xffdir\x00\x01🚨\x00"
                                                                                                "\x01\xffdir\x00\x01\xffa\x00\x011\x00"
                                                                                              

                                                                                              Maps to

                                                                                              /dir/
                                                                                              /dir2/
                                                                                              /dir/a
                                                                                              /dir/b
                                                                                              /dir/🚨 <- me testing utf8
                                                                                              /dir/a/1
                                                                                              

                                                                                              for a kv backing a distributed file system :) You can see the pattern a bit, but i plan on writing abt it soon

                                                                                    5. 14

                                                                                      I am maybe one of the few people that actually has production experience with a database schema like this built on postgres.

                                                                                      Do not do this. Set aside the consistency and correctness issues and just assume the programmer can maintain those guarantees themselves (they almost certainly cannot, but other commenters have argued that point better). Even ignoring those issues, you’ve just kneecapped your performance. Computed indices/columns are not good enough. Every update needs to hit every computed index. It may not update the index, but it still has to check.

                                                                                      You might think that that won’t be an issue. Maybe you’re right—but if you are then nobody is using your software at scale so you can do whatever you want. At the job where i had to deal with this kind of schema, the first large customer we got immediately ran into major query performance issues that were not fixable without huge refactoring of both the codebase and the database schema.

                                                                                      One thing you’re glossing over with your “code native” talk is that this approach ties your in-memory structures to at-rest representation. The moment that you need to break that (for example: to make queries over at-rest data faster), you better hope your codebase is typed well enough to refactor. Our mid-sized clojure codebase certainly wasn’t.

                                                                                      Do not do this. This is bad advice. Don’t delete your comment. Let it stand as a warning to others.

                                                                                      1. 1

                                                                                        I’ve yet to need a computed column, contrary to what this thread might suggest, I think I’m pretty good at modeling data

                                                                                      2. 13

                                                                                        I think you’re just trolling

                                                                                        1. 3

                                                                                          I think I get punished every time I share an opinion or fact contrary to what database fanatics believe

                                                                                        2. 11

                                                                                          I don’t want to be condescending, but this makes all alarm bells glowing red of overheating. I guess I might kindly ask to cut me some slack because of being relatively old and grumpy. Having worked in dozens of software projects professionally, I can say without blinking that that mindset is literally the #1 reason why software products and even companies fail. I can’t think of any other reason that comes close to this. I estimate that it people knew relational theory to a professional level and applied it at their daily work, effort in solving bugs and other problems would reduce by 80% or so.

                                                                                          Hear me out, I don’t intend to make a debate about opinions. All I am saying is don’t knock it till you try it. Have you red the original paper on relational systems by Codd? It’s a very easy read is its pragmatism is pretty undisputable. Have you red a book about relational theory? Again, don’t knock it till ou try it.

                                                                                          Normalization is not a trend. Is a set of rules theoretically derived as a solution to well identified problems. I Haven’t seen a single instance of people disregarding these rules and not falling in the old traps that have been known for 50 years. Often times it doesn’t even take them a day. Every single time. It isn’t even funny anymore.

                                                                                          I could be here all night giving examples, but frankly speaking, anyone that understands why normalization is a godsend, already knows them all. Heck, I am sure if we look deep enough into Usenet, we will find some post saying the exact same things I am writing now.

                                                                                          What I find bizarre is the present day odd obsession with the type systems as the magical elixir (no pun intended) to fix any and all programming languages. You said in other post: “the client language can to that”. That sounds like: how to deal with bomb explosions. You certainly would prefer to figure out a way to not making said bombs going off? My point being that you don’t need to really on brittle obscure code that someone wrote for data consistency because your databas is essentially built to provide just that. It does that consistently and more efficient than anything that needs to rely o copying data around across processes.

                                                                                          Rant over I guess. I just hope people don’t go this apparently easier route. It turns nightmarishly more difficult really quick. Hope people can interpret this beyond my arguably negative tone.

                                                                                          1. 2

                                                                                            I can say without blinking that that mindset is literally the #1 reason why software products and even companies fail

                                                                                            I really can’t imagine a company failing because of their data structure choices, or at least not modern companies. I’d love to see some examples of companies that objectively failed because someone chose the wrong DB structure.

                                                                                            I personally have used SQL for a long time, and it always felt like it was getting in the way. It was designed for ad-hoc querying by analyists, which code is not.

                                                                                            I have no problem with relational systems, I’d LOVE a flatbuffer-based relational DB. SQL specifically seems to be the problem for me, but I also don’t want to blow another dog whistle here so I won’t talk any more about that XD

                                                                                            I’m not against normalization by definition, I’m against it in the way that SQL encourages it ATM. Flatbuffers really solves a lot in my eyes because it has defaults that are handled by the code, and the knowledge of “null” vs “does not exist yet, add a default”.

                                                                                            It was probably my poor choice of words to communicate my thoughts that dropped an opinion bomb. Maybe I should have lead with the above, but oh well.

                                                                                            But I don’t know why this would overhead alarm bells, DynamoDB has been doing a pretty good job at Amazon, Segment, Tinder, and many more hyperscale companies even powering financial transactions.

                                                                                          2. 7

                                                                                            Strongly disagree. First, for the reasons everyone else says. But second, I disagree about the “painful binding” and “not code-native” bits. A normalized database maps very well to a sensible entity model in OO languages, and honestly is probably even easier with functional languages. Yes, you have to deal with migrations, but probably not often, unless you don’t design your entities up-front. Migrations put the cost of maintaining data integrity over code changes in one place, too, rather than forcing you to do ad-hoc cleanup after the fact.

                                                                                            1. 2

                                                                                              rather than forcing you to do ad-hoc cleanup after the fact

                                                                                              But you don’t have to actively delete data if you don’t need to, and it will get filtered out next time you updated re-serialize to JSON (or something else like flatbuffer or protobuf)

                                                                                              1. 1

                                                                                                I wish you were right. But databases like postgres don’t even have the most basic support for sumtypes. (having lots of null values does not count)

                                                                                                1. 4

                                                                                                  PG has enums, like @jamesnvc mentioned, and it has arrays, etc. It doesn’t have every data type around obviously, but you are also encouraged to create your own datatypes specifically for your intended use case.

                                                                                                  So if you can’t use a built-in datatype, you use a third-party extension or roll your own. All 3 options are well supported in PG.

                                                                                                  1. 1

                                                                                                    Enums are not sumtypes. Array aren’t either.

                                                                                                    You can’t create your own datatype as a sumtype. I have the impression that you might not be familiar with the concept of sumtypes.

                                                                                                    1. 2

                                                                                                      sumtypes have lots of definitions, you are not clear on which definition of sumtypes you mean. Based on this comment, I’ll assume you mean tagged unions? If you mean something else, please be specific.

                                                                                                      You can emulate tagged unions in SQL: https://stackoverflow.com/questions/1730665/how-to-emulate-tagged-union-in-a-database

                                                                                                      You can also do table inheritance, which is a PG only thing: https://www.postgresql.org/docs/current/ddl-inherit.html It’s not exactly the same thing, but it might meet your needs.

                                                                                                      1. 1

                                                                                                        Yes, that’s what I meant. Thank you for clarifying.

                                                                                                        You can emulate tagged unions in SQL: https://stackoverflow.com/questions/1730665/how-to-emulate-tagged-union-in-a-database

                                                                                                        Exactly, so as I said: there is not even basic support for them. So they have to be emulated.

                                                                                                        Now, if I have to emulate them anyways, then I’ll choose the best way of emulation. And in 95% of the cases (in my experience) that is emulation through JSON.

                                                                                                        1. 3

                                                                                                          I think this really boils down to, who owns the DB.

                                                                                                          If you own it fully(i.e. you have control over it) it doesn’t matter. Do whatever you want and makes you feel happy. If that’s a bunch of JSON, have fun.

                                                                                                          If you don’t fully own the data, then you have to deal with other people. In that use-case, you should make the DB do as much as possible, including SQL tagged unions as needed.

                                                                                                          I normally find myself in the 2nd camp, I rarely own the data outright. Other people and processes and teams also need the data. Making the DB do as much as possible forces them to not abuse the data too much.

                                                                                                          In my experience, when #1 is true, that I fully own it, as the data grows and time marches on, it inevitably moves to the second camp anyway(or it really wasn’t important data anyway).

                                                                                                          1. 1

                                                                                                            Yeah, I agree with that. I have seen too many problems as that I’d “share” a json column with other people (maybe with few exceptions).

                                                                                                    1. 2

                                                                                                      I suppose those are the most basic sum types, but valenterry is probably wanting tagged unions (Rust-style enums, not C-style enums).

                                                                                                      1. 3

                                                                                                        (Off topic grumpiness about Rust naming their sum types as enums.. I really don’t get why they did that)

                                                                                                        1. 1

                                                                                                          (It does seem unnecessarily confusing)

                                                                                              2. 1

                                                                                                Relevant classic: Rails is f*cking boring! I love it.

                                                                                                That sexist comment in the Reddit post is a good reminder of why I stay here and not there…

                                                                                                1. 1

                                                                                                  Submitted this b/c it was certainly a revelation for me to note that, in sequence,

                                                                                                  1. a slow statement with a permissive lock
                                                                                                  2. a fast statement with a restrictive lock

                                                                                                  can be just as bad as as a slow statement with a restrictive lock (assuming said statements are targeting the same resource)

                                                                                                  1. 5

                                                                                                    I relate a lot to the state synchronization issue. I migrated a project from Go+Templ+HTMX to Go+OpenAPI+React (no SSR) because of this.

                                                                                                    HATEOAS is a nice idea, but when your application has a lot of client-side interactivity and local state, it becomes (at least for me) to keep a clean codebase that do not turn in a spaghetti monstrosity.

                                                                                                    1. 7

                                                                                                      I mean, “state synchronization” is (IMO) the whole problem React is intended to solve. So when folks balk at the way that React work and advocate for a stripped down lib, my question is always, “okay, so how are you planning on solving it?”

                                                                                                      1. 8

                                                                                                        React solves state synchronization on the client-side, but not between the server and the client. Actually, this second part often becomes more difficult as one adds more interactivity client-side. That’s what leading some teams (for example linear.app) to treat the sync problem at the database level, and replicate the database between the server and client. Then the React client becomes a “simple” view on top of the database replica.

                                                                                                    2. 1

                                                                                                      It’s not obvious to me why one should need the font: bit—isn’t the em unit already relative?

                                                                                                      1. 2

                                                                                                        font-size: 0.9em: Sets the root element’s font size for all browsers, relative to the 16px default.

                                                                                                        font: -apple-system-body: Only in browsers that recognize it, sets the root element’s font family to the system font and its font size to the size of body text at the current system text scale, specified by the user either once in system settings or overridden per application.

                                                                                                        font-family: “Avenir Next”, “Helvetica Neue”, sans-serif: Sets the font family back the way the author wanted it; keeps system-scaled font size.

                                                                                                        So I’m willing to bet the 0.9em applied to all other browsers is chosen to match the 100% size of the system body text scale.

                                                                                                        All of the above is modified by the browser text scale setting.

                                                                                                        1. 1

                                                                                                          I suppose the default size for 1em is 16px, while the font bit uses the default font size for the system.

                                                                                                        2. 24

                                                                                                          I have gotten a lot out of this technique on large/long-term projects, but it’s overkill on small ones. I haven’t yet figured out how to articulate a good rule of thumb for when wrapping is worthwhile. Like, I wouldn’t wrap ActiveRecord in a Rails app, but I would probably wrap use of HTTParty. The answer is probably about predicting what dependencies are unlikely to outlive the current program. I’m tempted to say something as strong as “if it’s not a one-off script, wrap any dependency that’s not a framework” where the library/framework distinction is using definition based on control flow, whether your program is calling into it or it’s calling into your program.

                                                                                                          To attack the other side of the ROI equation, if you do the simplest possible wrap where you define MyFoo that forwards everything to the Foo library and provides the same interface it’s probably just enough of a seam to be worthwhile. That very small investment pays off pretty quick when, six months later, you think of a tweak you’d like to the API.

                                                                                                          1. 20

                                                                                                            My rule of thumb is “is this a service I pay for?”. The number of times I changed vendors b/c of external business factors dwarfs the number of times I needed to change libraries.

                                                                                                            1. 8

                                                                                                              maybe it’s: “does this dependency make network calls?”

                                                                                                              1. 4

                                                                                                                I think it’s “can I incrementally replace this, or does it all have to happen at once?”

                                                                                                                1. 1

                                                                                                                  All three of these ring true for me and are excellent additions to my guesses.

                                                                                                            2. 9

                                                                                                              do the simplest possible wrap where you define MyFoo that forwards everything to the Foo library and provides the same interface

                                                                                                              I briefly worked on a project at my last job where the previous developer wrapped syslog(), named Syslog() with the same interface. There was absolutely no reason to do that, as the project only ran on POSIX systems (no Windows in sight), and by doing so, the compiler lost the ability to warn against misuse of syslog() (mainly, the parameters didn’t match the format string) [1]. When I removed the wrapper, I started finding scores of errors.

                                                                                                              [1] Obviously this is compiler dependent, and could have been added to the wrapped function, but the developer didn’t know or bother to do this.

                                                                                                              1. 3

                                                                                                                Like, I wouldn’t wrap ActiveRecord in a Rails app, but I would probably wrap use of HTTParty.

                                                                                                                Yes! There’s some hidden here. Maybe the more “framework-y and less library” something is, the less valuable it is to wrap it? Wrapping Rails probably does more harm than good, but wrapping small libraries is super helpful IME.

                                                                                                                As much as everyone seems to say YAGNI, I’ll double down and say I did need it several times in multiple client projects (especially JS dependencies).

                                                                                                                1. 3

                                                                                                                  It seems like a good rule of thumb is: Is there an existing alternative here that I might have chosen?

                                                                                                                  I don’t do JavaScript stuff much, but outside of that world new alternatives to dependencies don’t typically appear from nowhere. When I choose a library to provide a feature, either it’s the only one in the space, it’s the only vaguely plausible one in the space, or it’s one of a few options but looks like the best of the bunch. The further to the right you are on that continuum, the more likely it is that I’ll change my mind later and want to replace it.

                                                                                                                2. 2

                                                                                                                  Thinking about your examples, the rule of thumb maybe “how likely am I to need to replace this dependency”. ActiveRecord isn’t going anywhere, but it’s not unlikely that eventually, a different / better / preferred HTTP requests library will come along (as happens in the Python ecosystem once in a while).

                                                                                                                  In that case, you have to evaluate how many places you call HTTParty from, how complex the usage is (can it be find-and-replaced?) and how complex the wrapper would need to be to accommodate your use cases.

                                                                                                                  I think many projects are too small to care about this, but I see it as useful advice once you rely heavily enough on a replaceable thing. Doing things like this has definitely saved me some hassle in the past!

                                                                                                                  1. 1

                                                                                                                    I haven’t yet figured out how to articulate a good rule of thumb for when wrapping is worthwhile. Like, I wouldn’t wrap ActiveRecord in a Rails app, but I would probably wrap use of HTTParty. The answer is probably about predicting what dependencies are unlikely to outlive the current program.

                                                                                                                    I’ve arrived at a very similar position to the article over time, but I would phrase it differently: own your interfaces.

                                                                                                                    The motivation of this rule comes from having to perform shotgun surgery on hundreds of files to update someone else’s API a few too many times. I consider having to do this a failure of architecture.

                                                                                                                    When the APIs of your business logic changes, it should be your decision. It should not be foisted on you by a transitive dependency.

                                                                                                                    The greater the likelihood of API churn, the more imperative it becomes to quarantine behind a stable API.