So that’s two that I strongly disagree with, and eight that I think are perfectly sensible.
ETA: I have a sort of idiosyncratic approach to designing schema, however, in that I tend to let the data speak first, and worry about join performance or query sanity second. It helps my case that I haven’t worked on a high-volume SQL database in ten years, of course.
I’d prefer CREATE DOMAIN or check constraints for simple types, but will use a table if I don’t have easy access to radical new 1980’s style features in my database. Allowing unconstrained data leads to problems not even very far down the road.
My experience mirrors yours. Use types if you can. Zip/Postal code? Make a type. Enumeration? Ditto. As soon as you make it a VARCHAR and have people enter it, you will get hard to detect errors, especially if someone decides it’s a good idea to update the production database directly.
Types in databases actually made me appreciate strong typing in programming languages more than I did before I worked with them.
7 I disagree with…timestamp with time zone helps fix a bunch of weird dumb errors that can creep in. Of course, the machine and database should be set to UTC time zone anyways, but it helps to do this regardless.
timestamp with time zone
I prefer UTC everywhere, but I sympathize with the sentiment. Some things are much easier with time zones. That said, if you have time zones, I’ve found you get a different set of weird, dumb errors.
The key is to not mix them. As soon as you do that, you’re in a world of hurt. And if you don’t store a time zone, make sure you store a UTC value! (Been there, had to recover from that.)
The sole exception where it should be the first choice to store a time zone is if you are storing user preferences about times in the future - but you should not be using a timestamp for that, because you don’t know how DST rules might change before it comes to pass. So it’s almost a totally unrelated problem.
Edit: I wish to clarify that I agree with what you said and am expanding on it. :)
Why not #4?? I prefer them in all my databases and I have read that using UUIDs can be worse in a lot of ways.
Integer keys always always always end up imposing an ordering that’s dishonest – in other words, that isn’t a property of the actual data. I don’t like surrogate keys full-stop, of course, but SQL makes using natural keys awfully hard a lot of the time.
I think the integer primary key is meant to be a unique item irrespective of what data the row is meant to store, so it’s not for ordering or analysis on what the data means, but for developer convenience. What is the downside to using an integer primary key (monotonically increasing as mentioned) and uniqueness constraints vs not? It seems like it would just be the space for the extra column with a benefit being a (likely) shorter handle to each row.
I am unsure what you mean by “I don’t like surrogate keys full-stop, of course, but SQL makes using natural keys awfully hard a lot of the time.” Would you mind expanding that thought?
Another downside to unique integer fields is that they need very careful treatment with sharded databases. There are certainly strategies to make integer sequences work with sharding, but UUIDs are cleaner and easier in many cases.
Notice that this author is advocating adding a unique integer column even when there is already another primary key, and also even in join tables. As you say, it’s for developer convenience, but that’s really not a convenience at all - it’s very dubious that there’s a debugging need to query data in such unnatural ways, and there should absolutely never be a production need. And having them there creates a temptation for someone to use them for something. Clean schemas shouldn’t have such temptations. :)
I wish we lived in a world where there was a better language for interacting with relational data than SQL. Yes, yes, D sort of exists, but still.
Yeah, it’s the best there is but it does get pretty verbose, and lacks portable ways to express things that would otherwise be doable and useful.
The syntax is terrible, but the worst, in my opinion, is that it’s not closed under composition, which makes it a royal PITA to write complex queries in.
Oh, huh. That’s a fascinating high-level point that I haven’t heard before.
Do I understand that you mean you can’t combine queries and be guaranteed that the result is valid? I’m having trouble thinking of an example offhand… of course, if you mean that the result may have unacceptable performance, I completely agree. :)
Thank you for clarifying! I didn’t think of sharding as a requirement. You’re right that a well-designed resource will not tempt its users into ruin. :-)
Could you point me at any resources for describing the difficulty you describe, or a good resource for how it works cleanly with UUIDs?
Problems with monotonically increasing integer keys:
Thanks for clarifying and pointing these out.
Is number one only an availability thing? It would seem that to save a record in the first place the database would be available, so I’m not sure I understand what you mean.
Number two makes sense, though PEBKAC might apply more than blaming the existence of the key. I will agree with another comment in this thread that a well-designed resource or system should not leave that sort of bad use available.
A surrogate key is one which is not derivable from the data being stored; a UUID or auto-incrementing integer, for instance. A natural key is often most readily expressed as several columns, and SQL makes multi-column joins a tremendous pain in the ass. I dislike the former because it implies something about the data that isn’t necessarily true. I dislike the latter because SQL is terrible.
A natural key is more brittle in the face of schema changes too. Surrogate keys are more resistant to change.
If the primary key is generated on a remote system (say, an event) and pushed to a central location, then monotonically increasing integer keys are a nightmare to work with. You can put an integer on it, but now you have two primary keys.
Basically, if your system is distributed, a central primary key authority becomes a terrible bottleneck or a source of hideous complexity. UUIDs save a lot of effort.