1. 64
  1. 20

    IMO the “clickbaity” original title frames the piece better than the toned down version.

    1. 8

      It is as though authors put thought into how they title their works…

      1. 3

        For reference, at the time of the above comment, the title had been changed from the original “A terrible schema from a clueless programmer” to “Normalizing a database schema” due to user suggestions.

        I suggest adding the culture tag in addition to changing the title back, since this article is mostly about developer attitudes rather than database normalization.

      2. 16

        Disclaimer: I can make relational databases work fast but I often try a few designs before I settle on one which performs as required.

        Why not leave it as one table and put an index across those four fields?

        Maybe the index would be ‘too large’ and that itself would cause a performance problem?

        1. 8

          The context here is that MySQL and the hardware available in 2002 was much less capable. I could see this normalization with particular cardinalities and string sizes for certain columns performing better than the indexes available at the time. (The max width for varchars in indexes was much smaller and machines were still around the low hundreds MB of RAM max, for two possible reasons)

          Of course, that’s not really the point of the post

          1. 3

            If the index b-tree used prefix compression, the index might not even be that big.

            1. 1

              This. I would imagine the typical query still has to do a string comparison for the email address. Also, not sure how MySQL fares now but it used to be pretty slow when joining tables.

            2. 9

              A lot of the comments in on this thread brings to mind the following quote from Babel:

              The monk turned to Kaimu, whose jaw had gone somewhat slack. “What do the annals say on the subject of the Misunderstood Lesson?”

              1. 6

                The good thing about a normal relational database is that you don’t pay for by the row is that the initial slow design is completely fine until you get enough data that it’s a problem.

                You’re generally not locked into any particular design and you can make changes incrementally.

                1. 2

                  Also, your sins can be fixed by more experienced sysops.

                2. [Comment removed by author]

                  1. 7

                    Surely this is the key lesson learned

                    I think the key lesson learned is:

                    Now, what do you suppose happened to that clueless programmer who put in the schemas with raw strings (yes, varchars) galore and didn’t know anything about foreign key relationships?

                    Well, that’s easy. She just wrote this post for you. That’s right, I was that clueless newbie who came up with a completely ridiculous abuse of a SQL database that was slow, bloated, and obviously wrong at a glance to anyone who had a clue.

                    My point is: EVERYONE goes through this, particularly if operating in a vacuum with no mentorship, guidance, or reference points. Considering that we as an industry tend to chase off anyone who makes it to the age of 35, is it any surprise that we have a giant flock of people roaming around trying anything that’ll work?

                    It’s a massive problem, and we’re all partly responsible. I’m trying to take a bite out of it now by writing stuff like this. What about you?

                    1. [Comment removed by author]

                      1. 4

                        So when you were just starting out with nearly 0 experience you are saying that

                        • you already knew peformance testing was going to be required in this case?
                        • that indexes were going to be needed in this case?
                        • and you were working in an environment like Amazon where the culture promoted good practices and set you on the right path from the get go?

                        You must be a unicorn or something. Or perhaps you are looking at the past with rose colored glasses? Everyone makes mistakes. If you learn from them then eventually you become that grizzled wise dude in the corner that everyone goes to for advice. But you don’t start out as that person. You get there by making mistakes.

                  2. 4

                    I’m 35 right now and the conclusion of the article kinda scared me 😅

                    1. 6

                      I’d take a different conclusion. Database normalisation was something I learned about in my first year as an undergraduate on a computer science degree and it’s something that sums up the value of a degree to me:

                      The purpose of a university education is not to teach you things, it’s to give you a guided tour of the things you don’t know.

                      I have not had to create a database schema since I left university (and, outside that course, the only one I created was for a hobby project) and by my second year I probably couldn’t remember the difference between any of the normal forms. The most important thing was that I knew that the normal forms existed and, if I’d been in a position where I’d needed to create a database schema then I’d have known to go and read up on them again.

                      The problem is not junior folks that don’t know everything, it’s that they don’t know what they don’t know. One of the most important things you can do as you develop in your career is discover what the things that you don’t know are and the kind of problems that can be solved with them. Even if you never learn about them in detail, knowing that they are things that you need to be able to go and find out about when you encounter a problem of the right shape is important.

                      The worst thing that happens with senior engineers (and I’ve been guilty of this personally) is that they discover new field X and think ‘oh, this is just like Y, which I know about’ and don’t realise that they’ve missed an important difference. For example, I largely ignored containers on Linux because I knew about jails and the mess of cgroups, seccomp, namespaces, duck tape, string, and wishful thinking that goes into isolating containers on Linux just seemed like a crappy way of doing jails. This is completely true, but misses the point that doing builds in a pristine isolated environment and deploying completely self-contained artefacts is a (very useful) shift in how software can be packaged. Now, most of the big cloud providers are deploying Linux containers using VM isolation rather than the goo that Docker initially used, so all of the security issues with the isolation mechanisms go away, but the benefits are still there.

                      1. 2

                        the conclusion

                        That is the only point of that article.

                        1. 4

                          I don’t understand. I do get the point of the article, but realizing that I’ve reached the age described in the article as essentially the industry standard “best before” date was fairly anxiogenic for me.

                          1. 1

                            “best before” date was fairly anxiogenic for me

                            It is the same for me. And that is the gist of the article.

                            The point of the article is not that “database design is hard” or “noobs do mistakes”. The point is Sillicon Valley et al. have evil hiring practice that is not in their own interest.

                            ref: look around, there is a HN thread: https://relevantdb.com/careers.html

                      2. 4

                        Considering that we as an industry tend to chase off anyone who makes it to the age of 35,

                        Where do they go? I may need to know soon ;)

                        1. 4

                          We go to suitable organisations :-) Context: I’m 40, and a couple of years ago was working for an org where I was definitely at the top of where I could get as a dev there as a Senior Developer. This was a bad place for me to be, and once some other life stuff had calmed down, I got out and went to my current employer as a Staff Engineer and there’s plenty of room for me to grow as a dev here. Incidentally, we’re hiring!

                          1. 4

                            And where are they chased away from? At Microsoft, 35 seems to be pretty close to the median age for the folks that I interact with regularly and my more limited experience with other big tech companies suggests that we’re not that unusual. I suspect it’s somewhat related to seniority. Most companies are shaped like a pyramid and hire a lot more junior people than senior and promote a subset of the junior folks. When you’re very junior, you have potential and (hopefully) intelligence but not expertise.

                            There’s a really large error margin in assessing the ability of junior people because they haven’t had the chance to demonstrate their competence. This cuts both ways: someone who looks fine might be pretty useless but they might be amazing (and they might become even more amazing in an environment with good mentorship). Hiring junior people is high-risk, high-reward, made lower risk (but still high-reward) by the fact that you hire them on a lower salary and often with a probation period. As you get more experienced, it’s much easier to accurately place your ability.

                            People who have demonstrated competence over a period of 20+ years tend for form the backbone of these companies. These people are low-risk, high-reward hires: you know that they’re good because they have a proven track record, including demonstrating the ability to acquire new skills over their career. If, however, someone with 20 years of experience is still as productive as a fresh graduate, then they’re a low-risk, low-reward hire. Why would you hire them if you can hire someone who looks as competent now but may be able to improve over time?

                            1. 2

                              Senior positions, one would hope.

                              Apparently there’s a problematically toxic situation with institutional ageism, but that situation should auto-correct as well as be actively addressed, as the founders and funders of companies realize wisdom comes with age, and how that maps into revenue. Or less lost revenue and less churn.

                              I find the whole idea of not hiring seniors strange, because one rule, maybe the number one rule, of startups is to hire people smarter than you. More experienced than you.

                              Someone in their 50s (or whatever elder age) walks in with a solid resumé, the course of action should be obvious.

                            2. 3

                              Now, what do you suppose happened to that clueless programmer who put in the schemas with raw strings (yes, varchars) galore and didn’t know anything about foreign key relationships?

                              My gut reaction was “I hope they didn’t get promoted”, expecting some Daily WTF conclusion. And then revelation hit me like a wall of bricks.

                              Which says something about me, and this industry, I think.

                              I do try to make a point to remember my foibles and missteps, and share them freely with my colleagues, for better and for worse. I also try to embrace, like this article, the unskilled and/or less skilled parts of my past. I want to normalize a self-aware humility on this front. Yumaikas has been pretty ignorant in the past.

                              adds article on mistakes to the pile of ideas

                              It’s a massive problem, and we’re all partly responsible. I’m trying to take a bite out of it now by writing stuff like this. What about you?

                              I’m trying to mentor folks devs less experienced than I, where they’re receptive to it.

                              1. 3

                                This article didn’t have enough information about the data set for me to form an opinion about whether the fix would actually have been the right way to solve it. The shape of the data is a huge input into the shape of its representation.

                                If they weren’t seeing the same value more than once or twice in one of the columns (e.g., because the spammer was running through a list of recipients, so each TO value only appeared once in the data set) then they may have been better off leaving that mostly-unique column in the main table and covering it with the multi-column unique index.

                                The main thrust of the article is right: data modeling is easy to get wrong especially when you’re just starting out. But the conclusion really ought to be, “Analyze your data and your access patterns to determine how to model it,” not, “Normalization makes queries faster.”

                                1. 11

                                  I think the conclusion of the article was not about any of that at all. It was about it being okay to make mistakes or something.

                                2. 1

                                  From the “update” at the bottom:

                                  Go on, bag on me for being ignorant. I know what that really means.

                                  What does that really mean?

                                  1. 6

                                    I took it as a gender bias comment. This is a successful woman in tech who has faced harsh criticism from her peers, over the years, for being a woman in a male dominated industry.

                                    1. 5

                                      I agreed with you since it made sense, but after reading THE ONE from the other comment, I no longer agree. It seems just a screed against internet trolls who work in positions where they never could break prod or, in this case, make database design decisions.

                                      1. 1

                                        Good catch! I bet we’re both right to some extent, however! ;)

                                      2. 2

                                        And how were we to know the gender of the author just by reading the article?

                                        1. 4

                                          Well, that’s easy. She just wrote this post for you.

                                          1. 1

                                            Touche, missed that line :)

                                          2. 1

                                            They’re quite a well known blogger and their blog is called “Rachel by the bay”.

                                        2. 5

                                          Her post THE ONE, which she links to in the first paragraph, should make it clearer.

                                          1. 3

                                            That the commenter is more interested in putting someone down to make themselves feel/appear better than in actually engaging with the content of the article.

                                            1. 1

                                              I’m not entirely sure, but I hope the author isn’t too harsh on themselves.