1.  

    So we started at $400. What’s the cost of the final build?

    1. 0

      This hits home. Just had this argument with my CIO last week. I never want to hear about Master data management or “one source of truth” nonsense again.

      1. 1

        The author shows these indexes:

        CREATE INDEX i_actor_1 ON actor (last_name);
        CREATE INDEX i_actor_2 ON actor (last_name, first_name);
        

        …and says that you don’t need the index on last_name because the two-column index supports queries on just its first column.

        But wouldn’t the best option be to have separate indexes for last_name and first_name? Then you could query efficiently by either or both. I guess maintaining two one-column indexes is more expensive than maintaining one two-column index?

        1. 2

          Two separate indexes on last_name and first_name is the most flexible solution for the reason you gave. It isn’t true that only one of them is ever used, often both indexes are scanned to create bitmaps which are then merged.

          A single index on (last_name, first_name) can be very useful for queries like “WHERE last_name = ‘Smith’ ORDER BY first_name”. An index like that could even satisfy the query if it only fetches values from the two columns (index only scan).

          1. 2

            Your typical query engine will not use two indexes simultaneously to lookup a value in a table. It will choose just one. Querying on both requires the two columns.

            The index on last_name and first_name is good for predicate including last_name and first_name or just last_name. But it won’t be utilized for first_name except as a possible scan of the index.

            So given the two indexes, you can eliminate the index with just last_name because that’s basically covered in the other.

            However if you had queries with predicate on first_name alone, then a separate index would be beneficial.

          1. 5

            List of changes from postgresql site:

            https://www.postgresql.org/docs/devel/static/release-10.html#idm45202035407008

            There’s a lot here. Going to take awhile to pore over the details.

            1. 13

              @itamarst dude, time to step it up. There’s not much content here, and this is a clickbaity blogspam title. The article doesn’t offer much - one time this thing happened to you, and a nice thing to do when you make software is iterate?

              Probably being a bit harsh, but I don’t come to lobste.rs for articles that are short, sugary and don’t particularly contribute anything other than ‘bad things are bad, good things are good’ with no exploration or development or original content.

              1. 2

                Hey, sorry you didn’t like the article.

                I don’t know what to say about “clickbait title”, I’m just not very good at good titles mostly.

                As far as actual content: iteration is pretty basic, yes, except many people don’t do it. So still seems worth repeating.

                But beyond that… the point I’m trying to get across isn’t quite the same as “iterate”:

                • Iteration is a technique, “start with simple version and flesh it out in each release,” let’s say.
                • Incremental results are a desirable goal, with iteration being one way to achieve that goal.

                And so I think the idea of incremental results is more powerful because it’s applicable more broadly. E.g. sometimes you can achieve incremental results without iteration:

                • If each individual feature provides value on its own then you can incremental results with less iterative, more cumulative development.
                • Incremental results can be structured into the output, e.g. the way Progressive JPEGs render incrementally.

                But perhaps that should all have been in the original article.

                1. 7

                  Your article also doesn’t provide any evidence that what you’re espousing does produce success. Your correct way isn’t actually a success store, it’s a figment of your imagination. It might be correct but there is no way to know based on this article. Maybe your correct way would have resulted in 6mo - 1yr delay for other reasons.

                  1. 2

                    No, but I implemented a program per “the right way” [1]. It’s written in Lua using LPeg (because of the amount of parsing involved) and it has proved to be “mostly fast enough.” It’s only recently (the past two months out of 18 months in production) that a performance issue has popped up (because traffic to the service has vastly increased) and the issue only happens intermittently (and usually don’t last long—by the time it’s alarmed, it’s over). Not a bad thing really, as the code is straightforward and easy to work on [2].

                    [1] It was intended to be a proof-of-concept (at least in my mind), only it ended up in production, without anyone informing me (long story).

                    [2] Thank God Lua has coroutines. It makes writing network code much nicer, as it’s imperative instead of a nest of callbacks with logic spread out all over the place.

                  2. 3

                    As thin as it may seem to some, I wish my boss would read it…

                    1. 4

                      Your boss will read the whole thing and only will remember “But maybe a Python version would have been fast enough.”

                      1. 2

                        ah ah ah, true. Unfortunately.

                1. 1

                  The 32bit version from the postgresql site:

                  https://wiki.postgresql.org/wiki/Pseudo_encrypt

                  Looks like a great approach for this problem and will outperform using UUIDs.

                  1. 3

                    The one not mentioned here is the comment referencing the ticket number. That’s what the vcs is for, put your comment there.

                    1. 24

                      Has anyone else found that making many small helper methods tends to make reading the code much more difficult though? This is something that I spend a lot of time musing about, and writing half-assed tools about (I have a bad scheme clone of imatix’s gsl).

                      When I see a function that is just a list of helper functions, I have to zip around look up the implementation for all of them! Especially when they’re not re-used across the code base, you’re learning a whole set of semantics that are unnecessary for your problem domain.

                      If the top level function was fully documented, then I’d only need to do that when I change something. Which is an improvement, but then it means that changing something is the slowest part, which is the opposite of maintainable code.

                      A function name is only better than a line comment in that it delineates the start and the end of the code it’s referring to, unless it’s reused, but YAGNI if it’s not reused already. You could mark sections within comments like pre-function programming…

                      Ah shit. I think I’m reverse engineering literate programming again.

                      1. 10

                        Has anyone else found that making many small helper methods tends to make reading the code much more difficult though?

                        Depends on the length and complexity of the function.

                        I think it’s much cleaner to have a function calling a list of well named functions each with a specific purpose than many lines of if else switch for loops etc.

                        As far as maintenance, I suspect you know what you’re trying to modify and can go directly to which function is requiring the modification. Why would you need to zip around looking at all of them?

                          1. 8

                            I agree. My understanding is that this is partly based on people applying generally good rules in a way too draconian way. So while it’s great that code quality is something that people look at nowadays and build tools to improve it I think they arrived at a point where they might be either counter productive or too limited.

                            Yeah, it’s good to have short functions that aren’t too complex, but I think everyone has seen code that became clearer when increasing it’s size.

                            My favorite example is cyclomatic complexity, which is a great concept to quantify complexity and give a clue on when it’s worth to separate, however it sometimes can lead to the complexity becoming even more of a nuisance when that complexity is split up, to make the tool checking your code quality happy.

                            I don’t know what the solution is and have seen different bad outcomes of bringing code quality (tools) into companies. Next to justifying insanity by throwing around some quote or some tool, I’ve seen the other side as well, which is starting to use such a tool and instead of taking its input simply configuring it so that the code just passes checks. Other than that I have seen extremely smart ways, just as silly patterns to work around code quality checkers, leading in the code being hard to read, rather than more easy.

                            Something where these tools do very badly is that they are often not made with the programming language in mind and for obvious reasons not with the programming task at hand. So the outcome is usually that writing something like parsers or converters in a readable and efficient way makes most of these tool think code is horribly complex.

                            I think general statements are hard to make. Sometimes I prefer to split up stuff into helpers, sometimes it is way clearer to kind of write “paragraphs” of code with some comment above them, but only when it cannot be reused as it is code that requires what happens before and after and the context/the amount of variables on the stack is both big and specific, but even here I wouldn’t consider that a general rule.

                            A while ago I read an article about testing. If I remember correctly, it was linked here, but I can’t find it right now. It was about how not every test needs to be written or every single line of code needs to be tested and that despite testing being extremely important and something people should do. It still is nonsense to not test code, just like it being nonsense to have big programs in only a few functions.

                            In other words: In my opinion, part of what make a programmer a good/experienced programmer is knowing what the right thing to do is.

                            1. 3

                              My favorite example is cyclomatic complexity

                              So I decided to play with oclint and wrote essentially one function in three different styles. I also snuck a typo in one of them to see if the tool would give a shit (and it wouldn’t). oclint complains about fun2, which is arguably the most straightforward and a reasonably idiomatic way to write this kind of code in C. It does not complain about complexity in the other two functions…

                              It also nags about all the variable names because they are too short. How useless. Here’s the code:

                              int
                              fun1(int op, int l, int r)
                              {
                                  return
                                    op == 1 ? l + r
                                  : op == 2 ? l - r
                                  : op == 3 ? l * r
                                  : op == 4 ? l / r
                                  : op == 4 ? l % r
                                  : op == 6 ? l << r
                                  : op == 7 ? l >> r
                                  : op == 8 ? l == r
                                  : op == 9 ? l != r
                                  : 0;
                              }
                              
                              int /* line 17 */
                              fun2(int op, int l, int r)
                              {
                                  if (op == 1) return l + r;
                                  if (op == 2) return l - r;
                                  if (op == 3) return l * r;
                                  if (op == 4) return l / r;
                                  if (op == 5) return l % r;
                                  if (op == 6) return l << r;
                                  if (op == 7) return l >> r;
                                  if (op == 8) return l == r;
                                  return 0;
                              }
                              
                              int
                              fun3(int op, int l, int r)
                              {
                                  switch (op) {
                                  case 1: return l + r;
                                  case 2: return l - r;
                                  case 3: return l * r;
                                  case 4: return l / r;
                                  case 5: return l % r;
                                  case 6: return l << r;
                                  case 7: return l >> r;
                                  case 8: return l == r;
                                  default: return 0;
                                  }
                              }
                              

                              Summary: TotalFiles=1 FilesWithViolations=1 P1=0 P2=1 P3=9

                              /tmp/ocl/oclint-0.12/bin/cat.c:17:1: high npath complexity [size|P2] NPath Compexity Number 256 exceeds limit of 200

                              1. 1

                                Now suppose our poor programmer had started with fun2 and thought that there was no way to cheat the tool and reduce complexity without breaking the function down into smaller pieces… tada, more code, a layer of indirection, and an extra pair of checks to maintain behavior with the old numbering scheme! And now the tool will happily accept it (well it still nags about names that are shorter than three letters).

                                Only now it’s much harder for somebody reading the code to determine which number matches which operation. Of course one could introduce yet another layer of indirection by replacing the numbers with identifiers, and then you also need to make sure the table & defines stay in sync. If that’s too much work, we can introduce yet more complexity by having another tool generate the table at build time.. anything goes as long as the tool is happy!

                                int op_add(int l, int r) { return l + r; }
                                int op_sub(int l, int r) { return l - r; }
                                int op_mul(int l, int r) { return l * r; }
                                int op_div(int l, int r) { return l / r; }
                                int op_mod(int l, int r) { return l % r; }
                                int op_shl(int l, int r) { return l << r; }
                                int op_shr(int l, int r) { return l >> r; }
                                int op_eq(int l, int r) { return l == r; }
                                
                                int (* const optab[])(int, int) = {
                                    NULL,
                                    op_add,
                                    op_sub,
                                    op_mul,
                                    op_div,
                                    op_mod,
                                    op_shl,
                                    op_shr,
                                    op_eq,
                                };
                                
                                int
                                fun4(int op, int l, int r)
                                {
                                    if (op < 1 || op >= sizeof optab / sizeof *optab )
                                            return 0;
                                    return optab[op](l, r);
                                }
                                
                                1. 1

                                  I think the tool’s right. With fun2 the flow is much less obvious; in fun1 and fun3 it’s immediately clear that only one path will be taken.

                                  1. 2

                                    What in fun2 would even suggest that more than one path may be taken?

                                    And then why does that not apply to fun3? Both are reliant on returns, nicely lined up.

                                    fun1 is the least idiomatic of them all, and while the way I lay it out should give a strong clue about its behavior, it might still take a moment longer for the average C programmer to determine that it does indeed do what it should (the sneaky typo aside).

                                    1. 2

                                      What in fun2 would even suggest that more than one path may be taken?

                                      There are a bunch of different if statements. A priori, any combination of them is possible.

                                      And then why does that not apply to fun3? Both are reliant on returns, nicely lined up.

                                      It does to a certain extent, but less so, because a switch is an immediate indication that only be one path will be taken - fall-through is so rarely desirable that the reader is inclined to assume it won’t happen.

                                      fun1 is the least idiomatic of them all

                                      Interesting, because I found it the clearest / most readable.

                              2. 4

                                I like style C with blocks

                                var x int
                                { // minor function 1
                                    ...code..
                                    x = whatever
                                }
                                

                                my coworker hates it though

                                1. 4

                                  I like to use the same pattern. It’s nice because you don’t have to go around looking for tiny helper functions that only get called once while still properly scoping the lifetime of your variables so readers can forget about them when leaving a scope.

                              3. 8

                                I find the “many small helpers” style unbearable when any of the helpers mutate shared state.

                                Having many short pure functions can still be confusing, but it’s vastly more manageable for me.

                                1. 7

                                  Has anyone else found that making many small helper methods tends to make reading the code much more difficult though?

                                  Yes, I’ve been saying this for a while.

                                  Show me your half-assed tools and I’ll show you mine.

                                  1. 3

                                    NOW THAT’S A WHOLE ASS, not half. That’s awesome!

                                    Notes on yours: Reminds me a lot of the tanglec program that the axiom guy daly came up with. I like your notation a lot, but I’m not sure I like the before/after notation… I’ll take a deeper look at this and try to think more clearly about it. The testing part is fascinating, totally different ideas than I would ever think of.

                                    The iMatix guys take it a different way and say that your literate programs have “models” within them that you can pull out and give some meaning to, by encoding them in well documented XML, which is then expanded into code.

                                    Mine doesn’t support looping correctly yet in the template language, because I don’t want to re-invent monads. but here it is. It doesn’t cover what you’re doing quite, as I’m trying to go with a higher order concept of “models” that the gsl folks came up with.

                                    I have some work locally that’s about joining these two ideas. Scheme as the meta-language, SXML/XML “models” to contain program models, and tangle/weave (with TeX or HTML, not sure yet) for source style. It’s extremely half-baked as there are a lot of missing parts.

                                    1. 4

                                      Thanks! Yes, I’m aware of Tim Daly’s work and we’ve exchanged comments in various places on the internet. I tend to think his emphasis on typography – shared with classic Literate Programming – is a tarpit. But it’s working for him, and I don’t really have complete confidence in my approach yet ^_^

                                      I hadn’t come across the iMatix/gsl approach, do you have a pointer to learn more about it? That’ll help me gain context on your tool. Thanks.

                                      1. 2

                                        I tend to think his emphasis on typography – shared with classic Literate Programming – is a tarpit.

                                        Having tried it, I am inclined to agree – at least for myself, who is susceptible to typography twiddling. On the other hand, are you familiar with Docco, the “quick-and-dirty, hundred-line-long, literate-programming-style documentation generator”? I have had good experiences with it.

                                        Docco (and its clones) take as input ordinary source files containing markdown-formatted comments, and render them to two-column HTML files with comments to the left, and the code blocks that follow them to the right. That encourages writing the comments as a narrative, but limits the temptation to twiddle with typography – it basically lets you keep a nicely read- & skimmable version of the source code at hand at all times.

                                        All of this is far more modest than what you are working on, of course. I’ll be keeping an eye on it!

                                        1. 2

                                          I’m a little familiar with Docco. As you said, it’s a little better because you don’t tweak the presentation endlessly. However, I think it also throws the baby out with the bathwater by dropping the flexibility to reorder parts of a program to present them in the most comprehensible way. There’s a few “quasi-literate” systems like that out there. They’re certainly better than nothing! But I think they have trouble scaling beyond short single-file examples.

                                        2. 2

                                          The best place to see a good example is: https://imatix-legacy.github.io/mop/index.html They are the ones that wrote ZMQ.

                                          The basic idea is simple. Put same data in a serialized hierarchical format, that is also very easy for humans to read (for gsl, this is XML). It’s best if it has a CDATA like feature, for unquoted/raw data.

                                          Next, there is what is called the “template”. This is another format of data, that you apply to your xml files to generate output.

                                          Finally, there is what is called the “script”. This lets you do things like include xml files as subtrees of others, run them through multiple templates, and execute pretty basic code.

                                          Through these pieces, you can generate what gsl calls the model. With a given set of scripts and templates, you can hypothetically reduce the “uniqueness” of your implementation to a series of XML files. zproto and other tools use this to actually generate entire server & client implementations with XML descriptions of your format. You can see a description of how zproto kind of works here : http://hintjens.com/blog:75

                                          So it’s the happy combination of a templating language, some xml / hierarchy manipulation tooling, and a simple scripting language. I then took it into whacko land by saying all of that just sounds like lisp macros with some custom reader settings, so why not implement each of these three pieces as scheme, and add on top my desire for literate programming as part of the “generation” work it does. It’s not very far off the ground yet, but I do think there is something in here. We’ll see.

                                          1. 3

                                            Most interesting. I see the connection you’re drawing with Lisp macros. It sounds like GSL is basically lisp macros that get a tree of data rather than just individual variables. So like lots of nested pattern matching on the s-expression you pass into the macro, so that you can name different parts of it to access them more conveniently. Does this seem right? Now I wish I could read more about it, and that the original series had more posts :)

                                      2. 3

                                        That looks like it ends up equivalent to AOP, which is a nightmare to debug when done badly (and any widely-used programming tool ends up being used badly). I read of someone doing something like this in Ruby that was allegedly highly maintainable - writing the simple version of a function, thoroughly testing it, then locking it down and using monkeypatching to handle cross-cutting concerns rather than complicating the original function - but it’s so contrary to my experience that I struggle to believe it.

                                        What I prefer is something like the “final tagless” style - just enough inversion of control, write your function in terms of commands and you can invoke the same function in either a simple or a complex context, provided it offers those commands.

                                        1. 3

                                          AOP is very different from MOP in terms of actual day-to-day usage and thought pattern, but I think you’re entirely right that “how will it be used badly?” is a great pattern to try and compare approaches.

                                          One part of AOP that makes things more difficult is there is no way to see the steps/expansions visually anywhere other than with tooling. With MOP (largely, I’m only talking about gsl), it’s something explicit in your templates, and you can run iterations of gsl to see what is generated very easily. Hypothetically you could use gsl to implement AOP-style programming, but most of the culture around MOP is to focus on the “model” that is repeated in your business logic (think state machines) and making them easier to see. AOP is much more about taking large cross cutting things like logging or error handling and rolling them up into these aspects.

                                          As I’m thinking more about this though, I’m seeing more of your point. It’s a slippery slope to just munging crap everywhere.

                                        2. 3

                                          This is extremely interesting! I am a big fan of literate programming because it allows me to create a narrative for understanding my program, but LP is not without its problems: you often don’t get access to your regular development tools (syntax highlighting and auto-indenting for your language, integration with IDE-like utils such as Racer for Rust) and this makes development more difficult and painful.

                                          Your idea allows the creation of a similar narrative—order the fragments in an order that tells the story to the reader—but also gives the programmer access to his usual coding environment, save perhaps for a quick fix to allows for the directive at the top.

                                        3. 4

                                          A function name is better in another way: it doesn’t become outdated as easily as a comment.

                                          Has anyone else found that making many small helper methods tends to make reading the code much more difficult though?

                                          I have found it makes reading the code somewhat more difficult, on the first read through, iff I need to thoroughly understand or review all the code. Once I’ve done that, it makes reading the code easier, because I don’t need to parse every line to understand what it does: it tells me what it does and I know I can trust the name to be accurate.

                                          If I only need to fix a bug and know I can trust the author, I know I don’t need to check all the implementations.

                                          If you constantly need to check all the implementations, then it indeed only hurts to make many small well named methods. But I assert that means you are doing something wrong.

                                          1. 2

                                            look up the implementation? I’m confused why wouldn’t the implementation be in the function which contains the helper functions?

                                            1. 2

                                              Look at the link that mikejsavage posted above for the “helper” functions I’m thinking of.

                                              Putting the functions inline inside a function is only useful in languages that support that style (which c++ does wonderfully now), but I’m thinking about a slightly higher point around documentation.

                                              1. 1

                                                Oh so our definitions of helper functions are entirely different things o__o. cool. I guess author did say helper methods, not helper functions.

                                            2. 1

                                              Nope. Small helpers make it easier to read, provided they’re well-named and well-typed. As @danielrheath says, if they mutate state they’re very hard to understand - but don’t do that, thread state through explicitly instead.

                                            1. 2

                                              You might have noticed that the LOWER part of the SQL query isn’t represented in my unique index. In my case, I was already normalizing the data stored, so that bit of logic was redundant.

                                              Not sure I understand what they’re saying here. Why was the lower not required in the constraint? Now it looks like you can add two names with different casing.

                                              Don’t you need the lower?

                                              CREATE UNIQUE INDEX ON repos ((LOWER(name)));

                                              Plus with this you wouldn’t need to change the model because it’d be optimized for the previous call.

                                              Edit: You’re probably setting name to lower on save.

                                              1. 2

                                                Edit: You’re probably setting name to lower on save.

                                                Before validation actually.