1. 5

    I like several bits of this and dislike several other bits, but one thing that stands out to me is the seemingly continual refusal to place responsibility on the people consuming and using tech:

    When software encourages us to take photos that are square instead of rectangular, or to put an always-on microphone in our living rooms, or to be reachable by our bosses at any moment, it changes our behaviors, and it changes our lives.

    Users volunteer for this. Most of the people taken advantage of by tech do so by sleepwalking, like lemmings, into the grim meathook future some of us create to monetize them. Nobody holds a gun to their head and says “Put Amazon Echo in your house or you get shot by the Bezostruppen.” Nobody says “Hey you should totally enter a multiyear contract for this smartphone that will bleed you dry and spy on you instead of using a cheapo burnerphone or else we will put you in jail.” There is no national law that says “Cititzen, you must participate in the two-minutes hate on Twitter or else your voting privileges will be revoked.”

    There is no end of the trouble we get into if we ignore the actions, the real actions, that got us here.

    1. 11

      Interestingly, lemmings don’t actually walk to their death as their environment typically only contains lakes they can swim across. Put them in front of an ocean though…

      Which is actually the perfect metaphor. Put people into environments they are unfamiliar with, maladapated to, and unable to even ask the right questions and it isn’t all that surprising that they won’t ultimately act in their interests.

      But I guess we can blame the people who software hurts for being hurt by that software, which they couldn’t hope to understand without deep study.

      1. 4

        When there is minimal consumer choice, it’s hard to blame consumers for making the wrong choice. Robust consumer choice would be something close to feature-by-feature optionality: smartphones without spy powers, or only with photo spy powers, for example. In point of fact, even a smartphone with a hardware keyboard is a non-option nowadays.

        This is in due in no small part to the limits and strengths of mass manufacturing: if everyone buys the smartphone that’s good for 51% of the people, we all enjoy a better phone for less money – but that puts the power of feature selection out of consumers’ hands. They get the phone that the designers designed: take it or leave it.

        The responsibility – moral and otherwise – for those features rests squarely with those who made the phone, not those who bought it.

        1. 4

          Nobody says “Hey you should totally enter a multiyear contract for this smartphone that will bleed you dry and spy on you instead of using a cheapo burnerphone or else we will put you in jail.”

          No, sure. But (to take just this example) the contract and the undeniable benefits of the smartphone, obviously without any reference to any potential downsides, are what’s advertised, sold, heavily pushed, to the extent that many won’t even realise there’s an alternative - and when availability of the features and capabilities provided are normalised to the extent that getting by without them involves significant extra effort, then in the majority sections of world outside of “people who understand, and can either afford or have to spend significant parts of their time understanding, technology”, that’s effectively all that exists.

          1. 1

            Exactly. Ill add this is true even when the constraints between two solutions are similar enough that the safer/quality/free-er one requires no sacrifice or less. Getting people to switch from texts to IM… important since texts were a downgrade from IM (esp with delays)… was hard despite equivalent usability, better thing being free, better thing having more features (optional though), some being private, and so on.

            An uphill battle even when supplier went above and beyond expectations making a better product for them. Usually laziness or apathy was reason when other factors were eliminated.

          1. 2

            But abandoning the comfort that comes with displacing damage into distant landscapes also means reckoning with the convenient poetry of magic dust and the idea that there is anything unique or rare about an age fueled by colonialist fictions and extractive regimes. If anything about this age is rare, perhaps it is the possibility that our fraught networked systems have finally reached such a unique point, with their environmental and social consequences so visibly intertwined, that they have become impossible to ignore.

            Accepting this at face value it seems we must say: the more things change, the more they stay the same.

            1. 2

              Does a change like this reduce or increase contention in for SERIALIZABLE transaction isolation? Or does it have no effect?

              1. 2

                not really the point of the article, but the one thing i’ve never found a good tool for is deploying my application. its something that i absolutely don’t want to build, but keep on reinventing for every project i work on.

                1. 2

                  I’d like to solve this problem, but I have very strong opinions about how it should be done.

                  1. 2

                    Write down the problem/requirements in a blog post and submit it here. I love to read about unsolved problems. Maybe someone even knows a solution.

                    1. 1

                      Having written these things for a few startups (Instacart, Airbnb) I can say, it’s tough to make a clean API for it and that makes generality, reusability and consistency (all requirements for a “tool” instead of a “solution”) very difficult.

                    1. 7

                      The Big Co’s of the world are rife with this problem. Even worse, in the meetings I attend I know who these people are, but there is little I can do about it. I was really hoping the countermeasures section was written as part of this post, because I’d really like some idea here, other than just bluntly calling out the saboteurs.

                      1. 4

                        The trick is to establish consensus on what is to be done before the meeting, and make sure the chair is on board.

                        1. 1

                          Well, having an agenda and goals are important for every meeting…but when a saboteur derails a group, even for a moment, getting back on track to complete the stated purpose within the timebox is not easy.

                          1. 1

                            Not with a capable chair.

                        2. 3

                          It’s probably something well beyond the Big Co’s of the world. My wife recognized these as a teacher in a small department in a mid-sized school. I recognize these tactics from a variety of different jobs in manual labour.

                          1. 1

                            Sometimes making the same “reasonable” suggestions, or amplifying them, puts them in a spot where they can’t go forward and so, to maintain momentum, have to go back.

                          1. 2

                            This is a two cultures thing.

                            In San Francisco there are many small companies where interesting product work and good taste count for so much relative to the fundamentals.

                            1. 2

                              I wonder if ARC is good enough for the kinds of programs Raymond is discussing.

                              1. 3

                                There is some innate affinity for computer programming which you must be born with, and cannot be taught

                                It’s hard to say whether this is true or false, or even what people believe.

                                On the one hand, people are not all equally good at everything.

                                On the other, even if they were, becoming good at things takes time; and if that length of time is long enough than it is hard to say what the practical difference is between not having an innate ability and simply not having the skill at present.

                                1. 2

                                  This code speaks to Haskell’s really strength – the ability to author clear, re-useable abstractions. It is only half in jest that Simon Peyton-Jones called Haskell “…the world’s most beautiful imperative language.”.

                                  The strength of Haskell’s abstracting power can also be a weakness. The author writes:

                                  …this site exclusively uses the async package and the stm package for concurrency. Concurrency is represented explicitly and used in IO.

                                  And:

                                  …in order to make it easier for non-Haskell users, several famous utility functions and excessive point-free style are avoided.

                                  You can do everything you do in Go in Haskell, in a very similar way; but you can also do it in a totally different way; and it’s hard to say what you’ll see in the wild.

                                  The fact that Haskell doesn’t bake in try ... catch doesn’t mean you won’t see code with exceptions. For awhile, what it meant was that one saw many similar-but-not-the-same-approaches to this kind of error handling – and anyone could code their own.

                                  1. 1

                                    This makes me appreciate zero values in Go. Instead of having to write a builder, I’d just declare a variable and set fields on it. I realize Rust insists on explicit initialization, but you could at least approximate that here: just write a function that returns a struct and set fields on that. What’s the advantage of the with_whatever methods?

                                    1. 14

                                      If this is your use case, Rust has a protocol around the Default trait and field initialization syntax.

                                      #[derive(Default)]
                                      struct Foo {
                                        field1: u32,
                                        field2: u32
                                      }
                                      
                                      fn main() {
                                        let foo = Foo {
                                          field1: 1,
                                          .. Foo::default()
                                        };
                                      }
                                      

                                      Alternatively, you can implement Default on that.

                                      impl Default for Foo {
                                        fn default() -> Foo {
                                          Foo { field1: 3, field2: 2 }
                                        }
                                      }
                                      

                                      The advantage of Builders is that they are lazy and can be passed around. So, for example, I can have a library that pre-builds requests in a certain fashion and then hand them off to a user defined function that sets additional data.

                                      1. 1

                                        I don’t get it. I can pass around an object and set fields on it too, what’s the advantage?

                                        1. 10

                                          Struct fields in Rust are private by default, so having it leave scope might lead to people not being allowed to assign those fields.

                                          Additionally, if forgot this: Rust is a generic language and patterns like the following aren’t uncommon.

                                          fn with_path<P: AsRef<Path>>(&mut self, pathlike: P) {
                                           ....
                                          }
                                          

                                          Which means that the method takes anything that can be turned into a (filesystem)-Path. A string, a path, an extensible pathbuffer, etc.

                                          It enables you strictly more things to do. Yes, at the cost of some verbosity, which you can avoid in simple cases.

                                          1. 5

                                            A TBuilder doesn’t type check as a valid T object. This is the real value of the builder pattern (in Rust and Go and Java and…Haskell): one can write a fairly strict definition for T, and wherever one has a function that accepts a T, one can be sure that the T is fully constructed and valid to use. The TBuilder is there for your CLI parser, web API, or chatbot to use, while stitching together a full object from defaults+input, or some other combination of sources.

                                            Distinguishing between T and TBuilder prevents a partial object from masquerading as a full object.

                                        2. 7

                                          This is an orthogonal thing for the most part. For example, sometimes I use builders in Go when the initialization logic is more complicated.

                                        1. 5

                                          The post in question Big-O: how code slows as data grows

                                          The comment by ‘pyon’:

                                          You should be ashamed of this post. How dare you mislead your readers? In amortized analysis, earlier cheap operations pay the cost of later expensive ones. By the time you need to perform an expensive operation, you will have performed enough cheap ones, so that the cost of the entire sequence of operations is bounded above by the sum of their amortized costs. To fix your list example: a sequence of cheap list inserts pays the cost of the expensive one that comes next.

                                          If you discard the emotion, he gives out a fairly interesting additional note about what amortized analysis means. Instead of giving the information value, Ned reacts on the part that questions his authority. Such a brittle ego that puts you to writing a small novel worth’s of rhetoric instead of shrugging it off. Childish.

                                          1. 52

                                            If @pyon had just phrased the first part of the comment like “You’re making a number of simpliications regarding “amortization” here that I believe are important…” this would probably not have escalated. This is what Ned means with being toxic - being correct, and being a douche about it.

                                            1. 10

                                              Indeed; the original article appeared on Lobsters and featured a thoughtful discussion on amortization.

                                              1. 2

                                                I wonder whether better word choice without changing the meaning would help one step earlier: the original post did include «you may see the word “amortized” thrown around. That’s a fancy word for “average”», which sounds a bit dismissive towards the actual theory. Something like «Notions of ‘‘amortized’’ and ‘‘average’’ complexity are close enough for most applications» would sound much more friendly.

                                                (And then the follow-up paints the previous post as if it was a decision to omit a detail, instead of a minor incorrectness in the text as written, which can be (maybe unconsciously) used to paint the situation as «correctness versus politeness», and then options get represented as if they were mutually exclusive)

                                                1. 4

                                                  I feel like that would have put the author in a more defensible position on this specific point, yes. Being clear about where additional nuance exists and where it doesn’t is something that anyone writing about technical subjects should strive for, simply because it’s useful to the reader.

                                                  I don’t think it’s likely that that clarification would have much of an effect on most readers, since the hypothetical reader who’s mislead would have to study complexity theory for some years to get to the point where it’s relevant, and by that time they’ll probably have figured it out some other way. We should all be so lucky as to write things that need several years of study before their imperfections become clear. :)

                                                  But more to the point, while I can’t know anything about this specific commenter’s intent, somebody who’s determined to find fault can always do so. Nobody is perfect, and any piece of writing can be nit-picked.

                                                  1. 1

                                                    Several years sounds like an upper bound for an eventually succesful attempt. A couple months can be enough to reach the point in a good algorithms textbook where this difference becomes relevant and clear (and I do not mean that someone would do nothing but read the textbook).

                                                    I would hope that the best-case effect on the readers could be a strong hint that there is something to go find in a textbook. If someone has just found out that big-O notation exists and liked how it allows to explain the practical difference between some algorithms, it is exactly the time to tell them «there is much more of this topic to learn».

                                                    These two posts together theoretically could — as a background to the things actually discussed in them — create an opposite impression, but hopefully it is just my view as a person who already knows the actual details and no newbie will actually get the feeling that the details of the theory are useless and not interesting.

                                                    As for finding something to nitpick — my question was whether the tone of the original paragraph could have made it not «finding» but «noticing the obvious». And whether the tone may have changed — but probably nobody will ever know, even the participants of the exchange — the desire to put a «well, actually…» comment into the desire to complain.

                                                    1. 3

                                                      Not having previous familiarity with this subject matter, I was guessing at how advanced the material was. :)

                                                      I agree about your best case, and that it’s worth trying for whenever we write.

                                                      I’ve never found anything that avoids the occasional “well, actually”, and not for want of trying. This is not an invitation to tell me how to; I think it’s best for everyone if we leave the topic there. :)

                                                      1. 1

                                                        I consider a polite «well, actually» a positive outcome… (Anything starting with a personal attack is not that, of course)

                                              2. 25

                                                It’s possible to share a fairly interesting additional note without also yelling at people. Regardless of what Pyon had to say, he was saying it in a very toxic manner. That’s also childish.

                                                1. 5

                                                  Correct. But I don’t just care about the emotion. I care about the message.

                                                  Instead of trying to change web into a safe haven of some kind, why not admire it in its colors? Colors of mud and excrete among the colors of flowers and warmth, madness and clarity. You have very little power over having people get angry or aggressive about petty things. Though you can change a lot yourself and not take up about everything that’s said. Teaching your community this skill is also a pretty valuable in life overall.

                                                  1. 32

                                                    I don’t want my community to be defined by anger and aggression. I want beginners to feel like they can openly ask questions without being raged or laughed at. I want people to be able to share their knowledge without being told they don’t deserve to program. I want things to be better than they currently are.

                                                    Maintaining a welcoming, respectful community is hard work and depends on every member being committed to it. Part of that hard work is calling out toxic behavior.

                                                    1. 5

                                                      I want beginners to feel like they can openly ask questions without being raged or laughed at.

                                                      While I agree this is critically important, it’s not entirely fair to conflate “beginners asking questions” and “people writing authoritative blog posts”.

                                                    2. 10

                                                      Yeah. That kind of self-regulation and dedication to finding signal in noise are endlessly rewarding traits worth practicing. And to extend your metaphor, we weed the garden because otherwise they’ll choke out some of the flowers.

                                                      1. 5

                                                        But I don’t just care about the emotion. I care about the message.

                                                        I’m with you unless the message includes clear harm. I’ll try to resist its affect on me but advocate such messages are gone. That commenter was being an asshole on top of delivering some useful information. Discouraging the personal attacks increases number of people who will want to participate and share information. As Ned notes, such comment sections or forums also get more beginner friendly. I’m always fine with a general rule for civility in comments for such proven benefits.

                                                        Edit: While this is about a toxic @pyon comment, I think I should also illustrate one like I’m advocating for that delivers great information without any attacks. pyon has delivered quite a lot of them in discussions on programming language theory. Here’s one on hypergraphs:

                                                        https://lobste.rs/s/cfugqa/modelling_data_with_hypergraphs#c_bovmhr

                                                        1. 5

                                                          I personally always care about the emotion (as an individual, not as a site moderator), it’s an important component of any communication between humans. But I understand your perspective as well.

                                                          1. 3

                                                            I may have been unclear. I do too. I was just looking at it from other commenters’ perspective of how Id think if I didnt care about it but wanted good info and opportunities in programming sphere. Id still have to reduce harm/toxicity to other people by ground rules to foster good discussion and bring more people in.

                                                            So, whether emotional or not, still cant discount the emotional effect of comments on others. Still should put some thought into that with reducing personal attacks being among easiest compromise as they add nothing to discussions.

                                                            1. 2

                                                              Ah! Okay. I misunderstood then, and it sounds like we’re in agreement.

                                                        2. 4

                                                          It’s ridiculous to say that if someone cannot ignore personal attacks, they have a brittle ego and are childish. While also defending personal attacks and vitriol as being the thing that we should celebrate about the internet. Rather, we should critique people for being assholes. The comment was critiquing the manner and tone in which he explained amortized analysis, but he’s not allowed to say that the comment’s manner and tone was bad? It’s ridiculous. The comment was bad, not because of the point it made, but because it made the point badly.

                                                      2. 22

                                                        Compare this approach:

                                                        I believe this post simplifies the idea incorrectly. In amortized analysis, earlier (cheap) operations pay the cost of later (expensive) ones. When you need to perform an expensive operation, you will have performed enough cheap ones that the cost of the entire sequence of operations is bounded by the sum of their amortized costs. In the context of your list example, a sequence of cheap list inserts would pay the cost of the expensive one that comes next.

                                                        This is the same content, free of “shame” and accusations of “misleading.” The original comment is a perfect example of the terrible tone that people take, as discussed in this post and in my previous post of Simon Peyton-Jones’ email.

                                                        1. 4

                                                          Instead of giving the information value, Ned reacts on the part that questions his authority.

                                                          The author does give it value. You’ve missed the point. The author isn’t saying it’s incorrect or not valuable; he’s saying that this attitude from experts (who use their expertise as a tool to put others down) is highly toxic.

                                                          1. 4

                                                            If you discard the emotion, he gives out a fairly interesting additional note about what amortized analysis means. Instead of giving the information value, Ned reacts on the part that questions his authority.

                                                            It’s not clear that Ned interprets pyon as questioning his authority. His criticism is of pyon‘s tone, which is histrionic. The cutting intro isn’t bad if we discard it; but what is the effect if we include it? It would be more balanced for Ned to discuss the details and value of pyon’s post, but that does not invalidate Ned’s point.

                                                          1. 1
                                                            1. 2

                                                              Their discussion of why kernels need multiple mutable references and how this is interacts with linear types — how it requires a work around — was a highlight.

                                                              The unsafe part of TakeCell is curious: https://github.com/helena-project/tock/blob/master/kernel/src/common/take_cell.rs#L52

                                                              It seems like double access to a TakeCell would result in one thread/holder “doing nothing” (and having to retry?).

                                                              1. 1

                                                                It seems like double access to a TakeCell would result in one thread/holder “doing nothing” (and having to retry?).

                                                                Yes, the map call will return an error so failure will be known. It is not explicitly stated but the authors clarified it on reddit.

                                                              1. 12

                                                                Slides, for those who don’t want to watch a 40 minute video, though it is an enjoyable talk.

                                                                1. 7

                                                                  The seven:

                                                                  1. Noisy Code
                                                                  2. Comments
                                                                  3. Unsustainable Spacing
                                                                  4. Lego Naming
                                                                  5. Unencapsulated State
                                                                  6. Getters and Setters
                                                                  7. Uncohesive Tests
                                                                1. 16

                                                                  This is an amazing effort. Rust’s approach to community involvement sets a high bar.

                                                                  1. 9

                                                                    It really does. With my memory problems, all I can say is it’s possibly the best I’ve seen on that.

                                                                  1. 4

                                                                    McIlroy’s critique seems intellectually dishonest. One would not accept his solution in a take home coding test – it’s like when we ask someone to implement line splitting and they use .split().

                                                                    Of course what he does is shorter; but not all programs can be that short, merely because this toy example can. Such a program would be a poor example of literate programming, but is also a poor example of how to handle complexity in general. When you actually write complicated things in shell you quickly find out that these components don’t always reuse, and the need to author, explain and organize your own components rapidly outstrips the facilities available in shell.

                                                                    1. 5

                                                                      “Intellectually dishonest” ? I’d give points for using .split unless the assignment spelled out what could be used.

                                                                      1. 7

                                                                        That is another way of clarifying the distinction. Knuth didn’t set out to write the shortest or most production worthy program, but rather to demo Web on a simple example program. McIlroy’s answer is to a different question; and his sleight of hand is in equating the two.

                                                                      2. 3

                                                                        not all programs can be that short

                                                                        Why not?

                                                                        I’ve seen small databases, small web servers, small programming languages…

                                                                        1. 3

                                                                          Just because one iteration of an idea can be small, doesn’t mean all useful iterations will be.

                                                                          1. 1

                                                                            Of course not, but that’s stupid. Who cares if “all useful iterations” will be: One could iterate on an idea and produce a massive steaming pile of dogshit simply because they’re a shit programmer.

                                                                            What I really care about is whether all business problems can be solved with small programs, and given that the smallest database is also the fastest and most featureful I’m inclined to believe that it is.

                                                                            1. 3

                                                                              I obviously meant that not all business problems can be solved with small programs.

                                                                              What database is the smallest, fastest, and most full featured?

                                                                              1. 2

                                                                                I obviously meant that not all business problems can be solved with small programs.

                                                                                Right. I understand this is a prevailing thought, but I’m not convinced.

                                                                                What database is the smallest, fastest, and most full featured?

                                                                                kdb

                                                                                1. 1

                                                                                  Sigh. I thought you were going to say kdb, but I thought I’d ask in case you had a novel or interesting answer. It’s incredibly niche, anyone familiar with its featureset would plainly know that it’s not general purpose.

                                                                                  1. 0

                                                                                    I’m using Kdb as a (CRM) database.

                                                                                    I’m also using it for time series (yes), and as an application server for a real-time bidding system.

                                                                                    I’ve got a unstructured data ingestion system on Kdb.

                                                                                    I’ve even got a full text search running on Kdb.

                                                                                    I know people doing GIS with Kdb.

                                                                                    Not sure what your definition of “general purpose” is, but it certainly meets mine.

                                                                                    1. 3

                                                                                      For the most part discussions I’ve had about kdb have been overly religious for my taste, and I’m not about to get in another holy war. I’m glad kdb works for you, but implementing features like FTS and GIS on top of kdb yourself doesn’t mean kdb has those features.

                                                                                      1. 1

                                                                                        I don’t know about that. I think that the ability to implement them in KDB does – I’m just writing queries here, and that’s important.

                                                                                        You can call this a potato if you want, but I’d say postgresql can do GIS queries as well even though someone had to write them in C and link them in as a externally shipped tool.

                                                                                        Today I wanted to index a table by a cuckoo hash. I can’t imagine the mess of SQL needed to do such a thing, and it’s about five lines long in kdb. Doing that in postgresql would be very invasive.

                                                                                        1. 2

                                                                                          That’s neat. Would you care to share those 5 lines?

                                                                                          1. 2

                                                                                            Sure. Unoptimised version follows:

                                                                                            pos:{[t;n] k:count[t] div 2;raze (0,k)+\:((n mod 16),(n div 16)) mod k}
                                                                                            hash:{last (md5 "c"$ -18!x) except 0x00}
                                                                                            add:{ [table;text] n:hash text; if[-11h=type table; :table set ins[get table;n]; ]; :ins[table;n]; };
                                                                                            ins:{ [t;n] i:pos[t;n]; j:i rand where t[i]=0x00; if[j<>0N; :@[t;j;:;n]; ]; j:i@rand where t[i]<>n; if[j<>0N; m:t j; p:pos[t;m]; q:p@rand where t[p]<>m; if[q<>0N; :@[t;j,q;:;n,m]; ]; :@[t;j;:;n]; ]; :t; };
                                                                                            check:{ [table;text] n:hash text; t:$[-11h=type table;get table;table]; :any t[pos[t;n]]=n; };
                                                                                            

                                                                                            … but it’s still fast enough to check things out.

                                                                                            1. 2

                                                                                              I think I’ve got you beat on brevity:

                                                                                              create table "table" ( "text" varchar);
                                                                                              create index hash_index on "table" using hash("text");
                                                                                              

                                                                                              May not a be cuckoo hash, but I’m not convinced that matters. Except possibly in a highly specialized, niche application. I doubt you wrote this for your CRM, for example.

                                                                                              1. 2

                                                                                                I think I’ve got you beat on brevity:

                                                                                                Oh if I just want a regular hash/index I can use the g or s properties (sorted is fine). Indeed that’s what I benchmark this problem with. The exact syntax would be:

                                                                                                table:([] id:`s#`sym$(); date:`s#`date$() acct:`acct$())
                                                                                                

                                                                                                I doubt you wrote this for your CRM, for example.

                                                                                                A CRM is a component of this application.

                                                                                                So there’s an attribution component where I’ve got ~2m accounts that I want to connect to some set of around ~1m ids. You might imagine it’s:

                                                                                                id date -> account[]
                                                                                                

                                                                                                and indeed the trivial

                                                                                                create table (id varchar, date date, acct varchar);
                                                                                                

                                                                                                is fine, but it’s a chunky index – around 10-16GB per day – that I’d have to build and for this use case I don’t need exact answers: it’s okay to select a few acct that don’t have the id, and given the number of processes that need this data built, maybe something that uses around 20MB might be worth experimenting with.

                                                                                                1. 2

                                                                                                  I don’t think I have adequate information about the problem to discuss more in depth. Only, I could store a year of that index (6T) on the cloud for less than 2 engineer-weeks of USD. So if your optimization took longer than 2 weeks cumulative to design, implement, test, be used by other engineers effectively, per year, then it’d be a waste. Moreover, I suspect there is a trivial strategy to represent that index in a smaller way. You have string identifiers for a few million rows, you could easily use an int32 id and save space. Or maybe you can’t. I don’t know enough about your application. I also don’t know what you mean by number of processes that need the data built. If a distinct 16 gb index is built for each process running, I can see how that would explode the size. But I don’t see anything else in what you said to indicate whether that’s the case. Again, not enough info.

                                                                                                  I don’t really think it’s productive to continue this discussion. You like kdb and I’m happy for you. But not many people can just implement, e.g., a reasonable GIS system for their application. And though I could, I don’t want to. I’ll use PostGIS or something else off the shelf unless I have a compelling reason to go out of my way to make a custom solution. Because all that bloat you talk about, that’s functionality I don’t even know I need yet, but will a year down the line when the scope of my application expands. It’s subtle logic handling edge cases that might have otherwise wasted a lot of my time.

                                                                                                  That’s why I say kdb is niche, it’s for people who actually will derive financial value out of doing stuff like that themselves. That’s tremendously uncommon.

                                                                                                  1. 1

                                                                                                    That’s why I say kdb is niche, it’s for people who actually will derive financial value out of doing stuff like that themselves. That’s tremendously uncommon.

                                                                                                    Okay, but I’m not arguing it’s not niche. I said that “all business problems can be solved with small programs” and you said they can’t.

                                                                                                    That people are okay with big programs is a (possibly) unrelated issue.

                                                                                                    1. 2

                                                                                                      Solving a problem with limited time and developer resources is a business problem.

                                                                                                      1. -1

                                                                                                        I don’t agree “limited time and developer resources” is a business problem that could be solved better with a big program than a small program.

                                                                                                        That actually sounds absolutely absurd to me, so I assume you must mean something else, but I can’t imagine what it might be.

                                                                                                        1. 3

                                                                                                          I do mean that. Using software that supports doing what you need saves time and developer resources. Suppose a team needs a few different unrelated features. In the database world, there are many large general purpose databases that support a ton of features, including the features the hypothetical team needs. But they aren’t terribly likely to find a database that supports ONLY those few unrelated features. And in the interest of saving time and developer resources, they could reasonably choose that large general purpose database over implementing those features themselves on top of a small but highly customizable database.

                                                                                                          And on a meta level, building a large general purpose database program solves a business problem: build a database that a ton of different teams can use, so you can sell it a ton of times. The fact that you technically can implement your own GIS on kdb isn’t all that compelling if I’m looking to buy a GIS database. I can implement my own GIS on a lot of things, the point is I don’t want to.

                                                                                                          I said that “all business problems can be solved with small programs” and you said they can’t.

                                                                                                          Perhaps “can’t in practice with realistic constraints” is more accurate. Can all business problems be solved with small programs given unlimited resources and top developers? Who fucking cares? That’s not how real life works. No one choses Walgreens vs CVS based on how many lines of code those companies execute to conduct business. If refining, specializing, and minimizing their code size made them more money somehow, then maybe their business problems would be better solved with small programs. But they probably get more value per dollar out of mixing and matching large generic programs. More value per dollar is better in a business context.

                                                                                                          There is a point where specializing becomes more effective than mixing and matching large general purpose code, but that point isn’t “always every time for any business problem.”

                                                                                                          1. 1

                                                                                                            This is all over the place and I’m not sure how to respond. I’m not even really sure what you’re saying.

                                                                                                            Why exactly do you think that a problem like GIS requires a large program, when we can clearly see a solution with a small one?

                                                                                                            there are many large general purpose databases that support a ton of features

                                                                                                            There are also small general purpose databases that support a ton of features.

                                                                                                            Not sure what your point is.

                                                                                                            Can all business problems be solved with small programs … Who fucking cares?

                                                                                                            I do. There is significant value in small programs: They have less bugs, they are easier to read and write and they run faster. I find programs that are correct and fast to be more valuable than programs that aren’t, and I can’t imagine there’s a business that thinks otherwise that will last very long.

                                                                                                            There’s other things in your post that I don’t really understand your point. It’s not clear if you disagree with me or where you disagree. It almost seems like you’re angry about something – maybe this religious point you mentioned earlier – that has nothing to do with me.

                                                                                                            1. 2

                                                                                                              Why exactly do you think that a problem like GIS requires a large program, when we can clearly see a solution with a small one?

                                                                                                              That’s not what I’m saying. I’m saying a program that supports GIS, and a bunch of other unrelated features so as to be general purpose, will be large. One such program is PostgreSQL.

                                                                                                              It’s not clear if you disagree with me or where you disagree.

                                                                                                              I agree small programs are good. I disagree that every problem can be solved with small programs.

                                                                                                              It almost seems like you’re angry about something

                                                                                                              No, just frustrated that this discussion is going exactly the way I expected, and that I should have known better and not gotten involved.

                                                                                                              that has nothing to do with me.

                                                                                                              It has everything to do with you. I feel I have dangled my point in front of your face, and you are perfectly capable of understanding but have refused to do so.

                                                                                                              So here it is laid out:

                                                                                                              Thesis: not all problems can be solved with small programs.

                                                                                                              Example: I do not believe the problem solved by PostgreSQL could be solved by a small program.

                                                                                                              Problem solved by PostgreSQL: saving time and developer resources by providing a general purpose, many featured solution usable immediately by a wide variety of teams. Contrast with kdb, which requires implementing desired features on top of it.

                                                                                                              Make sense?

                                                                                                              1. 1

                                                                                                                I’m saying a program that supports GIS, and a bunch of other unrelated features so as to be general purpose, will be large. One such program is PostgreSQL.

                                                                                                                PostgreSQL ships GIS as an add-on.

                                                                                                                Same as with kdb.

                                                                                                                Thesis: not all problems can be solved with small programs.

                                                                                                                “All problems” isn’t important.

                                                                                                                You can always invent a problem that cannot be solved by a small program, such as “needs to be a big program.”

                                                                                                                All business problems is a little better, and while still open to a certain amount of shenanigans, if you’re not intellectually dishonest you’ll get something out of the argument.

                                                                                                                Shit like this:

                                                                                                                Example … Problem solved by PostgreSQL: saving time and developer resources by providing a general purpose, many featured solution usable immediately by a wide variety of teams. Contrast with kdb, which requires implementing desired features on top of it.

                                                                                                                are counterproductive. “general purpose” is met:

                                                                                                                • having a range of potential uses or functions; not specialized in design.

                                                                                                                however:

                                                                                                                • many featured solution usable immediately by a wide variety of teams

                                                                                                                is weasel words. Define exactly what you mean by this. How many varieties of team is “wide enough”. How many features is “many featured enough”? I’m certain whatever number you choose we can simply implement that many with kdb and close this point off.

                                                                                                                finally:

                                                                                                                • Contrast with kdb, which requires implementing desired features on top of it.

                                                                                                                … like GIS using PostGIS.

                                                                                                                1. 1

                                                                                                                  PostgreSQL ships GIS as an add-on.

                                                                                                                  Same as with kdb.

                                                                                                                  I was not aware, you made it sound like your friends implemented GIS on kdb. If kdb has a fully featured GIS plugin that ships with the distribution, then I stand corrected—for this feature. To define fully featured for you, lets go with this, in particular sections 8.8. Operators, 8.9. Spatial Relationships and Measurements, and 8.11. Geometry Processing.

                                                                                                                  many featured solution usable immediately by a wide variety of teams

                                                                                                                  Define exactly what you mean by this.

                                                                                                                  It defines itself.

                                                                                                                  • has many features
                                                                                                                  • usable immediately by a wide variety of teams

                                                                                                                  And I was implying that it’s usable immediately by a wide variety of teams because it has many features, GIS being one example. For another example, generalized inverted indexes on hierarchical document values like JSON. Although perhaps there is also a kdb plugin that ships with the distribution and provides generalized inverted indexes?

                                                                                                                  I’m certain whatever number you choose we can simply implement that many with kdb and close this point off.

                                                                                                                  But they aren’t there already, which is the entire point.

                                                                                                                  Contrast with kdb, which requires implementing desired features on top of it.

                                                                                                                  … like GIS using PostGIS.

                                                                                                                  PostGIS ships with PostgreSQL in nearly every distribution channel. And the end user of the database certainly does not have to implement PostGIS, since it’s already written. Which is my entire point.

                                                                                                                  1. 2

                                                                                                                    I just got to this thread, and it’s immensely interesting to me in spite of occasionally falling off the tightrope into flames.

                                                                                                                    Excluding specifics of different tools, there’s two aesthetics at war here between you and @geocar:

                                                                                                                    • Using conventional tools provided by others, because they have an incentive to serve as many users as possible, and so could potentially anticipate features you may need in the future. This is a really nice steel-man of the conventional approach that most people cargo-cult as “reuse”.

                                                                                                                    • Using small programs, minimalist tools and as few dependencies as possible, because every new dependency introduces new degrees of freedom where things can go wrong, where somebody else’s agenda may conflict with your own, where you pay for complexity that you don’t need.

                                                                                                                    If only we could magically unbundle the benefits of other people’s code from their limitations, have features magically appear when we need them, and be magically robust to security holes in features we don’t use.

                                                                                                                    The synthesis I’ve come up with to these two poles is to use libraries by copying, and then gradually rip out what I don’t need. This obviously makes things more minimal, so moves me closer to @geocar (whose biases I share). But it also moves me closer to your side, because when I need a new feature from upstream next year I know enough about the internals of a library to actually bring it back into the fold.

                                                                                                                    It’s hard to imagine a better synthesis than this. The only way to get the benefits without the limitations is to get on a path to understanding your dependencies more deeply.

                                                                                                                    Edit: http://arclanguage.org/item?id=20221 provides deeplinks inside the evolution of a project of mine, where you can see a library being ingested and assimilated, periodically exchanging DNA with “upstream”.

                                                                                                                    1. 1

                                                                                                                      I agree that minimal software is generally better too. I just don’t think it’s practical or valuable to make all software minimal. Using a HTTP wrapper library for literally one request in an app? Kill that dependency. But wait, the app isn’t consumer facing, just needs to get done in as little time as possible, and probably won’t be substantially extended? Screw it, who cares? Adding a dependency in a situation that matters so little is totally worth it if the wrapper library saves the developer 20 minutes of learning a more low level API.

                                                                                                                      1. 1

                                                                                                                        But you haven’t addressed my comment at all. Copying the HTTP wrapper library is a reasonable option, right? At worst, it adds minimal overhead for upgrading and so on. At best, it reduces your exposure to a fracas like befell left-pad.

                                                                                                                        1. 2

                                                                                                                          If the app matters then copying the HTTP wrapper, or any other library, could be valuable. If the app doesn’t matter, it’s still a waste of time. It’s all about tradeoffs.

                                                                                                                          Something like an HTTP wrapper, I might just drop it entirely. A lot of those libraries are just reinterpretations of how the author feels APIs should look. Something like ncurses though? I’m not touching it, no way. Or postgres? Forking a database is a huge commitment. But a json parser with a few hokey features I’ll never need, that slow down the parser? I’ve forked that. A password hashing library that bizarrely had waaaay more functions than hash, and check_hash? Forked.

                                                                                                                          For C++ it’s especially valuable to fork and strip, because monster headers increase compile times. In big projects, adding a header that increases compile time by 200ms can add minutes to build time. Yikes.

                                                                                                                          So yeah, I agree with you that forking and stripping is a good strategy. It doesn’t apply to everything, but in situations where it’s the best choice, I find it’s usually the best choice by a long shot.

                                                                                                                          1. 2

                                                                                                                            It sounds like you’re already practicing what I struggled to figure out. That’s great! I’ll suggest that your narrative of “big programs” is too blunt, and doesn’t adequately emphasize the challenges of dealing with their fallout.

                                                                                                                            Forking a database is a huge commitment.

                                                                                                                            All you’re doing is copying it. How is that a commitment?

                                                                                                                            There’s a certain amount of learned helplessness that rears its head whenever the word “fork” comes up. Let’s just say “copy” to get past that. That’ll help us realize that there’s no dependency we can’t copy into our project, just to allow for future opportunities to rip out code. Start with the damn OS! OpenBSD has a userland you can keep on your system and recompile with a single command. Why can’t everyone do this?

                                                                                                                            1. 3

                                                                                                                              It’s not learned helplessness, it’s that maintaining database software is actually hard. If you copy it but don’t change it, you’re pretty much just taking the peripheral burden upon yourself. Now if you want to deploy it you’re on the hook for builds, packaging, package testing, patches for your distro and so on. All this stuff normally done by actual domain experts. Not only is it a huge waste of time, it’s something you’re really likely to screw up at least once.

                                                                                                                              I work on database engines and I don’t even host my own databases when I can afford it. Setting up replication, failover, backups, etc., that’s a ton of work, especially since you have to test all of it thoroughly and regularly. If it were for a business application, I’d happily pay for Heroku Postgres all the way up to premium-8 tier ($8500 / month). At $102,000 / year, that’s still lower than the salary I’d pay for an engineer I’d actually trust to manage a HA postgres setup.

                                                                        2. 2

                                                                          On the other hand, with just a moment’s thought to the command line, McIlroy’s version will quickly show problems with the definition of “word” where you end up with “isn”, “wouldn” and “t” as “words,” among other problems. McIlroy can then spend time on replacing the first line with a more specialized program to break words out of a text stream. Knuth can do the same, but how much time has been spent writing the rest of the code to deal with counting and sorting words?

                                                                          1. 4

                                                                            I’ve way more often been in the position of replacing huge shell/Python/&c agglomerations with a single well defined and modular program than the opposite. Perhaps there is ultimately good reason that people build large systems in languages with module systems and interfaces, instead of in shell.

                                                                            Most languages have libraries for the kind of stuff you’re talking about — modules don’t have to be literally separate programs.

                                                                            1. 1

                                                                              Also busted: words with accents, like café, Montréal, née, Québec, and résumé. He even used the word “Fabergé” in his review, which would become “faberg” in the output!

                                                                            2. 2

                                                                              Why would you not accept his solution? He doesn’t use a ready-made frequency algorithm, but shows his knowledge of the problem and the tools at hand to implement exactly the algorithm required. Exactly what I want a candidate to do.

                                                                              1. 2
                                                                                1. 1

                                                                                  If we imagine a second student, who is Knuth, who gives the expected answer, then McIlroy is like the clever student, calling the other student’s answer unimaginative or dull — but to be especially imaginative was never the purpose to begin with.

                                                                                  1. 2

                                                                                    Indeed, you are right about that.

                                                                              1. 12

                                                                                Chen’s blog post is interesting in both what it references, McIlroy critiquing Knuth, and in what it misses in that exchange.

                                                                                In short, in 1986 Jon Bentley asked Donald Knuth to demonstrate literate programming by implementing a word-count program which would then be critiqued by Doug McIlroy. Knuth delivered a beautiful example of literate programming in Pascal. 10 pages worth. McIlroy, in addition to his critique, delivered a six-segment shell script that accomplished the same thing without intermediate values…a purely functional implementation as Chen describes it.

                                                                                McIlroy, among other comments, ends his critique with:

                                                                                Knuth has shown us here how to program intelligibly, but not wisely. I buy the discipline. I do not buy the result. He has fashioned a sort of industrial-strength Fabergé egg—intricate, wonderfully worked, refined beyond all ordinary desires, a museum piece from the start.

                                                                                That’s the background.

                                                                                Chen takes up the topic because he’s intrigued by McIlroy’s solution because it’s purely functional, wondering how he’d do the same today. He writes his solution in Haskell in two variations: “standard” and literate. As a Haskell implementation, it’s effective. Chen then discuss the advantages of both and falls on the side of “standard” rather than literate. Had he left it at that, it would be an interesting bit of Haskell.

                                                                                A curious exchange in the comment section brings the discussion back to McIlroy’s critique of Knuth. Dorin B takes Chen to task for misunderstanding McIlroy’s point:

                                                                                You missed the point in McIlroy’s sollution: to use reusable components.

                                                                                Chen then replies:

                                                                                No, I think I illustrated exactly the point that McIlroy was making, and I believe that if you emailed him, he would completely agree with me today. … Note how every single line in my Haskell program is in fact a reusable component.

                                                                                Chen completely misses Dorin’s that for McIlroy’s reusable components isn’t about functions or sub-routines but composable tools. Dorin’s right.

                                                                                In the interview Chen posted with McIlroy the question that segues into a discussion of his critique of Knuth’s solution begins with a discussion of how pipes effectively invented the concept of the tool. McIlroy says:

                                                                                McIlroy: Yes. The philosophy that everybody started putting forth: “This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams because that is a universal interface.” All of those ideas, which add up to the tool approach, might have been there in some unformed way prior to pipes. But, they really came in afterwards.

                                                                                MSM: Was this sort of your agenda? Specifically, what does it have to do with mass produced software?

                                                                                McIlroy: Not much. It’s a completely different level than I had in mind. It would nice if I could say it was. (Laughter) It’s a realization. The answer is no. I had mind that one was going to build relatively small components, good sub-routine libraries, but more tailorable than those that we knew from the past, that could be combined into programs. What has… the tool thing has turned out to be actually successful. People just think that way now. That’s providing programs that work together. And, you can say, if you if stand back, it’s the same idea. But, it’s at a very different level, a higher level than I had in mind. Here, these programs worked together and they could work together at a distance. One of you can write a file, and tomorrow the other one can read the file. That wasn’t what I had in mind with components. I had in mind that … you know, the car would not be very much use if its wheels were in another county. They were going to be an integral part of the car. Tools take the car and split it apart, and let the wheels do their thing and then let the engine do its thing and they don’t have to do them together. But, they can do them together if you wish.

                                                                                MSM: Yeah. I take your point. If I understand it correctly, and think about it, a macro based on a pipeline is an interesting thing to have in your toolbox. But, if you were going write a program to do it, you wouldn’t just take the macro, you’d have to go and actually write a different program. It wouldn’t be put together out of components, in that sense.

                                                                                McIlroy: So, when I wrote this critique a year or two ago of Knuth’s web demonstration. Jon Bentley got Knuth to demonstrate his web programming system, which is a beautiful idea. …

                                                                                Now, in 1968, I would have thought he was doing just right. He was taking this sub-routine and that sub-routine, and putting them together in one program. Now, I don’t think that is just right. I think that the right way to do that job is as we do it in Unix, in several programs, in several stages, keeping their identity separate, except in cases where efficiency is of extreme importance. You never put the parts into more intimate contact. It’s silly. Because, once you’ve got them there, it’s hard to get them apart. You want to change from English to Norwegian, you have to go way to the heart of Knuth’s program. You really ought to be able to just change the pre-processors that recognize this is a different alphabet.

                                                                                For Chen to then argue that his Haskell implementation illustrates exactly McIlroy’s point shows Chen either didn’t read what McIlroy had to say about it, or doesn’t understand it. That’s not to say McIlroy is against functions, sub-routines or software toolboxes. But that’s not the point McIlroy was making.

                                                                                Of course Chen isn’t alone in misunderstanding Unix and what Thompson, Ritchie, Kernighan, McIlroy and many others achieved with it. In a nutshell this is what distinguishes many BSD users from Linux users.[1] BSD isn’t merely about POSIX, nor is it about avoiding Windows and other proprietary software (important as those goals may be). BSD is mostly about the Unix philosophy.[2]

                                                                                On the whole, Chen’s discussion of literate vs “standard” program is interesting. As a Haskell programmer, I find his solution informative. As commentator on McIlroy or the Unix philosophy, I’ll look elsewhere.

                                                                                [1] That’s not to say many Linux users aren’t interested in the Unix philosophy or that all BSD users are Unixphiles. Setting aside criticisms about security, implementation and what have you, the issue many Linux users have with /systemd/ is isn’t Unix-like.

                                                                                [2] Yes, the various BSDs differ a bit on how that looks and how rigorously to pursue it.

                                                                                Edit: Fix formatting.

                                                                                1. 2

                                                                                  I think that the right way to do that job is as we do it in Unix, in several programs, in several stages, keeping their identity separate, except in cases where efficiency is of extreme importance.

                                                                                  We often must ditch this lots-of-separate-programs approach whenever efficiency is of more than negligible importance.

                                                                                  1. 0

                                                                                    I admit I know nothing of the wider context; all I know is what you’ve posted here. But from what you’ve written it sounds like Chen is presenting the 21st century McIlroyian view which may not be the same as the original in concrete terms but is the same spirit rebased on today’s technology.

                                                                                    1. 3

                                                                                      I think Chen is trying to cast McIlroy that way, but for it to be the 21st-century version of McIlroy’s argument, reusable software components would have to come after tools. But that’s not how it went. Indeed, McIlroy says as much in his critique of Knuth:

                                                                                      Now, in 1968, I would have thought he was doing just right. He was taking this sub-routine and that sub-routine, and putting them together in one program. Now, I don’t think that is just right. I think that the right way to do that job is as we do it in Unix, in several programs, in several stages, keeping their identity separate, except in cases where efficiency is of extreme importance.

                                                                                      Chen is saying more than is warranted based on the support he provides (the linked interview). It’s one thing to write something like: “In the same spirit as McIlroy’s reusable tool approach…” It’s another to write:

                                                                                      No, I think I illustrated exactly the point that McIlroy was making, and I believe that if you emailed him, he would completely agree with me today.

                                                                                      Again, that’s not to say McIlroy would disagree with a reusable-component approach to software development.

                                                                                      The point is McIlroy was making a very specific critique of Knuth’s program based on value of tool over software reuse and composition and Chen completely missed it. And then asserts McIlroy would agree with him that writing software with reusable software components is point McIlroy was making.

                                                                                      Chen misunderstands McIlroy, and worse, imputes his misunderstanding to McIlroy.

                                                                                  1. 39

                                                                                    The argument seems to rely on the cost of static typing, which is stated in the following four points that I challenge:

                                                                                    It requires more upfront investment in thinking about the correct types.

                                                                                    I don’t buy this argument at all. If you don’t think about correct types in a dynamic languages, you will run into trouble during testing (and production, when your tests aren’t perfect). You really have to get your types right in a dynamic language too. Arguably, with a statically typed language, you have to think less because compiler will catch your error. I think that’s the whole point of statically typed languages (performance concerns aside).

                                                                                    It increases compile times and thus the change-compile-test-repeat cycle.

                                                                                    I’d have to see some proof to show that static typing plays a significant role in compile time. I’ll buy that you can make a very complicated (perhaps turing complete) type system and that would have a big impact. But there are statically typed languages that compile really fast, and most of the compile time is probably not spent on types. I’d argue that it is likely for the compiler to catch your error with types faster than you could compile, run, and test to find the same error with no types.

                                                                                    It makes for a steeper learning curve.

                                                                                    That may or may not be true. Sure, a type system can be very complicated. It doesn’t have to be. On the other hand, a dynamic language will still have types, which you need to learn and understand. Then, instead of learning to annotate the types in code, you learn to figure out type errors at run time. Is that so much easier?

                                                                                    Either way, I don’t believe type systems are an insurmountable barrier. And I think some people give the learning curve way too much weight. Maybe they are working on throwaway software on a constantly changing faddy tech stack, in a place with high employee turnover. It’ll matter more. I suppose there’s a niche for that kind of software. But I’m more into tech that is designed for software that is developed, used, and maintained for years if not decades. A little bit of extra learning up front is no big deal and the professionals working on it will reap the benefits ever after.

                                                                                    And more often than we like to admit, the error messages a compiler will give us will decline in usefulness as the power of a type system increases.

                                                                                    That might be the case with the current crop of languages with clever type system, though I don’t know if it’s inherent. Do static type system need to be so powerful (read: complicated) however? A simpler system can get you just as much type safety, at the expense of some repetition in code.

                                                                                    I think there are dimnishing returns, but not due to the cost of static typing as such, but rather due to the fact that types just don’t catch all errors. Once the low hanging fruit is out, there’ll be proportionally more and more logic errors and other problems that aren’t generally prevented with types. Or you could catch these with extensive type annotation, but the likelihood of preventing a real problem becomes small compared to the amount of annotation required. And then there’s the usual question: who checks the proof?

                                                                                    There have been some famous bugs that resulted from systems using different units. So if these numeric quantities were properly typed, these bugs would have been prevented. However, what if we change the scenario a little, and suppose we’re measuring fuel or pressure, in the right units. But we read the wrong quantity – spent fuel instead of stored fuel, or exterior pressure instead of interior pressure? Sure you can add more types to prevent such misuse, but it gets more and more verbose, and then you’re moving closer and closer to re-expressing (and thus enforcing) the program logic in the language of the type system; we could consider that to be a language of its own.

                                                                                    Now you have two programs, and one can prevent bugs in the other, but both could still be buggy. And the other program starts to grow because you start needing explicit conversions to enable the code to actually perform a computation on internal-pressure-in-pascal. Of course you are subverting the type system when you say you really want to convert internal-pressure-in-pascal to just pressure-in-pascal or whatever. Bugs ahoy?

                                                                                    1. 18

                                                                                      A simpler system can get you just as much type safety, at the expense of some repetition in code.

                                                                                      I agree with most of the rest of your comment, but this part is untrue. Stronger type systems do allow you to enforce more powerful laws at compile time. At one end of the curve we have a type system like C’s, which barely buys you anything, and then at the other end we have full dependent types where you can prove arbitrary invariants about your code (this function always terminates, this value is always even, etc.) that you cannot prove in a weaker type system. In between is a huge spectrum of safety checking power.

                                                                                      1. 8

                                                                                        The C type system can actually be quite powerful if you wrap basic types in one element structs. struct meter { int v; } and struct foot { int v; } can’t be added by mistake, but can still be worked with using one line inline functions with no performance penalty. It’s just work (which nobody likes).

                                                                                        1. 5

                                                                                          I would not describe that as “quite powerful” at all. That’s one of the most basic things a type system can give you.

                                                                                          You can’t really prove any interesting properties until you at least have proper polymorphism. Java doesn’t, for example, because every object can be inspected at runtime in certain ways. In a sensible type system, there are no properties of objects except those which are explicitly stated in the type of the object.

                                                                                          In such a type system, you can prove interesting properties like that a data structure does not “depend” in any way on the objects in contains. For example, if you could implement a function

                                                                                          fmap :: (a -> b) -> f a -> f b
                                                                                          

                                                                                          Which “mapped over” the contents of your object with some function, this would prove that your object never inspects its contents and therefore its structure does not depend on the values of its contents (because this function is universally quantified over ‘b’, and therefore you could map every ‘a’ to a constructed type which cannot be inspected in any way).

                                                                                          You can prove all sorts of useful properties like this (often without even realize you’re doing it) once you have proper quantification in your type system. One of the coolest quantification-based proofs I know of is that Haskell’s ST monad is extrinsically pure.

                                                                                          As you add more power to your type system (up to full dependent types, linear types, etc.) you can prove more and more useful things.

                                                                                          1. 2

                                                                                            As long as you like all your types disjoint, sure. But I’ll pass.

                                                                                            1. 2

                                                                                              So what’s wrong with disjoint types?

                                                                                              1. 2

                                                                                                It doesn’t you have rationals and floats that are both numbers for example.

                                                                                                1. 2

                                                                                                  In Ocaml ints and floats are different types and operators like (+) only apply to ints, one has to use (+.) for floats. It’s not a problem IME.

                                                                                                  1. 1

                                                                                                    I think automatic type conversion of ints to reals was the original sin of FORTRAN.

                                                                                                  2. 1

                                                                                                    In mathematics the system Z of integers and the system R of reals are different. The number 3 has different properties depending on system context - for example 3x = 1 has a solution in the second context.

                                                                                              2. 0

                                                                                                But it lacks a keyword connection to category theory.

                                                                                            2. 12

                                                                                              I don’t buy this argument at all. If you don’t think about correct types in a dynamic languages, you will run into trouble during testing (and production, when your tests aren’t perfect). You really have to get your types right in a dynamic language too. Arguably, with a statically typed language, you have to think less because compiler will catch your error. I think that’s the whole point of statically typed languages (performance concerns aside).

                                                                                              That’s a good point and one that took me a long time to learn: if a concept cannot be expressed in a language, it doesn’t magically disappear and absolve the programmer from thinking about it. Types are one example as you mention; similarly, in many discussions about Rust, some people mention that the borrow checker is an impediment to writing code. It’s true that some programs are rejected by the compiler, but lifetime and ownership are also concerns in C programs as well. The main differences are that in Rust you have rules and language constructs to talk about those concerns while in C it’s left to documentation and convention.

                                                                                              1. 7

                                                                                                But there are statically typed languages that compile really fast, and most of the compile time is probably not spent on types.

                                                                                                OCaml is a good example of this.

                                                                                                1. 3

                                                                                                  I don’t buy this argument at all. If you don’t think about correct types in a dynamic languages, you will run into trouble during testing (and production, when your tests aren’t perfect). You really have to get your types right in a dynamic language too. Arguably, with a statically typed language, you have to think less because compiler will catch your error.

                                                                                                  It’s true that you have to get the types right in a dynamic language, but the appeal of dynamic languages isn’t that you don’t have to think about types. It’s that you don’t have to think about the shape of types. For example:

                                                                                                  def make_horror_array(depth: int):
                                                                                                      arr = []
                                                                                                      deepest_arr = arr
                                                                                                      
                                                                                                      for i in range(depth):
                                                                                                          deepest_arr.append([])
                                                                                                          deepest_arr = deepest_arr[0]
                                                                                                      deepest_arr.append(depth)
                                                                                                      return arr
                                                                                                  

                                                                                                  What type should that return? Contrived, but it’s not the gnarliest type problem I’ve run into. Sometimes it’s nice to have a language where I can give up on getting the types right and rely on tests and contracts to check it.

                                                                                                  1. 4

                                                                                                    It’s that you don’t have to think about the shape of types.

                                                                                                    How do you use a value without thinking about it’s type or shape?

                                                                                                    In your example, you can’t just blindly apply numerical operation to first element of that horror array since it might be another array. So, if you wanted to get to the value inside of those nested arrays, you’d need to think about how you would “strip” them off, wouldn’t you? And wouldn’t it mean that layers of nesting have some special meaning for us?

                                                                                                     

                                                                                                    Taking your implementation as reference:

                                                                                                    >>> make_horror_array(0)
                                                                                                    [0]
                                                                                                    >>> make_horror_array(1)
                                                                                                    [[1]]
                                                                                                    >>> make_horror_array(5)
                                                                                                    [[[[[[5]]]]]]
                                                                                                    >>> make_horror_array(10)
                                                                                                    [[[[[[[[[[[10]]]]]]]]]]]
                                                                                                    

                                                                                                    we can write a Haskell version that distinguishes between a value nested in a “layer” and a value by itself:

                                                                                                    λ> :{
                                                                                                    λ> data Nested a = Value a | Layer (Nested a)
                                                                                                    λ>
                                                                                                    λ> -- just for presentation purposes
                                                                                                    λ> instance Show a => Show (Nested a) where
                                                                                                    λ>   show (Value a) = "[" ++ show a ++ "]"
                                                                                                    λ>   show (Layer a) = "[" ++ show a ++ "]"
                                                                                                    λ> :}
                                                                                                    λ>
                                                                                                    λ> mkHorror n = foldr (.) id (replicate n Layer) $ Value n
                                                                                                    λ> :type mkHorror
                                                                                                    mkHorror :: Int -> Nested Int
                                                                                                    λ>
                                                                                                    λ> mkHorror 0
                                                                                                    [0]
                                                                                                    λ> mkHorror 1
                                                                                                    [[1]]
                                                                                                    λ> mkHorror 5
                                                                                                    [[[[[[5]]]]]]
                                                                                                    λ> mkHorror 10
                                                                                                    [[[[[[[[[[[10]]]]]]]]]]]
                                                                                                    

                                                                                                    and if we don’t need layers anymore, we can get value out pretty easily:

                                                                                                    λ> :{
                                                                                                    λ> fromNested :: Nested a -> a
                                                                                                    λ> fromNested (Value a) = a
                                                                                                    λ> fromNested (Layer a) = fromNested a
                                                                                                    λ> :}
                                                                                                    λ>
                                                                                                    λ> fromNested (mkHorror 0)
                                                                                                    0
                                                                                                    λ> fromNested (mkHorror 5)
                                                                                                    5
                                                                                                    
                                                                                                    1. 4

                                                                                                      Assuming it’s correct, it should return whatever the type inference engine chooses for you :)

                                                                                                      1. 1

                                                                                                        This is because type theory is confused about what programs do - which is operate on bit sequences (or, these days, byte sequences). These sequences may be representations of mathematical objects or representations of things that are not mathematical objects, but they remain representations, not actual ideal mathematical objects.

                                                                                                      2. 1

                                                                                                        Now you have two programs, and one can prevent bugs in the other, but both could still be buggy.

                                                                                                        Aren’t a lot of type systems proven type-correct nowadays?

                                                                                                        1. 3

                                                                                                          The type system can be fine but the rules you define with the types could be flawed. Thus, you can still write flawed programs that the type system can’t prevent because the types were defined incorrectly.

                                                                                                          1. 1

                                                                                                            Can you give an example? I’m not sure exactly what breed of incorrect types you’re referring to.

                                                                                                        2. 1

                                                                                                          It requires more upfront investment in thinking about the correct types.

                                                                                                          I don’t buy this argument at all. If you don’t think about correct types in a dynamic languages, you will run into trouble during testing (and production, when your tests aren’t perfect).

                                                                                                          One thing I am learning from working with inexperienced developers is that even thinking about which container type you are using is a challenge. Should your function return a Seq? An Array? An Iterator? A Generator? And what if your new library returns a Generator and the old one returned an Iterator and now you have to rewrite all your declarations for seemingly no reason at all? Some kind of “most general type” restriction/requirement/tool would help with this…

                                                                                                          1. 2

                                                                                                            This is one of the things I think Go does really well (in spite of having a generally quite weak type system) - thanks to implicit interfaces, you can just return the concrete type you’re using and the caller will automatically pick up that it ‘fits’.

                                                                                                            1. 1

                                                                                                              This sort of works – but even with that system, it’s easy to declare one’s types too tightly.

                                                                                                              It depends in part on how granular the collection library’s interfaces are (ditto for numeric tower, effects tracking, monad wizard tool).

                                                                                                            2. 1

                                                                                                              I’m unclear what you mean. Many languages offer two solutions. You can declare the variable IEnumerable or whatever as appropriate. Or you declare the variable as “whatever type the initializer has”.

                                                                                                              1. 3

                                                                                                                When in doubt, use the inference!

                                                                                                                1. 1

                                                                                                                  It is sometimes easy to choose wrong. Iterable vs Iterator vs Enumerable

                                                                                                            1. 5

                                                                                                              Extending pglite with a tmp mode and the ability to run as a command wrapper – pglite starts the database, runs the wrapped command and then shuts everything down (exiting with the exit status of the wrapped command).

                                                                                                              1. 3

                                                                                                                Oh my god where have you been all my life. <3

                                                                                                                Holy shit that’s a handy script. Please continue! :)

                                                                                                              1. 4

                                                                                                                I think this article describes audit tables. And it seems an argument is made that an audit table is specifically designed for the table whose changes it tracks, so audit tables as a solution are not “automatic”.

                                                                                                                I’ve been working on a flexible audit table solution for Postgres, based on the official docs, some blog posts and advice from the Postgres IRC channel. It works like this:

                                                                                                                First you create the audit table to store the changes. Yes, the table means there’s only one for one or more other tables to be tracked. The assumption is that this table is only for storage and whenever you need to manipulate data, you copy a subset of the table into a temporary table for further work.

                                                                                                                CREATE TABLE IF NOT EXISTS public.audit
                                                                                                                (
                                                                                                                  change_date timestamp with time zone NOT NULL DEFAULT now(),
                                                                                                                
                                                                                                                  -- session_user may be the (or an) application's DB role or perhaps a developer's role
                                                                                                                  session_user_name text NOT NULL,
                                                                                                                
                                                                                                                  -- current_user may be set to something else than the user that started the session
                                                                                                                  current_user_name text NOT NULL,
                                                                                                                
                                                                                                                  -- this will be provided by the application (SET "application"."user" = 'bob';)
                                                                                                                  -- useful if its users are stored in a table and not DB roles
                                                                                                                  application_user_name text,
                                                                                                                
                                                                                                                  -- indicating command that modified data (insert / delete / update)
                                                                                                                  action character(1) NOT NULL,
                                                                                                                
                                                                                                                  -- the table where the data was modified
                                                                                                                  table_name text NOT NULL,
                                                                                                                
                                                                                                                  -- the table id. May be useful if indexed and you query the audit table for a specific relid
                                                                                                                  relid oid NOT NULL,
                                                                                                                
                                                                                                                  -- values identifying the changed row
                                                                                                                  pkey jsonb,
                                                                                                                
                                                                                                                  -- the object before and after the change. JSONB for schemaless data (this can store rows from multiple different tables)
                                                                                                                  before_change jsonb,
                                                                                                                  after_change jsonb NOT NULL
                                                                                                                );
                                                                                                                

                                                                                                                Everything in that table except the applitation user name will be provided by Postgres, there’s no manual work to do to make it work.

                                                                                                                If it’s not guaranteed that the application will always provide the application_user_name, it’s convenient to set an empty default:

                                                                                                                SET "application"."user" = '';
                                                                                                                ALTER SYSTEM SET "application"."user" = '';
                                                                                                                SELECT pg_reload_conf();
                                                                                                                

                                                                                                                This has to be done only once for a PG cluster.

                                                                                                                Then you’ll have to define a function whose purpose is to record changes. It’s designed to be executed by a trigger; you’ll define such triggers for each table you want to audit.

                                                                                                                CREATE OR REPLACE FUNCTION public.audit() RETURNS trigger AS
                                                                                                                $BODY$
                                                                                                                DECLARE
                                                                                                                  before JSONB; after JSONB;
                                                                                                                  pkey JSONB;
                                                                                                                  source record;
                                                                                                                BEGIN
                                                                                                                  IF TG_OP = 'UPDATE' THEN
                                                                                                                    IF NEW IS NOT DISTINCT FROM OLD THEN RETURN NEW; END IF;
                                                                                                                      SELECT json_object_agg(key, value)::jsonb
                                                                                                                      INTO after
                                                                                                                      FROM (
                                                                                                                        -- EXCEPT here eliminates fields that didn't change.
                                                                                                                        SELECT * FROM json_each_text(row_to_json(NEW.*))
                                                                                                                        EXCEPT
                                                                                                                        SELECT * FROM json_each_text(row_to_json(OLD.*))
                                                                                                                      ) y;
                                                                                                                      SELECT json_object_agg(key, value)::jsonb
                                                                                                                      INTO before
                                                                                                                      FROM (
                                                                                                                        SELECT * FROM json_each_text(row_to_json(OLD.*))
                                                                                                                        EXCEPT
                                                                                                                        SELECT * FROM json_each_text(row_to_json(NEW.*))
                                                                                                                      ) y;
                                                                                                                    source := NEW;
                                                                                                                  ELSIF TG_OP = 'DELETE' THEN
                                                                                                                    SELECT json_object_agg(key, value)::jsonb INTO after FROM json_each_text(row_to_json(OLD.*));
                                                                                                                    source := OLD;
                                                                                                                  ELSIF TG_OP = 'INSERT' THEN
                                                                                                                    SELECT json_object_agg(key, value)::jsonb INTO after FROM json_each_text(row_to_json(NEW.*));
                                                                                                                    source := NEW;
                                                                                                                  END IF;
                                                                                                                
                                                                                                                    SELECT json_object_agg(key, value)::jsonb
                                                                                                                    INTO pkey
                                                                                                                    FROM (
                                                                                                                      SELECT *
                                                                                                                      FROM json_each(row_to_json(source.*)) AS j
                                                                                                                      WHERE EXISTS (
                                                                                                                        SELECT a.attname
                                                                                                                        FROM pg_index i
                                                                                                                        JOIN pg_attribute a ON a.attrelid = i.indrelid AND a.attnum = ANY(i.indkey)
                                                                                                                        WHERE
                                                                                                                          i.indrelid = CONCAT_WS('.', TG_TABLE_SCHEMA, TG_TABLE_NAME)::regclass
                                                                                                                          AND i.indisprimary
                                                                                                                          AND a.attname = j.key
                                                                                                                      )
                                                                                                                    ) y;
                                                                                                                
                                                                                                                  INSERT INTO audit(
                                                                                                                    session_user_name,
                                                                                                                    current_user_name,
                                                                                                                    application_user_name,
                                                                                                                    action,
                                                                                                                    table_name,
                                                                                                                    relid,
                                                                                                                    pkey,
                                                                                                                    before_change,
                                                                                                                    after_change
                                                                                                                  )
                                                                                                                  VALUES (
                                                                                                                    session_user,
                                                                                                                    current_user,
                                                                                                                    current_setting('application.user'),
                                                                                                                    SUBSTRING(TG_OP, 1, 1),
                                                                                                                    CONCAT_WS('.', TG_TABLE_SCHEMA, TG_TABLE_NAME),
                                                                                                                    TG_RELID,
                                                                                                                    pkey,
                                                                                                                    before,
                                                                                                                    after
                                                                                                                  );
                                                                                                                  RETURN NEW;
                                                                                                                END;
                                                                                                                $BODY$ LANGUAGE plpgsql VOLATILE COST 100;
                                                                                                                

                                                                                                                Having done the above, you can now start auditing chosen tables, for example:

                                                                                                                CREATE TRIGGER audit
                                                                                                                  AFTER INSERT OR UPDATE OR DELETE -- you can choose any combination here. Note that UPDATE lets you choose columns to watch for changes
                                                                                                                  ON shop.books
                                                                                                                  FOR EACH ROW
                                                                                                                  EXECUTE PROCEDURE public.audit();
                                                                                                                
                                                                                                                1. 1

                                                                                                                  What’s the reasoning behind COST 100?

                                                                                                                  1. 1

                                                                                                                    It’s just a default. I haven’t put much thought into it.