Threads for squadette

  1. 1

    C++ can do it (conditional syntax errors) with templates, constexpr and of course macros (but is_friday() would not be implementable as a contant expression, since it requires IO, unless you count compiler arguments).

    SFINAE would be the template version. Here is a constexpr version:

    char* weekday() {
        if constexpr(is_friday()) {
            return "friday";
        }
        // Control reaches end of non-void function.
    }
    
    1. 1

      Sorry, I don’t understand that reply. So does this code work as expected? Or can it be made to work as expected? How does is_friday look?

      1. 1

        (Edited it for clarity.) I just mean conditional syntax errror – is_friday() is just any constant expression in my example. A correct implementation of is_friday() would require passing information through compiler arguments, which isn’t the impressive part.

        constexpr bool is_friday() { return IS_FRIDAY; }

    1. 1

      This seems very specific to a particular (unknown?) system. I had to dig around to find out what ‘attribute’ means in this context and it didn’t really make things much clearer.

      1. 2

        It’s designed to be the opposite of that — completely generic all-encompassing guide on migrating any kind of data in relational and semi-relational environment.

        Attribute here is any piece of data about a certain thing. (Basically: name of the user, text of the post, price of the item, etc.) In relational database this is often just a field in the table, but there are several other commonly accepted physical realizations, like key-value storage, schemaless etc.

        1. 2

          Update: terminology is explained in the very first issue (which may be hard to find): https://minimalmodeling.substack.com/p/introduction-to-schema-migrations

        1. 2

          That heterogeneous array looks really dangerous.

          1. 1

            yeah, but that ship has already sailed.

          1. -3

            The article forget to describe prior art, except Mathematica, Wolfram Alpha, and Google. Further research will yield more tools like conceptnet.io (that scrape wiktionaries), dbpedia (that scrape wikipedia infoboxes), further wikidata side to make wikidata easier to query https://query.wikidata.org/querybuilder/, also there is Marie Destandau work ergonomic SPAQRL federated query builder, and mediawiki $10M project dubbed abstract wikipedia. There is also the work on wikidata Q/A aka. https://hal.archives-ouvertes.fr/hal-01730479/document.

            The article did not mention the fuzzy situation regarding the licensing terms wiktionary and wikipedia vs. wikidata and more broadly the fuzzy legal framework.

            More fundamental problem that the OP @zverok seems to have no clues about: extracting structured data from unstructured or semi-structured data like html or plain text in the general case is a very hard problem and possibly AI-Complete.

            So, yes I agree given wikipedia and wiktionary, and the fact they are mainstream well-established, eventually they became user-friendly, it would have been better for wikidata to bet on extracting structured RDF triples from those, and invest into making automated and semi-automated approach to extract structured data from unstructured data, merely for legal reason, mediawiki C level want wikidata to be CC0.

            Also, I want to stress that ML mainstream frenzy shadowed one of Google tool of choice: freebase, and its related tools and practices.

            We need to parse our civilization’s knowledge and make it programmatically available to everybody. And then we’ll see what’s next.

            Yes!

            On a related note: I stopped giving money to wikimedia.

            1. 7

              The article forget to describe prior art

              That’s not a scientific article, rather a blog post about some problems I am thinking about and working upon. I am aware (at least of some) of the “prior art” (well, since I am playing with the problems at hand for 6 years now, my “links.txt” is a few books worth). I honestly don’t feel obliged to mention everything that is done in the field unless my work is inspired by/related to others’ work. Yes, a lot of people do a lot of stuff, some of it is dead-ends, some of it is fruitful, and I have the highest respect for them, but being just a developer I am, I am just doing what seems interesting and writing about it, no more, no less.

              More fundamental problem that the OP @zverok seems to have no clues about: extracting structured data from unstructured or semi-structured data like html or plain text in the general case is a very hard problem and possibly AI-Complete.

              What makes you think that person stating that spent many years on a problem “has no clues about” one of the most obvious aspects of it? The problem is obviously unsolvable “generically” (e.g. “fetch any random page from Internetz and say what information it contains”), and that’s the exact reason I am investigating approaches to solve it for some practical purposes—in a somewhat generic way.

              1. 5

                The problem is obviously unsolvable “generically” (e.g. “fetch any random page from Internetz and say what information it contains”)

                People saying a problem is AI-complete is a big red flag for me in terms of taking the rest of what they say seriously. I am reminded of the famous Arthur C. Clarke quote:

                If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong.

                The issue is not that there aren’t AI-complete problems. It is not even that this problem isn’t AI-complete. The point is that the AI-completeness is just not a relevant criterion for evaluation a project that aims to be used by humans. Language translation is also AI complete but I have heard some people use google translate…

                An AI-Complete problem is by definition achievable by human intelligence (if not it is simple called an impossible problem) and therefore tools that help humans think, remember or formulate things can help with all of them.

                Also any time you take an AI-complete problem and reduce it to a finite and programmatically defined domain it ceases to become AI-complete. I remember when I used to try to convince people that real language AI should be part of computer games (I have given up now). This was people’s standard reply. But it is simply not true. AI understanding real human language in every possible case in the real world is AI complete. AI understanding and generating basic descriptive sentences in a single non-evolving language, in a world that is small and has well defined boundaries (both in actual size and also in complexity), and where every aspect of the world is programmatically defined and accessible to the AI is not even really AI-hard, it is merely a quite large and complicated software problem that will take careful planning and a lot of work.

                There is a long history of people saying “AI will never be able to do X” and being completely wrong. The people working on AI don’t really listen though.

              2. 4

                More fundamental problem that the OP @zverok seems to have no clues about

                That’s quite harsh. Was this really necessary in this context?

                1. 1

                  I did not mean to be harsh \cc @zverok, sorry! I agree with the other threads, whether it may be AI-Complete or not, I like the project, it is an interesting project, possibly difficult and possibly with low-hanging fruit. The tone of my comment was not the one it meant to have, I was trying to draw a picture of existing, and some time far fetched related projects, so that someone (OP or else) can jump in more easily, in a subject that interests me a lot, and imo that deserves more attention!

                  Re @Dunkhan, the whole comment is interesting, I quote one part:

                  AI understanding and generating basic descriptive sentences in a single non-evolving language, in a world that is small and has well defined boundaries (both in actual size and also in complexity), and where every aspect of the world is programmatically defined and accessible to the AI is not even really AI-hard, it is merely a quite large and complicated software problem that will take careful planning and a lot of work.

                  +1, hence the importance to review prior art, whether one is a science officer or a hobbyist, and put on continuum what-is-done vs. what-is-impossible, and try to find a new solution for “it for some practical purposes—in a somewhat generic way”.

                  I am eager to read a follow up article on the subject.

                  Again sorry for the tone of my comment. I repeat that, for sure, I am clueless about what OP is clueless about. I only meant to do common good.

              1. 9

                Seems that we have a new cringe content generation trope, ugh.

                1. 1

                  and on medium.com, nonetheless.

                1. 17

                  Personally, I’m excited about the “melting face” emoji, as well as a few of the new gender and skin color variants that will piss off chuds. Also, the Emoji block of Unicode now has not one, but two amulets against the evil eye, which I expect will be extremely valuable for social media.

                  1. 6

                    I’m torn between “melting face” and “dotted line face”, I think they’ll replace my usage of 🙃going forwards.

                    1. 5

                      The emoji thing is so totally irresponsible. Humanity is never going to replace Unicode. We’re stuck with it until we either go extinct or go Luddite. Adding emoji based on whims is how you end up with things like this sticking around for four thousand years and counting. The Egyptians at least had the excuse that they didn’t know what computers were.

                      1. 11

                        I actually have that one saved in my favorites in UnicodePad for Android. Of course, the modern spelling would be 🍆💦.

                        1. 10

                          Adding emoji based on whims is how you end up with things like this sticking around for four thousand years and counting. The Egyptians at least had the excuse that they didn’t know what computers were.

                          Not sure what the problem is? Ancient Egyptians living thousands of years ago didn’t share your particular cultural taboos and sensitivities, which seems like an entirely valid “excuse” to me.

                          1. 2

                            Right, there’s nothing that the Egyptians were doing “wrong”, because when they decided to use a penis as a letter, they had no way of knowing that for the remainder of human civilization we will have to use the penis as a letter, whether it’s culturally taboo or cool or we replace men with artificial sex bots or whatever. We however do know that Unicode is forever, and so the bar to adding a new character should be really fucking high. Like, here is an alphabet that was already in use by a non-trivial amount of people for some length of time. Not, it would be cool to make a new kind of smiley face.

                            A better system would be to do what is already done with flags. For flags, the flag 🇺🇸 is just

                            U+1F1FA 🇺       REGIONAL INDICATOR SYMBOL LETTER U
                            U+1F1F8 🇸       REGIONAL INDICATOR SYMBOL LETTER S
                            

                            We could do the same thing for other ephemera, and not have to burden Unicode with an open ended and endless list of foods that were popular with the Unicode committee in the 21st century.

                            1. 10

                              We don’t “have to use the penis as a letter” because it exists in Unicode. It’s just that it is technically representable. I’ll admit there’s nuance here - there are probably some things I’d rather see us avoid in Unicode, i.e. violence. But I’m struggling to see the harm caused in this particular case.

                              1. 5

                                Who’s to say that the United States will still be around in 4,000, 1,000, or 200 years? Or that the “US” code won’t be recycled for some other country? Hell, why should our current ISO system of labelling countries even persist? Once you start talking about these kind of timeframes anything is up for grabs really.

                                “Forever” is a heck of a long time. I don’t think we’re stuck with Unicode for all eternity, there’s all sorts of ways/scenarios we could come up with something new. I think we should just address the issues of the day; there’s no way what the future will be like anyway; all we can do is focus on the foreseeable future.

                                1. 3

                                  I just imagined some kind of a Unicode successor system that would have a “compatibility” block with 200k+ slots and groaned.

                                  1. 1

                                    That’s the whole point. US won’t mean 🇺🇸 forever. It will naturally change over time and when it does, the old codes will still be decipherable (flag for something called “US”) without needing to be supported anymore.

                                    Tbh, the most likely a scenario is a RoC, PRC thing where two countries claim to be the US, and then the international community will have to pick sides. Anyway, still better than having the flag as a real emoji!

                                    1. 2

                                      I don’t really follow how one scheme is more advantageous over the over; at the end of the day you’re still going have to map some “magic number” to some specific meaning. I suppose you could spell out “happy” or “fireman” in special codepoints, but that just seems the same as mapping specific codepoints to those meaning, but with extra steps (although “fireman” already consists of two codepoints: “man” + “fire engine”, or “person” and “women” for other gender variants).

                                      The reason it’s done with flags probably has more to do that it’s just easier.

                                      1. 1

                                        It’s not just that it’s easier it’s that obsolescence is a built in concept. New countries come and old countries go and ISO adds and removed country codes. Using Slack and GitHub style :name: emojis mean that you can add and drop support for specific emoji without needing to just serve up a �. It is also more forward compatible. When your friend on a new phone texts you :dotted smiley: you won’t just see �, you’ll see words that describe what is missing. Plus you aren’t using up a finite resource.

                                        1. 3

                                          Plus you aren’t using up a finite resource.

                                          TIL integers are a finite resource.

                                          1. 2

                                            To be fair, I’ll be the first to grab popcorn when they announce that everyone and their toaster now has to adopt utf8 with 1-5 bytes. Will probably be as smooth and fast as our ipv4 to ipv6 migration.

                                          2. 1

                                            When your friend on a new phone texts you :dotted smiley: you won’t just see �

                                            Right, that would be useful.

                                            Changing the meaning of specific codepoints or sequences of codepoints over time just seems like a recipe for confusion. “Oh, this 300 year old document renders as such-and-such, but actually, back then it meant something different from today” is not really something that I think will help anyone.

                                            This already exists to some degree; e.g. “Ye olde tarvern” where “Y” is supposed to represent a capital Thorn, which is pronounced as “th”, not as Y (written as þ today, but written quite similar to Y in old-fashioned cursive writing, and early German printing presses didn’t have a Thorn on account of that letter not existing in German so people used Y as a substitute). In this case it’s a small issue of pronunciation, but if things really shift meaning things could become a lot more apt to misunderstandings in meaning.

                                            1. 1

                                              Emoji have already shifted in ways that change their meaning. The gun emoji has become a ray gun due to political correctness and sometimes points left and sometimes right. Shoot me → zap you. The grimace 😬 was a different emotion on Android and iOS for a while. There are other documented examples of this kind of semantic shift in just a short amount of time. I think it’s a bit hopeless to try to pin them down while you keep adding stuff. The use of eggplant and peach for penis and butt is based on their specific portrayal and subject to the visual similarity being lost if they redraw them in different ways. What if President Xi demands a less sexy 🍑? Will it stick around or be a bit of passing vulgar slang from the early twentieth century? Impossible to predict.

                                2. 4

                                  Why can’t we have a little fun? What is the problem you are seeing with this?

                                  1. 3

                                    Your body shame is a culturally specific artifact and hardly a universal experience.

                                    1. 1

                                      You’re missing the point. I’m not ashamed of weiners. They are hilarious. The point is that a character can be taboo or not and we’re still stuck with it.

                                      1. 2

                                        If it’s not something to be ashamed of, is it really taboo enough to exclude from the Unicode standard? And furthermore, why is being stuck with it an issue? It can even be valuable from an anthropological standpoint.

                                    2. 1

                                      Just because a standard exists doesn’t mean we have to use all of it all the time.

                                      Or should the ASCII maintainers be embarrassed that their standard contains Vertical Tab?

                                      1. 2

                                        My dude, Unicode inherited vertical tab from ASCII. That’s my point. Things are only going to continue to accumulate for now until the collapse of civilization. It will never shrink.

                                  1. 13

                                    @dpc_pw, have you read “Object-Oriented Software Construction, 2nd ed.” by Bertrand Meyer? It’s insanely thick, but I feel that this is what you’d want to read. I’ve read it almost in entirety twenty years ago, and it influenced me a lot.

                                    1. 5

                                      Object-Oriented Software Construction, 2nd ed

                                      Hey, half the price of 99 Bottles. Ordered. Thanks.

                                      1. 3

                                        Meyer is definitely a guru of OOP. If he can’t convincingly explain what “OOP done right” ought to look like, no-one can.

                                        1. 2

                                          A potentially silly third option might be to have a table dedicated to relationships (id, timestamps, notes, status, etc.) and then a join table to link users to relationships in (user, relationship) tuples. I think that’d solve the “which id comes first” problem, and is probably the more normalized version still.

                                          1. 1

                                            Could you draft a concrete minimal structure of the tables? I suspect that it’s going to be one of the discussed structures in disguise.

                                            1. 5

                                              Sure!

                                              CREATE TABLE friendship (
                                                id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
                                                metadata JSON NOT NULL DEFAULT '{}'::JSON,
                                                created_at TIMESTAMPTZ DEFAULT now()   
                                              );
                                              
                                              CREATE TABLE friends_friendship_join(
                                                friend_id UUID NOT NULL,
                                                friendship_id UUID NOT NULL
                                              )
                                              
                                              # assume a 'friend' table that has some id column.
                                              

                                              That make more sense?

                                              And then you just have however many entries in friends_friendship_join are needed to specify the friendship (typically 2, but you could go higher or lower depending on business needs).

                                              1. 1

                                                So, it looks like a two-row version basically, doesn’t it?

                                                1. 3

                                                  I disagree, since the two-row version is really just representing both directions of an edge on a graph, where the verts are friends and then edges are friendships.

                                                  This version is a bit dumber and just says “hey, these friends all share a friendship”.

                                                  1. 2

                                                    The OP explicitly described a situation where friendships existed only between pairs of people. Your version throws away this invariant, which could be useful in other situations, but would allow invalid data in the context described in the article.

                                                    1. 1

                                                      I don’t enforce that invariant, but you could via triggers or constraints.

                                                      My point was to offer a third option in the solution space, not a good one.

                                                      1. 1

                                                        No, absolutely. I don’t judge! The idea is to investigate the design space.

                                                        Now that I understand your idea better, I think that it’s actually an equivalent of a one-row solution :)

                                                    2. 2

                                                      So, we’ll have for the friendship table:

                                                      | id | metadata | created_at |
                                                      | 35 | {}       | 2021-10-20 |
                                                      

                                                      For friends_friendship_join:

                                                      | friend_id | friendship_id |
                                                      | 3         | 35            |
                                                      | 5         | 35            |
                                                      

                                                      And then to query a list of friends of Alice, we use:

                                                      SELECT friend_id 
                                                      FROM ffj 
                                                      WHERE friendship_id IN
                                                           ( SELECT friendship_id FROM ffj WHERE friend_id = 5)
                                                      AND friend_id <> 5;
                                                      

                                                      right?

                                                      1. 1

                                                        I believe so.

                                            1. 0

                                              Isn’t storing mutual friendship seperately itself data duplication? It’s derivable from your friendship table, we would want it to he such that deleting a friendship would automatically delete the mutual friendship if any, that’s a good thing to put more thought on.

                                              1. 2

                                                We do not store both tables simultaneously. Friendship and mutual friendship are alternative designs that we discuss.

                                              1. 2

                                                The first version of this query was buggy, because I carelessly used the obvious-looking condition “WHERE user1_id = 5 OR user2_id = 5”. This condition is wrong.

                                                For the slower among us, what makes this condition wrong? Is it because the SELECT only gets one of the two user_id values?

                                                Both of those models frankly feel somehow weird, they go strongly against the usual effortlessness of relational database modeling. Maybe this is because of the additional invariants that are not handled directly by the table structure?

                                                It feels like the “right” approach would be some kind of set type, where the values are collections unique unordered user_ids. Then the table constraint is |set| = 2 and friendship tests are select ... where 5 in set. But I don’t know if any SQL databases have a set data type.

                                                1. 1

                                                  I think I wrote something like

                                                  SELECT user2_id
                                                  FROM mutual_friendship
                                                  WHERE user1_id = 5 OR user2_id = 5
                                                  

                                                  And this version survived several minutes of writing until I realized that it’s wrong.

                                                  You could rewrite it something like (as suggested on Reddit):

                                                  SELECT CASE user1_id WHEN 5 THEN user2_id ELSE user1_id END
                                                  FROM mutual_friendship
                                                  WHERE user1_id = 5 OR user2_id = 5
                                                  

                                                  I guess the idea of that sentence is that this “double-part” complexity needs to live somewhere, and any query would be somewhat awkward. Maybe I should rewrite it a bit better.

                                                  It feels like the “right” approach would be some kind of set type

                                                  I tried playing with that idea, but so far all attempts, when compiled to a classic relational framework (scalar columns), do not become more beautiful.

                                                  My idea was also that maybe we should just treat each friendship as a ordered tuple (with tuple-typed columns), but that does not allow elegantly querying list of friends.

                                                  1. 1

                                                    if you don’t have any other fields in the friendship table, maybe it’d make sense to store the undirected friendship edge as two rows in the table. An undirected graph with edges (v, u) can be converted to a directed one if you store two symmetric edges (v -> u), (u -> v). You can still INSERT and DELETE like now, and you can manage the symmetric edge with triggers. Perhaps this approach will simplify all those things that appear because of symmetry. I know you say this in the article, but n versus 2n isn’t that big a deal.

                                                    1. 1

                                                      that’s what we started from, the two-row representation. In this thread we’re trying to go higher, beyond that.

                                                1. 4

                                                  You could have single-row “core” table and then create view for simulating two-row view to simplify queries (my friends are friends where user_1 = my_id). This would provide best of both of the two worlds.

                                                  1. 1

                                                    That’s true. However if you implement this view through UNION ALL it’s possible that it would be later used in some ad-hoc analytic query and the performance could be non-obvious (and hidden by the view). It’s manageable, but it needs to be kept in mind.

                                                  1. 1

                                                    Why not using a single row model with additional column for the friendship status (mutual, forward and backward, canceled)? This would be better in term of storage and will permit easy analytics

                                                    1. 1

                                                      Would you mind producing an analysis of such schema in the same vein as presented in the article? Four typical queries, storage requirements, invariants to be preserved, possible anomalies? Then we would all see how exactly it’s better and easy.

                                                      1. 1
                                                        • Establishing friendship: pre-processing is needed before INSERT, like the single row model;
                                                        • Deleting friendship: pre-processing is needed before UPDATE, no need for a DELETE;
                                                        • Getting the list of friends: two-part query is needed;
                                                        • Are they friends?: pre-processing (or two-part query) is needed;
                                                        • Storage requirements: optimal data size, but an additional index would be needed;
                                                        • Potential invariant violations: two symmetric rows (this can be solved with a constraint, the first user_id will be the lower); wrong order of IDs.
                                                        • You can easily extract the top friendship requestor or the top friendship “acceptor”
                                                        1. 1

                                                          That’s the summary of something that we don’t see, there is no detailed explanation. You understand your own idea and it may be obvious for you, but I don’t really get how the additional column would work, what does “forward and backward” mean, why do we need “cancelled”, etc (given that we’re interested in mutual friendship). And especially I don’t understand how the additional column both improves space and simplifies analytics.

                                                          What I’d like to see is table schemas, how the data looks for a friendship between alice and bob, how the four SQL queries look like. At the moment I’m in the blind.

                                                          1. 1

                                                            Gotcha. I am working at the moment, then not too much time to prepare scripts. I will provide scripts and query in the late afternoon or tonight.

                                                    1. 2

                                                      This came up for me in a prototype app about ten years ago. I found that I needed to represent the asymmetrical relation, whether or not the UI displayed it, because to verify a mutual friendship you have to get both people to attest to it.

                                                      In other words, I had to model “A claims to be friends with B” in the database. I added this relation when a user indicated they were friends with another user. But the important property (for the UI and for access control) was still mutual friendship, so I queried for “ A claims to be friends with B and B claims to be friends with A”.

                                                      I have trouble imagining how you’d prove that two users were mutual friends without a two-step process like this.

                                                      1. 3

                                                        You will have something like “friend requests” with “approve” and “reject” operations. This is trivially modelled in a separate table.

                                                        The article discusses what to do when the friend request was approved.

                                                        1. 1

                                                          You could do it that way. But in my app there was meaning to the unidirectional “follow” relationship, and the “mutual friends” one was composed of a pair of “follows”.

                                                      1. 8

                                                        Choosing a random order is not a good idea, because there is a chance that both possibilities would eventually get inserted, and what would that mean?

                                                        We are just now going into the development of this very feature. And I have a feeling we have not taken this into the consideration. Thank you for saving us hours!

                                                        1. 2

                                                          haha awesome. Please share your experience with the schema that you decide on, later on when you’ve got some.

                                                          1. 2

                                                            You can use a cryptographic operation like the one described in this recent post: https://lobste.rs/s/ousoal/how_play_poker_by_mail_without_trusting

                                                            Basically, there is a single row, single column model too. I don’t think it’s performance would be good for anything but pairwise friend testing. Basically any operation with the commutative property and a wide enough range to avoid collisions works. “Sort them and make them a tuple” is just an intuitive function with the commutative property that implicitly has the range needed.

                                                          1. 2

                                                            I have a much much better explanation.

                                                            Ignore all this hand waving warm fuzzy waffly “Single” and “Responsibility” or “Requirement” shit.

                                                            Stick to the Stroustrup Principle.

                                                            Bjarne Stroustrup: My rule of thumb is that you should have a real class with an interface and a hidden representation if and only if you can consider an invariant for the class.

                                                            https://www.artima.com/articles/the-c-style-sweet-spot

                                                            Now write down an expression to evaluate whether, for this instance, that invariant holds.

                                                            Is it of the form… exp1( a1, a2, …ai) && exp2( ai+1, ai+2, …ai+n) …where ai for i = 1..n are instances variables?

                                                            Then clearly you could decompose this object into an object containing two objects one with invariant exp1( a1, a2, …ai) and the other with invariant exp2( ai+1, ai+2, …ai+n)

                                                            Duh! Obvious go and do that. Your code is simpler, more reusable, more testable.

                                                            If you can write the invariant in the form….

                                                            exp1( a1, a2, ...ai) || exp2( ai+1, ai+2, ...ai+n)
                                                            

                                                            You’re obviously have some pretty confused shit going on… don’t do that, break it into two independent objects with invariants exp1( a1, a2, …ai) and exp2( ai+1, ai+2, …ai+n) respectively. and then think damn hard about what you’re actually mean because I think you might be missing something.

                                                            And ps: The LSP is not a warm fuzzy guideline, if you violate LSP you have a bug. Simple as that. Violate LSP you have a bug, you might not be able to reproduce it from your UI right now, but it will mysteriously and randomly leap out and bite you if you change your program or reuse your code.

                                                            ps: LSP is really all about invariants.

                                                            ie. Ignore SRP, look at your invariant… are the simpler, more compelling classes visible by decomposing the large class into simpler classes with simpler invariants. Can you build up a complex invariant from simpler more compelling classes?

                                                            1. 1

                                                              Your rule of thumb is strongly biased towards classes (because instance variables). But the article also mentions modules, which are more interesting topic. How would you handle those?

                                                              1. 2

                                                                Depends critically on what you mean by a “module”. It’s one of the most overused and under defined terms in the industry.

                                                                In the C++ world they have only been formally defined in C++20

                                                                If you’re using “warm fuzzy whatever you want them to mean” words…. expect “warm fuzzy whatever I want them to mean” advice….

                                                                So what I mean by module is “collection of functions and variables, some of which are publically visible outside the module, some are not”. Whether this is a file or a directory or a library or a collection of submodules…. remember we’re warm and fuzzy here so I don’t care. (Hint: By “visible” I mean the build will break if if something in another module attempts to reference or use a non public symbol. I don’t care about any other definition. You can wave your hands all you like, but if the build doesn’t break, your definition means nothing.)

                                                                The core ideas then are…

                                                                • Encapsulate state.
                                                                • Reduce scope (reduce the number of things that are visible at the outer most scope).
                                                                • Remove all cyclic dependencies.

                                                                Prefer “non member, non friend” functions to methods to improve encapsulation, and then Ban cyclic dependencies between modules, and then focus on Reducing Coupling and Enhancing Cohesion.

                                                                Between module coupling is bad and should be refactored where appropriate to weaken it. Especially the various horrid connascent flavours of coupling https://connascence.io/ i

                                                                Within module cohesion is Good, and any function or class that is not cohesive with the rest of the module should be pulled out of the module into it’s own.

                                                                Tools for doing this are the “I” and the “D” in SOLID.

                                                                I really don’t care about domain level “requirements” in module layout. I care about reducing the number of things I need to know and worry about before making a beneficial change.

                                                                In fact that is the only design criteria when operating at large scale. “Reducing the number of things I need to know and worry about before making a beneficial change.”

                                                                1. 1

                                                                  Thanks for the link, I’ll take a look.

                                                                  I’m not sure why you insist on defining modules through “C++20”, the original article discusses SRP which is supposed to be relevant (almost) anywhere. I would say that the idea of a “module” is as defined as the idea of “SRP” for purposes of that discussion.

                                                                  1. 1

                                                                    That the problem… the article doesn’t define what a module is beyond the somewhat recursive notion of “it’s a single requirement”.

                                                                    Ruby’s modules are different from C++20’s modules and C doesn’t have modules… I bet half the confusion and debate would go away if people actually knew what the other guy meant by “Module”.

                                                            1. 3

                                                              I have a highly-downvoted answer on Stack Overflow arguing that you should avoid using the array datatype in Postgres because it violates 1NF. I see there is room for debate on whether or not that is true, but from a pragmatic standpoint, most ORMs and database abstraction layers I’ve used have trouble with array and nested table datatypes because the implementation is inconsistent across databases. The closer you cleave to the commonalities between databases, the less you tend to suffer, and since there is a 1:1 mapping from nested arrays to a tabular representation, you’re not really gaining anything meaningful by using the nested array implementation.

                                                              I am in general happy to marry my applications to Postgres, but the benefit should outweigh the drawback.

                                                              1. 2

                                                                Interesting that you begin with “because it violates 1NF” and then spend the rest of the comment expanding on a completely different reason.

                                                                Personally I probably wouldn’t be using arrays (unless it’s in really, really sweet spot), but at some point I realized that they conveniently keep the order of items, and that made me much warmer towards the entire idea.

                                                                1. 2

                                                                  What I was trying to convey is that this used to seem very clear-cut to me but that I now see that it isn’t clear cut, however I would avoid doing it for pragmatic reasons aside from whether it is or is not 1NF.

                                                              1. 2

                                                                What about dependent requirements?

                                                                1. 1

                                                                  I’m not sure if I understand the question. I guess you refer to this sentence from the article:

                                                                  The single responsibility principle states that a class should at most be responsible for implementing a single independent requirement.

                                                                  I think about requirements through the lens of concatenability principle: https://minimalmodeling.substack.com/p/concatenability-principle. If we have a “liked tweets” and “bookmarked tweets” features than the “tweet page” depends on both of them, and I guess it makes them dependent?

                                                                  But I think that there is nothing wrong with such “umbrella” requirements. The most important umbrella requirement is the website front page, for example.

                                                                1. 2

                                                                  First, our toolkit should only allow fetching data by the array of IDs, and never by a scalar ID.

                                                                  Disallowing scalar ID access makes it a bit easier to prevent the so called N + 1 query problem (also called shotgun queries).

                                                                  So, we want to encourage the developer to think in batches from the very beginning.

                                                                  I like these nudges. How about also forcing you to be explicit about when a query happens? In Django it’s pretty implicit:

                                                                  users = User.objects.all()  # not sent to the DB yet
                                                                  users = users.filter(created__gt=something)   # not yet
                                                                  users = users[:10]  # not yet
                                                                  
                                                                  for u in users:  # now it sends it!
                                                                      ...
                                                                  

                                                                  Something like for u in users.execute(): would remind you where the round-trips are happening.


                                                                  Should business logic follow the same principle (everything is batches)? For example, you might have some rules that decide whether a comment is visible to a given user:

                                                                  class Comment:
                                                                      ...
                                                                      def visible_to(self, user):
                                                                          return (user.is_moderator
                                                                                  or self.owner_id == user.id
                                                                                  or not self.is_draft)
                                                                  

                                                                  Maybe at first you only use it for checking whether to 404 on /comments/<comment_id>, so there’s no N+1 query. You just load one comment and then call this method once.

                                                                  Then later you want to render a whole list of links, and hide the links that aren’t visible to the current user. Now you do have an N+1 query. How should visible_to() have been written?

                                                                  def select_visible_to(comments, user):
                                                                      if user.is_moderator:
                                                                          return comments
                                                                      else:
                                                                          return comments.filter( Q(is_draft=False) | Q(owner_id=user.id) )
                                                                  

                                                                  Something like this?

                                                                  1. 1

                                                                    a) yeah, it would be nice to have all three ways to initiate execution: lazy, explicit and implict. But this question is language-level: some languages have a “null-syntax” feature, that is it allows you to call a function when it is not obvious from the syntax that it’s going to happen. I believe that it would be better if the code execution would be explicit, but this is apparently a matter of debate.

                                                                    b) yes! That was one of my personal insights, when I realized that it makes sense to design high-level api batch-first too. Yes, something like your select_visible_to() function.

                                                                    I have a small social network (with pretty advanced access control) as one of the motivating examples, and I’ve been thinking about how to efficiently eliminate branches, like the one you have in user.is_moderator condition. Basically it’s similar to the idea of code vectorization.

                                                                    Thank you,

                                                                  1. 6

                                                                    The last point is also known as boolean blindness. However, I always found the support for enums (let alone adt) in databases to be quite poor, there isn’t much in term of validation or constraints :(

                                                                    1. 3