1. 1

      The downside to using RANK/DENSE_RANK/ROW_NUMBER is that they don’t work with group by. As your example below demonstrate, you need to use more than one query and join the results. Another downside, which is probably more significant, is that rank etc. requires a sort. Using max(array[]) keeps a “running max” in memory dosen’t need any type of sort.

      1. 1

        I don’t understand what you mean by saying they don’t work with group by. Window functions have PARTITION BY syntax for evaluating over groups.

        You are definitely right that performance could be improved on huge datasets by using your array trick. I would still prefer the window functions unless this query is a prominent bottleneck.

        1. 1

          Window functions cannot be used with group by. What you did was to combine the window function with DISTINCT ON, so you essentially calculate the aggregates for the entire set, and then take only the first row.

          This is different than how a group by works in the sense that group by applying “reducing” rows to group while computing the aggregates. Window functions operate on the entire set (w/o distinct on you would get duplicate rows).

          1. 1

            I’m sorry, now I’m really confused. I did not write DISTINCT ON.

            1. 1

              It is a unique way of phrasing it, but if I were to guess, he’s saying: “What you [must have] did [at some point] was to combine the window function with DISTINCT ON [which while similar, has important differences]”

      1. 3

        The correct answer is undeniably 4 2

        https://tryapl.org/?a=8%F72%282+2%29&run

        1. 6

          https://i.imgur.com/KWAlJMw.png

          I’m not sure what I was expecting, but APL seems to have taken the principle of least surprise and inverted it.

          1. 2

            Ahahahaha thank you for that

            1. 1

              Very few programming languages allow multiplication by juxtaposition. I can only think of Maple and Julia which do. Maybe Mathematica too? Not familiar enough with it.

              Juxtaposition is probably the most common notation for multiplication after you leave high school.

          1. 3

            This is just a gorgeous HTML-ification of a classic paper/talk. I wish it had a little note for folks like me who just clicked into it. I’ll have to set some time aside to read this. Great work. Also it would be interesting to see where/how this connects to J, which also I just now found out about.

            1. 4

              J is probably most directly influenced by another Iverson paper, “A Dictionary of APL” [1] as described in “An Implementation of J.” [2]

              1: https://www.jsoftware.com/papers/APLDictionary.htm

              2: https://sblom.github.io/openj-core/ioj.htm

            1. 10

              I once had a thought that it would be interesting to see a treatment of category theory in APL. Then i went back and read Iverson, and he pretty much had it already. That guy was so far ahead, the rest of computing still hasn’t caught up.

              1. 28

                The frustration of the author in their Lobste.rs bio is palpable:

                If you take a look at my submissions, you’ll see most of my work is ignored and my book recommendations have been most popular, as of the time of writing this; I believe this is because, no different than Hacker News, this website is mostly populated by people with only surface-level interest in computing, if that; meanwhile, a book recommendation lets everyone have their opinion and they give it points because they easily understand it.

                While frustrated, I intend to keep this account until it completely ceases to serve me, at which time I’ll do my best to delete everything ever made under this account.

                Marketing yourself is difficult, but having distain for your audience because they will not recognize your obvious intelligence is a trap. Congrats on discovering that a clever project name will get you some self-gratifying attention, but for long term success have some humility and respect for your fellow man.

                1. 9

                  People may be expecting to see some moderator action here…

                  I honestly don’t know what to make of the article. I don’t find the jokes in it funny, nor do I find them appropriate for this forum (“homo” is not simply a silly word; it causes real harm). I don’t encourage personal attacks in all but the most exceptional circumstances, but I did find that your comment provided helpful context, and you were pretty restrained. Ultimately, I think I’m glad that you commented as you did. For the sake of civility, I want to encourage you not to get drawn into back-and-forth about this; I think your top-level comment stands well on its own.

                  1. 1

                    I don’t find the jokes in it funny, nor do I find them appropriate for this forum (“homo” is not simply a silly word; it causes real harm)

                    Isn’t this referring to homoiconicity? The author states his language has this property. “Homo” has uses beyond the offensive type you’re referring to, it’s a latin prefix - homogeneous, homophone, homoiconicity.

                    I’m not trying to say it’s inoffensive to everyone - I just don’t draw the conclusion that the author using it in this way.

                    1. 3

                      I felt the entire joke of the first paragraph was that it’s using a bunch of funny-sounding words that are often considered inappropriate, and claiming they’re being used solely for their technical meanings.

                  2. 4

                    How about engaging with the content rather than bringing in the author’s off-topic profile text? Congrats on fulfilling their expectations.

                    1. 4

                      Thanks, but I’m going to let my comments stand. The tone and purpose of the article I think is best explained by his bio.

                      This article is not a serious attempt at coding. If you think I’m off-topic, fine. But software is more than just code. I’ll take clarity and respect over cleverness and contempt every time.

                  1. 2

                    I would love to read the implementation but I think there’s some encoding problem. None of the APL symbols are rendered properly for me. For example I see ⍝ where I would expect ⍝.

                    1. 1

                      The link is written in this way:

                      <a charset="UTF-8" href="masturbation.apl">implementation</a>
                      

                      This didn’t correct it, however. Your browser probably gives you the option to change the character encoding of a document manually, but I’ll change the link to behave properly if it’s a matter of changing this tag.

                      1. 4

                        Your server isn’t sending a content type or encoding header with the page itself. The charset attribute on the anchor isn’t supported by any browser. I don’t know of a way to change the encoding client-side in mobile Safari, but you are right it can be changed in most desktop browsers.

                        As @spc476 said, the best way to correct it is to configure Apache to deliver files with .apl extension with a Content-type: text/plain; charset=utf-8 header.

                        1. 3

                          Another way to fix it with Apache is to add a AddDefaultCharset directive to the configuration.

                          1. 1

                            I wonder why UTF-8 is not the standard default encoding for HTTP.

                            1. 10

                              because HTTP predates UTF-8 and wide adoption of HTTP predates wide adoption of UTF-8

                      1. 4

                        This has been my main criticism of 12 factor as well. So many instances where something has gone wrong because of ENV variables where a simple config file would’ve fixed everything. But they have their place and sometimes they are the correct solution. Still not a fan.

                        1. 3

                          Another issue with using env vars is that any part of the program can use them. It doesn’t force the developer to make the configuration schema explicit.

                          I have been in situations where different part of the program were loading different environment variables, because they were designed by different people. It becomes a mess quite quickly.

                          1. 5

                            I have been in situations where different part of the program were loading different environment variables, because they were designed by different people. It becomes a mess quite quickly.

                            Ick. Only the entrypoint of the program (func main or equivalent) has the right and responsibility to take information from the environment and provide it to components that need them. Corollary: if a component needs a bit of config, it should take it in the form of an explicit constructor or initialization parameter, never by implicitly reaching into the runtime environment or the global namespace.

                            1. 1

                              I like your point of making “configuration schema explicit” but I don’t know that any popular config file format actually does that. I have an idea that an application should always read its configuration from a database. An in-memory SQLite db would be sufficient for many purposes.

                              The “config file” is just a dump of a database from a known state. Its format is portable, editable, and standard, and any arbitrary data schema can be encoded in the relational model.

                              At startup time, the application initializes the database from this dump “config file” and also loads commandline and environment parameters into the database. From that point on, all components obtain configuration by SQL query.

                              1. 2

                                I like your point of making “configuration schema explicit” but I don’t know that any popular config file format actually does that.

                                Commandline flags as the only (or primary) way to get configuration from the environment into the program has this side effect: yourprogram -h authoritatively describes the configuration surface area.

                                Self-promotion: https://github.com/peterbourgon/ff

                          1. 2

                            I have come to believe that secrets should always be passed by reference (usually a path in the filesystem), not by value. This holds true for configuration files as well. If you are able to enforce that consistently, suddenly it becomes a non-issue to log environment variables or dump the config file for inspection. Which makes a whole set of other activities like debugging much easier.

                            1. 5

                              I have come to believe that secrets should always be passed by reference (usually a path in the filesystem), not by value.

                              I like passing them as a file descriptor, because it really truly is a capability: unforgeable yet shareable.

                              1. 1

                                That’s a good idea. Are you able to apply this in the container world or did you create your own special scheduler?

                                In Kubernetes the canonical way is to mount the secrets on disk, which makes them vulnerable to file-traversal attacks if there are any.

                                1. 1

                                  I haven’t done it with containers, only with processes. It should be possible to inject into a container, but I don’t know how well the tooling supports this. Probably not well — POSIX file descriptors are criminally underknown.

                                2. 1

                                  I’m guessing you mean to use something like file descriptor redirection in a shell command, e.g.:

                                  python my_script_needs_secrets.py 3</path/to/secret
                                  

                                  Then inside the process:

                                  secret=os.fdopen(3).read()
                                  

                                  This is a great approach for security, but how does it scale with multiple secrets? Do you use a separate descriptor for each one, or cat them all into the same descriptor? How do you organize your app to know which descriptor contains the secret data?

                                  1. 1

                                    When I’ve used the technique, I’ve just used a different descriptor for each, but one could send a bunch of secrets down one descriptor in some format if one wished.

                                    The mapping of descriptor to schema is part of the documentation, typically a README (this is all for internal software, often just for my own use).

                              1. 3

                                If you needed a signed JSON object to retain JSON structure, why wouldn’t you add a valid JSON envelope with the token and the original payload as attributes like so:

                                {
                                    "header": {
                                        "alg": "HS256",
                                        "typ": "JWT"
                                    },
                                    "token": "XXXXXXX",
                                    "payload": {
                                        "sub": "1234567890",
                                        "name": "John Doe",
                                        "iat": 1516239022
                                    }
                                }
                                
                                1. 8

                                  Imagine that the sender of the JSON document is Node and the ECMAScript JSON API, and the recipient of the document is using Rust and Serde.

                                  Most cryptographic algorithms, including hashing functions, operate on bytes. So to take the hash of that payload, you need to decode the entire JSON document, pull the payload object out of memory, re-encode it as a JSON document, and perform the hashing algorithm on that. When you do this in Node, you’ll wind up hashing the ASCII bytes of {"sub":"1234567890","name":"John Doe","iat":1516239022}, getting 3032e801ce56c762a1485e5dc2971da67ffff81af5cc7dac49d13f5bfbe95ba6. Also, because of the way objects are represented in Node, seemingly innocuous changes to the code can result in the keys being in a different order when you initially build an object, but Node does preserve order in JSON documents when it decodes and reencodes it. (node also does not provide any good APIs for manipulating the order of keys in objects, as far as I know, because ECMAScript actually says that the order is unspecified)

                                  Serde, on the other hand, does not preserve order when you decode a JSON document. There are basically two common ways to decode a JSON object: you can decode it into a HashMap, which literally randomizes the order, or you can decode it into a struct, which if you re-encode it, will encode it in the same order that the struct is written in. So, given this code:

                                  #[derive(Deserialize,Serialize)]
                                  struct FullMessage {
                                      header: MessageHeader,
                                      token: String,
                                      payload: MessagePayload,
                                  }
                                  #[derive(Deserialize,Serialize)]
                                  struct MessagePayload {
                                      iat: u64,
                                      name: String,
                                      sub: String,
                                  }
                                  

                                  If you decode into a FullMessage, and then re-encode the MessagePayload, you will wind up hashing {"iat":1516239022,"name":"John Doe","sub":"1234567890"}, which hashes to 907b71ecd7dbc6cb902905e053fe990ed5957aa5217150b2355c36583fcf9519. It will, thus, report that the payload was tampered with, even though both versions of the payload are equivalent for your purposes.

                                  Because the JSON specifications say that order is not important in an object, both behaviors are spec-compliant.

                                  1. 3

                                    Gotcha. I don’t use Node or Rust, but I can understand how different JSON libraries could make this a problem. What if the payload was serialized?

                                    {
                                        "header": {
                                            "alg": "HS256",
                                            "typ": "JWT"
                                        },
                                        "token": "XXXXXXX",
                                        "payload": "{\"iat\": 1516239022, \"sub\": \"1234567890\", \"name\": \"John Doe\"}"
                                    }
                                    

                                    In this form, the token is computed and verified on the given bytes of the serialized payload, so differences in parsers should not matter.

                                    1. 1

                                      That would totally work. It’s basically the same as the OP’s recommendation (serialize your JSON, then concatenate it with the signature) except you’re using a much more complicated way of “concatenating” them.

                                      1. 3

                                        Right, but the result is still valid JSON, which was the problem they raised with “just concatenation.”

                                        1. 3

                                          Technically, yes. Unfortunately, whatever signature-unaware middleware you’re using won’t be able to get at the JSON keys and values within the payload part. Most people deploy such middleware specifically because they want to be able to filter or route based on the contents of the message, and you lose that.

                                1. 4

                                  Biggest one to me would be better use of generic data types. That always seemed strange to me coming from python or clojure.

                                  1. 1

                                    [Edit: I made the mistake of posting before reading the link. I think there’s some answers to my questions in the article.]

                                    What do you mean by “better use?” Racket has native support for lists, vectors, hash tables, and boxes. There are generic routines for treatment as sequences or dictionaries. Are there other types of data structures you would use, or is it something else I don’t understand?

                                    1. 2

                                      Racket has good support for generics, but the “default” functions are list-specific. For instance, to get the length of a vector, you must use sequence-length or vector-length rather than length.

                                      1. 1

                                        Does anyone know why that seems to be common with lisps? (Or am I wrong?) It seems like there aren’t really lisps that do duck/structural typing or function overloading. Is it just that languages that do that use dynamic dispatch on objects for function overloading?

                                        1. 2

                                          I expect it’s just the influence of Scheme and Common Lisp overpowering everything else. It’s hard to generalize about lisps without taking their influence into account.

                                          Racket is actually above average in that it provides the functionality you want; it’s just that it was added later, and they didn’t feel comfortable adding the performance overhead of dispatch for a function everyone assumed to be about lists. For a long time lispers felt that lists should be treated as the only data structure that matters, and that swapping out vectors and hash tables should only be done in cases needing extreme performance.

                                          1. 1

                                            I guess If performance is the concern, you can still keep the type specific versions, just not be the idiomatic default. Clojure really does make working with sets and maps so much nicer (in my limited experience).

                                  1. 0

                                    This is so cool!

                                    ML seems like one of those domains you had to be born into to do effective work. When physics transitioned from the classical view to the quantum view, many of the older scientists found that they were out of their element and could not really do effective work in this new domain. ML seems quite similar in that you are either a data scientist as your full-time role, or you don’t understand what you’re doing. The distinction is pretty binary.

                                    Sure, you can throw together a hotdog-not-hotdog classifier in your spare time pretty easily. But you’re not really understanding anything. You’re just running some code that someone handed to you, and then hoping it can classify your hotdogs.

                                    I want to build an ML system to recognize where you are, spatially, from a mobile camera. E.g. given a video feed from a camera, give me a real-time feed of GPS coordinates accurate down to the inch. (You yourself have a knowledge of where you are, right now, and it’s accurate down to an inch. So don’t say it can’t be done; your brain already does it.)

                                    Yet that problem seems… well… like I said, you’re either a data scientist or you’re not. And I’m not. It’s rather like deciding to be an artist, then deciding your first painting will be the Battle of Anghiari.

                                    1. 3

                                      ML seems like one of those domains you had to be born into to do effective work. […] ML seems quite similar in that you are either a data scientist as your full-time role, or you don’t understand what you’re doing. The distinction is pretty binary.

                                      I disagree, the mathematics of logistic regression, support vector machines, feed forward/convolution/recurrent neural networks are pretty simple. Of course, you need to get a feeling for what regulatization, activations, initialization, etc. works, which are also mathematically simple, but can often only be verified to empirically.

                                      At any rate, when I started studying computational linguistics as an undergrad in 2004, most people were only using linear classifiers (maxent models and linear SVMs reigned). There were still quite a lot of people making rule-based systems. Now virtually everyone has transitioned to ‘deep learning’ (for the lack of a better name).

                                      The reason that it may appear as if people either understand it or don’t understand it, particularly in neural networks, is that once you understand the basics of a feed forward network (affine transformations, non-linearities, (multinomial) logistic regression, regression, back-propagation), most other networks are pretty straightforward. E.g. an unrolled recurrent neural network can be seen as a deep feed-forward neural network with weight sharing between layers.

                                      1. 1

                                        iswrong, I see what you did there! So, what it is like to be a quantum physicist watching your classical physics peers fall by the metaphorical wayside?

                                        shawn, every journey begins with a single step. Can you make one that simply differentiates between “indoors” and “outdoors”? Or one that can tell you the latitude of any given input of photograph+timestamp? (You’d be forgiven if it fails on photos that do not include any sunlight.)

                                      2. 2

                                        But you’re not really understanding anything. You’re just running some code that someone handed to you, and then hoping it can classify your hotdogs.

                                        I have some maybe cynical responses to this. First is that this appears to be imposter syndrome at work. You probably greatly underestimate the proportion of working data scientists who “really understand” the tools and models they apply to problems. The ones who do are on the vanguard! For every one who can push the frontier of understanding, there are a hundred who just need to get a job done.

                                        Is that a fair criterion then? If you have achieved your goal by “just running some code” does that diminish the result?

                                        We’ve experienced this same sort of controversy in software engineering too. Are you a real programmer if you don’t “really understand” the underlying hardware and software models?

                                        1. 2

                                          It’s similar to programmers. In order of increasing expertise you have: end-user programmers, framework/library authors, language/compiler/runtime developers.

                                          In the ML world, you have people adjusting pre-trained models / running models against their data, model developers, and researchers working on new model types. I wouldn’t expect every user to be a ML researcher, just as I wouldn’t expect everyone to work on the compiler for their language. And just because someone has a job as a data scientist doesn’t mean they are researching new model types or whatnot. Perhaps they are, but it’s probably not necessary for all but the most novel use-cases. You can deliver a useful project with just better regression and classification.

                                        1. 3

                                          I have never really invested in learning org mode. I can’t really imagine planning my life on a laptop/desktop running emacs. What happens if I remember something that I need to add to my TODO list when I’m out for a walk or at a store? Does org mode actually solve this problem in some way that I’m not familiar with or does it just fail to work for people who aren’t permanently attached to emacs?

                                          1. 3

                                            I use and really like Orgzly on Android: http://www.orgzly.com/

                                            (I think there are some iOS apps also.)

                                            There are multiple ways you could use Orgzly. In my case I sync all my .org files[*] to my phone for browsing/searching but on mobile I add new entries to an “inbox.org” file and then go through that file every few days in emacs. This works well for the “oh I just remembered/realised this thing” case, and organizing/editing in emacs is easier than on a phone.

                                            I don’t use Dropbox so instead Orgzly uses a directory on the phone and SyncThing syncs that directory with my laptop. This has been pretty reliable for me (although I think it helps that I only add things in inbox.org on my phone and then empty this file on my laptop, minimum potential for sync conflicts.)

                                            [*] I have a to-do list but I also keep a lot of “personal wiki” type notes and ideas in Org.

                                            1. 2

                                              I don’t use Dropbox so instead Orgzly uses a directory on the phone and SyncThing syncs that directory with my laptop. This has been pretty reliable for me (although I think it helps that I only add things in inbox.org on my phone and then empty this file on my laptop, minimum potential for sync conflicts.)

                                              Orgzly and SyncThing is precisely my set up too, and it has worked unbelievable well. I have also figured out why, and it’s not because of the one-directionality of inbox.org – it’s because at any given point in time, I am either in a location where I can not add things on my computer, or am I in a location where my phone can sync its additions over wifi. So the only case where I get conflicts is when I accidentally have the wifi turned off on my phone, and it accumulates notes without syncing them.

                                              1. 1

                                                That’s good to know. I sync a lot of data with syncthing so it’s not enabled unless my phone is on wifi and also charging, so not as clear cut for me. But good to know it would work smoothly if I changed that.

                                            2. 2

                                              The article is posted on the blog of a mobile app (“beorg”) that syncs orgmode files via Dropbox, iCloud, etc and helps with the mobile editing affair

                                              1. 1

                                                Speaking for myself, I use my phone’s reminders for that, or random scraps of paper that go in my wallet/bag. Once a week (hah! in my dreams) I do a periodic review, and any reminders here will be moved to Org at that time.

                                                1. 1

                                                  Sorry for the two-month late reply, but here goes.. Honest answer.. :) When I need to add something to my home emacs while I’m out and about, I text myself a reminder to do it later. When I need to read something out of my home emacs, I use my phone to ssh into the house and start an emacsclient instance.

                                                  Why not use the remote solution to add something? Because it’s not quick and easy. The number one pain point is typing long passwords with a tiny on-screen keyboard–just shoot me! It’s not even maintenance free–I was messing with my bootloader on the home PC and left the machine non-operational.. Hopefully I’ll get around to fixing it tonight.

                                                  My point is, when you absolutely have to reach a file, ssh on mobile can do the trick–assuming that your filetype and editor both work over a terminal! (Ok, there’s also scp on mobile.)

                                                1. 1

                                                  Sphinx, with HTML generated output bundled into an image with nginx to serve it. CI/CD is the same as any app.

                                                  1. 2

                                                    This is nice. The best Makefiles are nearly empty and make heavy use of templates and implicit rules. I would make a couple small changes:

                                                    1. I’m not sure why the target that generates dependency Makefile fragments renames the generated file. This should work:

                                                      %.d: %.c Makefile
                                                      $(CPP) $(CPPFLAGS) -M -MM -E -o “$@” “$<”

                                                    2. You might want to prevent generating Makefile fragments for the clean goal. A conditional include can help:

                                                      ifneq ($(MAKECMDGOALS),clean)
                                                      -include $(DEPS)
                                                      endif

                                                    3. Remaking target objects if the Makefile changes can be simply:

                                                      $(OBJS): Makefile

                                                    1. 3

                                                      While I also do use templates and implicit rules when convenient (your example is certainly one of these), my experience is that Makefiles are best when they try not to be clever, and simply define straightforward from->to rules with no room for subtlety. As an example, make treats some of the files produced through chains of implicit rules as temporary, and will delete them automatically. In some cases, I have found this will cause spurious rebuilds. There is some strangely named variable you can set to avoid this deletion, but I’d rather such implicit behaviour be opt-in than opt-out.

                                                      Sometimes a little duplication is better than a little magic.

                                                      1. 3

                                                        Yes, the special target .PRECIOUS can be used to mark intermediate files that should be kept. Cf. https://www.gnu.org/software/make/manual/make.html#index-_002ePRECIOUS-intermediate-files

                                                        My recommendation for anyone who wants to learn to effectively use make: Read the manual. All of it. Keep it handy when writing your Makefile.

                                                        People have already done the hard work of getting it to work right under most circumstances. I don’t consider it clever to stand on their shoulders.

                                                    1. 0

                                                      Original paper does seem to hold up well under this reproduction study.

                                                      1. 8

                                                        That’s exactly the opposite conclusion I took from this paper. Did you mean “does not?”

                                                        1. 4

                                                          Damn, yes I meant does not.

                                                          1. 2

                                                            I agree with jec, most of the conclusions in the original paper do hold up under new analysis.

                                                            I think you meant “does not”.

                                                          1. 4

                                                            I reached out to David via twitter and it turns out we have some shared connections even though I don’t know his team first-hand. My team does quite a lot of our work in Clojure and we are hiring developers and data scientists. Email me [my username at apple] if you are interested.

                                                            1. 3

                                                              Now I’m curious. Can anyone recommend other visual representations of JOINs?

                                                              1. 2

                                                                Tables are probably the best way to illustrate joins. Cf https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators

                                                                1. 2

                                                                  Unfortunately my database doesn’t have an antijoin keyword.

                                                                  1. 1

                                                                    From the same blog: https://blog.jooq.org/2015/10/13/semi-join-and-anti-join-should-have-its-own-syntax-in-sql/

                                                                    Apparently Impala supports it, but in other database you can get the same result using one of:

                                                                    • SEL … FROM R LEFT OUTER JOIN S ON R.key = S.key … WHERE S.key IS NULL
                                                                    • SEL … FROM R WHERE R.key NOT IN (SEL key FROM S)
                                                                    • SEL … FROM R WHERE NOT EXISTS (SEL … FROM S WHERE S.key = R.key)
                                                                    1. 2

                                                                      r.key NOT IN (SELECT key FROM s) is not equivalent to NOT EXISTS (SELECT 1 FROM s WHERE s.key = r.key) - you can observe that when “s” contains a NULL “key”.

                                                                      1. 1

                                                                        Yes, the blog post I linked above mentions that you have to take special care if NULLs are present.

                                                              1. 2

                                                                I hadn’t thought about this until reading the article, but it seems as if the Venn diagrams are accurately representing something. If you take the full outer join as your starting point, then the Venn diagram represents which elements are retained, by projecting the elements of the full outer join onto the underlying relations A & B. Is that right?

                                                                If so, then I think they’re ok. While the actual mathematical explanation of the Venn diagrams is complex, I don’t think most readers are concerned with it. Or put another way, the diagram is correct from a perspective of maximum pedantry or minimum pedantry, but incorrect if you’re being mildly pedantic.

                                                                As far as teaching people, I think that the Venn diagrams are somewhat useful. Joins aren’t intrinsically hard, there’s just a lot of different options, and they’re not super-memorable (cross join vs. outer join, or left vs. right, for instance).

                                                                1. 3

                                                                  The Venn diagrams (when they overlap) represent operations among sets of the same (“union-compatible”) type. As the author points out, SQL includes these operations UNION, INTERSECT, and EXCEPT.

                                                                  A SQL join is a product among sets of possibly differing types. Using a Venn diagram to illustrate a SQL join only makes sense if your join key is covering, or if you are ignoring all non-key attributes.

                                                                  1. 1

                                                                    I think you misread me. I described a way of reading the Venn diagrams as representing things of the same type, namely subsets of the full outer join.

                                                                    1. 2

                                                                      Perhaps so! If I now understand what you mean such diagrams would have one circle (the subset) fully enclosed within another (the full outer join). That’s an interesting perspective, but it doesn’t define the meaning of “full outer join.”

                                                                      1. 2

                                                                        Good point. I’m coming at this from the perspective that what’s tough isn’t the concepts, just keeping the variants straight.

                                                                        That makes me more sympathetic to the article. If you’re trying to bootstrap an understanding of what a join is, the Venn diagrams are a potentially confusing metaphor/illustration.

                                                                1. 3

                                                                  I’m confused and disappointed by the reactionary, defensive comments here. Who’s ranting or yelling? Is the author accusing someone of stupidity if they cannot understand complex comparison expressions?

                                                                  I’m not ashamed to admit we enforce this rule at work. It yields consistency and simplicity that is important for us to cut out unnecessary mental overhead during code review. Across dozens of contributors and hundreds of repositories, it makes a difference. And happily, not one person of our team has rejected the rule out of arrogance that they are “smart enough” to eschew it.

                                                                  I think of it this way: if someone has written a complex boolean expression that is deeply nested and mixes the and/or/not operations arbitrarily, I suggest they apply De Morgan’s laws and simplify the expression into a normal form. It isn’t that I cannot understand a complex, non-normalized expression—I don’t want to have to!

                                                                  That’s all this rule is, a normal form for comparisons.

                                                                  1. 1

                                                                    The linked post contains the sentence

                                                                    This is such a nice way to express numbers I wonder why programming languages allow for the greater than sign ( > ) at all.

                                                                    If this isn’t indicative of a piece that’s more akin to a rant than a reasoned discussion, I don’t know what is.

                                                                    I fully support coding conventions that encourage consistent usage of range operators, but to go from there to actually removing a character from the ASCII set to address what is, in the grand scheme of things, a rather minor issue is pretty extreme.

                                                                    1. 1

                                                                      I don’t see it as lengthy, impassioned, or angry. It meets exactly none of the criteria I would use to qualify ranting. It’s merely controversial.

                                                                  1. 3

                                                                    I used APL to write my solvers last year, but I didn’t complete the whole set. I’ve already gotten more than 25 coworkers to join my private leaderboard this year. We all use Clojure so that’s what I’ll be writing my solvers in this year (but I might solve them twice so I get a chance to code in APL again! 🤓)