I appreciate the post, especially putting all the DDL in a <details>. I didn’t like how both credit events had the same date, so I put the 50 earlier than the 100.
I write T-SQL mostly; I didn’t know about the distinct on trick, that’s cool. I am trying to get better at PostgreSQL.
This is how I think about & approach the problem, with separate tables, row_number() & CTEs: db-fiddle.
Thanks, I just realized I accidentally put the same date on two records. Good catch!
The downside (and upside) to using row number is that it makes it harder to add other aggregates. I just added a bit about comments I got from readers with my thoughts on them.
The straightforward and portable solution I would reach for would involve the RANK() or ROW_NUMBER() window functions.
Looks like these are fully supported in Postgres: https://www.postgresql.org/docs/9.1/tutorial-window.html
Even Hive! https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
The downside to using RANK/DENSE_RANK/ROW_NUMBER is that they don’t work with group by. As your example below demonstrate, you need to use more than one query and join the results. Another downside, which is probably more significant, is that rank etc. requires a sort. Using max(array) keeps a “running max” in memory dosen’t need any type of sort.
I don’t understand what you mean by saying they don’t work with group by. Window functions have PARTITION BY syntax for evaluating over groups.
You are definitely right that performance could be improved on huge datasets by using your array trick. I would still prefer the window functions unless this query is a prominent bottleneck.
Window functions cannot be used with group by. What you did was to combine the window function with DISTINCT ON, so you essentially calculate the aggregates for the entire set, and then take only the first row.
This is different than how a group by works in the sense that group by applying “reducing” rows to group while computing the aggregates. Window functions operate on the entire set (w/o distinct on you would get duplicate rows).
I’m sorry, now I’m really confused. I did not write DISTINCT ON.
It is a unique way of phrasing it, but if I were to guess, he’s saying: “What you [must have] did [at some point] was to combine the window function with DISTINCT ON [which while similar, has important differences]”
Very nice, did not know the unlogged tables. Is there some way to transform them in logged one after?
Is there some way to transform them in logged one after?
Is there some way to transform them in logged one after?
I never tried it and I can’t think of a use case for it, but from the documentation of ALTER TABLE it looks like it’s possible.
If numeric type in PostgreSQL can go up to 16383 digits after the decimal point, how many comments can I store this way?
If we’re taking Postgres, might as well have parents INTEGER and save yourself a lot of messing about.
Or just use a recursive CTE
On the same subject, there is a discussion on the Django developers form about using black.
It pains me to see how happily people are getting herded around by GAFAM with dirty tricks like this, but I honestly don’t know what to do. I haven’t managed to convince even a single person to use “freer” software in years.
Once upon a time I would’ve tried to spread the Holy Word of open-source (or at least user-respecting) software like Firefox and Linux, but I’ve mostly given up on that. In my experience, “normal people” associate these things (yes, even Firefox) with “technical people”, so they’re too scared to try. (And actual technical people are too stubborn ;-D)
My last job was a shop of 20 or so developers. We all had the same standard-issue macbook. I could tell mine apart because I put a Firefox sticker on it. I evangelized quite a bit with my teammates. By the end of my three-year tenure there, I had convinced nobody that at least supporting Firefox in our web applications was neccesary, and nobody to use Firefox as their main browser.
I left that job a few weeks ago, and I learned from my old boss that he and two others started using Firefox because they missed me and my ridiculous evangelism.
It’s okay to give up, or take a break, from evangelizing because it’s kind of exhausting and puts your likability with others at risk. But I think it’s hard to tell when you’re actually being effective. There’s no real feedback loop. So if you’re tired of it and don’t want to do it anymore, by all means stop evangelizing. But if you want to stop just because you don’t think you’re not being effective, you may be wrong.
You can’t convince “normal” people by saying it’s “freer” and open source (unfortunately, not even more private) - they don’t care about this stuff.
I convinced my wife to try FF once quantum came out saying “it is much faster”. She got used to it.
I managed to convince a teammate to use FF because of the new grid inspector.
People only care about “what’s in it for me”.
Of course that’s what they care about, I know. IMHO not having a multi-billion-dollar corporation breathing down your neck is a pretty big plus, but I guess that’s just a bit too abstract for most people, as you seem to be implying.
I was wondering why Django does not generate an SQL migration statement with the CONCURRENTLY keyword directly.
I have found a recent discussion about it on the GitHub project: https://github.com/django/django/pull/10901
I’ve seen this PR. It essentally does the same thing described in the article, but using a custom migration command.
I feel the problem with this approach is that
As a visually impaired developer, I’ve both used and developed screen readers, particularly on Windows. I’ll be happy to answer any questions anyone may have. Ask me anything.
I have a code documentation and blog site. I’m pretty confident in the overall layout of the site and the prose - this is reasonably simple semantic html - but what about presenting code samples? My markup for those is kinda a mess, I’m not super happy with it semantically, but it renders fairly well to the eye after css styling. Lots of stuff like
<pre><span class="line-number">1</span><span class="comment">/* foo</span>
<span class="line-number">2</span><span class="comment"> stuff*/</span></pre>
Except some of my samples are hundreds of lines long. Not necessarily because all that code is worth reading in detail, but because I want to provide a complete example for copy and paste tinkering. It is meant to be skimmed.
Do you have any tips on a good way to better mark up those code samples to make them more usable to a visually impaired user?
When adding a “back to top” button, it’s often recommended to move the focus as well. What am I suppose to move the focus to? The first focusable element on the page? What if this element is below the fold? Should I focus the first element even if not focusable?
Can you give a rule of thumb to aria-live sections in SPA? Is it really useful? How should I move focus?
A11y now days feels a lot like developing a website 10 years ago where you had to test it on chrome and IE. You have JAWS, talk back and ten others times chrome and edge. TBH im still unable to install a screen reader on gnome in my language. It’s still too hard!
Having interviewed a lot of people, SQL is one of those things that everyone thinks they know well and very few actually do. In interviews now, if I see them claiming expertise, I ask if they can do something more than “SELECT * FROM foo” because so often that’s all it takes to “know SQL” on your resume.
Good database knowledge can be extremely valuable. I can’t tell you how many times I’ve seen code that makes four or five round-trips to the DB, grabbing data, processing it client-side, and sending it back that could’ve been done in one round-trip with a well-crafted SQL query.
There’s the other side too: I’ve seen some truly awful database designs that have required multiple round trips because the DB designer didn’t understand foreign keys or constraints or whatever.
Imo one of the best arguments against relying on ORMs is performance: in some situations a CTE or window function can get you one or two orders of magnitude improvement over a naive orm solution.
Nothing prevent you from implementing those edge case in your ORM though. I personally use SqlAlchemy, and I feel like it cover what I need as-is 90% of the time, and the 10% of the time left it gives me the tool to build more complexe queries and even allow me to keep using the “object mapping”. SqlAlchemy supports CTE and Window function!
For even too complexe query, it might also be possible to wrap them into SQL Function and simply map your ORM to this function.
Oh, to clarify, I think ORMs are great, I just don’t think it’s great to rely on them. They do 90% of the things for you, but you need to know SQL for the other 10% of cases. Some things aren’t like that, where there’s not such a wildly significant benefit as knowing SQL gives you in this case.
This is very true. It’s also helpful to know (in rough terms) what the ORM is doing under the hood, something that’s only really possible if you understand the SQL behind the operations.
Yep, Peewee orm also supports things like CTE, window functions, insert/on conflict, etc. The query builders for both libraries are very flexible, but you pretty much have to know SQL and then translate it into the APIs the query builder exposes. For experienced programmers this is no problem, but someone less fluent with SQL is going to have no idea how to use this advanced functionality, regardless of how it is exposed.
Definitely! My point was mostly about ORM and “advanced”/performant SQL not being mutually exclusive.
I like ORM or query builders not because I don’t know SQL, but rather because I detest the SQL syntax. I wish there were a better relational language. Maybe Tutorial D in front on Postgres’ storage engine.
Depends on how you interview me.
If you sit me down with a pen and paper and tell me to write a complex query…. I might well fail.
If you sit me down with say sqlitebrowser… and an editor side by side, I will rapidly iterate starting from very simple queries (yes, I will start with select * from foo) to as complex as need be.
And yes, I might occasionally refer to the syntax for the gnarlier corners.
But I will get it done pretty rapidly in a single very well crafted query.
Conversely, I’m a true believer in what CJ Date has been saying for decades…. so if you allow me to redesign the table definitions…. you’d be shocked by how simple my queries will be.
What would you consider beyond “SELECT * FROM foo"? I don’t touch SQL on a daily basis, but I could throw together a left, right, and full join if I needed to, and I’m aware of how to use subqueries. What SQL skills would someone have in order for you to consider them competent / good at SQL?
SELECT * FROM foo"
JOINs, GROUP BY, that sort of thing. If they’re going to be building DBs, understanding indexes, unique constraints, and foreign keys.
If you’re gonna be my lead database programmer/DB admin/DB architect, I’d want to see understanding of good database design (normal forms, good foreign key relationships, good key choices), CTEs, triggers (in your preferred database), being able to spot when a full table scan might happen, understanding when and how to use precomputed values for future queries, and so on.
The use of aggregate functions, defining new aggregate functions, the various joins, the 3VL truth table, grouping, subselects, CTEs, that kind of thing.
I like asking candidates
If you found a guy that thought about equal values ( rank / dense rank / row number ) you know he did some work. Hire him.
Haven’t touched joins yet.
I don’t know window function, but believe that I know join
It’s well worth the time learning window functions.
As Maruc Winand (who created use-the-index-luke.com and the newer modern-sql.com) says in his talk on Modern SQL there’s life before windows functions and life after windows functions.
I’d much prefer lateral joins be used in some of these cases.
I’ve seen no mention yet of EXPLAIN queries (Or similar operation in other RDMS than Postgresql?).
Never been doing a lot of SQL in the past, but lately I had to work with some complexe queries over large dataset and most of my learning involved playing with EXPLAIN [ANALYZE], understanding the result and tweak the base query to fix the performance issue. Once you understand that, you can work from there to find the best index, where to place subqueries, set large enough working memory, etc.
Just wanted to mention that there is an interesting bit about structural subtyping using mypy typing_extensions in the article.
Thank you for drawing attention to that section of your article. I have an instance of it whereby I have a variety of kinds of equipment that are rack mountable. In designing classes for this scenario you typically see some kind of abstract base class like physical-machine or rackable-computer or similar. I don’t like solving the problem that way, as those classes are not real in the same way a switch or a dom0 is. The equipment has compatible physical properties, more realistically expressed as an interface.
Seeing an example of accomplishing this in Python is a treat. I have not encountered Mypy before.
Mypy is a static type checker for Python. It’s still not as common as it should be, but it’s gaining traction.
The motivation for using a Protocol is static type checking. The protocol does not act as an interface or abstract base class as you might know it from other languages.
We also started with abstract base classes but we quickly realized they are not a good fit for Django models (as described in the article, they cause some problems with migrations).
If you like the concept take a look at go.