1. 5

    Getting back in to the structural editor game and working on OCell. I’ve been mulling this kind of thing over for years, and I still think there’s a lot of potential.

    1. 1

      This makes sense conventionally; if you make a small part better consistently, the whole will get better in the end. In this post, I will challenge that.

      Don’t bother challenging that. The accumulation of improvements is not worse than the unimproved.

      1. 4

        I think what they’re actually trying to get at is that local improvements can be global detriments.

        See also: microservices.

        1. 1

          It’s true, as is the converse. Which is why the prospect of deliberately not cleaning up after ourselves – globally or locally – is a silly one. Big, old, complex code bases are difficult to maintain in the best of worlds and refactoring them only in large swaths is economically infeasible for what I assume is the majority of teams.

      1. 1

        http://kevinmahoney.co.uk - mostly database stuff and an extension of my frustrations trying to create a sane database at a large ‘trendy’ company.

        1. 16

          This isn’t unique to the developer world, but it certainly makes writing technical blog posts exhausting! It is so easy to get caught up trying to tie down every lose end in your writing to avoid misinterpretation. It’s probably better just to ignore those kind of responses.

          1. 12

            Also: people don’t actually read what you’ve written, but will comment anyway. So if you’re not super careful with your title and first paragraph or two, you’ll keep receiving “criticism” that’s actually already addressed in the article itself, or they will take one thing you’ve said and argue heavily against that, while actually later on more nuance is added.

            The worst was when I wrote “Why is no one signing their emails?”; the article is about institutions like Amazon, your bank, etc. not signing their emails so it’s easier to identify phishing and such. Many parts about PGP/GPG are quite hard, but verifying signatures is quite easy. Of course, many people responded to the title only with commentary about how hard it is to use PGP to sign stuff. Okay … don’t disagree, but … that wasn’t what the article is about 🤷‍♂️

            As a result of all of this, I’ve become a much more careful reader before commenting myself, or simply abstaining from commenting 😅

          1. 2

            P.S. apologies for submitting this and then taking out a big chunk of the content afterwards. On reflection I thought the article would be stronger after taking out most of the opinion. The change is viewable here.

            1. 17

              I love this approach and have been using it exclusively for all my web apps for 5 years now.

              See https://sivers.org/pg for my thoughts on it

              And https://github.com/sivers/store for an example of it in action.

              It’s really nice to keep all of your data/business logic together with the data, so that your code outside of it can be just the controller and view.

              1. 6

                Your article goes much further than mine. I am really only arguing for some basic consideration and organisation of the database!

                I am totally on board with your approach, but I think it would be difficult to persuade my colleagues.

                1. 2

                  I actually liked your post more than OP, it’s a bit more nuts and bolts.

                  One thing I always struggled with a bit is version control of db functions, because small changes in a function would require a migration. Of course you could also have a sort of “load all functions” script you run every time you start the application (there’s no need to “migrate” because there’s no data that needs to be transformed). But then you’d also need some way to drop old functions you’ve deleted, too.

                  Regarding db views, a colleague used materialized views in a project but it caused nothing but trouble, with having to refresh them all the time and the definitions drifting against the db schema version. I’m not sure if that was just his inexperience with them or our framework not supporting them well enough, or what.

                  I’ve been pondering how to deal with a certain big query embedded in the programming language in a project I’m working on. Maybe I’ll experiment with ways of putting that into the database, instead.

                  1. 8

                    This is what I’ve just started doing and love it so far:

                    1. Put your tables in a separate schema from your functions.
                    2. So, one schema (“a”) is all tables with data, and another one (“z”) is all functions.
                    3. Whenever you make any changes to the functions, it’s simple to drop schema z cascade; create schema z;
                    4. … then load all functions again. Takes under a second. Data untouched.
                    1. 2

                      I take it that you have your functions in a separate file that you then import into schema z with the LOAD function?

                      1. 3

                        See https://github.com/sivers/store/tree/master/store for a real example. I like keeping each function in its own file named with the name of the function, for easy finding.

                        Then make.rb combines the views, helper functions, triggers, and api functions in that order into one “schema.sql” file:

                        https://github.com/sivers/store/blob/master/store/make.rb

                      2. 1

                        Nice!

                    2. 2

                      What’s your approach to testing in the database?

                      1. 4

                        I make some fixtures in the database. Sample data. Ideally something kinda close to real-world usage so I can play scenarios.

                        (“Veruca Salt wants her order now. Charlie Buckets is patiently waiting. The manager, Willy Wonka, can approve the orders…”)

                        See https://github.com/sivers/store/tree/master/store for an example.

                        The fixtures: https://github.com/sivers/store/blob/master/store/fixtures.sql

                        Then unit tests written against that data: https://github.com/sivers/store/blob/master/store/test-db.rb?ts=2 https://github.com/sivers/store/blob/master/store/test-api.rb?ts=2

                        (Those are in Ruby but you can use whatever.)

                        If you save all your schema.sql and fixtures.sql in files that are quick to re-load, and use a test database, then you can actually drop and rebuild the entire schema of fixtures/data inbetween each unit test.

                    1. 2

                      What is your favorite database embedded language? I am quite displeased at syntax of pl/sql in every dbms. What’s your opinion on pl/python? Is it easy to debug and version pl/* scripts?

                      1. 5

                        I’ve been using a lot of Postges with PL/* in the past. Depending on the exact implementation of both the extension and how the language works it can be cumbersome and inefficient to use non-PL/SQL (or PL/pgSQL).

                        For this reason I would start out with PL/pgSQL. For simple functions it will be okay, for more complex functions you really should read up on limitations of the particular language. Be extremely careful. With a database like PostgreSQL one is used to extreme reliability, data and access to data being very safe. This of course changes when you are interfacing a complete programming language.

                        I agree, pl/sql is cumbersome, but overall it is still a good starting point until you really have a reason to switch. It’s also good advise to not just switch away, because you feel more confident in another language, because you will usually end using a very basic subset and a lot of the expierence that counts is on handling the interface to the database in a secure (as in not messing up data) and efficient (so you don’t block too long) way.

                        Depending on what exactly you do this for and especially if you want to dig deeper it can make sense to look into creating own extensions.

                        Regarding debugging: From my experience things don’t get easier if you switch the language, again because a lot will be around interfacing with the language. Compared to normal operation (plain SQL) it’s easy to completely lock up your database, especially with bigger or not as widely used pl/* languages.

                        One last thing worth mentioning: Do have a look at what a database like PostgreSQL offers you on the SQL side. I’ve seen so many cases of where people both add complexity, sacrificing performance and data integrity because of only relying on ORMs and the most basic queries that work well with them. As soon as things become a bit more complex it’s really handy to know what your DBMS offers you. Views,, WITH queries (Common Table Expressions), dealing with and indexing JSON, virtual and computed columns can really make your live easier, without making a lot of these sacrifices. Sometimes these are better options than moving computations out of the database or creating complex functions or adding columns that are actually redundant as a workaround.

                        1. 4

                          If you’re asking me as the author of the piece, unfortunately I haven’t had the opportunity to dive in to embedded database languages. The idea that the database is a dumb store is quite deeply embedded in the companies I have worked for. Part of the reason I wrote this was to sway people.

                          Looks like sivers is the person you want to ask! :)

                          1. 3

                            I basically agree with @reezer, I’ve used PL/Python quite extensively, for over a decade, and in most respects it’s AWESOME to have python run against queries. But you have to be careful, putting python(or any language there) as it gives you total access to the language and the database, so while it’s totally possible to have a table query run out to python and then have python go call some random HTTPS service, and write a bunch of data out to disk, or call some other code somewhere all willy nilly, if you do those things, it will probably end up becoming very very painful. Just because you can do something doesn’t mean you should.

                            We haven’t had any issues with debugging really. You can test outside of the DB for all of the logic portions(using your normal testing practices), and then it’s just the interface, which is quite stable, so once you understand it, it’s pretty hard to screw up.

                            PL/PGSQL is definitely a great choice, as it doesn’t include the entire kitchen sink, so it’s harder to screw up. That said, most all of our stuff is PL/Python, since our main product using the DB is written in Python, it’s just easier to keep it all together in python. Overall it’s been great.

                            Definitely use a schema version control system, for us liquibase, but there are many other solutions out there. Liquibase has been fine for us, with no reason to move off of it in over a decade.

                          1. 1

                            This is super neat! I’ve been writing a lot of T-SQL recently and though I appreciate the database aspect, I feel like the abstraction powers of the language leave a lot to be desired. Composition and reusability has been my greatest desire, but I’ve also wondered what the best way to make SQL testable might be. It doesn’t feel like testing is something that can be done inside the system very easily/idiomatically. Rather, you need some sort of runner to handle it.

                            Some level of parameterization would be nice too. I had to write a lot of SQL that was using introspection and string concatenation, which is pretty janky. (Potentially a major security vulnerability too!) Being able to handle these operations in a safe way would be lovely. (Thinking of hygienic macros vs c preprocessor.)

                            On the other hand, I have also wondered whether a really expressive database language is actually a trap. If it’s expressive then you might be inclined to do more processing in the database, which could lead to worse performance.

                            Any thoughts on dumb vs smart data stores?


                            Noticed that this is written in Haskell. Have been wondering what a relational database with sum types might look like. Not really applicable to this since it is a transpired language, but would be a nice feature when working with such languages.

                            This blog post handles some patterns for working with sum types in SQL:

                            https://www.parsonsmatt.org/2019/03/19/sum_types_in_sql.html

                            Does anyone know of any database systems that work with sum types natively?


                            Apologies for rambling! Typing this out on my phone while waiting to eat. Girlfriend just got home so little time to revise!

                            1. 2

                              I’ve worked quite a bit with T-SQL in the past and I found that using the WITH statements and views to wrap the more dirty things that are occasionally needed for performance, the majority of queries can be kept short and easy to understand.

                              For testing, I had two different principles I used.

                              The first was for verifying that the query worked as expected. I used a WITH statement, wrapping the whole query and supply the data needed.

                              This doesn’t test for performance. I did that by creating a test instance of the database and doing tests in transactions that were never committed.

                              1. 1

                                Some level of parameterization would be nice too.

                                I think you might mean something different, but you can pass parameters in to the queries. Search for ‘export’ in the readme.

                                On the other hand, I have also wondered whether a really expressive database language is actually a trap.

                                I am very much on the pro-database side when it comes to writing apps. IMO performance issues are better solved by thinking about your data rather than arbitrarily putting a network connection between your data and your app, and in fact that can often make things worse (e.g. N+1 queries).

                                testing… sum types…

                                I am kicking the can down the road on these subjects :)

                              1. 3

                                you can’t join two tables containing the same field name without joining on those fields

                                Heek! Foreign keys are how to naturally join tables.

                                1. 3

                                  This restriction isn’t an intrinsic problem, I just haven’t gotten around to a solution yet. I implemented natural joins first because they were the most interesting to implement, not necessarily because they are the most useful kind of join. :)

                                1. 3

                                  In between work, I’ve been working on a small Haskell/Elm app: https://favpla.com

                                  I tried out Typescript/React and ReasonML but finally settled on Elm. I’m a speed freak so asset size is important to me. The Elm compiler generates very small assets!

                                  I wrote a small Python script to generate the Haskell/Elm data structures so I didn’t have to write JSON encoders/decoders. I wonder now if I could have used Servant and generated elm bindings from that.

                                  1. 3

                                    http://kevinmahoney.co.uk/articles/

                                    Mostly stuff about types and databases.

                                    1. 4

                                      One does not need ACID to solve the problem the author describes and ACID is likely overkill for the issue. There are other options such as causal consistency or RAMP transactions which are much light than ACID but still strong than eventual consistency.

                                      But the author is spot on in that transaction boundaries are a big problem that groups either ignore or don’t address in their design.

                                      Pet peeve though:

                                      but I think you will find that they all have issues, and none are simpler or easier than an ACID database

                                      and

                                      ACID transactions are usually simpler than eventual consistency or distributed transactions

                                      1. Simpler in what regard? Production ACID transactions are very complicated to implement, they require a lot of machinery. They also put a lot of constraints on the data. And they can fail in surprising ways. EC is extremely simple but it pushes complexity to the user. So a blanket “X is simpler than Y” statement here doesn’t make much sense IMO.
                                      2. ACID transactions can be distributed transactions to the second quote is a bit confusing.
                                      1. 2

                                        Thanks for the feedback. I have never heard of RAMP transactions before, so I’m definitely going to read up on that. Right now I don’t see how the problems can be solved without the A, C and I of ACID though.

                                        I could soften the wording a bit. What I’m actually rallying against is pushing the complexity of transactions to the application code, instead of handling it in the database.

                                        You’re right about #2. I’ll update the article.

                                        (changes here: https://github.com/KMahoney/kmahoney.github.io/commit/605ae0c11a73fcf7e1ffe717d08d8c803a6dd3be)

                                        1. 1

                                          Right now I don’t see how the problems can be solved without the A, C and I of ACID though.

                                          That doesn’t necessarily mean you need the rest of ACID, though. And the rest of ACID is prohibitive. Separate services owning related data, for example, is very difficult if-not impossible. With causal consistency you can spread data out over multiple services and as long as they offer the data semantics needed (and you’re ok spending the bit of extra cost to implement it) it’s not a problem.

                                          The key observation, IMO, for causal consistency or RAMP transactions is that the actual data in the DB doesn’t need to be consistent, you just need to be able to find a consistent version of it. So if you create/updated data in one place you need to either update it the other place or make sure you can get the state of data pre-update in both.

                                      1. 2

                                        I’m working on a WebGL RPG game, and I released the backend renderer: https://github.com/KMahoney/funscene

                                        Still early days!

                                        1. 4

                                          Absolutely wonderful article!

                                          The only nit I will pick is that the idiomatic C code for something where we care about failure is something like:

                                          error_t x(int *y, size_t n, int z, size_t* indexOut) {
                                            /* implementation elided */
                                          }
                                          

                                          Thus, we separate the error from the data returned. I think the overall points they make still stand, though.

                                          1. 3

                                            Thanks for the kind words.

                                            I’ve been out of the loop with C for a while. I’ll update the article! Is this a recent(ish) trend? I was aware some standard library functions use the -1 convention.

                                            1. 6

                                              I won’t presume to know how widespread it is, but it’s done exactly to fix the cases where a sentinel value is still theoretically within the range of the data.

                                              I dislike returning -1, for example, because if doing pointer arithmetic or something like that one can still mistakenly do math without the compiler complaining. An explicit error value and type being returned cannot be abused as readily, at least by accident.

                                              I especially dislike sentinel values in the case of, say, dedicated routines for handling integer operations and catching overflows and underflows–those are cases where, for example, (-1) might conceivably show up during normal use.

                                              In the case of finding indices of something, the index type should be size_t, because (somehow) max int there could still be a valid index. It’s paranoid, but it is what it is.

                                          1. 1

                                            I really like Nix, but I found the advantage with homebrew is that the packages are configured to integrate well with OS X. For example, I had problems with SSH keys when using the Nix version of git.

                                            Maybe I’ll give it another go to see if it’s improved.

                                            1. 16

                                              <rant>To be frank, as a backend engineer, the rotating door of JavaScript libraries is why it’s hard to take the community seriously. This is a complaint I see levied against backend engineers often and you frontenders do it to yourself. The choice of frameworks for backend things is large, sure, but mostly static. Python only has three or four serious choices and they have been around for awhile. Ruby has only a few as well. Etc. I have tried to learn frontend dev a few times and it’s just infuriating. By the time I’ve learned how to do what I want the in framework X.js the community has moved onto Y.js. It’s not even clear what the community is trying to optimize for! Is it speed? Download time? Simple APIs? I don’t know because the framework of the week is always so different from the previous. Next time you, a frontender, feel like backenders aren’t treating you like grown up professional devleopers, well damn straight, start acting like one, then.</rant>

                                              1. 7

                                                My reaction to this article was, “Good fucking lord, another one?”

                                                At this point, it’s nearly impossible to take frontend development seriously.

                                                1. 1

                                                  Well, Web frontend anyway. Native mobile dev lacks this problem.

                                                2. 5

                                                  To be frank, as a backend engineer, the rotating door of JavaScript libraries is why it’s hard to take the community seriously. This is a complaint I see levied against backend engineers often and you frontenders do it to yourself.

                                                  Translation: I don’t understand why the “frontend community” (whatever that is) seems to do things differently than I would. Therefore, they aren’t worth taking seriously.

                                                  The choice of frameworks for backend things is large, sure, but mostly static. Python only has three or four serious choices and they have been around for awhile. Ruby has only a few as well. Etc.

                                                  There are only a few “serious choices” for frameworks in the frontend world as well. Angular, Ember, and React are the big three, and have been for a while. Most others are “flavor of the month” and never achieve critical mass. They just make a lot of noise.

                                                  I have tried to learn frontend dev a few times and it’s just infuriating. By the time I’ve learned how to do what I want the in framework X.js the community has moved onto Y.js.

                                                  What do you mean by “moved onto?” Blog posts are a poor metric for assessing where the “community” is actually at.

                                                  It’s not even clear what the community is trying to optimize for! Is it speed? Download time? Simple APIs? I don’t know because the framework of the week is always so different from the previous.

                                                  Is it really surprising that largest programming language “community” in the world is composed of lots of different people with lots of different priorities, levels of experience, knowledge of different paradigms? And that the frameworks reflect those differences?

                                                  The kind of JS framework that will make a Java developer happy is different from the kind that will make a Lisp developer happy, which is different again from what a Ruby developer wants. In JS, we have all of these communities bumping up against each other and doing things their own way, then cross-pollinating ideas.

                                                  It’s inevitable that there will be a lot of churn among smaller frameworks. And it’s inevitable that authors (and those ideologically similar to the authors) will want to promote their framework of choice. Thus the persistent “X is outdated, use Z” blog posts. This is further exacerbated by the fact that many frontenders come from a design and marketing background and are quite aware of the power of marketing and simply shouting louder.

                                                  Next time you, a frontender, feel like backenders aren’t treating you like grown up professional devleopers, well damn straight, start acting like one, then.

                                                  I can quite easily believe that there are more unprofessional JS developers than there are in other languages. (It has a pretty low barrier to entry, after all.) However, that doesn’t mean you get to just label JavaScript developers your inferiors and bask in your awesomeness.

                                                  1. 3

                                                    Translation: I don’t understand why the “frontend community” (whatever that is) seems to do things differently than I would. Therefore, they aren’t worth taking seriously.

                                                    Actually, no. This is not my complaint at all. The rest of your response is all reasonable but this opener is completely missing the rant. It’s not that things are done differently than how I would do it, it’s that to an outsider the JS world is still solving problems via a Random Walk. I don’t like how the Java world solves a good bunch of problems, but there is method to the madness that I can at least understand.

                                                    1. 2

                                                      Perhaps I phrased my response poorly, but your clarification here is actually what I understood you to mean. On rereading, I also see that my response comes off as hostile. Sorry about that.

                                                      I intended my opener to emphasize the idea that there is not a single JS community in the same way that there is a Ruby community, a Clojure community, etc. And the seeming randomness that results from this misidentification of the “JS community” as a coherent entity is more apparent than real.

                                                  2. 2

                                                    I don’t think that’s fair. If you counted all the changes in every backend language, you’d see just as much chaos. It’s just that web frontenders only have one language to work with so every new tech gets put in the same bucket.

                                                    Plus, it looks like chaos, but over the long term you can see slowly converging best practices in the frontend world. Most new frameworks seem to agree that virtual-dom diffing and unidirectional data flow were a good idea for example.

                                                    1. 2

                                                      I don’t think that’s fair. If you counted all the changes in every backend language, you’d see just as much chaos. It’s just that web frontenders only have one language to work with so every new tech gets put in the same bucket.

                                                      This comparison isn’t valid. Being in different language means you have no choice but to rewrite things. Many of the frameworks across different languages are copying each other and reusing ideas.

                                                      Your point is roughly that: the number of solutions to a problem that will be created is constant and it’s just a question of how concentrated it is (one language vs many languages). That might be observably true but doens’t seem like a good reason to do something.

                                                      Plus, it looks like chaos, but over the long term you can see slowly converging best practices in the frontend world.

                                                      How many iterations does it take to get MVC right? Maybe the actual stats disagree with me but I would expect the rate at which these new frameworks come out to be decreasing but that doesn’t seem to be the case.

                                                      1. 2

                                                        How many iterations does it take to get MVC right?

                                                        Hard to tell, since there is not one universal “MVC” nor is there one universal “right way” to do it. Though we do seem to be moving away from “traditional” Rails-style MVC toward something more component-oriented.

                                                        I would expect the rate at which these new frameworks come out to be decreasing but that doesn’t seem to be the case.

                                                        My personal theory is that is because knowledge of front-end MV* is now dispersing more widely, leading more JS developers to try writing their own, just like jQuery clones were popular a couple of years ago.

                                                    2. 2

                                                      You have three or four serious python choices NOW, but that wasn’t always the case.

                                                      Also remember that JS has so many more people using it because they’re basically forced to. The JS community is the largest by default, so it has the most variation in libraries of people wanting to code like they did in whatever their favorite/previous language was.

                                                    1. 1

                                                      I’m working on a database written in Rust. I really like Rust. Sometimes the borrow checker stops you from doing something, and then you think about it and realise that’s it’s right - what you’re trying to do is unsafe. If i’d written it in C++ I would have made lots of mistakes.