1. 34

  2. 3

    I recently implemented a system for GDPR compliance (especially data retention and right to erasure), using Django’s ORM, which enforces the kind of thing mentioned in this article, and it turned out really nicely. Being able to iterate over all your tables and fields, and require that a policy is defined for everything, is really not hard with Django’s _meta API.

    I also made the configuration of the data retention policy into a human-and-machine readable document - data_retention.yaml, which worked well for a system this size (small charity). The implementation (both of applying the policy and checking that the policy is exhaustive) is in data_retention.py for anyone interested. This way also required essentially zero changes to main application code.

    Of course, there are still holes - I can’t enforce that all future code will only store data in the main database. But mostly people will do, because it is the easiest path.

    Another nice feature was that, despite the fact that the details are specified in YAML not Python or SQL, running the data retention purge commands generates between 0 and 1 database queries per table (nothing, a DELETE or an UPDATE) - thanks to ORM goodness in being able to build up complex queries dynamically.

    1. 3

      But another sort of test is possible. Consider a test that asserts that every type of component in your app supports dark mode, or a test that iterates through the schema of every data field your app writes to any database, and asserts that it has an appropriate GDPR annotation. These tests aren’t “unit” or “integration”. I’m not sure what they are. Are they “system tests”? Whatever their name is, this technique is not as common as perhaps it should be. There aren’t a million blog posts written about this sort of test.

      Challenge Accepted

      1. 1

        A million posts is a tall order.

      2. 3

        “Add layers of indirection to enforce harder constraints”

        “Try to test and lint for any constraint you add to your code”

        I’m in favor of linting checks. Less so in favor of adding a layer of abstraction to enforce compliance. Layers of abstraction are not in vogue right now.

        1. 2

          I think the difficulty is the imperfection of lints. Sometimes this is Basically Fine (enforcing some design pattern or the like), but other places it could be Very Bad (confirming that you are properly sanitizing user input).

          In an ideal world type systems would be more usable for the “annotate data with lint information” use case… the only two systems I find that get close are Typescript and Clojure’s metadata stuff, everything else is way too heavy. The state of the art in ADTs is very much a local maximum that I think we need to get out of if we want to make improvements here.

          1. 2

            It’s not just that they’re not in vogue, it becomes quite easy to forget to add constraints when you take away the constraint from the lexical scope the coder is working in, especially in a loose-typing situation where as a coder you can’t tell one way or another whether the constraint has already been covered.

            If you’re writing strong, full-featured types that bolt right into the programming language, whatever that language is, then it doesn’t matter. But far too often a coder will set up an abstraction, add a bit to it, then at some later time add constraints all over the place that are somewhat related. In that scenario, in a large codebase you can’t tell what the the hell has been fixed and what hasn’t.

            “Be consistent with constraints” is a much truer statement than many might be willing to acknowledge. It applies both to your language constructs as well as your coding practices.

            1. 4

              In E and relatives, programmers can produce “guards”, which are regular objects that can transform other objects. For example, in:

              def x :Int := 42

              Int is a guard. In this case, it has simple behavior: Either accept an integer, or throw an exception. But guards can do arbitrary things. This gives programmers the ability to attach arbitrary constraints to their code at binding sites for names, and constrain the values to which those names can refer.

              1. 2

                This is where the aspect-oriented programming guys were going. I haven’t kept up with them, though.


                The Executable UML guys were also headed this way. I haven’t kept up with that work either, sadly.


              2. 2

                How instead would you propose to implement GDPR-aware and activity-logging constraints other than abstractions that raise “method not implemented” at compile or run time? What linter rule would you write that could analyze the complete code path involving a potentially multiple database transactions and guarantee that some of those transactions write to the activity log, and those that do write everything?

                With an abstraction thats easy. If the body of Activity.execute changes, so must the body of Activity.append_to_history.

                If the history method needs no changes, the commit author can justify that in the commit message. Google has tooling that suppresses lints if certain fields are present in the commit message. But even without such tooling, you only need a modicum of diligence from code reviewers to ask “why can you ignore this lint? could you update your commit message?” should it pop up on a merge request.

            2. 2

              This kind of reminds me of a podcast I listened to with the creator of Wireguard - namely, how certain things such as offering administrators different knobs to tune with VPNs and dynamic allocation of memory have resulted in vulnerabilities in other VPN software. Because of this, they designed those aspects out of Wireguard entirely - it’s highly opinionated, and it allocates all of the memory it needs upfront, so they sidestep those kinds of vulnerabilities entirely!