1. 65

Abstract:

In France, income tax is computed from taxpayers’ individual returns, using an algorithm that is authored, designed and maintained by the French Public Finances Directorate (DGFiP). This algorithm relies on a legacy custom language and compiler originally designed in 1990, which unlike French wine, did not age well with time. Owing to the shortcomings of the input language and the technical limitations of the compiler, the algorithm is proving harder and harder to maintain, relying on ad-hoc behaviors and workarounds to implement the most recent changes in tax law. Competence loss and aging code also mean that the system does not benefit from any modern compiler techniques that would increase confidence in the implementation.

We overhaul this infrastructure and present Mlang, an open-source compiler toolchain whose goal is to replace the existing infrastructure. Mlangis based on a reverse-engineered formalization of the DGFiP’s system, and has been thoroughly validated against the private DGFiP test suite. As such, Mlang has a formal semantics; eliminates previous hand-written workarounds in C; compiles to modern languages (Python); and enables a variety of instrumentations, providing deep insights about the essence of French income tax computation. The DGFiP is now officially transitioning to Mlang for their production system.

  1.  

  2. 9

    I would love to see a side-by-side with Catala Lang, which was posted here awhile back, though only the Git repo: https://lobste.rs/s/b74svy/catalalang_catala

    1. 28

      Hi! Author of both Mlang and Catala here :) So Catala is basically an evolution/reboot of the M language, but this time done right using all the PL best practices.

      1. 7

        Wait, are you for real? That is absolutely fascinating! My wife is a lawyer (which makes me not a lawyer) and I am very interested in these types of intersections. Namely where a highly regimented and regulated domain gives rise to some type of formalism once exposed to CS through some “interdisciplinary process”.

        I have studied DSL design peripherally but would really like to pick your brain about some things. I did once, long ago, design a policy language. Are you open to additional discussions and collaboration?

        1. 11

          Ha ha ha yes this area is fascinating. I have the impression that there’s a lot of people in legaltech that are all trying to make a DSL to express parts of the law but have no clue about how to properly make a DSL.

          I am open to discussions and collaboration, moreover both Mlang and Catala are open-source and accept contributions. Hit me up using the email in the Mlang paper for instance :)

          1. 2

            As a lawyer designing my own DSL ;) I would love to know how using of Mlang has affected legislation. For example how do you deal with law being changed? Does your parliament creates updates as “diffs” or as already “merged” texts? Do you use lawxml? Soo many questions!

            1. 4

              The French laws are usually written in terms of “diff”. Also I had made a prototype that warned which articles of law your program was relying on were about to expire https://twitter.com/DMerigoux/status/1252914283836473345?s=19. I don’t use any form of XML, I just copy paste the law text to start writing a Catala program. XML would not improve the way Catala programs are written since the XML structure does not follow the logical structue of the law but rather its formatting structure, which we don’t care when translating it to executable code.

        2. 4

          Hi Denis - nothing constructive to say except that I am a British CS student and my friends and I are big fans of your work! In fact I think a friend of mine will be basing his undergraduate thesis on your ideas :-)

          1. 4

            Thanks Jack! Well if your friend does end up basing his undergrad thesis on Catala or else please drop me an email, I’ll be happy to give feedback or suggest interesting things to look at.

          2. 2

            I want to just praise you for the time and effort you put into this space. I’ve recently got into “hobbyist” law myself, specifically Canadian law (http://len.falken.ink/law/101.txt), and instantly had the same thoughts: where are the formal proofs? :) Sure there are tax calculators, and some will creators, but are they rigorous? Can they tell us other properties of a situation?

            I’m 100% going to play with Catala. This is technology worth spending time on because law governs our every day lives.

            1. 2

              but this time done right using all the PL best practices.

              Does this mean that DGFiP is migrating to something one of the implementers considers not done right?

              1. 4

                I suppose it’s easier to migrate step by step: Improve the tooling, so that everything can be in the open without security concerns and so the system can evolve more easily from its apache cgi-bin roots. That’s what MLang seems to offer.

                Once that’s in place, there can be further steps to improve the language (e.g. by introducing Catala) because the foundations are state of the art again. And even if that doesn’t happen, the system is still better off than before because it’s a single system instead of a single system + 25 years of wrappers that extend it ad-hoc.

                1. 3

                  I could not have said it better!

                2. 2

                  Migrating to Mlang improves the compiler but the M language stays the same. For instance, in DGFiP’s M, there are no user-defined functions. And the undefined value in M is a contant reminder of the “billion dollar mistake”. So yes we can definitely improve the M language from its 1990 design :)

                3. 1

                  I’m just curious: who is driving all this? Is this simply something you one day decided to go and implement, or were you approached by someone to do this seemingly huge project? How do you get it financed, did you have backing from the start?

                  Fascinating stuff!

                  1. 10

                    I started looking into this after watching this talk: https://youtu.be/EshxZVMURt4. I always wondered whether it was possible for me to play with formal methods outside the traditional application domains like security or safety-critical embedded systems. Then I fell into a rabbit hole :) I started with a Python prototype of French law encoded into SMT, then moved to try and use the DGFiP code and ended up coding Mlang, then created Catala as a next logical step. I created these on my spare time during my PhD and was helped by some friends who contributed to the open source repos. I’m only starting now to have institutional backing! During a French PhD, your funding is secured for the whole duration from the start so I didn’t have to worry about it and could focus on other things. I would say stable and long-term unconditional funding enabled me to create all this. In my opinion research should promote that instead of the myriad of tiny little funding sources, each of them requiring a lot of paperwork to fill. But in that regard I go against the zeitgeist.

              2. 3

                @denismerigoux A friend of mine pointed out that Poland bought something like Mlang(called Poltax) in 1990’s from BULL for 10 milion franks - and currently they are trying to replace it with a new version(PoltaxPlus). Nice niche.

                1. 2

                  …I now have the urge to take the French tax code and try to make some kind of ML system (genetic algorithm maybe?) to optimize it for minimum possible taxes on cheese. I’ve been watching too many video game exploit videos lately, I suppose, and just want to produce the cheesiest possible world.

                  1. 1

                    I think my coworker Vincent worked on a previous version.