1. 18

Spotted while reviewing the FSF: recent licensing updates story on Lobste.rs.

  1. 15

    I have a story about this.

    A while ago I was interested in getting the statistical medcouple function into Python’s statsmodels. The problem is that this function is computed via a nontrivial but clever algorithm. It was described in an obscure paper from the 1970s that was really hard to read. The implementation in statsmodels is using a slow O(n^2) algorithm, whereas better O(n log n) implementations exist.

    So I find such an implementation in R, written by the same authors of the medcouple paper. Now, R is GPLed. Statsmodels is GPL-phobic. I could have just translated the R implementation into Python, but it didn’t seem fair to me, because I really did not understand the medcouple implementation until I read and translated the R code. Since statsmodels won’t accept the GPL, they shouldn’t accept the code I wrote.

    My solution was to write the medcouple Wikipedia article in generic pseudocode (that looks suspiciously like Python). This is now the spec part of the clean-room reverse engineering process. I’m glad to see that some people have stumbled onto that page and used it to create new implementations of the algorithm. Now I’m just waiting for someone to use this page to fix statsmodel’s implementation.

    1. 1

      Hold on - have you’ve just told on yourself?

      I really didn’t understand the medcouple implementation until I read […] the R code.

      Isn’t this effectively creating a derived work in another language based upon the original GPLed code? Shouldn’t your derived work also be GPLed?

      1. 4

        It should be and it is:


        But I also wrote a spec, the Wikipedia article. I described the algorithm in as much detail as I could. The spec should be enough for someone else to reimplement this.

        1. 3

          I don’t understand your reasoning. Why do you consider your python code to be a derivative work, but you don’t consider the Wikipedia pseudo-code you wrote to be a derivative work (and therefore GPL and not Creative Commons)? If your python code is a derivative work, why does the copyright notice only have your name?

          1. 3

            The Wikipedia article is a description of the algorithm that I cobbled together from various sources, which I amply cited. At no point do I just grab the R code and translate it for Wikipedia. The pseudocode I wrote based on my understanding of the algorithm as described by the papers I read and cited. I did do separate “literal” translations into Python and C++, and those I do consider derivative works of the original, which is why I GPLed them.

            As to why my copyright notices don’t mention the original copyright holders, I’m not sure if that’s necessary. Am I required to keep their names in order to satisfy my GPL obligations?

    2. 2

      I don’t quite know how this applies to freedom one, which includes the freedom to study the program, but I’ve been told by some people that they’re afraid to look at my GPL’ed code in case they ever want to implement something similar in any way.

      I wouldn’t figure that studying how someone did something and then just doing it differently if need be would be an issue, but I don’t quite know the legal status of this.

      1. 2

        Did I read this right? GPL says reimplementing a GPL program into a different language forces the reimplementation to be GPL? Isn’t that effectively saying that algorithms are copyrightable?

        1. 20

          No. “Under copyright law, translation of a work is considered a kind of modification”. (This is what the law says, not the GPL itself.)

          According to copyright law, creativity is required to make a work unique, novel and copyrightable. A machine translation is hardly creative.

          A completely new implementation can be easily considered new work, as long as it’s not trivial, so licensing does not carry.

          1. 4

            Ah, that sounds right. I missed that it was copyright law which defined that, not GPLs rules. Thanks!

            1. 2

              One interesting case would be if the work was first translated, then heavily refactored to be more idiomatic in the target language. If the intermediate steps were never published, it could be hard to determine whether the end result was a derivative work, was inspired by the original but not derived from it, or whether any similarities were simply down to the code performing the same function.

              1. 4

                This is somewhat what clean-room reverse engineering is, and known to be legally safe. One team reads the original work and writes a spec. The second team reads the spec and implements it. Now you have a copy that wasn’t created by directly copying.

                A single person instead of two teams doing the same might conceivably fool a judge or jury, but I think in reality we’re not as smart as we think we are and we leave behind traces of obvious similarity despite our intention.

              2. 1

                The completely new implementation doesn’t have to be trivial or not. It just has to be provably not copied from the original in any way, not even “inspired by”. A copyright lawsuit could fall apart for the defendant if the plaintiff can prove that there’s significant inspiration from the copyrighted work, even if the defendant can prove a lot of additional originality.

                1. 1

                  The triviality aspect is relevant: copyright law is based on the creative aspect of the work. That’s why it’s not possible to claim authorship of a sequence of consecutive numbers, or on a copy of an already existing work.

                  1. 1

                    Ah, I see. I thought you were talking if the difference from the original was trivial. Sorry, I misunderstood you.

                  2. 1

                    I think it was GNU who had guidelines that recommended making their versions different in various ways from existing Unix utils:

                    • making their version smaller
                    • or faster
                    • or better

                    etc. Can’t point to a resource right now but someone else here might remember?

                    1. 1

                      Found it:


                      From the guidelines:

                      … For example, Unix utilities were generally optimized to minimize memory use; if you go for speed instead, your program will be very different. You could keep the entire input file in memory and scan it there instead of using stdio. Use a smarter algorithm discovered more recently than the Unix program. Eliminate use of temporary files. Do it in one pass instead of two (we did this in the assembler)….

                2. 9

                  This is a problem we face in GNU Octave.

                  We have to keep telling our contributors, do not read Matlab’s code!

                  It’s okay to reimplement algorithms. It’s not okay to reimplement them in a way that would make it obvious to a judge or jury that you copied them from someone else (e.g. similar variable names or similar function structure). You must also be able to say, under oath and without perjury, that you did not copy. This puts us in the funny situation that it’s okay to read their documentation, which contains no implementation details, but not okay to read their code, which contains variable names and structure.

                  This is a big distinction between copyright and patents. With copyright, independent rediscovery (as would be proven by showing that you never even read the original), is a valid defense.

                  Software patents are particularly insidious because this is not a defense in that case. You can inadvertently infringe a patent you never even knew existed. You really can’t infringe copyright unless you’ve had experience with the copyrighted work.

                  1. 1

                    re patents. I’ll add you cannot independently reproduce the patented thing at all without a license from what I read. If you didn’t know, you were infringing. If you did know, you knowingly infringed with triple damages or something like that. Patents are better for companies trying to stop great software from being written than those writing that software.

                  2. 3

                    It’s saying that specific implementations of algorithms are. I doubt one can translate proprietary code (of which the source code has somehow been published) into a different language either.

                    1. 4

                      I would suspect that it’s enforceable to the extent of the triviality of the translation. Translations can be considered derivative works and therefore do not transfer the copyright. However what is and is not a derivative work is often something that gets debated in courts. Depending on the technical prowess of the jury and judge it can and has often gone in very unexpected ways. Intent often matters here. For example if the intent was to skirt a copyright infringement then it often IS a copyright infringement.

                      I am not a lawyer, but I am an American and I do read about this stuff sometimes so, grain of salt as always.

                  3. 1

                    I think, when you’re thinking about the copyright implications of mechanical vs creative translation of at least human languages, you have to consider that good translations of complex things are usually a creative process (see also https://en.wikipedia.org/wiki/Le_Ton_beau_de_Marot and other treatments of the complexities of translation). It’s possible to do a mechanical translation of some works, which might kick in harder copyright restrictions, but mechanical translations aren’t going to be very good ones, usually.

                    I suspect a lot of this applies to programming languages as well. We have ample examples of code that translates code from one language to another mechanically, but the results aren’t usually nearly as “good” as a hand-crafted translation that takes into account the nature of the target language and all the subtleties of how it differs from the source language, which requires a fair amount of creativity.

                    1. 2

                      Legally, the originality of a derived work doesn’t matter. If you create a great new original fanfic of Mickey Mouse using elements of Mickey Mouse’s history, that’s still derivative work. As I understand it, legally this is no different than reading someone else’s code and using it to write new code that is obviously inspired by the original.