    An interesting attempt to analyse the data by someone who has completely grabbed the wrong end of the stick.

    The age of citations is quite interesting. It definitely doesn’t reflect the velocity of cases through the system, but it might say something about memory of the aggregate legal profession.

      the wrong end of the stick

      What would be the correct end then?

        The end of the stick that doesn’t mistake ‘doing research-like actions’ for ‘doing research’, nor ‘graphs and statistics’ for ‘an understanding of the system’.

        To quote xkcd: “Liberal-arts majors may be annoying sometimes, but there’s nothing more obnoxious than a physicist first encountering a new subject.” In this case, for physicist read ‘data scientist’. [edit 17:25 CET: use ‘data scientist’ instead of another phrase –SBB]

        The dataset is incomplete in a very non-random way: it only includes transitive citations of Supreme Court decisions. Nowhere does the author betray any signs of realising that this affects the statistics, and therefore the conclusions they so confidently draw about, say, average duration of a judicial procedure.

        I don’t have the time for a full fisking, so I’ll only address the most important idiocy:

        The writer says “somewhere between 33% and 50% of all cases are completely trivial and could be easily automated away”. They base that only, I kid you not, only on seeing that somewhere between 33% and 50% of the judgements they gathered cite only one previous case or judgement.

        No, the author doesn’t discuss how to recognise those supposedly trivial cases; nor whether the right to appeal can even be fulfilled by an automated system; and nowhere do they wonder whether ‘the judgement cites only one case’ is, in fact, a reliable sign that a case was trivial.

        You have to understand the system you’re talking about! Even if you’re a data scientist! Especially if you’re a data scientist!

        I’d also like to hear more about the correct end of the stick from @Student .

        I’m not a data engineer at all, could you describe what you consider a better approach?

          For starters, citations don’t mean what you think they mean. Cases are cited because they are a source of law under discussion.

          You need to actually work with a domain expert to understand what the data mean.