1. 15
  1. 3

    Yeah R Core or “base R” as called provides a lot of stability and a solid API story but tbh it don’t reflect on the reality of how R is used most of the time (to my knowledge, bais included) with a heavy from RStudio environment et the tidyverse universe of libraries. tidyverse reshaped a lot of how-to write R in a very bold way. Small anecdote, I had a coding interview project in R and used a bit of data.table, base R and some tidyverse package (mostly ggplot2) and the feedback I got was that “someone must really know R to follow your code, it would be better to stick to modern R practices with tidyverse only [or favor tidyverse ecosystem”].” IMHO, my code was far that perfect and the post had other requirements ; the feedback was really interesting and pushing not only to the tidyverse by itself but to consider it as a high-level API with easy to teach, grasp and understand structure that would need less knowledge to understand and get a hold to.

    1. 2

      This is really interesting to me even without knowing anything about the R eco-system. How old is tidyverse, and how much does it care about compatibility and reproducibility?

      1. 2

        From what I can see, tidyverse became popular around 2016 or 2017. It looks like the first releases of dplyr (one of the main components) is from 2014, so it can’t be older than that:

        https://cran.r-project.org/src/contrib/Archive/dplyr/

        I had used data.table before (and base R), but switched tidyverse sometime around 2016. I wrote this post comparing all the approaches, with sample code:

        What Is a Data Frame? (In Python, R, and SQL)

        It really helps that there is a book about it, and some great papers, linked in the blog post.


        FWIW a few weeks ago I had a recurrence of the packaging issue that indirectly led me to Oil. (e.g. building R, R packages, and applications inside containers from scratch, with shell scripts).

        Basically the general problem is that package managers like CRAN and pip have no connection to the system package manager, or at least a very tenuous one.

        And the R ecosystem generally moves faster than the Debian/Ubuntu ecosystem. Authors update their code frequently.

        So basically in January dplyr changed its versioning requirements to require a new version of R, which my version of Ubuntu didn’t have. I didn’t notice until a few weeks ago (March) when I tried to install re-install it on the same machine (Ubuntu Xenial 16.04, which is a bit old).

        And then it failed. So I found this somewhat elaborate workaround with a PPA maintained by somebody.

        https://github.com/oilshell/oil/blob/master/build/dev.sh#L81

        I’d say the R package manager is better than Python’s package manager. (docs tend to be better too) But the ecosystem moves very fast, and that means things break, unless you go for some “forever immutable” system, which few popular package managers do.

        1. 2

          I’ve found tidyverse reasonably good about backwards compatibility, though not to the same extent base R is, and it’s over a much shorter timeframe.

          The individual packages themselves tend to keep backwards compatibility, but the constellation of packages that make up “the tidyverse” shifts more frequently. The issue that comes up most frequently for me personally when revisiting older R code is that they have now twice rethought/overhauled what constitutes a good set of primitives for “reshaping” data tables. There was first the reshape package, followed by reshape2, and now tidyr. I have a bunch of code that uses reshape2, which I can still install from CRAN, but it doesn’t come by default anymore when installing or loading the tidyverse libraries, and is considered deprecated.

      2. 1

        I agree it deserves praise, but it’s also what I would expect because R is a popular and long-lived language. The projects that can’t pull this off fail to serve their users and don’t become popular.

        The same is true for Ruby, JavaScript, PHP, etc. Your 14 year old code will very likely work.

        Lua is a notable exception since it’s an embedded language.

        Python did it for 20 years before Python 3. I have dug up a lot of Python 1.5 code and run it on Python 2 – no problem.

        Perl 5 had the distraction of Raku, but Perl 5 is still being maintained and runs 30 year old code, etc.

        1. 1

          I really dislike R, but credit where it’s due.

          As far as I can tell, as an outsider, a big part of the reason R’s external API is so stable is that it doesn’t actually change that much internally. I work on a project that patches R’s source code to modify its I/O behavior. It’s part of my job to reapply the patch every time there’s an R release. The early days of the project are before my time, and maybe we just picked a lucky bit of the tree to mutate, but as far as I can tell, the patch has been applying cleanly since 2013 (R 2.15.x). 8 years later, after seven minor releases and dozens of maintenance releases (3.0.x to 3.6.x), 4.0.x was the first time we’ve had to resolve a merge conflict. It’s a very stable codebase!

          (Unrelated to this post, so I hope this isn’t a derail, but as it happens, the reason for the merge conflict was the decision for the 4.x series to rename the --slave argument to --no-echo.)