1. 10
  1.  

  2. 3

    As someone who is the target audience of this guide (programming experience but never written a single line of R in my life) I must say that this guide is pretty good. Short, simple and straightforward. Unfortunately many languages often only offer guides for total beginners which isn’t necessarily a bad thing but often boring for people with programming experience who just want to get started as fast as possible. This often leads to the skipping of chapters and therefore the missing of important information from those.

    1. 2

      Just as an FYI that is a pretty dated document. I want to say I first came across it well over a decade ago. I am not sure if John kept it current.

      Then again, one of the charms of the base R system is that what was said then will “almost surely” still be valid today as compatibity is a very important component of the base R system.

      1. 1

        Thanks for the comment. Do you happen to know a more up-to-date document with a similar premise?

        1. 2

          Check out https://learnxinyminutes.com/docs/r/ (not sure if it is any better, I just know about the learnxinyminutes site)

          1. 2

            Fair riposte. These days it is a little complicated because the R world is being altered and extended by what is being called the Tidyverse. Which is many things, among them many good ones as e.g. a focus on (first time) users, on consistency and some other things. At the same time a few of us who had known R and S from way before this came along are a little less thralled on its focus on “users as opposed to programmers” (per the “do not use tidyverse in packages” recommendation) and the somewhat different point it takes on the “stability versus innovation” continuum.

            To me some of the standard texts and dictums still rule. One of which is (quoting John Chambers here) the focus to turn “(data analytics) users into programmers”. A good tradition to uphold, and a split between “users” and “programmers” seems ill advised to me.

            If you can find (in a local library) the Venables and Ripley book “S Programming” is pretty good (even if old). As are the 2008 book by Chambers “Software for Data Analysis” and his 2016 book “Extending R”.

            Hope this helps.

            1. 2

              FWIW I write R occasionally and the doc still looks good, except that I use tidyverse now [1]. But in general you don’t have to know “modern” stuff to get things done.

              I think of the language mostly as JavaScript with vectorized operations and data types. In fact I think there is a lot of code that is both valid JS and R, like

               f = function(x, y) { return(x + y) }; f(3, 5)
              

              I just tried that in both R and JS and it works!

              The <- for assignment and the $ for member are the things that stand out to R newbies. But I usually use = for assignment.

              Two big gotchas are options(stringsAsFactors=FALSE) which should be the default, and anything with multi-dimensional arrays can be pretty broken, because R doesn’t distinguish between scalar and vectors.

              [1] tidyverse vs. base R, Python, SQL: http://www.oilshell.org/blog/2018/11/30.html


              But probably the ONLY other book I know of with R from a programming language POV rather than a stats POV is:

              https://adv-r.hadley.nz/

              (And see the pictures of all the R books I have in my blog post). As a PL person, I wondered about the material in Advanced R for a long time, and then Hadley finally went and wrote a book on it… I think it was only in the R reference manual otherwise, although even that is incomplete.

              1. 2

                Two big gotchas are options(stringsAsFactors=FALSE) which should be the default,

                It is since last April and R 4.0.0.

                anything with multi-dimensional arrays can be pretty broken, because R doesn’t distinguish between scalar and vectors

                Please be specific about broken. R has no “scalar”. Everything is a vector, sometimes of length 1. And any vector of length N can have a dimension attribute: a matrix is simply a vector with a two-d one (plus a few support operations as matrices are useful). You can create a 3-d, 4-d-, … array the same way. But statisticians rarely use those so there is little machinery for it. But it works in the base language which the post was about.

                1. 2

                  Oh yes I remember seeing that change, finally!

                  With regards to matrices, the fact that R has no scalars is exactly the problem. That’s another way of saying it doesn’t distinguish between 0 and 1 dimension.

                  The consequence of that is that operations that increase the dimension result in confusion between N and N+1 dimensions in general! Several years ago I fixed like 10 bugs in a single piece of code related to that issue. It would work for dimension > 1, but fail for dimension == 1 because a 3x1 matrix was fundamentally confused with a vector of length 3 by the language and standard library.

                  I dug up the thread on that here:

                  https://old.reddit.com/r/oilshell/comments/a2atkg/what_is_a_data_frame_in_python_r_and_sql/eazlrl2/

                  I can probably come up with a concrete example, but bottom line is that I use Python for linear algebra (which is rarely in any case), and R for data manipulation.


                  edit: From reading over that thread again, it is the simplify= issue. This gotcha doesn’t appear in the original post, but I believe it will confuse anyone who has ever used Matlab, NumPy, or Julia. I can’t say for sure but I’m pretty sure none of them have that issue. But I don’t think it is limited to simplify as far as I remember – that’s just a symptom of the problem with the data model itself.

                  1. 1

                    R has no scalars is exactly the problem.

                    It’s a feature. The language, written to analyse data, is naturally vectorised. You won’t find too many actual R users who dislike that.