1. 7

  2. 3

    I was excited to see this pop up on github a few days ago. I think that it has made a lot of smart decisions and practiced restraint in its capabilities. There have been quite a few attempts at a grammar of data manipulation for python, but most seem to be obsessed with creating an infix pipe operator. This one feels the most pythonic of the ones that I’ve seen.

    Python’s data ecosystem has a lot of very strong libraries. I feel like array libraries (numpy, pytorch, etc) have solid APIs, scikit-learn has a great API, and Altair + seaborn’s new object interface are good grammar of graphics implementations. Pandas has been the biggest sore spot when comparing usability between R (tidyverse) and python. Don’t get me wrong, pandas is a great library, but there is only so far that it can adapt to ideas and new standards in API design without breaking a bunch of code or creating more confusion in its users.

    I don’t necessarily think that redframes is the answer, but I think that it could be some good inspiration for future API designs or a base to build upon. Ibis and polars are currently the best alternative options. Ibis has been undergoing heavy development as of recently and I think that with the duckdb and arrow/datafusion backends, it will start to become more appealing for local data analysis, while also having the flexibility to operate on remote analytic engines/databases.

    1. 2

      I can’t decide which method name is the worse: melt (which I just learned about) or denix. I can kinda sorta maybe see the rationale with melt, but denix is defying explanation.

      Nitpicking aside, is this fast? Because honestly, I wouldn’t touch pandas for anything other than reading files if I could get away with it.

      1. 1

        I had the same thoughts. Seems useful, but some of the names are odd and without speed metrics, I likely won’t touch it.

        1. 1

          My best guess is that “denix” is “de-nix”, as in remove “nix” (nothing).

        2. 2

          Hm very interesting! I’m tempted to give this a spin, because I have a bunch of dplyr code I want to run on a CI, and putting R and CRAN in containers is annoying …