1. 22

  2. 4

    Name is quite clever! As panda bear -> pandas, so Polar bear -> polars; plus the name contains ‘rs’ to show its connection to Rust.

    I am delighted to note that, although the API is very clearly inspired by Pandas’s DataFrame API, Polars avoids at least one of Pandas’s mistakes: Pandas has index columns that can’t be used like normal columns, but Polars does not. Specifically, Pandas has the misfeature that if you call groupby(birthday).apply(f) on a data frame, the resulting data frame no longer has a column birthday – those values got moved into the index, and the index is not a normal column >_<. But Polars doesn’t do that, hooray!

    I tried that out, by the way, using this bit of code:

        # added to the example at https://docs.rs/polars/0.12.1/polars/frame/group_by/struct.GroupBy.html
        let df2 = df
            .apply( |df| Ok(df.head(Some(1))) );
        println!("{:?}", df2);

    which is equivalent to Pandas’s

        df2 = df.groupby('date').head(1).reset_index()

    Final remark: anybody who wants to implement a data frame library will find wonderful inspiration by looking at R’s Tidyverse. tibble (a modern reimagining of the data.frame); dplyr (a grammar of data manipulation); tidyr (for reshaping untidy data to make them tidy); and the other Tidyverse packages all have delightful and clearly-organized APIs for working with data frames.

    Especially tidyr always impresses me with how clean and powerful its verbs are, considering how thorny the data reshaping problems are that it solves.

    1. 3

      A a few questions: The author did use the memory layout of Apache Arrow, but implemented his own computing engine. Why didn‘t he contribute to Arrow directly, since they also implement these functions? It is on the roadmap that Pandas will get an Apache backend*. So is it possible to have a Polar backend, too?

      *which will take a lot of time, because of the API surface of Pandas compared to Arrow.