1. 63

  2. 15

    This is impressive. CSV escaping alone can be a huge advantage. I don’t really dare to use awk with CSV data.

    Could you provide more info in this and how it differs from awk in this respect?

    1. 5

      It looks like it properly handles CSV quoting and multi-line fields, rather than just a plain (and rather problematic) split-on-comma that AWK would do. See structured data section in overview doc and Rust code.

      1. 3

        Thank you. This is a total game changer. An enormous leap for awk. I personally have been using awk less than half the times I would if it had a safe/reliable way to handle CSV data. I frankly didn’t have hope someone would build this.

        Thank you so much @ezrosent. Keep up. Please consider keeping compatibility with common awk snippets to a level where most would work.

      1. 6

        I wished there was a Linux distro that allowed me to simply install the Rust versions over the unsafe ones.

        Much easier to see what breaks in practice, instead of trying to chase 100% bug compatibility.

        1. 10

          This looks like a cool tool, but I would hesitate to call any llvm based jit “safe” in the rust sense of the word.

          1. 7

            author here, I just want to echo this sentiment, but point out some subtleties.

            There’s some unsafe code in the runtime, and then all of the JIT code (particularly LLVM) should really be considered unsafe.

            By default, however, frawk is using Cranelift to JIT the code. Cranelift is a pure-rust project, so I’d expect it to be safer to use than LLVM. Still, JITs like the one in frawk are going to be inherently unsafe. Even Cranelift is providing you with a low-level builder API that doesn’t check the generated code is memory-safe, so running that generated code is still unsafe (both in the Rust sense, but also in the colloquial sense I’d say).

            1. 4

              Even if it’s just compiling in a single back end for LLVM, there is vastly more unsafe C++ code in the ‘safe’ Rust version than in a typical C++ implementation of awk.

              1. 2

                Eh, how much unsafe c++ code do you think there is in llvm?

                1. 3

                  Around 10MLoC. Nothing in LLVM uses the .at (bounds checked) accessors instead of operator[] (not bounds checked) for example. Nothing in LLVM is safe in the Rust sense of the word. Most of the unsafe things are hidden in classes like SmallIntPtrPair, which hides an integer in the low bits of a pointer (would require unsafe in Rust), but there are a lot of abstractions in LLVM that are built on things that would not be permitted in safe Rust.

            2. 2
              1. 1

                In NixOS you can do this. Technically, on any Linux distro “enriched” with Nix you could do it.

              2. 4

                Glanced through https://github.com/ezrosent/frawk/blob/master/info/overview.md and it gave the overview quite well. Especially interested in csv, join_fields and rust regex (which has lot more features compared to ERE).

                I wish to try the tool but there’s no release versions. I’ll try the installation guideline in a VM sometime this week. Tools like mdbook and ripgrep provide unknown-linux binaries (among other options). Is this easy to generate?

                1. 2

                  Typically for rust projects cargo build --release builds an unknown-linux library, named target/release/<project-name>. With this projects that looks like it comes with a few caveats (but I can’t test right now):

                  1. You need to install nightly, you can do this by running rustup toolchain install nightly, and then telling cargo to use nightly cargo +nightly build --release (alternatively you can set nightly as the default toolchain, but let’s not do that).
                  2. You either need llvm installed (cargo doesn’t handle installing non-rust dependencies for you), probably both at runtime and compile time, or you need to tell it (it being frawk, not the rust compiler) to not use llvm, you can do the latter for this project with cargo +nightly build --release --no-default-features --features use_jemalloc,allow_avx2,unstable.
                  1. 1

                    Thanks. I have no experience with Rust or llvm, so right now all of this looks alien to me. I’ll give it a shot.