This is impressive. CSV escaping alone can be a huge advantage. I don’t really dare to use awk with CSV data.
Could you provide more info in this and how it differs from awk in this respect?
It looks like it properly handles CSV quoting and multi-line fields, rather than just a plain (and rather problematic) split-on-comma that AWK would do. See structured data section in overview doc and Rust code.
Thank you. This is a total game changer. An enormous leap for awk. I personally have been using awk less than half the times I would if it had a safe/reliable way to handle CSV data. I frankly didn’t have hope someone would build this.
Thank you so much @ezrosent. Keep up. Please consider keeping compatibility with common awk snippets to a level where most would work.
Here are some details about its performance:
I wished there was a Linux distro that allowed me to simply install the Rust versions over the unsafe ones.
Much easier to see what breaks in practice, instead of trying to chase 100% bug compatibility.
This looks like a cool tool, but I would hesitate to call any llvm based jit “safe” in the rust sense of the word.
author here, I just want to echo this sentiment, but point out some subtleties.
There’s some unsafe code in the runtime, and then all of the JIT code (particularly LLVM) should really be considered unsafe.
By default, however, frawk is using Cranelift to JIT the code. Cranelift is a pure-rust project, so I’d expect it to be safer to use than LLVM. Still, JITs like the one in frawk are going to be inherently unsafe. Even Cranelift is providing you with a low-level builder API that doesn’t check the generated code is memory-safe, so running that generated code is still unsafe (both in the Rust sense, but also in the colloquial sense I’d say).
Even if it’s just compiling in a single back end for LLVM, there is vastly more unsafe C++ code in the ‘safe’ Rust version than in a typical C++ implementation of awk.
Eh, how much unsafe c++ code do you think there is in llvm?
Around 10MLoC. Nothing in LLVM uses the .at (bounds checked) accessors instead of operator (not bounds checked) for example. Nothing in LLVM is safe in the Rust sense of the word. Most of the unsafe things are hidden in classes like SmallIntPtrPair, which hides an integer in the low bits of a pointer (would require unsafe in Rust), but there are a lot of abstractions in LLVM that are built on things that would not be permitted in safe Rust.
Did you see this recent story? https://lobste.rs/s/jdqu4m/debian_running_on_rust_coreutils
In NixOS you can do this. Technically, on any Linux distro “enriched” with Nix you could do it.
Glanced through https://github.com/ezrosent/frawk/blob/master/info/overview.md and it gave the overview quite well. Especially interested in csv, join_fields and rust regex (which has lot more features compared to ERE).
I wish to try the tool but there’s no release versions. I’ll try the installation guideline in a VM sometime this week. Tools like mdbook and ripgrep provide unknown-linux binaries (among other options). Is this easy to generate?
Typically for rust projects cargo build --release builds an unknown-linux library, named target/release/<project-name>. With this projects that looks like it comes with a few caveats (but I can’t test right now):
cargo build --release
rustup toolchain install nightly
cargo +nightly build --release
cargo +nightly build --release --no-default-features --features use_jemalloc,allow_avx2,unstable
Thanks. I have no experience with Rust or llvm, so right now all of this looks alien to me. I’ll give it a shot.