Nice writeup, thanks! Would nom be a good option to parse pcap files and various contained network protocols (Ethernet, IP, TCP/UDP, application level messages)? I need to extract some information from large pcap files, but I need to look at various fields of the whole stack.
nom not only has support, but was in fact built specifically as a binary parser from the beginning, so it should work excellently for this use case. And it has dedicated support for streaming parsers that’s been steadily improving.
Yep, it has specific functionality for binary protocol parsing. It’s been a while since I used it but to my memory its streaming support wasn’t great, but for pcap files in particular I’d expect it to work really well. (And I could be way out of date on the streaming support, they’ve had more than one major version since I used it.)
Very nice. I have previously ported a parser for the MongoDB Language Model from JavaScript’s PEG to Rust’s pest.rs[1] to support my MongoDB to PostgreSQL translation layer Oxide[2].
My initial idea was to port it to nom, but since I was new to Rust altogether I felt a little bit intimidated by the complete new approach vs. changing from one representation of the grammar to the other (PEG -> pest.rs).
I plan on re-evaluating this idea, and I was wondering if anyone have any other good resources to learn more, specially in video format.
For LSP, you also want error-resilient parsing: even if input is garbage, you want to get some sort of bear effort syntax tree out of it. I don’t know a parser combinator library for that.
My general advice would be:
if you want to implement LS for a whole bunch of different languages, go with tree sitter
if you only have one lang to deal with, strongly consider hand written parser
Nice writeup, thanks! Would nom be a good option to parse pcap files and various contained network protocols (Ethernet, IP, TCP/UDP, application level messages)? I need to extract some information from large pcap files, but I need to look at various fields of the whole stack.
nom
not only has support, but was in fact built specifically as a binary parser from the beginning, so it should work excellently for this use case. And it has dedicated support for streaming parsers that’s been steadily improving.Yep, it has specific functionality for binary protocol parsing. It’s been a while since I used it but to my memory its streaming support wasn’t great, but for pcap files in particular I’d expect it to work really well. (And I could be way out of date on the streaming support, they’ve had more than one major version since I used it.)
Very nice. I have previously ported a parser for the MongoDB Language Model from JavaScript’s PEG to Rust’s pest.rs[1] to support my MongoDB to PostgreSQL translation layer Oxide[2].
My initial idea was to port it to nom, but since I was new to Rust altogether I felt a little bit intimidated by the complete new approach vs. changing from one representation of the grammar to the other (PEG -> pest.rs).
I plan on re-evaluating this idea, and I was wondering if anyone have any other good resources to learn more, specially in video format.
–
[1] https://github.com/fcoury/mongodb-language-model-rust
[2] https://github.com/fcoury/oxide
ugghhhh this is the blog post I needed 3 months ago when I was trying to write a large parser with nom. I completed it but it way messier than this
Does anyone know of a parsing combinator library that supports concrete syntax trees and fast updates? This would be for a language server.
For LSP, you also want error-resilient parsing: even if input is garbage, you want to get some sort of bear effort syntax tree out of it. I don’t know a parser combinator library for that.
My general advice would be:
Couple of videos for how to go hand-written parser way: