I didn’t even know about goldmark; I’ve always used blackfriday to process markdown in Go. I wonder why that wasn’t mentioned—it seems to be a really popular package. I love that all of this rationale is included in the pkg.go.dev docs though!
Blackfriday targets the “original” Markdown with some extensions, rather than CommonMark. Since the Markdown files I need to process are all CommonMark blackfriday is not an option for me. I’ve added that to the doc.
Didn’t know Ragel, but it’s a parser generator apparently?
TBH I’d be surprised if you can build a CommonMark parser with a parser generator without plugging in a significant amount of code. Block structures (paragraphs, lists, blockquotes, code blocks etc) can probably be parsed relatively cleanly, but emphasis parsing is defined by a very complex set of rules. The spec describes an algorithm to parse emphasis, which I essentially followed blindly without asking too many questions, and IMO it would be very hard to translate that into some declarative syntax.
Re two stage parsing, I suppose you can just build two parsers - the first one only parses block structures and preserves inline content verbatim, and then use another parser for the inline content? This is how I implement my handcrafted parser anyway.
I didn’t even know about goldmark; I’ve always used blackfriday to process markdown in Go. I wonder why that wasn’t mentioned—it seems to be a really popular package. I love that all of this rationale is included in the pkg.go.dev docs though!
Blackfriday targets the “original” Markdown with some extensions, rather than CommonMark. Since the Markdown files I need to process are all CommonMark blackfriday is not an option for me. I’ve added that to the doc.
I tried at one point to build a commonmark parser grammar based on Ragel. I didn’t manage to finish it sadly. :(
Didn’t know Ragel, but it’s a parser generator apparently?
TBH I’d be surprised if you can build a CommonMark parser with a parser generator without plugging in a significant amount of code. Block structures (paragraphs, lists, blockquotes, code blocks etc) can probably be parsed relatively cleanly, but emphasis parsing is defined by a very complex set of rules. The spec describes an algorithm to parse emphasis, which I essentially followed blindly without asking too many questions, and IMO it would be very hard to translate that into some declarative syntax.
Ragel alows to intersperse code that can be called when certain elements are being matched.
The problem with commonmark is that it requires a two stage parsing (even the spec acknowledges that).
Re two stage parsing, I suppose you can just build two parsers - the first one only parses block structures and preserves inline content verbatim, and then use another parser for the inline content? This is how I implement my handcrafted parser anyway.