If you could generate a bare bones LSP daemon as well, that would be amazing.
Let a thousand languages bloom!
Intriguing idea, but the installation is pretty rough at the moment. This is totally understandable as it’s only been out in the wild for three days, but it exceeded my curiosity to effort threshold.
By default it needs both Homebrew and Ports on Mac. I’m not going to install Ports just to get libcrypto++, so I manually installed it in /usr/local, but after adding the install path to the CFLAGS/LFLAGS in the root Makefile it still failed to find libcrypto at a later stage. It turns out the compiler arguments are hard-coded into the source code. I noped out at that point.
Filed it for you: https://github.com/jzimmerman/langcc/issues/3
Have you considered making an issue? I’m not trying to give you work - I can make a stub issue if it’s not objectionable to you.
I considered it, but assumed they would already be aware. If you think it’s worthwhile though, go for it.
Issue is closed with a comment - https://github.com/jzimmerman/langcc/issues/3#issuecomment-1256510884
This is cool, a parser generator that can handle “real languages” is a very worthy goal!
I have long wanted to write a parser generator extracted from the style of parsing shell I use:
How to Parse Shell Like a Programming Language
which is basically lexer modes, recursive descent AND grammars, and Zephyr ASDL. I think shell is one of the hardest real languages to parse, if not the THE hardest! So it’s a good test case for “declarative syntax”.
This same style should be able to parse almost any language, although of course C and C++ have the additional type feedback (“the lexer hack”). That would interesting to see addressed as well.
I haven’t read the papers yet, but I don’t see how he can automatically generate natural AST classes, given the data I see in the grammar file?
And I would like to know if there is any formalism behind the “lexer modes” in the syntax definition! (Oil uses a particular style of that, but most parser generators like TreeSitter also have some ad hoc mechanisms for more powerful lexers)
I read the paper about the tool, which basically doubles as the documentation. The parser rules with a “.” In the name treat the part after the dot as a subtype (union type) of the AST node, so if you have rules for A.X and A.Y it emits an AST node class A with variants X and Y. (Theres a companion library that implements the AST base classes with support for variants.)
The lexer modes are a stack, so the “push” operation switches to another mode until a “pop”.