Lots of “meh” on this one. Congrats, you re-invented adding a templating language on top of Markdown, then wove it a bit deeper so parsing both at once leads to your AST but lost the programmatic features of most templating systems. Hum, so? Who does this help over an SSG like zola (or your favorite SSG) built on Markdown with a templating engine on top?
Their templating language, at first glance, appears to be using Liquid syntax, is it just a set of Liquid macros?
The existence of things like this really highlights for me the big limitation of Markdown: there’s no way of extending the semantic markup easily. For headings, lists, and links, Markdown is fine. GitHub-flavoured Markdown has per-language code blocks, but not inline per-language code spans and there’s no good way of adding them.
In contrast, DocBook makes this kind of thing easy. For the FreeBSD Handbook, for example, there are custom XML tags for things like man page entries. The down side is that DocBook is not human readable or human writeable.
I’d love to see something a bit more in the middle. For my own books, I tend to use TeX-style markup (e.g. \cppcode{some C++ code}, \ccode{some C code}, \keyword{semantic markup}, with a regular syntax that I can either implement as LaTeX macros or parse to generate something else.
I’d love to see something a bit more in the middle.
You are looking for AsciiDoc(tor) I suppose? It is essentially a DocBook, whose surface syntax is almost a super-set of markdow. Specifically, AsciiDoctor is plain-text lightweight concrete syntax for HTML-like DOM with arbitrary nested nodes with attributes. It’s not directly HTML though – to get HTML, you implement a mapping from AsciiDoctor’s DOM to HTML. Ditto for DocBook XML.
Syntax for nesting + syntax for attributes + user-controllable transformation steps give enough of flexibility to do whatever, without resorting to inserting raw HTML. And the surface syntax is rather tastefully design.
AsciiDoc does look close, but it also looks pretty verbose for custom markup. The thing that I like about something TeX based is that it’s three extra characters on top of the name of the macro. If I want syntax-highlighted C++ text, I write \cppcode{virtual}. The macro name is cppcode, the only extra typing that I need to do is \, {, and }, to indicate the start of a macro name, and the start and end of the argument. With XML, I'd write something like virtualor possiblyvirtual`, which I wouldn’t want to type without an XML-aware editor (and which is annoying to read). I think AsciiDoc may be similar to TeX, but skimming the manual I couldn’t find how to write custom macros (the ‘inline macros’ section just tells me about the predefined ones).
In AsciiDoctor, you’d write this as [.cppcode]`virtual` – this is built-in monospace with cppcode role attached. Or, if you want to decorate non-specific inline element, [.cppcode]#virtual#. Curiously, this won’t be a macro – cppcode would be attached as an attribute to the relevant inline element. It would be up to convertor into the specific output format to interpret this role.
AsciiDoctor also has macros (bits of code which are run during construction of dom during parsing). With a macro, that would look like cppcode:[virtual] (matches TeX in the number of characters!). macro surface syntax is a cute hack: in http://example.com[this is a link], the http: is a name of the macro which receives //exaple.com as an argument. And image:/path/to/file.png is an image, which is way easier to remeber than markdown syntax. Although I like macro syntax, I hate the semantics – I think I wish that everything were just inert attrs in the dom, and that all the logic were in the convertor from dom to a particular format.
TeX’s syntax does look nice for inline elements, but things like
\begin{itemize}
\item one
\item two
\item three
\end{itemize}
are pretty horrible in comparison to
* one
* two
* three
That’s I think is the reason I like asciidoctor – it has (admittedly, poorly specified) a sane general tree-shaped document model inside, but enough syntax sugar (well, maybe a bit too much) and syntactical variety on top of it to make authoring pleasant.
In AsciiDoctor, you’d write this as [.cppcode]virtual – this is built-in monospace with cppcode role attached. Or, if you want to decorate non-specific inline element, [.cppcode]#virtual#. Curiously, this won’t be a macro – cppcode would be attached as an attribute to the relevant inline element. It would be up to convertor into the specific output format to interpret this role.
That does look pretty nice, thanks.
TeX’s syntax does look nice for inline elements, but things like [list examples]
The TeX example looks bad here but it has the nice property that it generalises. Itemised lists, enumerated lists, description lists, and any kind of user-defined collection can have the same syntax. The \begin{} / \end{} syntax is a bit verbose, but it’s used for so many things that I just have F2 in vim bound to a small macro that inserts a block with the token under the cursor used in the begin and end parts and switches to insert point with the cursor between the two.
Asciidoctor also allows general tree-shaped things. For example, you can do something like
[my-list]
--
[item]
one
[item]
two
[item]
three
--
This isn’t exactly equivalent to * (lists are first-class in the AST), but allows expressing arbitrary structure. Subjectively, AsciiDoctor scores high on make simple (common) things simple, and complex things possible. Though, it perhaps has too many general mechanisms for complex things.
My current plan is to wait until standardization effort proceeds to a meaningful spec with the grammar: https://projects.eclipse.org/proposals/asciidoc-language. Quality-of-the-implementation is a weak link today (there’s essentially CPython situation – one dominating featureful impl, which leads to a lot of impl-defined corners). With the spec, I hope to see an embedable asciidoc parser, and a bigger variety of converters.
My ducks aren’t quite in a row to write about this, but carpe diem…
There is at least one language here: D★Mark. It’s basically just the parser, but the author has at least one tool built atop it called MarkBook. The original implementation is in Ruby, and the author also has a Rust implementation (though I haven’t used that one). The author notes that its syntax is TeX-inspired.
I’ve wanted something similar for a while, so this Fall I bit the bullet and started prototyping something of my own (wordswurst) atop D★Mark (I wrote a Python parser for D★Mark in the process). I’m trying to scratch a specific itch: using a single documentation source to generate outputs for formats with differing whitespace semantics (~normal template languages fall over, here). To that end, I also want something that is both lightweight and as-purely-semantic as possible.
I’m not 100% certain wordswurst will use D★Mark in the long term, but it’s helped me focus while prototyping.
D★Mark looks nice. For me, the big motivator was wanting semantic markup in code snippets. For the ePub editions of my second Objective-C book, I was able to use libclang to parse all of the example code and then use the clang token kinds as classes in the HTML so that I could use CSS to style local variables, class names, macros, and so on. This was motivated by the previous book, where the publisher had generated the HTML from the PDF that LaTeX produced and mangled a lot of the code.
I also had a lot of custom macros for things like keywords (should be italicised and appear in the index), or abbreviations (should appear as a cross-reference in the index and be expanded on first use in a chapter).
The clang integration isn’t something I’d expect to have out-of-the-box for any system, but it’s something that I’d like to be able to easily write a plugin for. Writing my own parser for the subset of TeX that I used for semantic markup was very easy (though I didn’t use TeX’s math mode for anything). As a side effect, libclang is reliable only with complete valid compilation units and so all of the code snippets in the book had to be pulled from a real source file, which guaranteed that they actually compiled and worked.
As we finalise the design for Verona, I’m going to need to start writing a Verona book soon and I’d like to be able to use more modern tooling (and not reinvent the wheel).
Lots of “meh” on this one. Congrats, you re-invented adding a templating language on top of Markdown, then wove it a bit deeper so parsing both at once leads to your AST but lost the programmatic features of most templating systems. Hum, so? Who does this help over an SSG like
zola
(or your favorite SSG) built on Markdown with a templating engine on top?Their templating language, at first glance, appears to be using Liquid syntax, is it just a set of Liquid macros?
The existence of things like this really highlights for me the big limitation of Markdown: there’s no way of extending the semantic markup easily. For headings, lists, and links, Markdown is fine. GitHub-flavoured Markdown has per-language code blocks, but not inline per-language code spans and there’s no good way of adding them.
In contrast, DocBook makes this kind of thing easy. For the FreeBSD Handbook, for example, there are custom XML tags for things like man page entries. The down side is that DocBook is not human readable or human writeable.
I’d love to see something a bit more in the middle. For my own books, I tend to use TeX-style markup (e.g.
\cppcode{some C++ code}, \ccode{some C code}, \keyword{semantic markup}
, with a regular syntax that I can either implement as LaTeX macros or parse to generate something else.You are looking for AsciiDoc(tor) I suppose? It is essentially a DocBook, whose surface syntax is almost a super-set of markdow. Specifically, AsciiDoctor is plain-text lightweight concrete syntax for HTML-like DOM with arbitrary nested nodes with attributes. It’s not directly HTML though – to get HTML, you implement a mapping from AsciiDoctor’s DOM to HTML. Ditto for DocBook XML.
Syntax for nesting + syntax for attributes + user-controllable transformation steps give enough of flexibility to do whatever, without resorting to inserting raw HTML. And the surface syntax is rather tastefully design.
AsciiDoc does look close, but it also looks pretty verbose for custom markup. The thing that I like about something TeX based is that it’s three extra characters on top of the name of the macro. If I want syntax-highlighted C++ text, I write
\cppcode{virtual}
. The macro name iscppcode
, the only extra typing that I need to do is\
,{
, and}, to indicate the start of a macro name, and the start and end of the argument. With XML, I'd write something like
virtualor possibly
virtual`, which I wouldn’t want to type without an XML-aware editor (and which is annoying to read). I think AsciiDoc may be similar to TeX, but skimming the manual I couldn’t find how to write custom macros (the ‘inline macros’ section just tells me about the predefined ones).In AsciiDoctor, you’d write this as
[.cppcode]`virtual`
– this is built-in monospace withcppcode
role attached. Or, if you want to decorate non-specific inline element,[.cppcode]#virtual#
. Curiously, this won’t be a macro –cppcode
would be attached as an attribute to the relevant inline element. It would be up to convertor into the specific output format to interpret this role.AsciiDoctor also has macros (bits of code which are run during construction of dom during parsing). With a macro, that would look like
cppcode:[virtual]
(matches TeX in the number of characters!). macro surface syntax is a cute hack: inhttp://example.com[this is a link]
, thehttp:
is a name of the macro which receives//exaple.com
as an argument. Andimage:/path/to/file.png
is an image, which is way easier to remeber than markdown syntax. Although I like macro syntax, I hate the semantics – I think I wish that everything were just inert attrs in the dom, and that all the logic were in the convertor from dom to a particular format.TeX’s syntax does look nice for inline elements, but things like
are pretty horrible in comparison to
That’s I think is the reason I like asciidoctor – it has (admittedly, poorly specified) a sane general tree-shaped document model inside, but enough syntax sugar (well, maybe a bit too much) and syntactical variety on top of it to make authoring pleasant.
That does look pretty nice, thanks.
The TeX example looks bad here but it has the nice property that it generalises. Itemised lists, enumerated lists, description lists, and any kind of user-defined collection can have the same syntax. The
\begin{}
/\end{}
syntax is a bit verbose, but it’s used for so many things that I just have F2 in vim bound to a small macro that inserts a block with the token under the cursor used in the begin and end parts and switches to insert point with the cursor between the two.Asciidoctor also allows general tree-shaped things. For example, you can do something like
This isn’t exactly equivalent to
*
(lists are first-class in the AST), but allows expressing arbitrary structure. Subjectively, AsciiDoctor scores high on make simple (common) things simple, and complex things possible. Though, it perhaps has too many general mechanisms for complex things.Nice, thanks! It looks as if learning AsciiDoctor should be quite high up my TODO list.
My current plan is to wait until standardization effort proceeds to a meaningful spec with the grammar: https://projects.eclipse.org/proposals/asciidoc-language. Quality-of-the-implementation is a weak link today (there’s essentially CPython situation – one dominating featureful impl, which leads to a lot of impl-defined corners). With the spec, I hope to see an embedable asciidoc parser, and a bigger variety of converters.
My ducks aren’t quite in a row to write about this, but carpe diem…
There is at least one language here: D★Mark. It’s basically just the parser, but the author has at least one tool built atop it called MarkBook. The original implementation is in Ruby, and the author also has a Rust implementation (though I haven’t used that one). The author notes that its syntax is TeX-inspired.
I’ve wanted something similar for a while, so this Fall I bit the bullet and started prototyping something of my own (wordswurst) atop D★Mark (I wrote a Python parser for D★Mark in the process). I’m trying to scratch a specific itch: using a single documentation source to generate outputs for formats with differing whitespace semantics (~normal template languages fall over, here). To that end, I also want something that is both lightweight and as-purely-semantic as possible.
I’m not 100% certain wordswurst will use D★Mark in the long term, but it’s helped me focus while prototyping.
D★Mark looks nice. For me, the big motivator was wanting semantic markup in code snippets. For the ePub editions of my second Objective-C book, I was able to use libclang to parse all of the example code and then use the clang token kinds as classes in the HTML so that I could use CSS to style local variables, class names, macros, and so on. This was motivated by the previous book, where the publisher had generated the HTML from the PDF that LaTeX produced and mangled a lot of the code.
I also had a lot of custom macros for things like keywords (should be italicised and appear in the index), or abbreviations (should appear as a cross-reference in the index and be expanded on first use in a chapter).
The clang integration isn’t something I’d expect to have out-of-the-box for any system, but it’s something that I’d like to be able to easily write a plugin for. Writing my own parser for the subset of TeX that I used for semantic markup was very easy (though I didn’t use TeX’s math mode for anything). As a side effect, libclang is reliable only with complete valid compilation units and so all of the code snippets in the book had to be pulled from a real source file, which guaranteed that they actually compiled and worked.
As we finalise the design for Verona, I’m going to need to start writing a Verona book soon and I’d like to be able to use more modern tooling (and not reinvent the wheel).