“Good” and “bad” are matters of taste. I happen to like Markdown a lot. But yum not saying you’re wrong, nor am I wrong.
Arguing about whether _____ sucks or rules is easy. Articulating the specific ways in which something can lead to problems in one area, despite having benefits in another, takes thought and effort. That’s what I appreciate about this article.
I feel like this is coming full circle to XML again. The PortableText format seems like an awkward way to write XML in JavaScript. And now we get to rediscover the world of namespaces and schemas and validation.
Yeah, if your goal is to store rich text plus custom widgets in a CMS, your best bet is to just store HTML plus custom tags for the widgets (they can be web components or not depending on your stack, but just something your XML parser can handle as needed).
XML is good ♥
The tags and bracket text representation of XML is bad (compared ro SXML or DOM) but fine. It does the job. Markdown can help me write it.
Markdown certainly has its limitations, and I’m open to trying other lightweight markup languages. AsciiDoc looks particularly interesting. But this article also seems to be criticizing the dominance of lightweight markup languages in general, and that’s what I want to push back against.
I think what I like most about writing in a lightweight markup language is the complete transparency of plain text. As long as I’ve got the syntax correct, I know there will be no surprises when it comes to document structure or presentation. And if I make a mistake, fixing that is always a simple matter of adding, removing, or replacing characters. It’s just so simple and predictable, compared to working with a WYSIWYG editor, whether it’s Microsoft Word, the Thunderbird HTML email composer, or something else. Maybe my visual impairment has something to do with my discomfort with WYSIWYG editors. Then again, I do have enough sight to see how the document looks. Still, I’m slow at using the mouse, and I never learned all the keyboard shortcuts for Word, never mind other editors.
It’s true that most non-developers aren’t comfortable with markup languages. But maybe that just means we need to teach the next generation while they’re still young enough to pick up new things easily.
Perhaps the problem is picking markdown for something it’s not optimized to do.
I love markdown. I love the fact that I can write my novel in a text editor and get the little bit of formatting I need via easy to type characters.
I love that my files are plain text, not some franken markup, nor some franken binary.
I love my markdown for write ups that are overwhelmingly text and don’t need font or color changes.
If I had to embed a ton of images and forms and videos I would be using some ephemeral WYSYWIG platform that’ll last just a bit longer than the usefulness of my content, what with all the associated media and all. That’s when I wouldn’t be using plain text.
Even CommonMark implementations don’t behave quite the same. Even different implementations both maintained by John McFarlane don’t behave exactly the same! (I’m tempted to add more than one exclamation mark here, but I won’t ;)
If you are using an external syntax highlighter, it will break if you switch from cmark to pandoc and vice versa, unless you make pandoc behave the same as cmark (at least that’s possible with Lua hooks).
Using Markdown as if it were a protocol language or exchange format is what sucks. Just use it as a tool for yourself to quickly generate data for a more robust format.
gemini://idiomdrottning.org/markdown-good
Strongly agree. I wrote my own markup format for my blog that is a mixture of Markdown and OrgMode, and it exists to make writing of posts easier for me. The entries are never stored in this format but in their final HTML format. That way, I can change the way my markup format works without locking myself into some bad decisions (and it’s already changed a bit from when I first wrote it).
This needs to be a position paper somewhere. That Markdown is a projectional format at the point of use and not a long term protocol. When viewed through that lens, it makes a whole lot more sense.
That only works if your entries are written once, and seldom edited after publication. Works for a blog, less so for documentation that regularly needs to be updated.
For such, I’d be tempted to keep the source format around, and maintain two output formats: the target HTML, and the source format itself. Such that when I change the format, I have an automatic converter that can unlock me out of my bad decisions.
It should be pretty unambiguous what HTML is generated by a string of ordinary Markdown, since it’s all normal semantic stuff like P, EM, A, H1… (I say “it should be” because I haven’t tried, and any problem you haven’t worked on yourself is trivial.)
So it should also be feasible to translate that HTML back to MD. (Not all HTML, just the stuff output by the renderer.) The only bits to be careful about are escaping metacharacters like underscores.
This article seems to be aimed more toward the CMS crowd, but I have one piece of advice for them:
Text is timeless. I can dig out text files from my 1990 DOS computer’s archive, and they’re just as readable as they were then. In my first few years on the mac, I tried out a few “modern” technologies like RTF and now all those files are just binary garbage. (I’m sure I can read them in the future with some decoder app, but for now, they’re just garbage.) I went back to plain text and “commonmark” because I could write a blog post in 2008 and feel confident that I’d still be able to read and edit it 15 years later. (Prediction was correct!)
I suspect that a lot of people who prize commonmark don’t particularly love it, but have had bad experiences with the various fad structured formats over the years, and started thinking long-term. If/when the commonmark fad ends, these files will still be readable and editable by people who have never heard of it, and that matters more to me than having native support for video links to a commercial website.
I suspect that a lot of people who prize commonmark don’t particularly love it, but have had bad experiences with the various fad structured formats over the years, and started thinking long-term.
Exactly!
I used to have a MoinMoin wiki instance, and authored many private and public pages in their wiki syntax. Now, given that MoinMoin is abandoned (it doesn’t work on Python3, and the MoinMoin v2 is still in the works for years), I can’t really migrate my notes because the MoinMoin wiki syntax is non-standard, and there is no parser out there except the one built-in into MoinMoin… (I can say the same for the MediaWiki syntax, which is almost, but not quite, compatible with Creole…)
For a while I’ve used reStructuredText, especially for large and complex documents, but it too suffers from the same issue, there is only one parser out there that understands the syntax, that is the one embedded in the docutils Python software… (And it doesn’t help that Python is such a moving target…)
Therefore, now I try to use almost exclusively CommonMark (without extensions), because regardless of how limiting and annoying it is sometimes, there are countless parsers (especially in compiled languages like Rust and Go, not to mention the official one in C) that will allow me to keep my notes and migrate it from one system to another.
(In fact, regarding wiki systems, if it doesn’t support CommonMark, and it doesn’t use the file-system for storage, I won’t even think about using it, regardless of the features, promises, language, etc.)
Conversations in this domain can be frustrating because there are a lot of different/overlapping interests and historical currents.
I think the one that’s probably the most fundamental and maybe intractable is that on one hand there are many impressive towers-of-babel that we can build with a few CPUs and some semantics to rub together, and on the other–people do not semantic well.
(To be very clear, I’m speaking generally and not in a “Semantic Web” sense.)
There’s a whole nest of reasons we don’t semantic well, but some big ones are:
semantics are hard
semantics get harder as you add more brains
we have a ~legacy of writing practices/formats/tools (for the page, word processors, WYSIWYG tools, LWMLs, etc.) that have poor semantic fidelity and are often laden with distracting presentational features; this ensures vanishingly few people have been forced to semantic well
That said, I think there are some seams to pick at. One thing that could change the weather is more emphasis on building tools that deliver enough visible value to the people doing the authoring to get them to buy in.
I’ve been working on a weird little semantic-first documentation single-sourcing platypus, myself. To the point where I’ve recently started dogfooding it in the real project I started building it to document.
Impressive to get through an entire article on the limitations of and alternatives to markdown without once mentioning restructuredText! I’m writing the new version of learntla in Sphinx/rST and oh my god it is so much better. Unambiguous formatting! Uniform extensive mechanisms! Semantic markup! Intermediate representations! I’ve started even writing my own custom directives and roles and it’s solving problems I didn’t even know I had.
reStructuredText is definitely my least favorite of the lightweight markup languages. It seems to emphasize prettiness of the plain text over ease of writing and editing. The obvious example of this is the requirement that titles be underlined, and that the underline consist of at least as many characters as the title text itself. I much prefer the ### Title text style of headings in Markdown. If I’m not mistaken, AsciiDoc uses a similar convention, albeit with equal signs. This style also makes it much easier to know what level a heading is, without having to remember the underline convention the document is using.
Yeah, the weird title syntax and the lack of nested inline markup are the two big friction points with rST.
On the other hand, custom roles and directives. I defined exercise and solution directives that automatically cross-reference each other and warn me which exercises are missing solutions. More than makes up for the weird syntax decisions.
My solution to that in soupault is that if you work on the HTML level, then you can make extensions that work for any input format that allows embedding HTML-like tags in it. For example, this hyperlinked glossary plugin adds syntax for defining glossaries and converts all <term>something</term> elements to links to the glossary.
Indeed, this technique is what I’ve applied to my own notes / wiki tool – instead of extending the various markup languages I intend to support, I’ve decided to just post-process the resulting HTML for augmentation.
BTW, your soupalt project is really interesting, and it seems perfect for post-processing HTML dumps, like for example obtained from various wiki engines.
KeenWrite is my text editor that takes a slightly different approach than MDX. Rather than include variable definitions within documents, variables are defined in an external file. I find that when variables are embedded into documents, those variables often include controls for presentation logic. To me, any presentation logic meant to affect a plain text document’s appearance does not belong in the document itself. Part 8 of my Typesetting Markdown series shows the power of separating content from presentation by leveraging pandoc’s annotation syntax.
Annotated Markdown is sufficiently powerful to produce a wide variety of different styles. Here are a few such Markdown documents typeset using ConTeXt:
What’s bothersome is how some companies are setting de facto Markdown standards without considering the greater ecosystem. GitHub has done this by introducing the “``` mermaid” syntax, which creates some problems.
It’s so funny to me that people spend all this effort trying to explain to you that markdown is bad.
Listen pal, I know it’s bad–I have to use it all the time! I’m not using it because it’s good. That’s not what’s going on here at all.
“Good” and “bad” are matters of taste. I happen to like Markdown a lot. But yum not saying you’re wrong, nor am I wrong.
Arguing about whether _____ sucks or rules is easy. Articulating the specific ways in which something can lead to problems in one area, despite having benefits in another, takes thought and effort. That’s what I appreciate about this article.
I feel like this is coming full circle to XML again. The PortableText format seems like an awkward way to write XML in JavaScript. And now we get to rediscover the world of namespaces and schemas and validation.
Agreed. We need a koan about “JavaScript is just a poor man’s XML”, akin to the one on objects vs. closures.
Yeah, if your goal is to store rich text plus custom widgets in a CMS, your best bet is to just store HTML plus custom tags for the widgets (they can be web components or not depending on your stack, but just something your XML parser can handle as needed).
XML is good ♥ The tags and bracket text representation of XML is bad (compared ro SXML or DOM) but fine. It does the job. Markdown can help me write it.
Markdown certainly has its limitations, and I’m open to trying other lightweight markup languages. AsciiDoc looks particularly interesting. But this article also seems to be criticizing the dominance of lightweight markup languages in general, and that’s what I want to push back against.
I think what I like most about writing in a lightweight markup language is the complete transparency of plain text. As long as I’ve got the syntax correct, I know there will be no surprises when it comes to document structure or presentation. And if I make a mistake, fixing that is always a simple matter of adding, removing, or replacing characters. It’s just so simple and predictable, compared to working with a WYSIWYG editor, whether it’s Microsoft Word, the Thunderbird HTML email composer, or something else. Maybe my visual impairment has something to do with my discomfort with WYSIWYG editors. Then again, I do have enough sight to see how the document looks. Still, I’m slow at using the mouse, and I never learned all the keyboard shortcuts for Word, never mind other editors.
It’s true that most non-developers aren’t comfortable with markup languages. But maybe that just means we need to teach the next generation while they’re still young enough to pick up new things easily.
Perhaps the problem is picking markdown for something it’s not optimized to do.
I love markdown. I love the fact that I can write my novel in a text editor and get the little bit of formatting I need via easy to type characters. I love that my files are plain text, not some franken markup, nor some franken binary.
I love my markdown for write ups that are overwhelmingly text and don’t need font or color changes.
If I had to embed a ton of images and forms and videos I would be using some ephemeral WYSYWIG platform that’ll last just a bit longer than the usefulness of my content, what with all the associated media and all. That’s when I wouldn’t be using plain text.
Even CommonMark implementations don’t behave quite the same. Even different implementations both maintained by John McFarlane don’t behave exactly the same! (I’m tempted to add more than one exclamation mark here, but I won’t ;)
$ pandoc –no-highlight -f commonmark -t html
^D
^D
If you are using an external syntax highlighter, it will break if you switch from cmark to pandoc and vice versa, unless you make pandoc behave the same as cmark (at least that’s possible with Lua hooks).
Using Markdown as if it were a protocol language or exchange format is what sucks. Just use it as a tool for yourself to quickly generate data for a more robust format. gemini://idiomdrottning.org/markdown-good
Strongly agree. I wrote my own markup format for my blog that is a mixture of Markdown and OrgMode, and it exists to make writing of posts easier for me. The entries are never stored in this format but in their final HTML format. That way, I can change the way my markup format works without locking myself into some bad decisions (and it’s already changed a bit from when I first wrote it).
Yes, that’s great!
This needs to be a position paper somewhere. That Markdown is a projectional format at the point of use and not a long term protocol. When viewed through that lens, it makes a whole lot more sense.
That only works if your entries are written once, and seldom edited after publication. Works for a blog, less so for documentation that regularly needs to be updated.
For such, I’d be tempted to keep the source format around, and maintain two output formats: the target HTML, and the source format itself. Such that when I change the format, I have an automatic converter that can unlock me out of my bad decisions.
I think you can round-trip between MD and HTML.
It should be pretty unambiguous what HTML is generated by a string of ordinary Markdown, since it’s all normal semantic stuff like P, EM, A, H1… (I say “it should be” because I haven’t tried, and any problem you haven’t worked on yourself is trivial.)
So it should also be feasible to translate that HTML back to MD. (Not all HTML, just the stuff output by the renderer.) The only bits to be careful about are escaping metacharacters like underscores.
This article seems to be aimed more toward the CMS crowd, but I have one piece of advice for them:
Text is timeless. I can dig out text files from my 1990 DOS computer’s archive, and they’re just as readable as they were then. In my first few years on the mac, I tried out a few “modern” technologies like RTF and now all those files are just binary garbage. (I’m sure I can read them in the future with some decoder app, but for now, they’re just garbage.) I went back to plain text and “commonmark” because I could write a blog post in 2008 and feel confident that I’d still be able to read and edit it 15 years later. (Prediction was correct!)
I suspect that a lot of people who prize commonmark don’t particularly love it, but have had bad experiences with the various fad structured formats over the years, and started thinking long-term. If/when the commonmark fad ends, these files will still be readable and editable by people who have never heard of it, and that matters more to me than having native support for video links to a commercial website.
Exactly!
I used to have a MoinMoin wiki instance, and authored many private and public pages in their wiki syntax. Now, given that MoinMoin is abandoned (it doesn’t work on Python3, and the MoinMoin v2 is still in the works for years), I can’t really migrate my notes because the MoinMoin wiki syntax is non-standard, and there is no parser out there except the one built-in into MoinMoin… (I can say the same for the MediaWiki syntax, which is almost, but not quite, compatible with Creole…)
For a while I’ve used reStructuredText, especially for large and complex documents, but it too suffers from the same issue, there is only one parser out there that understands the syntax, that is the one embedded in the
docutils
Python software… (And it doesn’t help that Python is such a moving target…)Therefore, now I try to use almost exclusively CommonMark (without extensions), because regardless of how limiting and annoying it is sometimes, there are countless parsers (especially in compiled languages like Rust and Go, not to mention the official one in C) that will allow me to keep my notes and migrate it from one system to another.
(In fact, regarding wiki systems, if it doesn’t support CommonMark, and it doesn’t use the file-system for storage, I won’t even think about using it, regardless of the features, promises, language, etc.)
But RTF isn’t binary? It basically looks like castrated TeX, and every word processor that’s worthwhile supports it.
Looking at one now, you’re right, it’s not binary, but it’s certainly not readable or editable. The whole first page is stuff like
{\pgdsc0\pgdscuse195\pgwsxn12240\pghsxn15840\marglsxn1440\margrsxn1440\margtsxn1440\margbsxn1440\pgdscnx t0 Default;}}
So you would certainly need a word processor or other special tool to deal with it.
I don’t really consider a word processor a special tool, considering it’s what almost everyone used computers for.
I guess I’ll keep opening up Word 97 documents in whatever version of Word is in Office 365. Small price to pay.
Conversations in this domain can be frustrating because there are a lot of different/overlapping interests and historical currents.
I think the one that’s probably the most fundamental and maybe intractable is that on one hand there are many impressive towers-of-babel that we can build with a few CPUs and some semantics to rub together, and on the other–people do not semantic well.
(To be very clear, I’m speaking generally and not in a “Semantic Web” sense.)
There’s a whole nest of reasons we don’t semantic well, but some big ones are:
That said, I think there are some seams to pick at. One thing that could change the weather is more emphasis on building tools that deliver enough visible value to the people doing the authoring to get them to buy in.
I’ve been working on a weird little semantic-first documentation single-sourcing platypus, myself. To the point where I’ve recently started dogfooding it in the real project I started building it to document.
Impressive to get through an entire article on the limitations of and alternatives to markdown without once mentioning restructuredText! I’m writing the new version of learntla in Sphinx/rST and oh my god it is so much better. Unambiguous formatting! Uniform extensive mechanisms! Semantic markup! Intermediate representations! I’ve started even writing my own custom directives and roles and it’s solving problems I didn’t even know I had.
reStructuredText is definitely my least favorite of the lightweight markup languages. It seems to emphasize prettiness of the plain text over ease of writing and editing. The obvious example of this is the requirement that titles be underlined, and that the underline consist of at least as many characters as the title text itself. I much prefer the
### Title text
style of headings in Markdown. If I’m not mistaken, AsciiDoc uses a similar convention, albeit with equal signs. This style also makes it much easier to know what level a heading is, without having to remember the underline convention the document is using.Yeah, the weird title syntax and the lack of nested inline markup are the two big friction points with rST.
On the other hand, custom roles and directives. I defined
exercise
andsolution
directives that automatically cross-reference each other and warn me which exercises are missing solutions. More than makes up for the weird syntax decisions.My solution to that in soupault is that if you work on the HTML level, then you can make extensions that work for any input format that allows embedding HTML-like tags in it. For example, this hyperlinked glossary plugin adds syntax for defining glossaries and converts all
<term>something</term>
elements to links to the glossary.Indeed, this technique is what I’ve applied to my own notes / wiki tool – instead of extending the various markup languages I intend to support, I’ve decided to just post-process the resulting HTML for augmentation.
BTW, your soupalt project is really interesting, and it seems perfect for post-processing HTML dumps, like for example obtained from various wiki engines.
Yes. Dream web framework uses a combination of the odoc documentation generator and soupault (see https://github.com/aantron/dream/tree/master/docs/web), where the odoc output is used as soupault’s input. It does use it to insert odoc-generated HTML in a new page structure though.
It’s also possible to disable the generator part completely (
settings.generator_mode = false
) and use it as a pure post-processor.none of his points convincing at all. I think that commonmark is the best.
KeenWrite is my text editor that takes a slightly different approach than MDX. Rather than include variable definitions within documents, variables are defined in an external file. I find that when variables are embedded into documents, those variables often include controls for presentation logic. To me, any presentation logic meant to affect a plain text document’s appearance does not belong in the document itself. Part 8 of my Typesetting Markdown series shows the power of separating content from presentation by leveraging pandoc’s annotation syntax.
Annotated Markdown is sufficiently powerful to produce a wide variety of different styles. Here are a few such Markdown documents typeset using ConTeXt:
What’s bothersome is how some companies are setting de facto Markdown standards without considering the greater ecosystem. GitHub has done this by introducing the “``` mermaid” syntax, which creates some problems.