Thank you! Someone else mentioned troff, but the similarities didn’t quite click until you shared that specific article explaining the directives. Thanks! Will incorporate.
I will stay out of the mater in discussion. Inline formating vs optional appended formating.
What I really enjoyed and will point out, is how properly you layed out you idea. What motivate you to write this, and all the clear reasoning for why you built this the way you did. All pros and cons clearly analysied.
I wish software development would have more of this culture. So many useless projects, so much unnecessary overhead that could be avoided if people had a better culture of thinking before opening their IDE.
Keep up.
Historically I haven’t been so good at thinking things through and getting ideas into well structured symbols. I’ve been making a sustained effort to improve. So thank you for the kind words. Maybe it means I’ve made at least a little progress.
This is why I’m grateful for a lot of academic work (especially in the field I’m interested in: type systems and programming languages). While there are many issues with the academic publishing system, I never cease to be grateful for the level of quality of work you can find there. It’s always extremely helpful to see a comparison to prior work, and a specification sketched out, along with motivating examples etc. I remember reading this post thinking ‘this reminds me of standoff markup’ and then being pleasantly surprised to see that mentioned under a “Related work” section. Props to the OP for doing this!
In general, there are two ways of storing formatted text:
A data structure that contains a sequence of formatting directives and strings.
A string and a map from ranges to formatting directives.
These are largely interchangeable in terms of expressiveness, some operations are more efficient on one or the other. The second is common in word processors because you can store the strings as twines, ropes, or similar for easy insert and you can make the range elements relative to the start or end of a fragment so you don’t need to update ranges when you update the text.
Most markup languages are a serialisation of the former. This has a bunch of advantages for a medium that is intended for streaming I/O (you can display text as soon as you’ve read it and you can serialise it out by just walking whatever data structure you have). RTF and Word’s OOXML are (more or less) serialisations of the latter kind of data structure.
The down side of the second kind as a serialisation format is that it’s very easy for the formatting and the text to get out of sync. If you insert a character somewhere in the middle of a document all of the formatting directives that apply to ranges after that must be updated. This makes it terrible for human editing, but not too bad if your model is to read an entire document into an in-memory data structure, modify it, and then write it out to disk.
Aftertext tries to avoid this problem in two ways:
It puts the markup after each paragraph, so you only ever have to update local things.
It uses a copy of the text itself as a range marker.
The first of these is quite a nice property, though it has the side effect that markup can’t span a paragraph boundary (and so it can’t express things like a global style for paragraph indent). The second is that it’s ambiguous consider the following toy example:
This renders as “Hello ’ello ‘ello”. There’s no way currently that I could see of saying ‘the second match of this expression’. One could easily be added (e.g. italic@2 ello) but than falls back to the original problem: if you add any text to this paragraph that contains a marked-up substring then you need to update all of the formatting for this block.
All of that said, the author says that this is an experiment to demonstrate the power of the tool. There is a huge value in the second form of markup language for things that are intended to be edited by tooling and it’s great to see that the back end can support things like this. I wouldn’t recommend this specific representation but it’s a great (simple) tech demo for the back end.
I’d expect if this proves useful something like a globalAftertext node type would be added.
Your 'ello example is a great example and a footgun I just hit. Since each directive has its own scope, I imagine I might add some properties to directives, such as:
aftertext
Hello 'ello 'ello
italics ello
index -1
italics ello
(tldr: HTML encoded in whitespace at the end of the line. Not practical except as an intermediate format where you can’t guarantee that the audience has a parser.)
So clever^! I had not seen that before. It reminds me of the Whitespace esolang except with a sensible use case.
He also links to “Out of band encoding for HTML” — https://www.templetons.com/tech/oob.html — which is a different way to address the same problem Aftertext tackles except using a “cursor” approach instead of text selectors.
Will update. Thanks for helping flesh out my references in new directions.
^ For the time. Nowadays not practical, as you say.
This seems to assume there’s a clear separation between the content and the markup, but I don’t think that’s always the case. For example, everyone recognizes that if you remove the quotes from a sentence, you are liable to change its meaning. But the same could be said about a quote block or code block. And to a lesser extent: italics, which change the emphasis of a sentence, but in extreme cases that emphasis can probably dramatically change the meaning of the sentence.
EDIT: Also, hmm, how do bulleted/numered lists work in this system?
that emphasis can probably dramatically change the meaning of the sentence.
Great point! Thanks.
I updated the text with `Another problem of Aftertext is when markup is semantic and not just an augmentation. “I did not say that is different from “I did not say that”. Without embedded markup in these situations meaning could be lost.”
Also, hmm, how do bulleted/numered lists work in this system?
Aftertext and lists are currently orthogonal node types in Scroll. That is a footgun. I haven’t given much love to lists yet. There is only basic support, demo’d here
I was just thinking it reminded me of Tree Language when I noticed it was rooted in scroll.
The most likely reason why this is a bad idea is that it simply doesn’t matter whether it’s a good idea or not. You could argue that improvements to markup syntax are inconsequential.
It’s good to explore. I feel like the people who argue this will unironically post xkcd 927 in the process.
I’m pulling on a related thread (but with inline markup). I’m curious how you think about semantic v. presentational markup, in this context?
I’m curious how you think about semantic v. presentational markup, in this context?
In this context, I’m happy to have a simple new way to add presentational markup without worrying about cluttering the text, but I’m more excited about an easier way to experiment with new semantic markups, like footnotes or asides.
In terms of semantics though I’m much more excited about a thing upstream of Aftertext working on next.
I’m pulling on a related thread (but with inline markup).
Always excited to see new langs, if you wanted to share.
EDIT: Sorry I think I misread your question, after seeing justinpombrio’s question about semantically meaningful markup (vs presentational). Will address that below.
I’m not quite sure what it is. For now it is a few DSLs piggybacking on d★mark and CSS. The thread I’m pulling on is about being able to granularly single-source different kinds of documentation (to whatever extent is reasonable). I say it’s a related thread because it’s also focused on getting presentation out of the text (but the semantic markup stays).
(I do have a very small PoC in a test repo with outputs in a CI artifact if you’re really curious. It’s mixed in with samples of 2 other approaches for comparison and the files aren’t really organized, but if you start from wordswurst.nix you should be able to pick out the relevant files. The other 2 approaches have a similar .nix files.)
Not yet. Perhaps the downside is balanced by the upside of paying more attention to each markup. But to be fair I think I need a few more months of personal usage data. It may be annoying.
A nice thing about HTML (and the reason it was picked up by JSX) is that it has both attributes and children. You can think about <tag attr=value>children</tag> as equivalent to tag(*children, attr=value) in Python-like languages. The aftertext format doesn’t seem to have a good way to represent K-V pairs. It only has tags and children and the children are mixed up because they also describe the text they are modifying. There are a lot of ways to add KV pairs to aftertext, but I would think about how shell does it: tag --attr value $children. Using --attr to mark keys is pretty easy to type (that’s why shell uses it), and simple enough to understand.
I posted something about this in more detail already, but out-of-line representations are generally not appropriate for streaming parsing, they are much better for building a data structure that is then cheap to modify and serialising that data structure. This is what you want to do for a word processor, it’s not what you want to do for a (read-only) document renderer.
I don’t think aftertext is ever going to be used for serious document creation, so probably anything will work. If someone writes a novel in aftertext, the chapters can just be individual files, and it would be fine.
In terms of the selection, right now it’s VERB QUOTE, where the QUOTE is an unmarked selector, but since the selector is mandatory for every verb, I think it would make more sense to do SELECTOR VERB MODIFIERS. So for
Hello 'ello 'ello
italic ello
You would instead do
Hello 'ello 'ello
"ello" italic
Note the quotation marks, which can signal the beginning and end of a selector. Once selectors are a mini-language, you can have other kinds of selectors, like "ello"[3] or whatever.
And going back my point from before you can have attributes, like
I don’t think aftertext is ever going to be used for serious document creation
I’m curious if you’d be willing to expand on this a little more? What are your favorite langs for serious document creation, and are there any features in particular that make a key difference?
Agree!
Visually it reminds me of
troff(1)
directives.Thank you! Someone else mentioned troff, but the similarities didn’t quite click until you shared that specific article explaining the directives. Thanks! Will incorporate.
I will stay out of the mater in discussion. Inline formating vs optional appended formating. What I really enjoyed and will point out, is how properly you layed out you idea. What motivate you to write this, and all the clear reasoning for why you built this the way you did. All pros and cons clearly analysied.
I wish software development would have more of this culture. So many useless projects, so much unnecessary overhead that could be avoided if people had a better culture of thinking before opening their IDE. Keep up.
Historically I haven’t been so good at thinking things through and getting ideas into well structured symbols. I’ve been making a sustained effort to improve. So thank you for the kind words. Maybe it means I’ve made at least a little progress.
This could be a post of its own.
This is why I’m grateful for a lot of academic work (especially in the field I’m interested in: type systems and programming languages). While there are many issues with the academic publishing system, I never cease to be grateful for the level of quality of work you can find there. It’s always extremely helpful to see a comparison to prior work, and a specification sketched out, along with motivating examples etc. I remember reading this post thinking ‘this reminds me of standoff markup’ and then being pleasantly surprised to see that mentioned under a “Related work” section. Props to the OP for doing this!
In general, there are two ways of storing formatted text:
These are largely interchangeable in terms of expressiveness, some operations are more efficient on one or the other. The second is common in word processors because you can store the strings as twines, ropes, or similar for easy insert and you can make the range elements relative to the start or end of a fragment so you don’t need to update ranges when you update the text.
Most markup languages are a serialisation of the former. This has a bunch of advantages for a medium that is intended for streaming I/O (you can display text as soon as you’ve read it and you can serialise it out by just walking whatever data structure you have). RTF and Word’s OOXML are (more or less) serialisations of the latter kind of data structure.
The down side of the second kind as a serialisation format is that it’s very easy for the formatting and the text to get out of sync. If you insert a character somewhere in the middle of a document all of the formatting directives that apply to ranges after that must be updated. This makes it terrible for human editing, but not too bad if your model is to read an entire document into an in-memory data structure, modify it, and then write it out to disk.
Aftertext tries to avoid this problem in two ways:
The first of these is quite a nice property, though it has the side effect that markup can’t span a paragraph boundary (and so it can’t express things like a global style for paragraph indent). The second is that it’s ambiguous consider the following toy example:
This renders as “Hello ’ello ‘ello”. There’s no way currently that I could see of saying ‘the second match of this expression’. One could easily be added (e.g.
italic@2 ello
) but than falls back to the original problem: if you add any text to this paragraph that contains a marked-up substring then you need to update all of the formatting for this block.All of that said, the author says that this is an experiment to demonstrate the power of the tool. There is a huge value in the second form of markup language for things that are intended to be edited by tooling and it’s great to see that the back end can support things like this. I wouldn’t recommend this specific representation but it’s a great (simple) tech demo for the back end.
Well framed context and restatement of Aftertext.
I’d expect if this proves useful something like a
globalAftertext
node type would be added.Your
'ello
example is a great example and a footgun I just hit. Since each directive has its own scope, I imagine I might add some properties to directives, such as:The logical extreme of this is proletext: https://www.templetons.com/tech/proletext.html
(tldr: HTML encoded in whitespace at the end of the line. Not practical except as an intermediate format where you can’t guarantee that the audience has a parser.)
So clever^! I had not seen that before. It reminds me of the Whitespace esolang except with a sensible use case.
He also links to “Out of band encoding for HTML” — https://www.templetons.com/tech/oob.html — which is a different way to address the same problem Aftertext tackles except using a “cursor” approach instead of text selectors.
Will update. Thanks for helping flesh out my references in new directions.
^ For the time. Nowadays not practical, as you say.
This seems to assume there’s a clear separation between the content and the markup, but I don’t think that’s always the case. For example, everyone recognizes that if you remove the quotes from a sentence, you are liable to change its meaning. But the same could be said about a quote block or code block. And to a lesser extent: italics, which change the emphasis of a sentence, but in extreme cases that emphasis can probably dramatically change the meaning of the sentence.
EDIT: Also, hmm, how do bulleted/numered lists work in this system?
Great point! Thanks.
I updated the text with `Another problem of Aftertext is when markup is semantic and not just an augmentation. “I did not say that is different from “I did not say that”. Without embedded markup in these situations meaning could be lost.”
Aftertext and lists are currently orthogonal node types in Scroll. That is a footgun. I haven’t given much love to lists yet. There is only basic support, demo’d here
And to a greater extent, missing strikethroughs can pretty greatly change the meaning of a piece of text.
I was just thinking it reminded me of Tree Language when I noticed it was rooted in scroll.
It’s good to explore. I feel like the people who argue this will unironically post xkcd 927 in the process.
I’m pulling on a related thread (but with inline markup). I’m curious how you think about semantic v. presentational markup, in this context?
In this context, I’m happy to have a simple new way to add presentational markup without worrying about cluttering the text, but I’m more excited about an easier way to experiment with new semantic markups, like footnotes or asides.
In terms of semantics though I’m much more excited about a thing upstream of Aftertext working on next.
Always excited to see new langs, if you wanted to share.
EDIT: Sorry I think I misread your question, after seeing justinpombrio’s question about semantically meaningful markup (vs presentational). Will address that below.
I haven’t “published”, yet. It’ll go up at https://github.com/abathur/wordswurst when I do, but I am shaving other yaks for the past ~month.
I’m not quite sure what it is. For now it is a few DSLs piggybacking on d★mark and CSS. The thread I’m pulling on is about being able to granularly single-source different kinds of documentation (to whatever extent is reasonable). I say it’s a related thread because it’s also focused on getting presentation out of the text (but the semantic markup stays).
(I do have a very small PoC in a test repo with outputs in a CI artifact if you’re really curious. It’s mixed in with samples of 2 other approaches for comparison and the files aren’t really organized, but if you start from wordswurst.nix you should be able to pick out the relevant files. The other 2 approaches have a similar .nix files.)
Interesting. I’ll look forward to inspecting when you publish it.
Thanks for the link to d★mark!
I have also considered this… The only problem is that you need to keep the text and the markup in sync. Have you found this to be amnoying?
Not yet. Perhaps the downside is balanced by the upside of paying more attention to each markup. But to be fair I think I need a few more months of personal usage data. It may be annoying.
A nice thing about HTML (and the reason it was picked up by JSX) is that it has both attributes and children. You can think about
<tag attr=value>children</tag>
as equivalent totag(*children, attr=value)
in Python-like languages. The aftertext format doesn’t seem to have a good way to represent K-V pairs. It only has tags and children and the children are mixed up because they also describe the text they are modifying. There are a lot of ways to add KV pairs to aftertext, but I would think about how shell does it:tag --attr value $children
. Using--attr
to mark keys is pretty easy to type (that’s why shell uses it), and simple enough to understand.I posted something about this in more detail already, but out-of-line representations are generally not appropriate for streaming parsing, they are much better for building a data structure that is then cheap to modify and serialising that data structure. This is what you want to do for a word processor, it’s not what you want to do for a (read-only) document renderer.
I don’t think aftertext is ever going to be used for serious document creation, so probably anything will work. If someone writes a novel in aftertext, the chapters can just be individual files, and it would be fine.
In terms of the selection, right now it’s VERB QUOTE, where the QUOTE is an unmarked selector, but since the selector is mandatory for every verb, I think it would make more sense to do SELECTOR VERB MODIFIERS. So for
You would instead do
Note the quotation marks, which can signal the beginning and end of a selector. Once selectors are a mini-language, you can have other kinds of selectors, like
"ello"[3]
or whatever.And going back my point from before you can have attributes, like
I’m curious if you’d be willing to expand on this a little more? What are your favorite langs for serious document creation, and are there any features in particular that make a key difference?