1. 18
    1. 6

      YAML has a variety of quoting mechanisms for strings. One of them, block scalar, maps quite nicely to the author’s indent quoting, e.g.

      helptext: >
          Usage: This is the first line
              This is the second line
          ^--- This whitespace is part of the string.
          <--- This whitespace is the current indent.
          We can include """, ''', " and ' just as they
          are, so we can quote any code this way,
          whether python or other.

      However, it gets complicated. A full description of the above might be written block scalar folded, with clip chomping. I regularly need to refer back to these, then use some trial and error

      1. 1

        Yes, this is similar. Maybe the only difference is the handling of line breaks at the end of the string. YAML seems to have three ways to do it, none of which can represent all strings:

        • Single newline at end (clip): exactly one newline
        • No newline at end (strip): exactly zero newlines
        • All newlines from end (keep): one or more newlines

        Indent quoting would be able to represent all strings using the same notation:

        The end of the string is represented by a newline, followed by an indentation decrease.

        In YAML terms, this would probably be expressed as All newlines from end except the last one.

    2. 5

      djb, 1997 or so:

      I have discovered that there are two types of command interfaces in the world of computing: good interfaces and user interfaces.

      The essence of user interfaces is parsing—converting an unstructured sequence of commands, in a format usually determined more by psychology than by solid engineering, into structured data. When another programmer wants to talk to a user interface, he has to quote: convert his structured data into an unstructured sequence of commands that the parser will, he hopes, convert back into the original structured data.

      This situation is a recipe for disaster. The parser often has bugs: it fails to handle some inputs according to the documented interface. The quoter often has bugs: it produces outputs that do not have the right meaning. Only on rare joyous occasions does it happen that the parser and the quoter both misinterpret the interface in the same way.

      (SECURITY, included in the qmail tarball.)

    3. 4

      The primary example would be representing the length of the string followed by the string itself, which is very common in protocols and formats that are mostly read and written by machines. To use this technique in programming languages would mean making humans with no interest in counting symbols do just that, which is so painful that escaping is a better tradeoff.

      Someone’s never had to deal with ancient FORTRAN.

      1. 2

        You mean Hollerith constants in FORTRAN 66?

        4HCorr, 4Hect,, 4H I h, 4Had n, 4Hot s, 4Heen , 4Hthis, 4H bef, 4Hore.

        1. [Comment removed by author]

    4. 2

      One issue with PostgreSQL style quoting is that it can’t be parsed by a context-free grammar, which means you can’t use off-the-shelf parsing libraries to work with your language. That’s not necessarily a deal-breaker, but if you want a language that a lot of tools will want to work with, it may be a consideration.

      An alternative dynamic quoting style that is context free (so far as I can tell) is Rust’s, which looks like r"foo" or r#"foo"bar"# or r##"foo"#bar"##, etc. That is, an r, zero or more hashes, a quote, the text, another quote, then the same number of hashes. It’s not quite as pretty as the PostgreSQL one, but easier to implement.

      1. 6

        Predating Rust, there are the long strings that were introduced in Lua 5.1 (released in early 2006). The equivalents to your examples would be something like [[foo]], [=[foo]]bar]=], and [==[foo]=]bar]==].

        It’s a neat idea in either language because you effectively have an infinite number of delimiter pairs to choose from, so you can always find one that doesn’t exists in your string literal.

        1. 2

          in perl you can also pick your own quote char if you use q or qw:

          perl -e ‘print “hello?”, q[hello!], q!hello?!, qphellop, q.hello.’

    5. 2

      This multiple levels of escaping is nonsense, in my opinion, although this is nothing against the author.

      I’m disappointed that a better option, my preferred option, wasn’t mentioned: The beginning and ending character can be doubled to provide for a single instance. Follows is an example:


      In APL, this evaluates to a string with a single quote. Follows is another example:


      In Ada, this is a string containing a single double quote.

      Phrased differently, this is my preferred way to consider it, this is the simple idea that a string containing the string character itself can be represented by two strings juxtaposed, with the joining point becoming the string character in a new string that is the combination.

      It’s disappointing Common Lisp uses escaping instead of this much nicer way.

      1. 4

        I too prefer doubling the quote character over backslash + quote, since I personally find it more intuitive and aestetic.

        However, this is still escaping, at least in the way I used the term in the article—the quote character does not represent itself but instead is a dispatch character. It will still suffer some downsides of escaping, e.g. copying the text between the start and end quote characters will not put the actual string in your clipboard.

      2. 2

        The author appears to be aware of this method of escaping quotes:

        In PostgreSQL, apostrophe (’) is escaped as two apostrophes (’’).

        I first saw this kind of escaping in Plan 9’s rc(1):

        A quoted word is a sequence of characters surrounded by single quotes (’). A single quote is represented in a quoted word by a pair of quotes (’’).