1. 2
  1.  

  2. 1

    with the exception of the zero-width assertions, with only these features it’s possible to match any POSIX regular expression…

    I’m skeptical that zero-width assertions are an exception. What’s an example of a backreference-free POSIX regexp that does not specify a regular language?

    1. 1

      POSIX REs per se don’t have backreferences. The point I’m making in that sentence is that POSIX has zero-width assertions (as a feature, they are compatible with regular grammars) but since the toy regexp engine doesn’t have them, any POSIX regexp that depends on them is not replicable with it.

      1. 1

        POSIX REs per se don’t have backreferences

        Do you have a citation?

        According to http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html :

        9.5.2 RE and Bracket Expression Grammar

        This section presents the grammar for basic regular expressions, including the bracket expression grammar that is common to both BREs and EREs. … %token BACKREF L_ANCHOR R_ANCHOR

        That, as well as other parts of the doc, seem to indicate that back references are part of POSIX regexps.

        1. 1

          Sorry, I meant POSIX extended REs. Only basic REs have backreferences. I’ll clear that up when I update the article.

          (Why does ‘extended’ mean ‘has fewer features’? That’s committee-think for you.)

        2. 1

          any POSIX regexp that depends on [zero-width assertions] is not replicable with [a regexp engine that lacks them].

          I believe this is false. Again, why do you think it isn’t “possible” to transform regexps with zero-width assertions into equivalent regexp without them? Note that your original sentence even specifies, “after applying the appropriate transformation.”

          1. 1

            Can you explain a method to transform arbitrary uses of zero-width assertions (^, $, and \b) into arbitrary regexps without them?

            1. 1

              A straightforward but inefficient method is to convert the source regexp to a DFA, then convert the DFA back to a regexp that uses only union, concatenation, and star.