with the exception of the zero-width assertions, with only these features it’s possible to match any POSIX regular expression…
I’m skeptical that zero-width assertions are an exception. What’s an example of a backreference-free POSIX regexp that does not specify a regular language?
POSIX REs per se don’t have backreferences. The point I’m making in that sentence is that POSIX has zero-width assertions (as a feature, they are compatible with regular grammars) but since the toy regexp engine doesn’t have them, any POSIX regexp that depends on them is not replicable with it.
This section presents the grammar for basic regular expressions, including the bracket expression grammar that is common to both BREs and EREs.
…
%token BACKREF L_ANCHOR R_ANCHOR
That, as well as other parts of the doc, seem to indicate that back references are part of POSIX regexps.
any POSIX regexp that depends on [zero-width assertions] is not replicable with [a regexp engine that lacks them].
I believe this is false. Again, why do you think it isn’t “possible” to transform regexps with zero-width assertions into equivalent regexp without them? Note that your original sentence even specifies, “after applying the appropriate transformation.”
A straightforward but inefficient method is to convert the source regexp to a DFA, then convert the DFA back to a regexp that uses only union, concatenation, and star.
I’m skeptical that zero-width assertions are an exception. What’s an example of a backreference-free POSIX regexp that does not specify a regular language?
POSIX REs per se don’t have backreferences. The point I’m making in that sentence is that POSIX has zero-width assertions (as a feature, they are compatible with regular grammars) but since the toy regexp engine doesn’t have them, any POSIX regexp that depends on them is not replicable with it.
Do you have a citation?
According to http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html :
That, as well as other parts of the doc, seem to indicate that back references are part of POSIX regexps.
Sorry, I meant POSIX extended REs. Only basic REs have backreferences. I’ll clear that up when I update the article.
(Why does ‘extended’ mean ‘has fewer features’? That’s committee-think for you.)
I believe this is false. Again, why do you think it isn’t “possible” to transform regexps with zero-width assertions into equivalent regexp without them? Note that your original sentence even specifies, “after applying the appropriate transformation.”
Can you explain a method to transform arbitrary uses of zero-width assertions (
^,$, and\b) into arbitrary regexps without them?A straightforward but inefficient method is to convert the source regexp to a DFA, then convert the DFA back to a regexp that uses only union, concatenation, and star.