Better advice is to not use regular expressions, but use parser combinators, since they actually compose.
The power of a regex is that it can be compiled down to an FSM, which can be significantly faster. As a personal plug, I wrote an FSM generator that uses combinators: https://github.com/ztellman/automat. I haven’t seen anything else in this vein (even Ragel is a DSL rather than composable pieces), and I’m not sure why.
Oh you’ve just reminded me we need an equivalent of automat for Haskell. Thanks.
There are already some libraries for this:
More would be even better.
Does it compile to a FSM?
Yes, it compiles to a nondeterministic FSM.
Why non-deterministic? A deterministic one would be much more useful.
Would it? Doesn’t the powerset construction say it doesn’t really matter?
An NFA by itself doesn’t give you anything. You need to transform it into a DFA, minimize, and define an (efficient) execution model for the DFA. Anything shy of that is just a homework exercise.
Agreed. And really no more trouble to use. The example at the end could have been expressed using a peg library with barely any more typing.