Since the author appears to be using Python, one more tip: if you compile your regex with the re.VERBOSE flag (and if you have a regex as a global variable, you should be compiling it anyway), then Python will ignore all whitespace, and anything from # to the end-of-line.
So instead of this:
NAMED_CAPTURING_VERSION_REGEX = (
r'^v?' # optional leading v
r'(?P<major>[0-9]+)' # major version number
r'\.(?P<minor>[0-9]+)' # minor version number
r'\.(?P<micro>[0-9]+[a-z]?)' # micro version number, plus
# optional build character
(what if you leave a r off the front of one line, or forget a quote?)
…you can write this:
NAMED_CAPTURING_VERSION_REGEX = re.compile(r'''
^v? # optional leading v
[0-9]+ # major version number
[0-9]+ # minor version number
[0-9]+ # micro version number
[a-z]? # optional build character
Ive personally found that if i need to comment a regular expression, the problem it is solving is better solved with a parser.
thats what a regular expression is, isn’t it? A parser generating DSL for limited grammars.
It is, but at a certain point regular expression can become too complex to reason about. In this case I feel its better to use a parser library to perform the same task, since they tend to be written with facilitates to keep the grammar understandable.
One limitation is that while many languages come with a regex module in their standard library, few languages come with a parser-generator. And while even third-party regex libraries usually fall into the “Perl-compatible” or “DFA” camps, there’s any number of approaches to parser-generators. Do you want LL(1)? LR(1)? LR(k)? Shift-reduce, or PEG? Do you want to describe grammars in a DSL, or build them up with library calls? Or maybe you’d rather the parser-combinator approach?
Which is to say, it’s often more readable to use a slightly hairy regex than to force the reader to learn your favourite parser-generator tool, no matter how much more flexible it might be.
I agree with that. I think the main advantage of the parser-generator tools is maintainability, especially if you’re using something more restricted, like an LALR parser generator that statically errors out on ambiguous grammars. But as purely a reader of the source who doesn’t have to maintain it, a well-commented inline regex is often easier than yacc/bison, where you have separate source files with their own syntax, another build step, and a lot of boilerplate.
But it does vary by language. In Haskell, using a parser combinator library often feels like a pretty inline / native bit of code to the reader. Although in that case you don’t get the help with grammar debugging, since parser combinators don’t do the kind of static analysis that table-driven parser generators do.
We were talking about this the other day at work. A coworker said, “Regular expressions in code are write only. If you need to edit it, you just write a new one” :)
But they are magic! :)