1. 5

  2. 4

    This is progressing towards the rediscovery of regular expressions.

    1. 4

      Indeed. If anyone finds this article interesting and doesn’t know regular expressions (“regexes”) yet, I recommend reading regular-expressions.info/tutorial.html. When I learned regexes from that site I found it to be well-written. The site plugs the author’s own regex-testing tool in between explanations, but you can just use regex101.com, which is free and equally powerful.

      Here’s an example of using a regex in Python to extract text within square brackets:

      import re
      string = "123[4567]890"
      re.search(r'\[(.*)\]', string).group(1)  # evaluates to '4567'
      # You could also write the regex with whitespace for readability:
      # re.search(r'\[ (.*) \]', string, re.X).group(1)

      Regexes have some advantages over the extract DSL defined in the article. They support powerful features such as extracting multiple parts of the text with one search. They are supported by all major programming languages. Most text editors let you use them to search your text. They are also very concise to type. However, they have flaws of their own, particularly how hard they can be to read. So though regexes are useful to learn, they are not the ultimate way of extracting parts of text.

      Here are some projects that aim to improve on regexes (but are much less widespread):

      • Regexes in the Raku language. Raku, formerly known as Perl 6, breaks compatibility with Perl 5’s regex syntax (the same syntax used by most regex engines) in an attempt to make regexes more readable.
      • Egg expressions, or eggexes, are a syntactic wrapper for regexes that will be built into the Oil shell.
      1. 2

        I’d prefer r'\[(.*?)\]' or r'\[([^]]*)\]' to avoid multiple square brackets in the string matching more than expected. Also, in newer versions of Python, you can use [1] instead of .group(1)

        https://www.rexegg.com/ is another great site for learning regexp. And https://github.com/aloisdg/awesome-regex is a good collection of resources for tools, libraries, regexp collections, etc.

        1. 2

          And Parse in Red is also a nice alternative to regexes.

        2. 3

          Perhaps we can coin a new aphorism! Greenspun’s Tenth Zawinski’s Law: Any sufficiently complicated Lisp text processing program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of regular expressions.

          Edit: Or perhaps ‘Every Lisp program attempts to expand until it contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of regular expressions. Programs which cannot so expand are replaced by those which can.’

          1. 2

            We have this in CHICKEN as well, in the slice egg. It’s convenient but not used very often AFAIK.

            1. 2

              Nice DSL!

              Noting here that the extract can be implemented directly in Python too using the array item dunder method since slices are not validated. (Wasn’t clear from the text because OP implemented it in lisp.)

              class String:
                  def __init__(self, s):
                      self.s = s
                  def __getitem__(self, search):
                      start = self.s.find(search.start)
                      stop = self.s.find(search.stop)
                      return self.s[start:stop+1]
              v = String("123[4567]80")["[":"]"]
              1. 1

                I think you’re missing a code fence around Lisp (extract s + (+ "[" "4") 0) ; will return [4567]890