1. 30
  1.  

  2. 6

    💝💖💕💗

    1. 9

      Glad you like it :) :) :)

      I’m still looking for more feedback on the Oil language! The prototyping is going really fast right now and there’s a lot of room for (future) users to influence the language, as mentioned in the latest blog post.

      I’ve been posting everything to https://oilshell.zulipchat.com/ because I haven’t had time to write blog posts! But I’ve already changed several things in response to users:

      • lyer Sher (author of NGS) said “this looks funny” a couple times, and trying to explain it to him led me to change the language. It indeed did look funny! (Regexes used to be $/d+/ and not /d+/; the shopt option categories are simpler.)
      • ericbb on Reddit led me to change the capture syntax for eggexes
      • Zulip feedback led me to change the syntax of the push builtin

      I posted about the left-to-right function call syntax here: https://lobste.rs/s/9v0hyj/language_design_unary_operators#c_1rpogh

      Here are some of the recent threads on Zulip:

      • Oil functions look like JavaScript, Go, and Julia
      • Oil uses curly braces
      • procs and funcs are first class
      • inline function calls: $strfunc(x) and @arrayfunc(x, y)
      • Operators: div, xor, and ^ for exponentiation (also mod)
      • cd now takes a Ruby-like block
      • dict literals look like JavaScript
      • d->key is a shortcut for d['key']
      • x = 1+2*3 and lazy evaluation
      • Bool, Int, Float, and null literals
      • Tuple literals (I solved this problem)
      • String literal syntax – raw vs. C, backward compatibility with shell
      • set is an assignment keyword
      • ${} enhancement ideas
      • space sensitivity
      • Where the Oil language is compromised for compatibility

      It’s still early so everything is open to discussion :) The discussion has been really useful so far, even the vague comments :)

      1. 5

        It would be nice if renaming your file to foo.oil instead of foo.sh enforced some of these restrictions!

        Please do so! Please let users just jump straight into Oil if they want, and not have to laboriously enable it piece-by-piece with a list of magic shopts…

        1. 2

          Also, all this stuff is prototyped, so you can actually try it. As always, I make no guarantees that I’ll be able to finish it, but it’s going well so far :)

      2. 5

        This is cool enough I kinda want to implement it myself as a library. Do you have a test suite for just the egg sublanguage itself?

        If not, since it seems to be still in quite a lot of flux, just consider the possibility in the future. ;-)

        1. 6

          The tests I have are here, but they’re mixed in with Oil:

          https://github.com/oilshell/oil/blob/master/spec/oil-regex.test.sh

          There are 26 cases now – it wouldn’t be too hard to copy the patterns out and write a different test harness that only tests matching.

          What language were you thinking about doing it in?

          Ilya Sher mentioned on Zulip that he might want to use a C library. I think it would be pretty easy to port:

          • the lexer to lex
          • the grammar to yacc – I’ve been wanting to work with LALR(1)
          • the ASDL schema has already been automatically translated to C++. It could also be translated to C like CPython does.

          So that strategy would save a ton of work IMO. Using the DSL approach, I would say it’s like 200 lines of lexing, 200 lines of grammar, and 200 lines of ASDL, and then some really simple code to walk the tree. (BTW explicitly making an AST is useful so you can translate to many syntaxes.)

          The grammar and ASDL schema are linked in the doc.

          There are a couple things in flux mentioned on Zulip, but otherwise I think it’s settled. I’ve gotten a significant amount of feedback already.

          Feel free to join https://oilshell.zulipchat.com/ if you want to talk more about it! At least one other person may be interested.

          1. 4

            Thanks, I will check those out! I was intending to do it in Rust, and it would be an interesting exercise to make it expose a C compatible API as well so Oil could theoretically use it. Wouldn’t blame you at all for not wanting yet another language involved though. This was more a “someday it would be neat…” than a “I’m totally going to do this” anyway, so no promises.

            Not terribly interested in Zulip, alas… I don’t need more fragmentation to keep track of in my life. If I ever get around to it I’ll post on the issue tracker and see how people feel.

        2. 4

          One of these things is not like the others:

          • %input_end
          • %last_line_end
          • %end_word

          I really like this syntax for regexes - the simplest things are more complex than PCRE, but the most complex things are still simpler than PCRE.

          1. 3

            Thanks for catching that! Just pushed a fix.

            Glad you like it! This is actually version 2, version 1 being in 2014, and I think it got a lot better. It was a lot more “novel” then but I scaled it back to be more familiar, but still consistent.

          2. 3

            My first reaction was that creating a new regex syntax would complicate things. Regex is complex and confusing as it is, and adding another syntax would complicate adoption.

            I read the “why” section, and all the reasons seem valid.

            @andyc Do you see this new regex syntax becoming widespread in other applications?

            Have other regex syntaxes been created before, other than POSIX or Perl, what happened to them?

            1. 4

              That’s a fair question, I would say:

              (1) Oil is very backward compatible, so you’re not forced to learn anything new if you don’t want to.

              If you already know bash syntax, you can use it. The [[ construct works in Oil!

              https://github.com/oilshell/oil/blob/master/doc/regex-manual.md#oil-is-shorter-than-bash

              (but it seems to be so ugly that people resist learning it)

              You can also use string patterns with Oil:

              if (x ~ '[[:digit:]]+')   # string
              

              Eggexes use / /, but strings are still valid.

              (2) The main reason I thought this made sense is because it integrates seamlessly with egrep, awk, and other tools (see the doc) We don’t have to “boil the ocean” and write new versions of those tools that accept a different syntax.

              I did a previous version of Eggex in 2014 which you could only use from Python, and that wasn’t worth it. I showed it to a few people and that was it.

              (3) This syntax is somewhat familiar if you know lex or re2c. It’s not entirely new. I tried to provide a smooth upgrade path as usual:

              https://github.com/oilshell/oil/blob/master/doc/regex-manual.md#backward-compatibility

              (3) Perl 6 already jumped ship too, with an entirely new and exotic regex syntax. It uses quoted literals and unquoted operators like eggex. Larry Wall said something to the effect of “every language borrowed Perl 5 regex syntax but we’ve moved onto something better”. So I’m not the only one who thinks it’s justified :)

              And Eggex is much more conservative than Perl 6.

              1. 2

                Thank you for your reply, and thank you for your work on Oil. It looks very interesting.

            2. 3

              designed to be translated to any regex dialect

              This reminds me of SRE regex, did you take any inspiration from there? It seems to me that structured REs of some kind would remove a lot of unnecessary pain from writing & parsing regexes.

              Eggex:

              / ~[ a-z A-Z ] /
              

              SRE:

              (w/nocase (- (/"az"))
              
              1. 3

                I think I had heard of it, but I didn’t take inspiration from it. Someone else brought it up, and here is a partial list of the differences:

                https://www.reddit.com/r/ProgrammingLanguages/comments/d82zjq/egg_expressions_oil_regexes/f18ld8c/

                Probably the more direct influence was Russ Cox’s regex articles and RE2’s multi-engine design and front end. There was this pretty exhaustive survey of regex syntax across engines:

                https://github.com/google/re2/blob/master/doc/syntax.txt

                I didn’t talk about this, but 5 years ago, before Oil existed, I tried to do the same thing with Python’s syntax:

                http://www.oilshell.org/annex/cre-syntax.html

                That page may go down at any moment, but the syntax of that older version of Eggex was somewhat based on the RE2 docs.


                In short, the relation is that RE2 aims to support a union of features in a lot of different regex engines, and so does eggex. Although of course eggex is vastly simpler since it’s a language designed to be translated, rather than an engine.

                1. 2

                  Interesting, thanks. Amazing how much work Alex Shinn has done and it basically goes unused. I wonder how many of us are piddling along the same path. :) Ubiquity always wins..

                  With OilShell you are doing a great job of communicating your ideas, which the Emacs/Lisp crowd generally fails to do.