The author might be a bit confused about what “local variables” are in awk. Technically there are no local variables. Only function parameters are local to the function. All other variables are global. And function parameters are passed by value if scalar and by reference if array name. Thus there are no locally declared arrays.
Yeah, I agree with @apg – you seem to just be restating what the author is saying (I don’t think he’s confused at all). Also, I don’t think your last sentence is correct:
Thus there are no locally declared arrays.
You can definitely have a locally-declared array. From the POSIX spec:
If fewer arguments are supplied in a function call than are in the function definition, the extra parameters that are used in the function body as scalars shall evaluate to the uninitialized value until they are otherwise initialized, and the extra parameters that are used in the function body as arrays shall be treated as uninitialized arrays where each element evaluates to the uninitialized value until otherwise initialized.
So if you use a parameter inside a function as an array, but the caller doesn’t pass that array, AWK creates a fresh new local array each call (which works recursively too). It’s still stack-based, so still doesn’t need a GC. Consider this recursive function:
$ cat t.awk
BEGIN { f(2) }
function f(n, a) {
a[n] = 1
for (k in a) print k, a[k]
print "---"
if (n) f(n-1)
}
$ awk -f t.awk
2 1
---
1 1
---
0 1
---
Note how each time it’s called, the a array only has one element. A fresh one is created each time == local arrays.
Compare that to this one, where we pass in a single global array from above, and note how the array gets larger as we add to it:
$ cat t2.awk
BEGIN { f(2, a) }
function f(n, a) {
a[n] = 1
for (k in a) print k, a[k]
print "---"
if (n) f(n-1, a)
}
$ awk -f t2.awk
2 1
---
1 1
2 1
---
0 1
1 1
2 1
---
which mentions Perl-style regex literals. Though I don’t claim the solution generalizes or you should use it – it’s best with hand-written parsers, not generated ones. Although I would like to see generated parsers solve that problem too [1].
BTW Oil also has regex literals called Egg Expressions, and the funny thing is that our syntax is more like traditional Lex, not Perl, which means we avoid precisely the lexer mode problem.
For example this can be lexed exactly like a Python expression.
var ip_address = / d+ '.' d+ '.' d+ '.' d+ /
Namely
the d is an identifier
the + is now a postfix operator, but it’s still an operator lexed the same way
the '.' is a string literal
spaces behave as spaces in normal Python expressions. They’re not literal spaces; you use them for readability
But this can’t be lexed that way:
ip_address = /\d+\.\d+\.\d+\.\d+/ # also has a "leaning toothpick problem" :)
You need to change the mode of the lexer or some similar solution. That’s not the whole purpose of Eggex (composability is another), but it’s one benefit!
The author might be a bit confused about what “local variables” are in awk. Technically there are no local variables. Only function parameters are local to the function. All other variables are global. And function parameters are passed by value if scalar and by reference if array name. Thus there are no locally declared arrays.
Yeah, I agree with @apg – you seem to just be restating what the author is saying (I don’t think he’s confused at all). Also, I don’t think your last sentence is correct:
You can definitely have a locally-declared array. From the POSIX spec:
So if you use a parameter inside a function as an array, but the caller doesn’t pass that array, AWK creates a fresh new local array each call (which works recursively too). It’s still stack-based, so still doesn’t need a GC. Consider this recursive function:
Note how each time it’s called, the
a
array only has one element. A fresh one is created each time == local arrays.Compare that to this one, where we pass in a single global array from above, and note how the array gets larger as we add to it:
Agreed. I might have needed a bit more coffee.
I’m now confused. How is what you are saying different from what the author says?
Glad the blog was helpful! Regarding the lexer problem you mentioned, also see
When Are Lexer Modes Useful?
which mentions Perl-style regex literals. Though I don’t claim the solution generalizes or you should use it – it’s best with hand-written parsers, not generated ones. Although I would like to see generated parsers solve that problem too [1].
BTW Oil also has regex literals called Egg Expressions, and the funny thing is that our syntax is more like traditional Lex, not Perl, which means we avoid precisely the lexer mode problem.
For example this can be lexed exactly like a Python expression.
Namely
'.'
is a string literalBut this can’t be lexed that way:
You need to change the mode of the lexer or some similar solution. That’s not the whole purpose of Eggex (composability is another), but it’s one benefit!
Other posts tagged lexing: https://www.oilshell.org/blog/tags.html?tag=lexing#lexing
Wiki: https://github.com/oilshell/oil/wiki/Why-Lexing-and-Parsing-Should-Be-Separate
[1] Funny thing is that I recently noticed that tree-sitter bash has to invoke a little chunk of C++ to solve some lexing problems !! https://github.com/tree-sitter/tree-sitter-bash/tree/master/src