Threads for dutc

  1. 24

    I agree with most of what’s said in the article. However, it misses a bit the forest for the trees, at least considering the introduction. Programs that take seconds to load or display a list are not just ignoring cache optimizations. They’re using slow languages (or language implementations, for the pedantics out there) like cpython where even the simplest operations require a dictionary lookup, or using layers and layers of abstractions like electron, or making http requests for the most trivial things (I suspect it’s what makes slack slow; I know it’s what makes GCP’s web UI absolutely terrible). A lot of bad architectural choices, too.

    Cache optimizations can be important but only as the last step. There’s a lot to be fixed before that, imho.

    1. 16

      Even beyond than that, I think there are more more baseline things going on: Most developers don’t even benchmark or profile. In my experience the most egregious performance problems I’ve seen have been straight-up bugs, and they don’t get caught because nobody’s testing. And the profiler basically never agrees with what I would have guessed the problem was. I don’t disagree with the author’s overall point, but it’s rare to come across a program that’s slow enough to be a problem that doesn’t have much lower hanging fruit than locality issues.

      1. 3

        I agree so much! I’d even say that profiling is one half of the problem (statistical profiling, that is, like perf). The other half is tracing, which nowadays can be done with very convenient tools like Tracy or the chrome trace visualizer (“catapult”) if you instrument your code a bit so it can spit out json traces. These give insights in where time is actually spent.

        1. 1

          Absolutely. Most developers only benchmark if there’s a serious problem, and most users are so inured to bad response times that they just take whatever bad experience they receive and try to use the app regardless. Most of the time it’s some stupid thing the devs did that they didn’t realize and didn’t bother checking for (oops, looks like we’re instantiating this object on every loop iteration, look at that.)

        2. 9

          Programs that take seconds to load or display a list are not just ignoring cache optimizations.

          That’s right. I hammered on the cache example because it’s easy to show an example of what a massive difference it can make, but I did not mean to imply that it’s the only reason. Basically, any time we lose track of what the computer must do, we risk introducing slowness. Now, I don’t mean that having layers of abstractions or using dictionary are inherently bad (they will likely have a performance cost, but it may be reasonable to reach another objective), but we should make these choices intentionally rather than going by rote, by peer pressure, by habit, etc.

          1. 5

            The article implies the programmer has access to low level details like cache memory layout, but if you are programming in Python, Lua, Ruby, Perl, or similar, the programmer doesn’t have such access (and for those languages, the trade off is developer ease). I’m not even sure you get to such details in Java (last time I worked in Java, it was only a year old).

            The article also makes the mistake that “the world is x86”—at work, we still use SPARC based machines. I’m sure they too have cache, and maybe the same applies to them, but micro-optimizations are quite difficult across different architectures (and even across the same family but different generations).

            1. 6

              The article implies the programmer has access to low level details like cache memory layout, but if you are programming in Python, Lua, Ruby, Perl, or similar, the programmer doesn’t have such access

              The level of control that a programmer has is reduced in favor of other tradeoffs, as you said, but there’s still some amount of control. Often, it’s found in those languages best practices. For example, in Erlang one should prefer to use binaries for text rather than strings, because binaries are a contiguous sequence of bytes while strings are linked lists of characters. Another example, in Python it’s preferable to accumulate small substrings in a list and then use the join method rather that using concatenation (full += sub).

              The article also makes the mistake that “the world is x86”—at work, we still use SPARC based machines. I’m sure they too have cache, and maybe the same applies to them, but micro-optimizations are quite difficult across different architectures (and even across the same family but different generations).

              I don’t personally have that view, but I realize that it wasn’t made very clear in the text, my apologies. Basically what I want myself and other programmers to be mindful of is mechanical sympathy — to not lose track of the actual hardware that the program is going to run on.

              1. 4

                I know a fun Python example. Check this yes implementation:

                def yes(s):
                  p = print
                  while True:

                This hot-loop will perform significantly better than the simpler print(s) because of the way variable lookups work in Python. It first checks the local scope, then the global scope, and then the built-ins scope before finally raising a NameError exception if it still isn’t found. By adding a reference to the print function to the local scope here, we reduce the number of hash-table lookups by 2 for each iteration!

                I’ve never actually seen this done in real Python code, understandably. It’s counter-intuitive and ugly. And if you care this much about performance then Python might not be the right choice in the first place. The dynamism of Python (any name can be reassigned, at any time, even by another thread) is sometimes useful but it makes all these lookups necessary. It’s just one of the design decisions that makes it difficult to write a high-performance implementation of Python.

                1. 3

                  That’s not how scoping works in Python.

                  The Python parser statically determines the scope of a name (where possible.) If you look at the bytecode for your function (using dis.dis) you will see either a LOAD_GLOBAL, LOAD_FAST, LOAD_DEREF, or LOAD_NAME, corresponding to global, local, closure, or unknown scope. The last bytecode (LOAD_NAME) is the only situation in which multiple scopes are checked, and these are relatively rare to see in practice.

                  The transformation from LOAD_GLOBAL to LOAD_FAST is not uncommon, and you see it in the standard library: e.g.,

                  I don’t know what current measurements of the performance improvement look like, after LOAD_GLOBAL optimisations in Python 3.9, which reported 40% improvement: (It may be the case that the global-to-local transformation is no longer considered a meaningful speed-up.)

                  Note that the transformation from global-to-local scope, while likely innocuous, is a semantic change. If builtins.print or the global print is modified in some other execution unit (e.g., another thread,) the function will not reflect this (as global lookups can be considered late-bound, which is often desirable.)

                  1. 8

                    I think this small point speaks more broadly to the dissatisfaction many of us have with the “software is slow” mindset. The criticisms seem very shallow.

                    Complaining about slow software or slow languages is an easy criticism to make from the outside, especially considering that the biggest risk many projects face is failure to complete or failure to capture critical requirements.

                    Given a known, fixed problem with decades of computer science research behind it, it’s much easier to focus on performance—whether micro-optimisations or architectural and algorithmic improvements. Given three separate, completed implementations of the same problem, it’s easy to pick out which is the fastest and also happens to have satisfied just the right business requirements to succeed with users.

                    I think the commenters who suggest that performance and performance-regression testing should be integrated into the software development practice from the beginning are on the right track. (Right now, I think the industry is still struggling with getting basic correctness testing and documentation integrated into software development practice.)

                    But the example above shows something important. Making everything static or precluding a number of dynamic semantics would definitely give languages like Python a better chance at being faster. But these semantics are—ultimately—useful, and it may be difficult to predict exactly when and where they are critical to satisfying requirements.

                    It may well be the case that some languages and systems err too heavily on the side of allowing functionality that reduces the aforementioned risks. (It’s definitely the case that Python is more dynamic in design than many users make use of in practice!)

                    1. 2

                      Interesting! I was unaware that the parser (!?) did that optimization. I suppose it isn’t difficult to craft code that forces LOAD_NAME every time (say, by reading a string from stdin and passing it to exec) but I find it totally plausible that that rarely happens in non-pathological code.

                      Hm. For a lark, I decided to try it:

                      >>> def yes(s):
                      ...  exec("p = print")
                      ...  p(s)
                      >>> dis.dis(yes)
                        2           0 LOAD_GLOBAL              0 (exec)
                                    2 LOAD_CONST               1 ('p = print')
                                    4 CALL_FUNCTION            1
                                    6 POP_TOP
                        3           8 LOAD_GLOBAL              1 (p)
                                   10 LOAD_FAST                0 (s)
                                   12 CALL_FUNCTION            1
                                   14 POP_TOP
                                   16 LOAD_CONST               0 (None)
                                   18 RETURN_VALUE
                      >>> yes("y")
                      Traceback (most recent call last):
                        File "<stdin>", line 1, in <module>
                        File "<stdin>", line 3, in yes
                      NameError: name 'p' is not defined
                2. 5

                  and for those languages, the trade off is developer ease

                  I heard Jonathan Blow make this point on a podcast and it stuck with me:

                  We’re trading off performance for developer ease, but is it really that much easier? It’s not like “well, we’re programming in a visual language and just snapping bits together in a GUI, and it’s slow, but it’s so easy we can make stuff really quickly.” Like Python is easier than Rust, but is it that much easier? In both cases, it’s a text based OO language. One just lets you ignore types and memory lifetimes. But Python is still pretty complicated.

                  Blow is probably a little overblown (ha), but I do think we need to ask ourselves how much convenience we’re really buying by slowing down our software by factors of 100x or more. Maybe we should be more demanding for our slow downs and expect something that trades more back for it.

                  1. 2

                    Like Python is easier than Rust, but is it that much easier?

                    I don’t want to start a fight about types but, speaking for myself, Python became much more attractive when they added type annotations, for this reason. Modern Python feels quite productive, to me, so the trade-off is more tolerable.

                    1. 1

                      It depends upon the task. Are you manipulating or parsing text? Sure, C will be faster in execution, but in development?

                      At work, I was told to look into SIP, and I started writing a prototype (or proof-of-concept if you will) in Lua (using LPeg to parse SIP messages). That “proof-of-concept” went into production (and is still in production six years later) because it was “fast enough” for use, and it’s been easy to modify over the years. And if we can ever switch to using x86 on the servers [1], we could easily use LuaJIT.

                      [1] For reasons, we have to use SPARC in production, and LuaJIT does not support that architecture.

                3. 7

                  The trick about cache optimizations is that that can be a case where, sure, individually you’re shaving nanoseconds off, but sometimes those are alarmingly common in the program flow and worth doing before any higher-level fixes.

                  To wit: I worked on a CAD system implemented in Java, and the “small optimization” of switching to a pooled-allocation strategy for vectors instead of relying on the normal GC meant the difference between an unusable application and a fluidly interactive one, simply because the operation I fixed was so core to everything that was being done.

                  Optimizing cache hits for something like mouse move math can totally be worth it as a first step, if you know your workload and what code is in the “hot” path (see also sibling comments talking about profiling).

                  1. 6

                    They’re using slow languages (or language implementations, for the pedantics out there) like cpython where even the simplest operations require a dictionary lookup

                    I take issue with statements like this, because the majority of code in most programs is not being executed in a tight loop on large enough data to matter. The overall speed of a program has more to do with how it was architected than with how well the language it’s written in scores on microbenchmarks.

                    Besides, Python’s performance cost isn’t a just an oversight. It’s a tradeoff that provides benefits elsewhere in flexibility and extensibility. Problems like serialization are trivial because of meta-programming and reflection. Complex string manipulation code is simple because the GC tracks references for you and manages the cleanup. Building many types of tools is simpler because you can easily hook into stuff at runtime. Fixing an exception in a Python script is a far more pleasant experience than fixing a segfault in a C program that hasn’t been built with DWARF symbols.

                    Granted, modern compiled languages like Rust/Go/Zig are much better at things like providing nice error messages and helpful backtraces, but you’re paying a small cost for keeping a backtrace around in the first place. Should that be thrown out in favor of more speed? Depending on the context, yes! But a lot of code is just glue code that benefits more from useful error reporting than faster runtime.

                    For me, the choice in language usually comes down to how quickly I can get a working program with limited bugs built. For many things (up to and including interactive GUIs) this ends up being Python, largely because of the incredible library support, but I might choose Rust instead if I was concerned about multithreading correctness, or Go if I wanted strong green-thread support (Python’s async is kinda meh). If I happen to pick a “fast” language, that’s a nice bonus, but it’s rarely a significant factor in that decision making process. I can just call out to a fast language for the slow parts.

                    That’s not to say I wouldn’t have mechanical sympathy and try to keep data structures flat and simple from the get go, but no matter which language I pick, I’d still expect to go back with a profiler and do some performance tuning later once I have a better sense of a real-world workload.

                    1. 4

                      To add to what you say: Until you’ve exhausted the space of algorithmic improvements, they’re going to trump any microoptimisation that you try. Storing your data in a contiguous array may be more efficient (for search, anyway - wait until you need to insert something in the middle), but no matter how fast you make your linear scan over a million entries, if you can reframe your algorithm so that you only need to look at five of them to answer your query then a fairly simple data structure built out of Python dictionaries will outperform your hand-optimised OpenCL code scanning the entire array.

                      The kind of microoptimisation that the article’s talking about makes sense once you’ve exhausted algorithmic improvements, need to squeeze the last bit of performance out of the system, and are certain that the requirements aren’t going to change for a while. The last bit is really important because it doesn’t matter how fast your program runs if it doesn’t solve the problem that the user actually has. grep, which the article uses as an example, is a great demonstration here. Implementations of grep have been carefully optimised but they suffer from the fact that requirements changed over time. Grep used to just search ASCII text files for strings. Then it needed to do regular expression matching. Then it needed to support unicode and do unicode canonicalisation. The bottlenecks when doing a unicode regex match over a UTF-8 file are completely different to the ones doing fixed-string matching over an ASCII text file. If you’d carefully optimised a grep implementation for fixed-string matching on ASCII, you’d really struggle to make it fast doing unicode regex matches over arbitrary unicode encodings.

                      1. 1

                        The kind of microoptimisation that the article’s talking about makes sense once you’ve exhausted algorithmic improvements, need to squeeze the last bit of performance out of the system, and are certain that the requirements aren’t going to change for a while.

                        To be fair, I think the article also speaks of the kind of algorithmic improvements that you mention.

                      2. 3

                        Maybe it’s no coincidence that Django and Rails both seem to aim at 100 concurrent requests, though. Both use a lot of language magic (runtime reflection/metaprogramming/metaclasses), afaik. You start with a slow dynamic language, and pile up more work to do at runtime (in this same slow language). In this sense, I’d argue that the design is slow in many different ways, including architecturally.

                        Complex string manipulation code is simple because the GC tracks references for you

                        No modern language has a problem with that (deliberately ignoring C). Refcounted/GC’d strings are table stakes.

                        I personally dislike Go’s design a lot, but it’s clearly designed in a way that performance will be much better than python with enough dynamic features to get you reflection-based deserialization.

                      3. 1

                        All the times I had an urge to fire up a profiler the problem was either an inefficient algorithm (worse big-O) or repeated database fetches (inefficient cache usage). Never have I found that performance was bad because of slow abstractions. Of course, this might be because of software I work with (Python web services) has a lot of experiences on crafting good, fast abstractions. Of course, you can find new people writing Python that don’t use them, which results in bad performance, but that is quickly learned away. What is important if you want to write performant Python code, is to use as little of “pure Python” as possible. Python is a great glue language, and it works best when it is used that way.

                        1. 1

                          Never have I found that performance was bad because of slow abstractions.

                          I have. There was the time when fgets() was the culprit, and another time when checking the limit of a string of hex digits was the culprit. The most surprising result I’ve had from profiling is a poorly written or poorly compiled library.

                          Looking back on my experiences, I would have to say I’ve been surprised by a profile result about half the time.

                        2. 1

                          As a pedantic out here, I wanted to say that I appreciate you :)

                        1. 1

                          Does this also count?:

                          fizzbuzz = lambda n: [[str(n), 'Buzz'][n % 5 == 0], ['Fizz', 'Fizz Buzz'][n % 5 == 0]][n % 3 == 0]
                          1. 1

                            No, I think that they would consider the dict lookup to be equivalent to a conditional. This is well supported, as historical advice for implementing a swtich-statement in Python has been to use a dict for dispatch by looking up functions for each case.

                            I think the Python equivalent to the posted code would be…

                            from itertools import cycle, count, islice
                            fizzbuzz = lambda ds={3: 'fizz', 5: 'buzz', 7: 'quux'}: zip(count(), zip(*(cycle([[term], *[[]] * (div - 1)]) for div, term in ds.items())))
                            print(*[*islice(fizzbuzz(), 20)], sep='\n')

                            or the equivalent

                            from itertools import repeat, cycle, repeat, count, islice, chain
                            from functools import reduce
                            from operator import add
                            fizzbuzz = lambda ds={3: 'fizz', 5: 'buzz', 7: 'quux'}: islice(zip(count(), (reduce(add, ts) for ts in zip(*(cycle(chain([[term]], repeat([], div - 1))) for div, term in ds.items())))), 1, None)
                            print(*[*islice((str(n) if not ts else ''.join(ts) for n, ts in fizzbuzz()), 20)], sep='\n')

                            (I’m a little surprised that the posted Haskell version strictly returns strings, rather than some compound type with the original numeric value separate from the matched terms. There’s not much you can do with this output, other than print it to the screen…)

                            1. 2

                              I did not use any dictionaries, it’s a list. I’ll be honest, I also didn’t read the OP; I just thought it would be a neat tiny challenge to come up with something if-less in a Python one-liner :P

                              1. 1

                                Funny, because this was my Python solution. There is a conditional, yes, but no pattern matching… it’s quite minimal.

                                f = "fizz"
                                b = "buzz"
                                fb = f+b
                                block = [None, None, f, None, b, f, None, None, f, b, None, f, None, None, fb]
                                from itertools import cycle
                                list(map(lambda x, y: x or str(y), cycle(block), range(1, 101)))
                            1. 9

                              I’m the furthest thing from being an expert in programming under the regime of a rich type system—I would appreciate hearing from the experts on how these approaches play out in practice.

                              The principle of making illegal states unrepresentable goes back much further than the contemporary interest in type theory, and the benefits have been clear for a long time. (We were talking about this back in the days when I was still a C++ programmer.) Clearly, the proposition is that the pseudo-mathematical nature of a well-founded and theoretically-grounded type system is supposed to result in greater benefits than these previous approaches—presumably in the areas of static verifiability and, e.g., automated testing.

                              That said, this particular post uses a business requirement as its motivating example. While, understandably, the example is chosen for illustrative purposes only, I wonder: what benefits does type-orientated programming give to modeling business logic?

                              In my mind, the limitation of fitting of business constraints to a pseudo-mathematical structure is:

                              1. business constraints live within a non-mathematical domain of human laws and logic, which defy neat & structured modelling. e.g., it seems possible to me that a Contact must have either an EmailContactInfo or a PostalContactInfo unless its some special case (like some VIP contact or a contact under a privacy-sensitive jurisdiction) in which case the constraint is made further complex. (This seems to be because computer programmes are limited representations of an infinitely profound human reality, and even the physical laws that govern natural phenomena are no match for the complexities that can be dreamt up by the human imagination, and then enforced upon us by law or language.)

                              2. given the above, it would make sense that this system might be designed to be one that communicates with stored parties, thus encircling the modelled business constraint with some system design constraint. In other words, in spite of the complexity of the actual business constraint, it would be nonsensical for the system designed for the sole purpose of performing some action from considering entities that are not capable of being processed. What does it mean to store a Contact in a system whose sole purpose is to send mail or e-mail, if that datum has neither EmailContactInfo nor PostalContactInfo? However, in practice, management rarely wants to pay the cost of writing multiple systems (with the corresponding O(n²) complexity of integrating them,) so you get stuck get writing one system that has to encode these constraints very close to the actual business logic itself. In other words, your Contact tracking type gets used in places where it is meaningful to not have the …ContactInfo stored under certain special cases, because it’s used for some purpose distinct from sending mail or e-mail. (Heaven forbid the system is exposed to business users for direct data entry, because you might then see users working around the programme constraints by filling in dummy data in required fields, totally defying the purpose of the constraints. Many large systems I’ve seen almost expect data entry to messy and ad hoc, and don’t even bother with any kind of fancy modelling.)

                              This seems to result in the following designs:

                              1. some outer system with relaxed constraints to capture information, exactly as entered to retain intention and provenance, where these business constraints are modelled not in a pseudo-mathematical, language-level construct like a type system but in a (runtime) application-level construct that allows for the flexibility of unplanned special cases and whose design exposes this logic to an end-user in a human-verifiable form (e.g., prints out a flow-chart of the constraints and their validation state in a form that the lawyer can read.)

                              2. multiple, very small inner systems which enforce constraints more rigidly, by circumscribing the subset of data they accept to those which they can process. Anything that defies being subjected to these constraints gets shunted to some secondary (potentially manual) process.

                              It seems to me that only the latter, inner system would benefit from type-orientated programming/ (It’s often these systems that languages like Python, we we put into a more rigid restricted computational domain like a pandas.DataFrame or a numpy.ndarray.) The benefits at this layer seem obvious, but they also seem more minor, since these systems are necessarily more restricted in their scope—by nature, they’re smaller and more narrowly purposed.

                              My question is:

                              • does type-orientated programming offer a fundamentally different approach than the above?
                              • if not, how does type-orientated model assist with the outer system?
                              1. 4
                                1. 1

                                  To the extent that your chosen link answers the given questions, it seems like the answer is “no”:

                                  • No, type-driven programming is not fundamentally different from dynamically-typed approaches. In particular, we ought to see both dynamically-typed and statically-typed type-driven approaches as the exact same approach, of declaring shapes for abstract data types and then asking our language to ensure for us that values have the correct shape.

                                  • The first bullet was “no”.

                                  To give a concrete example, consider Zephyr ASDL, an abstract syntax description language. Zephyr ASDL is effectively a language for declaring algebraic data. CPython uses Zephyr ASDL for the reference Python abstract syntax. This one description is used both from statically-typed C and from dynamically-typed Python. Similarly, my own language Monte uses Zephyr ASDL for not just the next-generation abstract syntax, but also can be used for any user-level algebraic data; I recently used it for text-user interface widgets and constructive solid geometry.

                                  I hope that this example helps reveal to you that whether a type system is static or dynamic can depend on our vantage point.

                                  1. 2

                                    I understand that the vantage point is when you choose to do the type-checking–at runtime, or before that. Type-driven programming still offers a unique ability though, which is explained in the post I linked to: that is, you can parse less-structured data into more-structured data and then use the type of the parsed value as a guarantee (a proof) that some condition holds.

                                    You could of course choose to not take advantage of this technique–it doesn’t work magically. But if you do it, then the guarantee it gives you is very real, and one you don’t easily get with dynamic typing.

                                2. 2

                                  This is a great comment. I’ve recently written ten or twenty thousand words on this topic. I was going to come here and try to disambiguate all of the various issues, but you’ve done a pretty good job. I will attempt to be brief.

                                  I think the mistake I see on both sides of this discussion is assuming totality of approach. That is, either turn this on or turn it off, now compare the two.

                                  This fails for a variety of reasons, some of which you mention. However if we limit the scope to one important business decision at a time, it doesn’t fail at all. Various business groups may have various ideas of what makes a valid address, but the folks printing billing statements need their work done whether they agree with the other folks or not.

                                  Making illegal states unrepresentative rocks, but it rocks only when appropriately scoped. By the way, this is exactly how we treat default local variables and system types: we start out with a job (method), we pick the system types we need for that job, and then we forget about them for another job. It’s only when we begin using shared structs and classes across various methods that we start running into any kind of problems.

                                  The next logical question is: ok, then how do you manage types across hundreds of business decisions? I have an answer to that, but it’s too much to go into here. There are a couple of ways.

                                1. 1

                                  This talk is such a joke!

                                  I bet I could do a better job of re-adding the print statement to Python. I’d probably implement some equivalent of a reader macro or patch the part in the parser where it “falls off the end.”

                                  1. 1

                                    He’s replaced the referenced section on the book website. It’s available from


                                    The critique above seems a bit thin and comes across a bit pseudo-theoretical—is there any additional context?

                                    1. 3

                                      This is an awesome thread!

                                      This week, I hope to use my nasty assembly-patching hack to make two recent extensions to Python 3 available to folks to play with.

                                      The first extension is decoupling scope-as-visibility from scope-as-evaluation-context, allowing constant expressions to be evaluated either one scope higher or in the highest-level scope. Since we can probably safely assume module-import can occur at any time before its contents' runtimes, this could allow us to encode certain static checks and static optimisations into our code.

                                      Thus, the following would immediately raise a ValueError (day is out of range for month.)

                                      >>> from datetime import date
                                      >>> class Foo:
                                      >>>  def bar(self):
                                      >>>    return !!date(2014, 2, 30) # evaluate at top scope

                                      Similarly, we could fix a specialisation to a fast path in the below (for the lifetime of the programme):

                                       >>> def f(x, y):
                                       >>>  def helper():
                                       >>>    return !{(int, int): fast_path_ii, (float, float): fast_path_ff}.get((type(x), type(y)), slow_path) # evaluate one scope up

                                      The second extension adds ast-literals to Python 3.

                                      >>> e = $(x+y)*w+x
                                      >>> e
                                      sym(+, [sym(×, [sym(+, [sym(x), sym(y)]), sym(w)]), sym(x)])
                                      >>> e(x = $w+3) # bind x
                                      sym(+, [sym(×, [sym(+, [sym(+, [sym(w), 3]), sym(y)]), sym(w)]), sym(+, [sym(w), 3])])
                                      >>> e(x = $w+3)(w = 10) # bind x & w
                                      sym(+, [sym(×, [sym(+, [sym(+, [10, 3]), sym(y)]), 10]), sym(+, [10, 3])])
                                      >>> e(x = $w+3)(w=20)(30) # bind x, w, y
                                      sym(+, [sym(×, [sym(+, [sym(+, [20, 3]), 30]), 20]), sym(+, [20, 3])])
                                      >>> e(x = $w+3)(w=20)(30)() # bind x, w, y, & evaluate

                                      Not sure if either of these are actually useful, but they are fun exercises!