Let’s implement a function twice(n) that doubles the input. Like any sane person, we whip out our Python and write out a function in the way every human intuitively writes a doubling function.

def twice(n):
    if n==0:
    return twice(n-1)+2

Sure enough, twice(1)=2 and twice(10)=20, so we have experimentally verified we implemented the function correctly, at least correctly enough to satisfy most tech interviews.

In fact, we can even find out what twice(1000) is! It seems like in Python, the number 2000 is represented as

twice(1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in twice
  File "<stdin>", line 4, in twice
  File "<stdin>", line 4, in twice
  [Previous line repeated 995 more times]
  File "<stdin>", line 2, in twice
RecursionError: maximum recursion depth exceeded in comparison

Our first instinct as coders is to cry, then google the error message. This takes us to the Stack Overflow page or any of the ones like it. It tells that Python has a built in value that limits how deep a function can recurse. This magical number is set to 1000 by default, which honestly is rather weird because it’s not even a power of 2.

Thankfully, there is a function in python that lets us increment this value. Simply run

import sys
sys.setrecursionlimit(65536)
twice(1000)

And now we get 2000! We can even try out twice(65536) which results in

Segmentation fault (core dumped)

We have reached a stack overflow in Python. There is absolutely no indication as to where this stack came into being within the twice function we wrote. To properly understand the stack, imagine you are climbing a mountain. You are carrying a backpack around with you filled with useful tools you need to make the climb. This backpack is the stack. It contains all the local variables you have defined and might need on your trip. We start with the number n in our backpack which is honestly a pretty reasonable thing to store if we want to calculate 2*n. As we’re walking along, we find a rock: n-1. As we enter this recursive call, we decide to put this rock in our backpack and carry it with us. It’s a pretty rock. Not that heavy and it’ll make for a nice souvenir. But then we encounter rock n-2 and rock n-3 and so on. If n is small enough, we might be able to carry around the rocks with us and dump them all when we get back home. Otherwise, we will collapse on the side of the mountain trail, just like Python did.

Now one solution you may suggest is carrying less rocks with us. This is a pretty sensible idea that Python doesn’t want to do. We only keep the most recent rock (hopefully the prettiest one) with us, taking out the last rock from our backpack to make some space. This is what people call tail call optimization (TCO). The idea is that the recursive call happens at the “tail” of the function, essentially the last thing we compute. Because of this, we don’t need to keep around the local variables of that particular recursive call as we will never actually use them again. We store just one rock in our backpack. Most people are content with this and it makes sense.

Why are we climbing a mountain if we are feeble computer scientists? I don’t actually know. Let’s go back to the function and see where we are actually putting rocks in our backpack. I will use a syntax more like SML this time and we can work together to evaluate twice(3)

(fun twice n => if n=0 then 0 else twice (n-1) + 2) 3

The word twice within the function is referring to the function itself. It would make sense to expand out the function by copy pasting it into itself.

(fun twice n => if n=0 then 0 else (fun twice n => if n=0 then 0 else twice (n-1) + 2) (n-1) + 2) 3

Now we can step this forward the obvious way

(fun twice n => if n=0 then 0 else (fun twice n => if n=0 then 0 else twice (n-1) + 2) (n-1) + 2) 3
if 3=0 then 0 else (fun twice n => if n=0 then 0 else twice (n-1) + 2) (3-1) + 2
(fun twice n => if n=0 then 0 else twice (n-1) + 2) 2 + 2

And we repeat this again and again

...
(fun twice n => if n=0 then 0 else twice (n-1) + 2) 1 + 2 + 2
...
(fun twice n => if n=0 then 0 else twice (n-1) + 2) 0 + 2 + 2 + 2
...
0 + 2 + 2 + 2
...
6

Cool, where did we encounter a stack? Doesn’t seem like we did when we stepped it through by hand. Why would a computer create a stack out of thin air?

What we do have is an evaluation context. You can clearly see the +2 growing on the side of the expression while we are zooming in on the redux. For example, consider the following rules for the dynamics of function application in an eager setting.

\[\frac{A\longmapsto A'}{A\ B\longmapsto A'\ B}\quad\frac{A\ \mathbf{val}\quad B\longmapsto B'}{A\ B\longmapsto A\ B'}\quad\frac{B\ \mathbf{val}}{(\lambda x. A)\ B\longmapsto [B/x]A}\]

Note that most of these have \(\longmapsto\) in the premises. This means that they are simply “induction” steps that are carrying over the stepping of the redux into the top level. The only interesting rule is the last one. We need a way of expressing that this reduction step can happen within an expression instead of just at the top level. This introduces the notion of an evaluation context where we wrap around the surrounding expression. We can express these rules with the single rule

\[\frac{B\ \mathbf{val}}{E[(\lambda x. A)\ B]\longmapsto E[[B/x]A]}\]

Where \(E\) is an arbitrary closed expression where the next step is the redux in question. Essentially, we are pealing the expression, simplifying the stuff inside, and wrapping the simplified expression back up. The inductive cases are handled by the fact that the evaluation context is arbitrary.

Note, this does not make proofs any easier! Even though we only have one rule, we still have all the same inductive cases as before as we need to prove everything for arbitrary evaluation context. It simply changes the location at which we do the proof. It can make things look nicer on paper sometimes, or hurt in other cases. There are even some cases where we need to express things in evaluation contexts as we can incorporate features into the language where the program knows what its current evaluation context is (letcc/throw/C/A).

This method of expressing evaluation contexts only makes sense in a pen and paper setting where it doesn’t cause efficiency concerns to unwrap and wrap the evaluation context in every step. Bob Harper expresses an alternative for expressing these evaluation contexts using stack frames in PFPL, where we don’t wrap and unwrap in every step and instead express different rules for carrying the results of the evaluation back up the stack frames. While this method relates best to how a computer would actually be executing the code, most PL researchers go with the other version of evaluation contexts because it’s pretty easy to do on paper and being less explicit in the rules of wrapping and unwrapping makes proofs easier.

When we think of a function call, it makes sense to take this evaluation context as a snapshot of the current state of the machine during execution. We set this snapshot aside and execute the function. Once we are done with the function call, we simply wrap up the result with the evaluation context we set aside. This represents returning to the function call. The evaluation context wrapped around the result becomes a new expression that we simply continue to execute.

Evaluation contexts are not exactly the kind of thing that Python or C are dealing with when they do function calls. Instead, we see an interaction between these evaluation contexts and scoping of local state in the stack. We even have a heap! This means our dynamics contain three separate components, potentially looking something like this:

\[H, S, E || e\]

Where \(E\) is the evaluation context and \(e\) is the expression like before, but we also carry around a set of locally scoped variables on the stack \(S\) and a set of globally scoped variables on the heap \(H\). Now we see where our backpack filled with rocks came from! It’s the letter \(S\). When we call a new function, this stack expands to take in all the new local variables needed to run the function, and we shrink it back down once we are done with the function. C does this this “shrinking back down” incorrectly as it allows reference pointers to stack space that has been deleted, causing several likely bugs. Fixing this would require strengthening the theory to include things such as mobile types and a lot of careful case analyzing to make sure we don’t cause these bugs. In conclusion, the stack is a mess.

There is a technique called Continuation Passing Style (CPS) that allows us a more expressive control over the function calls and other components of evaluation by making the wrapping up step of evaluation contexts an explicit procedure within our program. Evaluation contexts are pretty much the same thing as continuations. Because we are able to see the continuations represented within our code, we can see where we can use TCO directly, and in many cases avoid needing to use the stack at all.

There’s so many fun things to do with evaluation contexts! These expressions with a hole in them which we use to wrap and unwrap other expressions show up in a lot of places. It is so much nicer not having to deal with the stack if we don’t have to. Carrying a bag of rocks is tough. Just remember, it is better to think of these programs and function calls as homomorphisms between different programs during evaluation instead of jotting down notes on a stack and treating the code as data as we run. The mathematical elegance of it makes things a lot prettier.