<!-- λ Λ ∀ ≡ α β ω Ω -->

[[!toc levels=2]]

# Reasoning about evaluation order in Combinatory Logic

We've discussed [[evaluation order|topics/week3_evaluation_order]]
before, primarily in connection with the untyped lambda calculus.
Whenever a term contains more than one redex, we have to choose which
one to reduce, and this choice can make a difference.  For instance,
recall that 

    Ω == ωω == (\x.xx)(\x.xx), so

    ((\x.I)Ω) == ((\x.I)((\x.xx)(\x.xx)))
                   *      *

There are two redexes in this term; we've marked the operative lambdas
with a star.  If we reduce the leftmost redex first, the term reduces
to the normal form `I` in one step.  But if we reduce the rightmost
redex instead, the "reduced" form is `(\x.I)Ω` again, and we are in
danger of entering an infinite loop.

Thanks to the introduction of sum types (disjoint union) in the last lecture, we
are now in a position to gain a deeper understanding of evaluation
order by writing a program that allows us to reasoning explicitly about evaluation.

One thing we'll see is that it is all too easy for the evaluation
order properties of an evaluator to depend on the evaluation order
properties of the programming language in which the evaluator is
written.  We would like to write an evaluator in which the order of
evaluation is insensitive to the evaluator language.  That is, the goal is to
find an order-insensitive way to reason about evaluation order.  We
will not fully succeed in this first attempt, but we will make good
progress.

The first evaluator we will develop will evaluate terms in Combinatory
Logic.  This significantly simplifies the discussion, since we won't
need to worry about variables or substitution.  As we develop and
extend our evaluator in future weeks, we'll switch to lambdas, but for
now, working with the simplicity of Combinatory Logic will make it
easier to highlight evaluation order issues.

A brief review of Combinatory Logic: a term in CL is the combination
of three basic expressions, `S`, `K`, and `I`, governed by the
following reduction rules:

    Ia ~~> a
    Kab ~~> a
    Sabc ~~> ac(bc)

where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
how to embed the untyped lambda calculus in CL, so it's no surprise
that evaluation order issues arise in CL.  To illustrate, we'll use
the following definition:

    skomega = SII
    Skomega = skomega skomega == SII(SII)
            ~~> I(SII)(I(SII))
            ~~> SII(SII)

We'll use the same symbol, `Ω`, for Omega and Skomega: in a lambda
term, `Ω` refers to Omega, but in a CL term, `Ω` refers to Skomega as
defined here.

Just as in the corresponding term in the lambda calculus, CL terms can
contain more than one redex:

    KIΩ == KI(SII(SII))
           *  *

we can choose to reduce the leftmost redex by applying the reduction
rule for `K`, in which case the term reduces to the normal form `I` in
one step; or we can choose to reduce the Skomega part, by applying the
reduction rule `S`, in which case we do not get a normal form, and
we're headed towards an infinite loop.

With sum types, we can define CL terms in OCaml as follows:

    type term = I | K | S | App of (term * term)

    let skomega = App (App (App (S, I), I), App (App (S, I), I))

This type definition says that a term in CL is either one of the three
simple expressions (`I`, `K`, or `S`), or else a pair of CL
expressions.  `App` stands for Functional Application.  With this type
definition, we can encode Skomega, as well as other terms whose
reduction behavior we want to try to control.

Using pattern matching, it is easy to code the one-step reduction
rules for CL:

    let reduce_one_step (t:term):term = match t with
        App(I,a) -> a
      | App(App(K,a),b) -> a
      | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))
      | _ -> t

    # reduce_one_step (App(App(K,S),I));;
    - : term = S
    # reduce_one_step skomega;;
    - : term = App (App (I, App (App (S, I), I)), App (I, App (App (S, I), I)))

The definition of `reduce_one_step` explicitly says that it expects
its input argument `t` to have type `term`, and the second `:term`
says that the type of the output the function delivers as a result will also be of
type `term`.

The type constructor `App` obscures things a bit, but it's still
possible to see how the one-step reduction function is just the
reduction rules for CL.  The OCaml interpreter responses given above show us that the
function faithfully recognizes that `KSI ~~> S`, and that `Skomega ~~>
I(SII)(I(SII))`.

We can now say precisely what it means to be a redex in CL.

    let is_redex (t:term):bool = not (t = reduce_one_step t)

    # is_redex K;;
    - : bool = false
    # is_redex (App(K,I));;
    - : bool = false
    # is_redex (App(App(K,I),S));;
    - : bool = true
    # is_redex skomega;;
    - : book = true

Warning: this definition relies on the accidental fact that the
one-step reduction of a CL term is never identical to the original
term.  This would not work for the untyped lambda calculus, since
`((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.

Note that in order to decide whether two terms are equal, OCaml has to
recursively compare the elements of complex CL terms.  It is able to
figure out how to do this because we provided an explicit definition
of the datatype `term`.

As you would expect, a term in CL is in normal form when it contains
no redexes (analogously for head normal form, weak head normal form, etc.)

In order to fully reduce a term, we need to be able to reduce redexes
that are not at the top level of the term.
Because we need to process subparts, and because the result after
processing a subpart may require further processing, the recursive
structure of our evaluation function has to be somewhat subtle.  To
truly understand, we will need to do some sophisticated thinking
about how recursion works.  

We'll develop our full reduction function in two stages.  Once we have
it working, we'll then consider a variant.  

    let rec reduce_stage1 (t:term):term = 
      if (is_redex t) then reduce_stage1 (reduce_one_step t)
                      else t

If the input is a redex, we ship it off to `reduce_one_step` for
processing.  But just in case the result of the one-step reduction is
itself a redex, we recursively call `reduce_stage1`.  The recursion
will continue until the result is no longer a redex.  We're aiming at
allowing the evaluator to recognize that

    I (I K) ~~> I K ~~> K

When trying to understand how recursive functions work, it can be
extremely helpful to examine an execution trace of inputs and
outputs.

    # #trace reduce_stage1;;
    reduce_stage1 is now traced.
    # reduce_stage1 (App (I, App (I, K)));;
    reduce_stage1 <-- App (I, App (I, K))
      reduce_stage1 <-- App (I, K)
        reduce_stage1 <-- K
        reduce_stage1 --> K
      reduce_stage1 --> K
    reduce_stage1 --> K
    - : term = K

In the trace, "`<--`" shows the input argument to a call to
`reduce_stage1`, and "`-->`" shows the output result.

Since the initial input (`I(IK)`) is a redex, the result after the
one-step reduction is `IK`.  We recursively call `reduce_stage1` on
this input.  Since `IK` is itself a redex, the result after one-step
reduction is `K`.  We recursively call `reduce_stage1` on this input.  Since
`K` is not a redex, the recursion bottoms out, and we return the
result.

But this function doesn't do enough reduction.  We want to recognize
the following reduction path:

    I I K ~~> I K ~~> K

But the reduction function as written above does not deliver this result:

    # reduce_stage1 (App (App (I, I), K));;
    - : term = App (App (I, I), K)

The reason is that the top-level term is not a redex to start with,
so `reduce_stage1` returns it without any evaluation.  

What we want is to
evaluate the subparts of a complex term.  We'll do this by evaluating
the subparts of the top-level expression.

    let rec reduce (t:term):term = match t with
        I -> I
      | K -> K
      | S -> S
      | App (a, b) -> 
          let t' = App (reduce a, reduce b) in
            if (is_redex t') then reduce 2 (reduce_one_step t')
                             else t'

Since we need access to the subterms, we do pattern matching on the
input.  If the input is simple (the first three `match` cases), we
return it without further processing.  But if the input is complex, we
first process the subexpressions, and only then see if we have a redex
at the top level.  

To understand how this works, follow the trace
carefully:

    # reduce (App(App(I,I),K));;
    reduce <-- App (App (I, I), K)

      reduce <-- K          ; first main recursive call
      reduce --> K

      reduce <-- App (I, I)  ; second main recursive call
        reduce <-- I
        reduce --> I
        reduce <-- I
        reduce --> I
        reduce <-- I
        reduce --> I
      reduce --> I

      reduce <-- K           ; third 
      reduce --> K
    reduce --> K
    - : term = K

Ok, there's a lot going on here.  Since the input is complex, the
first thing the function does is construct `t'`.  In order to do this,
it must reduce the two main subexpressions, `II` and `K`.  

There are three recursive calls to the `reduce` function, each of
which gets triggered during the processing of this example.  They have
been marked in the trace.  

The don't quite go in the order in which they appear in the code,
however!  We see from the trace that it begins with the right-hand
expression, `K`.  We didn't explicitly tell it to begin with the
right-hand subexpression, so control over evaluation order is starting
to spin out of our control.  (We'll get it back later, don't worry.)

In any case, in the second main recursive call, we evaluate `II`.  The
result is `I`.  

At this point, we have constructed `t' == App(I,K)`.  Since that's a
redex, we ship it off to reduce_one_step, getting the term `K` as a
result.  The third recursive call checks that there is no more
reduction work to be done (there isn't), and that's our final result.

You can see in more detail what is going on by tracing both reduce
and reduce_one_step, but that makes for some long traces.

So we've solved our first problem: `reduce` now recognizes that `IIK ~~>
K`, as desired.

Because the OCaml interpreter evaluates each subexpression in the
course of building `t'`, however, it will always evaluate the right
hand subexpression, whether it needs to or not.  And sure enough,

    # reduce (App(App(K,I),skomega));;
      C-c C-cInterrupted.

Running the evaluator with this input leads to an infinite loop, and
the only way to get out is to kill the interpreter with control-c.

Instead of performing the leftmost reduction first, and recognizing 
that this term reduces to the normal form `I`, we get lost endlessly
trying to reduce Skomega.

## Laziness is hard to overcome

To emphasize that our evaluation order here is at the mercy of the
evaluation order of OCaml, here is the exact same program translated
into Haskell.  We'll put them side by side to emphasize the exact parallel.

<pre>
OCaml                                                          Haskell
==========================================================     =========================================================

type term = I | S | K | App of (term * term)                   data Term = I | S | K | App Term Term deriving (Eq, Show)      
                                                                                                                             
let skomega = App (App (App (S,I), I), App (App (S,I), I))     skomega = (App (App (App S I) I) (App (App S I) I))                      
                                                                                                                             
                                                               reduce_one_step :: Term -> Term                                      
let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of                                      
    App(I,a) -> a                                                App I a -> a                                                      
  | App(App(K,a),b) -> a                                         App (App K a) b -> a                                              
  | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))             App (App (App S a) b) c -> App (App a c) (App b c)                      
  | _ -> t                                                       _ -> t                                                      
                                                                                                                             
                                                               is_redex :: Term -> Bool                                      
let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)                      
                                                                                                                             
                                                               reduce :: Term -> Term                                              
let rec reduce (t:term):term = match t with                    reduce t = case t of                                              
    I -> I                                                       I -> I                                                      
  | K -> K                                                       K -> K                                                      
  | S -> S                                                       S -> S                                                      
  | App (a, b) ->                                                 App a b ->                                                       
      let t' = App (reduce a, reduce b) in                          let t' = App (reduce a) (reduce b) in                      
        if (is_redex t') then reduce (reduce_one_step t')             if (is_redex t') then reduce (reduce_one_step t')      
                         else t'                                                       else t'                                
</pre>

There are some differences in the way types are made explicit, and in
the way terms are specified (`App(a,b)` for Ocaml versus `App a b` for
Haskell).  But the two programs are essentially identical.

Yet the Haskell program finds the normal form for `KIΩ`:

    *Main> reduce (App (App K I) skomega)
    I

Woa!  First of all, this is wierd.  Haskell's evaluation strategy is
called "lazy".  Apparently, Haskell is so lazy that even after we've
asked it to construct t' by evaluating `reduce a` and `reduce b`, it
doesn't bother computing `reduce b`.  Instead, it waits to see if we
ever really need to use the result.

So the program as written does NOT fully determine evaluation order
behavior.  At this stage, we have defined an evaluation order that
still depends on the evaluation order of the underlying interpreter.

There are two questions we could ask: 

* Can we adjust the OCaml evaluator to exhibit lazy behavior?  

* Can we adjust the Haskell evaluator to exhibit eager behavior?  

The answer to the first question is easy and interesting, and we'll
give it right away.  The answer to the second question is also
interesting, but not easy.  There are various tricks available in
Haskell we could use (such as the `seq` operator, or the `deepseq`
operator), but a fully general, satisifying resolution will have to
wait until we have Continuation Passing Style transforms.

The answer to the first question (Can we adjust the OCaml evaluator to
exhibit lazy behavior?) is quite simple:

<pre>
let rec reduce_lazy (t:term):term = match t with
    I -> I
  | K -> K
  | S -> S
  | App (a, b) -> 
      let t' = App (reduce_lazy a, b) in
        if (is_redex t') then reduce_lazy (reduce_one_step t')
                         else t'
</pre>

There is only one small difference: instead of setting `t'` to `App
(reduce a, reduce b)`, we omit one of the recursive calls, and have
`App (reduce a, b)`.  That is, we don't evaluate the right-hand
subexpression at all.  Ever!  The only way to get evaluated is to
somehow get into functor position.

    # reduce3 (App(App(K,I),skomega));;
    - : term = I
    # reduce3 skomega;;
    C-c C-cInterrupted.

The evaluator now has no trouble finding the normal form for `KIΩ`,
but evaluating skomega still gives an infinite loop.

We can now clarify the larger question at the heart of
this discussion: 

*How can we can we specify the evaluation order of a computational
system in a way that is completely insensitive to the evaluation order
of the specification language?*

As a final note, we should mention that the evaluators given here are
absurdly inefficient computationally.  Some computer scientists have
trouble even looking at code this inefficient, but the emphasis here
is on getting the concepts across as simply as possible.