topics/_week7_eval_cl.mdwn

   1 <!-- λ Λ ∀ ≡ α β ω Ω -->
   2
   3 [[!toc levels=2]]
   4
   5 # Reasoning about evaluation order in Combinatory Logic
   6
   7 We've discussed evaluation order before, primarily in connection with
   8 the untyped lambda calculus.  Whenever a term contains more than one
   9 redex, we have to choose which one to reduce, and this choice can make
  10 a difference.  For instance, in the term `((\x.I)(ωω))`, if we reduce
  11 the leftmost redex first, the term reduces to the normal form `I` in
  12 one step.  But if we reduce the left most redex instead (namely,
  13 `(ωω)`), we do not arrive at a normal form, and are in danger of
  14 entering an infinite loop.
  15
  16 Thanks to the introduction of sum types (disjoint union), we are now
  17 in a position to gain a deeper understanding of evaluation order by
  18 writing a program that experiments with different evaluation order
  19 strategies.
  20
  21 One thing we'll see is that it is all too easy for the evaluation
  22 order properties of an evaluator to depend on the evaluation order
  23 properties of the programming language in which the evaluator is
  24 written.  We will seek to write an evaluator in which the order of
  25 evaluation is insensitive to the evaluator language.  The goal is to
  26 find an order-insensitive way to reason about evaluation order.  We
  27 will not fully succeed in this first attempt, but we will make good
  28 progress.
  29
  30 The first evaluator we will develop will evaluate terms in Combinatory
  31 Logic.  This significantly simplifies the program, since we won't need
  32 to worry about variables or substitution.  As we develop and extend
  33 our evaluator in future weeks, we'll switch to lambdas, but for now,
  34 working with the simplicity of Combinatory Logic will make the
  35 evaluation order issues easier to grasp.
  36
  37 A brief review: a term in CL is the combination of three basic
  38 expressions, `S`, `K`, and `I`, governed by the following reduction
  39 rules:
  40
  41     Ia ~~> a
  42     Kab ~~> b
  43     Sabc ~~> ac(bc)
  44
  45 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
  46 how to embed the untyped lambda calculus in CL, so it's no
  47 surprise that the evaluation order issues arise in CL.  To illustrate,
  48 we'll use the following definition:
  49
  50     skomega = SII(SII)
  51             ~~> I(SII)(I(SII))
  52             ~~> SII(SII)
  53
  54 Instead of the lambda term `Ω`, we'll use the CL term skomega.  We'll
  55 use the same symbol, `Ω`, though: in a lambda term, `Ω` refers to
  56 omega, but in a CL term, `Ω` refers to skomega as defined here.
  57
  58 If we consider the term
  59
  60     KIΩ == KI(SII(SII))
  61
  62 we can choose to reduce the leftmost redex by firing the reduction
  63 rule for `K`, in which case the term reduces to the normal form `I` in
  64 one step; or we can choose to reduce the skomega part, by firing the
  65 reduction rule for the leftmost `S`, in which case we do not get a
  66 normal form, and we're headed towards an infinite loop.
  67
  68 With sum types, we can define terms in CL in OCaml as follows:
  69
  70     type term = I | S | K | FA of (term * term)
  71
  72     let skomega = FA (FA (FA (S, I), I), FA (FA (S, I), I))
  73
  74 This recursive type definition says that a term in CL is either one of
  75 the three simple expressions, or else a pair of CL expressions.
  76 Following Heim and Kratzer, `FA` stands for Functional Application.
  77 With this type definition, we can encode skomega, as well as other
  78 terms whose reduction behavior we want to try to control.
  79
  80 Using pattern matching, it is easy to code the one-step reduction
  81 rules for CL:
  82
  83     let reduce_one_step (t:term):term = match t with
  84         FA(I,a) -> a
  85       | FA(FA(K,a),b) -> a
  86       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
  87       | _ -> t
  88
  89     # reduce_one_step (FA(FA(K,S),I));;
  90     - : term = S
  91     # reduce_one_step skomega;;
  92     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
  93
  94 The type constructor `FA` obscures things a bit, but it's still
  95 possible to see how the one-step reduction function is just the
  96 reduction rules for CL.  The OCaml interpreter shows us that the
  97 function faithfully recognizes that `KSI ~~> S`, and `skomega ~~>
  98 I(SII)(I(SII))`.
  99
 100 We can now say precisely what it means to be a redex in CL.
 101
 102     let is_redex (t:term):bool = not (t = reduce_one_step t)
 103
 104     # is_redex K;;
 105     - : bool = false
 106     # is_redex (FA(K,I));;
 107     - : bool = false
 108     # is_redex (FA(FA(K,I),S));;
 109     - : bool = true
 110     # is_redex skomega;;
 111     - : book = true
 112
 113 Warning: this definition relies on the fact that the one-step
 114 reduction of a CL term is never identical to the original term.  This
 115 would not work for the untyped lambda calculus, since
 116 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
 117 order to decide whether two terms are equal, OCaml has to recursively
 118 compare the elements of complex CL terms.  It is able to figure out
 119 how to do this because we provided an explicit definition of the
 120 datatype Term.
 121
 122 As you would expect, a term in CL is in normal form when it contains
 123 no redexes.
 124
 125 In order to fully reduce a term, we need to be able to reduce redexes
 126 that are not at the top level of the term.
 127
 128     (II)I ~~> IK ~~> K
 129
 130 That is, we want to be able to first evaluate the redex `II` that is
 131 a proper subpart of the larger term, to produce a new intermediate term
 132 that we can then evaluate to the final normal form.
 133
 134 Because we need to process subparts, and because the result after
 135 processing a subpart may require further processing, the recursive
 136 structure of our evaluation function has to be somewhat subtle.  To
 137 truly understand, you will need to do some sophisticated thinking
 138 about how recursion works.  We'll show you how to keep track of what
 139 is going on by examining a recursive execution trace of inputs and
 140 outputs.
 141
 142 We'll develop our full reduction function in stages.  Once we have it
 143 working, we'll then consider some variants.  Just to be precise, we'll
 144 distinguish each microvariant with its own index number embedded in
 145 its name.
 146
 147     let rec reduce1 (t:term):term =
 148       if (is_redex t) then reduce1 (reduce_one_step t)
 149                       else t
 150
 151 If the input is a redex, we ship it off to `reduce_one_step` for
 152 processing.  But just in case the result of the one-step reduction is
 153 itself a redex, we recursively call `reduce1`.  The recursion will
 154 continue until the result is no longer a redex.
 155
 156     # #trace reduce1;;
 157     reduce1 is now traced.
 158     # reduce1 (FA (I, FA (I, K)));;
 159     reduce1 <-- FA (I, FA (I, K))
 160       reduce1 <-- FA (I, K)
 161         reduce1 <-- K
 162         reduce1 --> K
 163       reduce1 --> K
 164     reduce1 --> K
 165     - : term = K
 166
 167 Since the initial input (`I(IK)`) is a redex, the result after the
 168 one-step reduction is `IK`.  We recursively call `reduce1` on this
 169 input.  Since `IK` is itself a redex, the result after one-step
 170 reduction is `K`.  We recursively call `reduce1` on this input.  Since
 171 `K` is not a redex, the recursion bottoms out, and we return
 172 the result.
 173
 174 But this function doesn't do enough reduction.
 175
 176     # reduce1 (FA (FA (I, I), K));;
 177     - : term = FA (FA (I, I), K)
 178
 179 Because the top-level term is not a redex, `reduce1` returns it
 180 without any evaluation.  What we want is to evaluate the subparts of a
 181 complex term.
 182
 183     let rec reduce2 (t:term):term = match t with
 184         I -> I
 185       | K -> K
 186       | S -> S
 187       | FA (a, b) ->
 188           let t' = FA (reduce2 a, reduce2 b) in
 189             if (is_redex t') then reduce2 (reduce_one_step t')
 190                              else t'
 191
 192 Since we need access to the subterms, we do pattern matching on the
 193 input term.  If the input is simple, we return it (the first three
 194 match cases).  If the input is complex, we first process the
 195 subexpressions, and only then see if we have a redex.  To understand
 196 how this works, follow the trace carefully:
 197
 198     # reduce2 (FA(FA(I,I),K));;
 199     reduce2 <-- FA (FA (I, I), K)
 200
 201       reduce2 <-- K          ; first main recursive call
 202       reduce2 --> K
 203
 204       reduce2 <-- FA (I, I)  ; second main recursive call
 205         reduce2 <-- I
 206         reduce2 --> I
 207         reduce2 <-- I
 208         reduce2 --> I
 209       reduce2 <-- I
 210
 211       reduce2 --> I           ; third main recursive call
 212       reduce2 --> I
 213
 214       reduce2 <-- K           ; fourth
 215       reduce2 --> K
 216     reduce2 --> K
 217     - : term = K
 218
 219 Ok, there's a lot going on here.  Since the input is complex, the
 220 first thing the function does is construct `t'`.  In order to do this,
 221 it must reduce the two main subexpressions, `II` and `K`.  But we see
 222 from the trace that it begins with the right-hand expression, `K`.  We
 223 didn't explicitly tell it to begin with the right-hand subexpression,
 224 so control over evaluation order is starting to spin out of our
 225 control.  (We'll get it back later, don't worry.)
 226
 227 In any case, in the second main recursive call, we evaluate `II`.  The
 228 result is `I`.  The third main recursive call tests whether this
 229 result needs any further processing; it doesn't.
 230
 231 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
 232 redex, we ship it off to reduce_one_step, getting the term `K` as a
 233 result.  The fourth recursive call checks that there is no more
 234 reduction work to be done (there isn't), and that's our final result.
 235
 236 You can see in more detail what is going on by tracing both reduce2
 237 and reduce_one_step, but that makes for some long traces.
 238
 239 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
 240 K`, as desired.
 241
 242 Because the OCaml interpreter evaluates the rightmost expression
 243 in the course of building `t'`, however, it will always evaluate the
 244 right hand subexpression, whether it needs to or not.  And sure
 245 enough,
 246
 247     # reduce2 (FA(FA(K,I),skomega));;
 248       C-c C-cInterrupted.
 249
 250 Running the evaluator with this input leads to an infinite loop, and
 251 the only way to get out is to kill the interpreter with control-c.
 252
 253 Instead of performing the leftmost reduction first, and recognizing
 254 that this term reduces to the normal form `I`, we get lost endlessly
 255 trying to reduce skomega.
 256
 257 ## Laziness is hard to overcome
 258
 259 To emphasize that our evaluation order here is at the mercy of the
 260 evaluation order of OCaml, here is the exact same program translated
 261 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 262
 263 <pre>
 264 OCaml                                                          Haskell
 265 ==========================================================     =========================================================
 266
 267 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
 268
 269 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
 270
 271                                                                reduce_one_step :: Term -> Term
 272 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
 273     FA(I,a) -> a                                                 FA I a -> a
 274   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
 275   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
 276   | _ -> t                                                       _ -> t
 277
 278                                                                is_redex :: Term -> Bool
 279 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
 280
 281                                                                reduce2 :: Term -> Term
 282 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
 283     I -> I                                                       I -> I
 284   | K -> K                                                       K -> K
 285   | S -> S                                                       S -> S
 286   | FA (a, b) ->                                                 FA a b ->
 287       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
 288         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
 289                          else t'                                                      else t'
 290 </pre>
 291
 292 There are some differences in the way types are made explicit, and in
 293 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
 294 Haskell).  But the two programs are essentially identical.
 295
 296 Yet the Haskell program finds the normal form for `KIΩ`:
 297
 298     *Main> reduce2 (FA (FA K I) skomega)
 299     I
 300
 301 Woa!  First of all, this is wierd.  Haskell's evaluation strategy is
 302 called "lazy".  Apparently, Haskell is so lazy that even after we've
 303 asked it to construct t' by evaluating `reduce2 a` and `reduce2 b`, it
 304 doesn't bother computing `reduce2 b`.  Instead, it waits to see if we
 305 ever really need to use the result.
 306
 307 So the program as written does NOT fully determine evaluation order
 308 behavior.  At this stage, we have defined an evaluation order that
 309 still depends on the evaluation order of the underlying interpreter.
 310
 311 There are two questions we could ask: Can we adjust the OCaml
 312 evaluator to exhibit lazy behavior?  and Can we adjust the Haskell
 313 evaluator to exhibit eager behavior?  The answer to the first question
 314 is easy and interesting, and we'll give it right away.  The answer to
 315 the second question is also interesting, but not easy.  There are
 316 various tricks available in Haskell we could use (such as the `seq`
 317 operator), but a fully general, satisifying resolution will have to
 318 wait until we have Continuation Passing Style transforms.
 319
 320 The answer to the first question (Can we adjust the OCaml evaluator to
 321 exhibit lazy behavior?) is quite simple:
 322
 323 <pre>
 324 let rec reduce3 (t:term):term = match t with
 325     I -> I
 326   | K -> K
 327   | S -> S
 328   | FA (a, b) ->
 329       let t' = FA (reduce3 a, b) in
 330         if (is_redex t') then reduce3 (reduce_one_step t')
 331                          else t'
 332 </pre>
 333
 334 There is only one small difference: instead of setting `t'` to `FA
 335 (reduce a, reduce b)`, we omit one of the recursive calls, and have
 336 `FA (reduce a, b)`.  That is, we don't evaluate the right-hand
 337 subexpression at all.  Ever!  The only way to get evaluated is to
 338 somehow get into functor position.
 339
 340     # reduce3 (FA(FA(K,I),skomega));;
 341     - : term = I
 342     # reduce3 skomega;;
 343     C-c C-cInterrupted.
 344
 345 The evaluator now has no trouble finding the normal form for `KIΩ`,
 346 but evaluating skomega still gives an infinite loop.
 347
 348 As a final note, we can clarify the larger question at the heart of
 349 this discussion:
 350
 351 *How can we can we specify the evaluation order of a computational
 352 system in a way that is completely insensitive to the evaluation order
 353 of the specification language?*
 354
 355
 356 [By the way, the evaluators given here are absurdly inefficient computationally.
 357 Some computer scientists have trouble even looking at code this inefficient, but
 358 the emphasis here is on getting the concepts across as simply as possible.]