topics/_week7_eval_cl.mdwn

   1 <!-- λ Λ ∀ ≡ α β ω Ω -->
   2
   3 [[!toc levels=2]]
   4
   5 # Reasoning about evaluation order in Combinatory Logic
   6
   7 We've discussed [[evaluation order|topics/week3_evaluation_order]]
   8 before, primarily in connection with the untyped lambda calculus.
   9 Whenever a term contains more than one redex, we have to choose which
  10 one to reduce, and this choice can make a difference.  For instance,
  11 in the term `((\x.I)(ωω))`, if we reduce the leftmost redex first, the
  12 term reduces to the normal form `I` in one step.  But if we reduce the
  13 left most redex instead (namely, `(ωω)`), we do not arrive at a normal
  14 form, and are in danger of entering an infinite loop.
  15
  16 Thanks to the recent introduction of sum types (disjoint union), we are now
  17 in a position to gain a deeper understanding of evaluation order by
  18 writing a program that experiments with different evaluation order
  19 strategies.
  20
  21 One thing we'll see is that it is all too easy for the evaluation
  22 order properties of an evaluator to depend on the evaluation order
  23 properties of the programming language in which the evaluator is
  24 written.  We will seek to write an evaluator in which the order of
  25 evaluation is insensitive to the evaluator language.  The goal is to
  26 find an order-insensitive way to reason about evaluation order.  We
  27 will not fully succeed in this first attempt, but we will make good
  28 progress.
  29
  30 The first evaluator we will develop will evaluate terms in Combinatory
  31 Logic.  This significantly simplifies the discussion, since we won't need
  32 to worry about variables or substitution.  As we develop and extend
  33 our evaluator in future weeks, we'll switch to lambdas, but for now,
  34 working with the simplicity of Combinatory Logic will make the
  35 evaluation order issues easier to grasp.
  36
  37 A brief review: a term in CL is the combination of three basic
  38 expressions, `S`, `K`, and `I`, governed by the following reduction
  39 rules:
  40
  41     Ia ~~> a
  42     Kab ~~> a
  43     Sabc ~~> ac(bc)
  44
  45 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
  46 how to embed the untyped lambda calculus in CL, so it's no
  47 surprise that evaluation order issues arise in CL.  To illustrate,
  48 we'll use the following definition:
  49
  50     skomega = SII(SII)
  51             ~~> I(SII)(I(SII))
  52             ~~> SII(SII)
  53
  54 We'll use the same symbol, `Ω`, though: in a lambda term, `Ω` refers
  55 to omega, but in a CL term, `Ω` refers to skomega as defined here.
  56
  57 If we consider the term
  58
  59     KIΩ == KI(SII(SII))
  60
  61 we can choose to reduce the leftmost redex by applying the reduction
  62 rule for `K`, in which case the term reduces to the normal form `I` in
  63 one step; or we can choose to reduce the skomega part, by applying the
  64 reduction rule `S`, in which case we do not get a normal form, and
  65 we're headed towards an infinite loop.
  66
  67 With sum types, we can define terms in CL in OCaml as follows:
  68
  69     type term = I | S | K | FA of (term * term)
  70
  71     let skomega = FA (FA (FA (S, I), I), FA (FA (S, I), I))
  72
  73 This recursive type definition says that a term in CL is either one of
  74 the three simple expressions, or else a pair of CL expressions.
  75 Following Heim and Kratzer, `FA` stands for Functional Application.
  76 With this type definition, we can encode skomega, as well as other
  77 terms whose reduction behavior we want to try to control.
  78
  79 Using pattern matching, it is easy to code the one-step reduction
  80 rules for CL:
  81
  82     let reduce_one_step (t:term):term = match t with
  83         FA(I,a) -> a
  84       | FA(FA(K,a),b) -> a
  85       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
  86       | _ -> t
  87
  88     # reduce_one_step (FA(FA(K,S),I));;
  89     - : term = S
  90     # reduce_one_step skomega;;
  91     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
  92
  93 The type constructor `FA` obscures things a bit, but it's still
  94 possible to see how the one-step reduction function is just the
  95 reduction rules for CL.  The OCaml interpreter shows us that the
  96 function faithfully recognizes that `KSI ~~> S`, and `skomega ~~>
  97 I(SII)(I(SII))`.
  98
  99 We can now say precisely what it means to be a redex in CL.
 100
 101     let is_redex (t:term):bool = not (t = reduce_one_step t)
 102
 103     # is_redex K;;
 104     - : bool = false
 105     # is_redex (FA(K,I));;
 106     - : bool = false
 107     # is_redex (FA(FA(K,I),S));;
 108     - : bool = true
 109     # is_redex skomega;;
 110     - : book = true
 111
 112 Warning: this definition relies on the fact that the one-step
 113 reduction of a CL term is never identical to the original term.  This
 114 would not work for the untyped lambda calculus, since
 115 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
 116 order to decide whether two terms are equal, OCaml has to recursively
 117 compare the elements of complex CL terms.  It is able to figure out
 118 how to do this because we provided an explicit definition of the
 119 datatype Term.
 120
 121 As you would expect, a term in CL is in normal form when it contains
 122 no redexes.
 123
 124 In order to fully reduce a term, we need to be able to reduce redexes
 125 that are not at the top level of the term.
 126
 127     (II)K ~~> IK ~~> K
 128
 129 That is, we want to be able to first evaluate the redex `II` that is
 130 a proper subpart of the larger term, to produce a new intermediate term
 131 that we can then evaluate to the final normal form.
 132
 133 Because we need to process subparts, and because the result after
 134 processing a subpart may require further processing, the recursive
 135 structure of our evaluation function has to be somewhat subtle.  To
 136 truly understand, you will need to do some sophisticated thinking
 137 about how recursion works.  We'll show you how to keep track of what
 138 is going on by examining a recursive execution trace of inputs and
 139 outputs.
 140
 141 We'll develop our full reduction function in stages.  Once we have it
 142 working, we'll then consider some variants.  Just to be precise, we'll
 143 distinguish each microvariant with its own index number embedded in
 144 its name.
 145
 146     let rec reduce1 (t:term):term =
 147       if (is_redex t) then reduce1 (reduce_one_step t)
 148                       else t
 149
 150 If the input is a redex, we ship it off to `reduce_one_step` for
 151 processing.  But just in case the result of the one-step reduction is
 152 itself a redex, we recursively call `reduce1`.  The recursion will
 153 continue until the result is no longer a redex.
 154
 155     # #trace reduce1;;
 156     reduce1 is now traced.
 157     # reduce1 (FA (I, FA (I, K)));;
 158     reduce1 <-- FA (I, FA (I, K))
 159       reduce1 <-- FA (I, K)
 160         reduce1 <-- K
 161         reduce1 --> K
 162       reduce1 --> K
 163     reduce1 --> K
 164     - : term = K
 165
 166 Since the initial input (`I(IK)`) is a redex, the result after the
 167 one-step reduction is `IK`.  We recursively call `reduce1` on this
 168 input.  Since `IK` is itself a redex, the result after one-step
 169 reduction is `K`.  We recursively call `reduce1` on this input.  Since
 170 `K` is not a redex, the recursion bottoms out, and we return
 171 the result.
 172
 173 But this function doesn't do enough reduction.
 174
 175     # reduce1 (FA (FA (I, I), K));;
 176     - : term = FA (FA (I, I), K)
 177
 178 Because the top-level term is not a redex, `reduce1` returns it
 179 without any evaluation.  What we want is to evaluate the subparts of a
 180 complex term.  We'll do this by mapping the reduction function onto
 181 the parts of a complex term.
 182
 183     let rec reduce2 (t:term):term = match t with
 184         I -> I
 185       | K -> K
 186       | S -> S
 187       | FA (a, b) ->
 188           let t' = FA (reduce2 a, reduce2 b) in
 189             if (is_redex t') then reduce2 (reduce_one_step t')
 190                              else t'
 191
 192 Since what we need is access to the subterms, we do pattern matching
 193 on the input term.  If the input is simple (the first three `match`
 194 cases), we return it without further processing.  But if the input is
 195 complex, we first process the subexpressions, and only then see if we
 196 have a redex at the top level.  To understand how this works, follow
 197 the trace carefully:
 198
 199     # reduce2 (FA(FA(I,I),K));;
 200     reduce2 <-- FA (FA (I, I), K)
 201
 202       reduce2 <-- K          ; first main recursive call
 203       reduce2 --> K
 204
 205       reduce2 <-- FA (I, I)  ; second main recursive call
 206         reduce2 <-- I
 207         reduce2 --> I
 208         reduce2 <-- I
 209         reduce2 --> I
 210         reduce2 <-- I
 211         reduce2 --> I
 212       reduce2 --> I
 213
 214       reduce2 <-- K           ; third
 215       reduce2 --> K
 216     reduce2 --> K
 217     - : term = K
 218
 219 Ok, there's a lot going on here.  Since the input is complex, the
 220 first thing the function does is construct `t'`.  In order to do this,
 221 it must reduce the two main subexpressions, `II` and `K`.
 222
 223 There are three recursive calls to the reduce2 function, each of
 224 which gets triggered during the processing of this example.  They have
 225 been marked in the trace.
 226
 227 The don't quite go in the order in which they appear in the code,
 228 however!  We see from the trace that it begins with the right-hand
 229 expression, `K`.  We didn't explicitly tell it to begin with the
 230 right-hand subexpression, so control over evaluation order is starting
 231 to spin out of our control.  (We'll get it back later, don't worry.)
 232
 233 In any case, in the second main recursive call, we evaluate `II`.  The
 234 result is `I`.
 235
 236 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
 237 redex, we ship it off to reduce_one_step, getting the term `K` as a
 238 result.  The third recursive call checks that there is no more
 239 reduction work to be done (there isn't), and that's our final result.
 240
 241 You can see in more detail what is going on by tracing both reduce2
 242 and reduce_one_step, but that makes for some long traces.
 243
 244 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
 245 K`, as desired.
 246
 247 Because the OCaml interpreter evaluates the rightmost expression
 248 in the course of building `t'`, however, it will always evaluate the
 249 right hand subexpression, whether it needs to or not.  And sure
 250 enough,
 251
 252     # reduce2 (FA(FA(K,I),skomega));;
 253       C-c C-cInterrupted.
 254
 255 Running the evaluator with this input leads to an infinite loop, and
 256 the only way to get out is to kill the interpreter with control-c.
 257
 258 Instead of performing the leftmost reduction first, and recognizing
 259 that this term reduces to the normal form `I`, we get lost endlessly
 260 trying to reduce skomega.
 261
 262 ## Laziness is hard to overcome
 263
 264 To emphasize that our evaluation order here is at the mercy of the
 265 evaluation order of OCaml, here is the exact same program translated
 266 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 267
 268 <pre>
 269 OCaml                                                          Haskell
 270 ==========================================================     =========================================================
 271
 272 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
 273
 274 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
 275
 276                                                                reduce_one_step :: Term -> Term
 277 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
 278     FA(I,a) -> a                                                 FA I a -> a
 279   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
 280   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
 281   | _ -> t                                                       _ -> t
 282
 283                                                                is_redex :: Term -> Bool
 284 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
 285
 286                                                                reduce2 :: Term -> Term
 287 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
 288     I -> I                                                       I -> I
 289   | K -> K                                                       K -> K
 290   | S -> S                                                       S -> S
 291   | FA (a, b) ->                                                 FA a b ->
 292       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
 293         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
 294                          else t'                                                      else t'
 295 </pre>
 296
 297 There are some differences in the way types are made explicit, and in
 298 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
 299 Haskell).  But the two programs are essentially identical.
 300
 301 Yet the Haskell program finds the normal form for `KIΩ`:
 302
 303     *Main> reduce2 (FA (FA K I) skomega)
 304     I
 305
 306 Woa!  First of all, this is wierd.  Haskell's evaluation strategy is
 307 called "lazy".  Apparently, Haskell is so lazy that even after we've
 308 asked it to construct t' by evaluating `reduce2 a` and `reduce2 b`, it
 309 doesn't bother computing `reduce2 b`.  Instead, it waits to see if we
 310 ever really need to use the result.
 311
 312 So the program as written does NOT fully determine evaluation order
 313 behavior.  At this stage, we have defined an evaluation order that
 314 still depends on the evaluation order of the underlying interpreter.
 315
 316 There are two questions we could ask: Can we adjust the OCaml
 317 evaluator to exhibit lazy behavior?  and Can we adjust the Haskell
 318 evaluator to exhibit eager behavior?  The answer to the first question
 319 is easy and interesting, and we'll give it right away.  The answer to
 320 the second question is also interesting, but not easy.  There are
 321 various tricks available in Haskell we could use (such as the `seq`
 322 operator), but a fully general, satisifying resolution will have to
 323 wait until we have Continuation Passing Style transforms.
 324
 325 The answer to the first question (Can we adjust the OCaml evaluator to
 326 exhibit lazy behavior?) is quite simple:
 327
 328 <pre>
 329 let rec reduce3 (t:term):term = match t with
 330     I -> I
 331   | K -> K
 332   | S -> S
 333   | FA (a, b) ->
 334       let t' = FA (reduce3 a, b) in
 335         if (is_redex t') then reduce3 (reduce_one_step t')
 336                          else t'
 337 </pre>
 338
 339 There is only one small difference: instead of setting `t'` to `FA
 340 (reduce a, reduce b)`, we omit one of the recursive calls, and have
 341 `FA (reduce a, b)`.  That is, we don't evaluate the right-hand
 342 subexpression at all.  Ever!  The only way to get evaluated is to
 343 somehow get into functor position.
 344
 345     # reduce3 (FA(FA(K,I),skomega));;
 346     - : term = I
 347     # reduce3 skomega;;
 348     C-c C-cInterrupted.
 349
 350 The evaluator now has no trouble finding the normal form for `KIΩ`,
 351 but evaluating skomega still gives an infinite loop.
 352
 353 As a final note, we can clarify the larger question at the heart of
 354 this discussion:
 355
 356 *How can we can we specify the evaluation order of a computational
 357 system in a way that is completely insensitive to the evaluation order
 358 of the specification language?*
 359
 360 [By the way, the evaluators given here are absurdly inefficient computationally.
 361 Some computer scientists have trouble even looking at code this inefficient, but
 362 the emphasis here is on getting the concepts across as simply as possible.]