topics/week7_combinatory_evaluator.mdwn

   1 <!-- λ Λ ∀ ≡ α β ω Ω -->
   2
   3 [[!toc levels=2]]
   4
   5 ## Reasoning about evaluation order in Combinatory Logic
   6
   7 We've discussed [[evaluation order|topics/week3_evaluation_order]]
   8 before, primarily in connection with the untyped lambda calculus.
   9 Whenever a term contains more than one redex, we have to choose which
  10 one to reduce, and this choice can make a difference.  For instance,
  11 recall that:
  12
  13     Ω == ωω == (\x.xx)(\x.xx)
  14
  15 so:
  16
  17     ((\x.I)Ω) == ((\x.I)((\x.xx)(\x.xx)))
  18                    *      *
  19
  20 There are two redexes in this term; we've marked the operative lambdas
  21 with a star.  If we reduce the leftmost redex first, the term reduces
  22 to the normal form `I` in one step.  But if we reduce the rightmost
  23 redex instead, the "reduced" form is `(\x.I)Ω` again, and we're starting off on an infinite loop.
  24
  25 Thanks to the introduction of sum types (disjoint union) in the last lecture, we
  26 are now in a position to gain a deeper understanding of evaluation
  27 order by writing a program that allows us to reason explicitly about evaluation.
  28
  29 One thing we'll see is that it is all too easy for the evaluation
  30 order properties of an evaluator to depend on the evaluation order
  31 properties of the programming language in which the evaluator is
  32 written.  We would like to write an evaluator in which the order of
  33 evaluation is insensitive to the evaluator language.  That is, the goal is to
  34 find an order-insensitive way to reason about evaluation order.  We
  35 will not fully succeed in this first attempt, but we will make good
  36 progress.
  37
  38 The first evaluator we will develop will evaluate terms in Combinatory
  39 Logic.  This significantly simplifies the discussion, since we won't
  40 need to worry about variables or substitution.  As we develop and
  41 extend our evaluator in homework and other lectures, we'll switch to lambdas, but for
  42 now, working with the simplicity of Combinatory Logic will make it
  43 easier to highlight evaluation order issues.
  44
  45 A brief review of Combinatory Logic: a term in CL is the combination
  46 of three basic expressions, `S`, `K`, and `I`, governed by the
  47 following reduction rules:
  48
  49     Ia ~~> a
  50     Kab ~~> a
  51     Sabc ~~> ac(bc)
  52
  53 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
  54 how to embed the untyped lambda calculus in CL, so it's no surprise
  55 that evaluation order issues arise in CL.  To illustrate, we'll use
  56 the following definition:
  57
  58     skomega = SII
  59     Skomega = skomega skomega == SII(SII)
  60             ~~> I(SII)(I(SII))
  61             ~~> SII(SII)
  62
  63 We'll use the same symbol, `Ω`, for Omega and Skomega: in a lambda
  64 term, `Ω` refers to Omega, but in a CL term, `Ω` refers to Skomega as
  65 defined here.
  66
  67 Just as in the corresponding term in the lambda calculus, CL terms can
  68 contain more than one redex:
  69
  70     KIΩ == KI(SII(SII))
  71            *  *
  72
  73 We can choose to reduce the leftmost redex by applying the reduction
  74 rule for `K`, in which case the term reduces to the normal form `I` in
  75 one step; or we can choose to reduce the Skomega part, by applying the
  76 reduction rule `S`, in which case we do not get a normal form, and
  77 we're headed into an infinite loop.
  78
  79 With sum types, we can define CL terms in OCaml as follows:
  80
  81     type term = I | K | S | App of (term * term)
  82
  83     let skomega = App (App (App (S, I), I), App (App (S, I), I))
  84
  85 This type definition says that a term in CL is either one of the three
  86 atomic expressions (`I`, `K`, or `S`), or else a pair of CL
  87 expressions.  `App` stands for Functional Application.  With this type
  88 definition, we can encode Skomega, as well as other terms whose
  89 reduction behavior we want to try to control. We can *control* it because
  90 the `App` variant of our datatype merely *encodes* the application of the
  91 head to the argument, and doesn't actually *perform* that application.
  92 We have to explicitly model the application ourself.
  93
  94 Using pattern matching, it is easy to code the one-step reduction
  95 rules for CL:
  96
  97     let reduce_if_redex (t:term) : term = match t with
  98       | App(I,a) -> a
  99       | App(App(K,a),b) -> a
 100       | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))
 101       | _ -> t
 102
 103     # reduce_if_redex (App(App(K,S),I));;
 104     - : term = S
 105     # reduce_if_redex skomega;;
 106     - : term = App (App (I, App (App (S, I), I)), App (I, App (App (S, I), I)))
 107
 108 The definition of `reduce_if_redex` explicitly says that it expects
 109 its input argument `t` to have type `term`, and the second `: term`
 110 says that the result the function delivers will also be of
 111 type `term`.
 112
 113 The type constructor `App` obscures things a bit, but it's still
 114 possible to see how the one-step reduction function is just the
 115 reduction rules for CL.  The OCaml interpreter responses given above show us that the
 116 function faithfully recognizes that `KSI ~~> S`, and that `Skomega ~~>
 117 I(SII)(I(SII))`.
 118
 119 As you would expect, a term in CL is in normal form when it contains
 120 no redexes (analogously for head normal form, weak head normal form, etc.)
 121
 122 How can we tell whether a term is a redex? Here's one way:
 123
 124     let is_redex (t:term):bool = not (t = reduce_if_redex t)
 125
 126     # is_redex K;;
 127     - : bool = false
 128     # is_redex (App(K,I));;
 129     - : bool = false
 130     # is_redex (App(App(K,I),S));;
 131     - : bool = true
 132     # is_redex skomega;;
 133     - : book = true
 134
 135 In order to decide whether two terms are equal, OCaml has to
 136 recursively compare the elements of complex CL terms.  It is able to
 137 figure out how to do this because we provided an explicit definition
 138 of the datatype `term`.
 139
 140 Warning: this method for telling whether a term is a redex relies on the accidental fact that the
 141 one-step reduction of a CL term is never identical to the original
 142 term.  This would not work for the untyped lambda calculus, since
 143 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step. Neither would it work if
 144 we had chosen some other combinators as primitives (`W W1 W2` reduces to
 145 `W1 W2 W2`, so if they are all `W`s we'd be in trouble.) We will discuss
 146 some alternative strategies in other notes.
 147
 148 So far, we've only asked whether a term _is_ a redex, not whether it _contains_ other redexes as subterms. But
 149 in order to fully reduce a term, we need to be able to reduce redexes
 150 that are not at the top level of the term. Because we need to process subterms, and because the result after
 151 processing a subterm may require further processing, the recursive
 152 structure of our evaluation function has to be somewhat subtle.  To
 153 truly understand, we will need to do some sophisticated thinking
 154 about how recursion works.
 155
 156 We'll develop our full reduction function in two stages.  Once we have
 157 it working, we'll then consider a variant.
 158
 159     let rec reduce_try1 (t:term) : term =
 160       if (is_redex t) then let t' = reduce_if_redex t
 161                            in reduce_try1 t'
 162                       else t
 163
 164 If the input is a redex, we ship it off to `reduce_if_redex` for
 165 processing.  But just in case the result of the one-step reduction is
 166 itself a redex, we recursively apply `reduce_try1` to the result.  The recursion
 167 will continue until the result is no longer a redex.  We're aiming at
 168 allowing the evaluator to recognize that
 169
 170     I (I K) ~~> I K ~~> K
 171
 172 When trying to understand how recursive functions work, it can be
 173 extremely helpful to examine an execution trace of inputs and
 174 outputs.
 175
 176     # #trace reduce_try1;;
 177     reduce_try1 is now traced.
 178
 179 The first `#` there is OCaml's prompt. The text beginning `#trace ...` is what we typed. Now OCaml will report on all the input to, and results from, the `reduce_try1` function. Watch:
 180
 181     # reduce_try1 (App (I, App (I, K)));;
 182     reduce_try1 <-- App (I, App (I, K))
 183       reduce_try1 <-- App (I, K)
 184         reduce_try1 <-- K
 185         reduce_try1 --> K
 186       reduce_try1 --> K
 187     reduce_try1 --> K
 188     - : term = K
 189
 190 In the trace, "`<--`" shows the input argument to a call to
 191 `reduce_try1`, and "`-->`" shows the output result.
 192
 193 Since the initial input (`I(IK)`) is a redex, the result after the
 194 one-step reduction is `IK`.  We recursively call `reduce_try1` on
 195 this input.  Since `IK` is itself a redex, the result after one-step
 196 reduction is `K`.  We recursively call `reduce_try1` on this input.  Since
 197 `K` is not a redex, the recursion bottoms out, and we return the
 198 result.
 199
 200 But this function doesn't do enough reduction.  We want to recognize
 201 the following reduction path:
 202
 203     I I K ~~> I K ~~> K
 204
 205 But the reduction function as written above does not deliver this result:
 206
 207     # reduce_try1 (App (App (I, I), K));;
 208     - : term = App (App (I, I), K)
 209
 210 The reason is that the top-level term is not a redex to start with,
 211 so `reduce_try1` returns it without any evaluation.
 212
 213 What we want is to evaluate the subterms of a complex term.  We'll do this by pattern matching our
 214 top-level term to see when it _has_ subterms:
 215
 216     let rec reduce_try2 (t : term) : term = match t with
 217       | I -> I
 218       | K -> K
 219       | S -> S
 220       | App (a, b) ->
 221           let t' = App (reduce_try2 a, reduce_try2 b) in
 222           if (is_redex t') then let t'' = reduce_if_redex t'
 223                                 in reduce_try2 t''
 224                            else t'
 225
 226 Since we need access to the subterms, we do pattern matching on the
 227 input.  If the input is simple (the first three `match` cases), we
 228 return it without further processing.  But if the input is complex, we
 229 first process the subexpressions, and only then see if we have a redex
 230 at the top level.
 231
 232 To understand how this works, follow the trace
 233 carefully:
 234
 235     # reduce_try2 (App(App(I,I),K));;
 236     reduce_try2 <-- App (App (I, I), K)
 237
 238       reduce_try2 <-- K          ; first main recursive call
 239       reduce_try2 --> K
 240
 241       reduce_try2 <-- App (I, I)  ; second main recursive call
 242         reduce_try2 <-- I
 243         reduce_try2 --> I
 244         reduce_try2 <-- I
 245         reduce_try2 --> I
 246         reduce_try2 <-- I
 247         reduce_try2 --> I
 248       reduce_try2 --> I
 249
 250       reduce_try2 <-- K           ; third
 251       reduce_try2 --> K
 252     reduce_try2 --> K
 253     - : term = K
 254
 255 Ok, there's a lot going on here.  Since the input is complex, the
 256 first thing the function does is construct `t'`.  In order to do this,
 257 it must reduce the two main subexpressions, `II` and `K`.
 258
 259 There are three recursive calls to the `reduce` function, each of
 260 which gets triggered during the processing of this example.  They have
 261 been marked in the trace.
 262
 263 The don't quite go in the order in which they appear in the code,
 264 however!  We see from the trace that it begins with the right-hand
 265 expression, `K`.  We didn't explicitly tell it to begin with the
 266 right-hand subexpression, so control over evaluation order is starting
 267 to spin out of our control.  (We'll get it back later, don't worry.)
 268
 269 In any case, in the second main recursive call, we evaluate `II`.  The
 270 result is `I`.
 271
 272 At this point, we have constructed `t' == App(I,K)`.  Since that's a
 273 redex, we ship it off to reduce_if_redex, getting the term `K` as a
 274 result.  The third recursive call checks that there is no more
 275 reduction work to be done (there isn't), and that's our final result.
 276
 277 You can see in more detail what is going on by tracing both reduce
 278 and reduce_if_redex, but that makes for some long traces.
 279
 280 So we've solved our first problem: `reduce` now recognizes that `IIK ~~>
 281 K`, as desired.
 282
 283 Because the OCaml interpreter evaluates each subexpression in the
 284 course of building `t'`, however, it will always evaluate the right
 285 hand subexpression, whether it needs to or not.  And sure enough,
 286
 287     # reduce_try2 (App(App(K,I),skomega));;
 288       C-c C-cInterrupted.
 289
 290 Running the evaluator with this input leads to an infinite loop, and
 291 the only way to get out is to kill the interpreter with control-c.
 292
 293 Instead of performing the leftmost reduction first, and recognizing
 294 that this term reduces to the normal form `I`, we get lost endlessly
 295 trying to reduce Skomega.
 296
 297 ## Laziness is hard to overcome
 298
 299 To emphasize that our evaluation order here is at the mercy of the
 300 evaluation order of OCaml, here is the exact same program translated
 301 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 302
 303 <pre>
 304 OCaml                                                          Haskell
 305 ==========================================================     =========================================================
 306
 307 type term = I | S | K | App of (term * term)                   data Term = I | S | K | App Term Term deriving (Eq, Show)
 308
 309 let skomega = App (App (App (S,I), I), App (App (S,I), I))     skomega = (App (App (App S I) I) (App (App S I) I))
 310
 311                                                                reduce_if_redex :: Term -> Term
 312 let reduce_if_redex (t:term):term = match t with               reduce_if_redex t = case t of
 313   | App(I,a) -> a                                                App I a -> a
 314   | App(App(K,a),b) -> a                                         App (App K a) b -> a
 315   | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))             App (App (App S a) b) c -> App (App a c) (App b c)
 316   | _ -> t                                                       _ -> t
 317
 318                                                                is_redex :: Term -> Bool
 319 let is_redex (t:term):bool = not (t = reduce_if_redex t)       is_redex t = not (t == reduce_if_redex t)
 320
 321                                                                reduce_try2 :: Term -> Term
 322 let rec reduce_try2 (t : term) : term = match t with           reduce_try2 t = case t of
 323   | I -> I                                                       I -> I
 324   | K -> K                                                       K -> K
 325   | S -> S                                                       S -> S
 326   | App (a, b) ->                                                App a b ->
 327       let t' = App (reduce_try2 a, reduce_try2 b) in                 let t' = App (reduce_try2 a) (reduce_try2 b) in
 328       if (is_redex t') then let t'' = reduce_if_redex t'             if (is_redex t') then reduce_try2 (reduce_if_redex t')
 329                             in reduce_try2 t''                                        else t'
 330                        else t'
 331 </pre>
 332
 333 There are some differences in the way types are made explicit, and in
 334 the way terms are specified (`App(a,b)` for Ocaml versus `App a b` for
 335 Haskell).  But the two programs are essentially identical.
 336
 337 Yet the Haskell program finds the normal form for `KIΩ`:
 338
 339     *Main> reduce_try2 (App (App K I) skomega)
 340     I
 341
 342 Woa!  First of all, this is wierd.  Haskell's evaluation strategy is
 343 called "lazy".  Apparently, Haskell is so lazy that even after we've
 344 asked it to construct `t'` by evaluating `reduce_try2 a` and `reduce_try2 b`, it
 345 doesn't bother computing `reduce_try2 b`.  Instead, it waits to see if we
 346 ever really need to use the result.
 347
 348 So the program as written does _not_ fully determine evaluation order
 349 behavior.  At this stage, we have defined an evaluation order that
 350 still depends on the evaluation order of the underlying interpreter.
 351
 352 There are two questions we could ask:
 353
 354 * Can we adjust the OCaml evaluator to exhibit lazy behavior?
 355
 356 * Can we adjust the Haskell evaluator to exhibit eager behavior?
 357
 358 The answer to the first question is easy and interesting, and we'll
 359 give it right away.  The answer to the second question is also
 360 interesting, but not easy.  There are various tricks available in
 361 Haskell we could use (such as the `seq` operator, or the `deepseq`
 362 operator), but a fully general, satisifying resolution will have to
 363 wait until we have Continuation Passing Style transforms.
 364
 365 The answer to the first question (Can we adjust the OCaml evaluator to
 366 exhibit lazy behavior?) is quite simple:
 367
 368     let rec reduce_try3 (t : term) : term = match t with
 369       | I -> I
 370       | K -> K
 371       | S -> S
 372       | App (a, b) ->
 373           let t' = App (reduce_try3 a, b) in
 374           if (is_redex t') then let t'' = reduce_if_redex t'
 375                                 in reduce_try3 t''
 376                            else t'
 377
 378 There is only one small difference from `reduce_try2`: instead of setting `t'` to `App
 379 (reduce_try3 a, reduce_try3 b)`, we omit one of the recursive calls, and have
 380 `App (reduce_try3 a, b)`.  That is, we don't evaluate the right-hand
 381 subexpression at all.  Ever!  The only way to get evaluated is to
 382 somehow get into functor position.
 383
 384     # reduce_try3 (App(App(K,I),skomega));;
 385     - : term = I
 386     # reduce_try3 skomega;;
 387     C-c C-cInterrupted.
 388
 389 The evaluator now has no trouble finding the normal form for `KIΩ`,
 390 but evaluating skomega still gives an infinite loop.
 391
 392 We can now clarify the larger question at the heart of
 393 this discussion:
 394
 395 *How can we can we specify the evaluation order of a computational
 396 system in a way that is completely insensitive to the evaluation order
 397 of the specification language?*
 398
 399 As a final note, we should mention that the evaluators given here are
 400 absurdly inefficient computationally.  Some computer scientists have
 401 trouble even looking at code this inefficient, but the emphasis here
 402 is on getting the concepts across as simply as possible.