topics/week7_eval_cl.mdwn

   1 <!-- λ Λ ∀ ≡ α β ω Ω -->
   2
   3 [[!toc levels=2]]
   4
   5 # Reasoning about evaluation order in Combinatory Logic
   6
   7 We've discussed [[evaluation order|topics/week3_evaluation_order]]
   8 before, primarily in connection with the untyped lambda calculus.
   9 Whenever a term contains more than one redex, we have to choose which
  10 one to reduce, and this choice can make a difference.  For instance,
  11 recall that
  12
  13     Ω == ωω == (\x.xx)(\x.xx), so
  14
  15     ((\x.I)Ω) == ((\x.I)((\x.xx)(\x.xx)))
  16                    *      *
  17
  18 There are two redexes in this term; we've marked the operative lambda
  19 with a star.  If we reduce the leftmost redex first, the term reduces
  20 to the normal form `I` in one step.  But if we reduce the left most
  21 redex instead, the "reduced" form is `(\x.I)Ω` again, and we are in
  22 danger of entering an infinite loop.
  23
  24 Thanks to the recent introduction of sum types (disjoint union), we
  25 are now in a position to gain a deeper understanding of evaluation
  26 order by reasoning explicitly about evaluation by writing a program
  27 that evaluates terms.
  28
  29 One thing we'll see is that it is all too easy for the evaluation
  30 order properties of an evaluator to depend on the evaluation order
  31 properties of the programming language in which the evaluator is
  32 written.  We would like to write an evaluator in which the order of
  33 evaluation is insensitive to the evaluator language.  The goal is to
  34 find an order-insensitive way to reason about evaluation order.  We
  35 will not fully succeed in this first attempt, but we will make good
  36 progress.
  37
  38 The first evaluator we will develop will evaluate terms in Combinatory
  39 Logic.  This significantly simplifies the discussion, since we won't
  40 need to worry about variables or substitution.  As we develop and
  41 extend our evaluator in future weeks, we'll switch to lambdas, but for
  42 now, working with the simplicity of Combinatory Logic will make it
  43 easier to highlight evaluation order issues.
  44
  45 A brief review of Combinatory Logic: a term in CL is the combination
  46 of three basic expressions, `S`, `K`, and `I`, governed by the
  47 following reduction rules:
  48
  49     Ia ~~> a
  50     Kab ~~> a
  51     Sabc ~~> ac(bc)
  52
  53 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
  54 how to embed the untyped lambda calculus in CL, so it's no surprise
  55 that evaluation order issues arise in CL.  To illustrate, we'll use
  56 the following definition:
  57
  58     skomega = SII
  59     Skomega = skomega skomega == SII(SII)
  60             ~~> I(SII)(I(SII))
  61             ~~> SII(SII)
  62
  63 We'll use the same symbol, `Ω`, for Omega and Skomega: in a lambda
  64 term, `Ω` refers to Omega, but in a CL term, `Ω` refers to Skomega as
  65 defined here.
  66
  67 Just as in the corresponding term in the lambda calculus, CL terms can
  68 contain more than one redex:
  69
  70     KIΩ == KI(SII(SII))
  71            *  *
  72
  73 we can choose to reduce the leftmost redex by applying the reduction
  74 rule for `K`, in which case the term reduces to the normal form `I` in
  75 one step; or we can choose to reduce the Skomega part, by applying the
  76 reduction rule `S`, in which case we do not get a normal form, and
  77 we're headed towards an infinite loop.
  78
  79 With sum types, we can define CL terms in OCaml as follows:
  80
  81     type term = I | K | S | App of (term * term)
  82
  83     let skomega = App (App (App (S, I), I), App (App (S, I), I))
  84
  85 This type definition says that a term in CL is either one of the three
  86 simple expressions (`I`, `K`, or `S`), or else a pair of CL
  87 expressions.  `App` stands for Functional Application.  With this type
  88 definition, we can encode skomega, as well as other terms whose
  89 reduction behavior we want to try to control.
  90
  91 Using pattern matching, it is easy to code the one-step reduction
  92 rules for CL:
  93
  94     let reduce_one_step (t:term):term = match t with
  95         App(I,a) -> a
  96       | App(App(K,a),b) -> a
  97       | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))
  98       | _ -> t
  99
 100     # reduce_one_step (App(App(K,S),I));;
 101     - : term = S
 102     # reduce_one_step skomega;;
 103     - : term = App (App (I, App (App (S, I), I)), App (I, App (App (S, I), I)))
 104
 105 The definition of `reduce_one_step` explicitly says that it expects
 106 its input argument `t` to have type `term`, and the second `:term`
 107 says that the type of the output it delivers as a result will be of
 108 type `term`.
 109
 110 The type constructor `App` obscures things a bit, but it's still
 111 possible to see how the one-step reduction function is just the
 112 reduction rules for CL.  The OCaml interpreter shows us that the
 113 function faithfully recognizes that `KSI ~~> S`, and `skomega ~~>
 114 I(SII)(I(SII))`.
 115
 116 We can now say precisely what it means to be a redex in CL.
 117
 118     let is_redex (t:term):bool = not (t = reduce_one_step t)
 119
 120     # is_redex K;;
 121     - : bool = false
 122     # is_redex (App(K,I));;
 123     - : bool = false
 124     # is_redex (App(App(K,I),S));;
 125     - : bool = true
 126     # is_redex skomega;;
 127     - : book = true
 128
 129 Warning: this definition relies on the accidental fact that the
 130 one-step reduction of a CL term is never identical to the original
 131 term.  This would not work for the untyped lambda calculus, since
 132 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.
 133
 134 Note that in order to decide whether two terms are equal, OCaml has to
 135 recursively compare the elements of complex CL terms.  It is able to
 136 figure out how to do this because we provided an explicit definition
 137 of the datatype `term`.
 138
 139 As you would expect, a term in CL is in normal form when it contains
 140 no redexes (analogously for head normal form, weak head normal form, etc.)
 141
 142 In order to fully reduce a term, we need to be able to reduce redexes
 143 that are not at the top level of the term.
 144 Because we need to process subparts, and because the result after
 145 processing a subpart may require further processing, the recursive
 146 structure of our evaluation function has to be somewhat subtle.  To
 147 truly understand, you will need to do some sophisticated thinking
 148 about how recursion works.
 149
 150 We'll develop our full reduction function in two stages.  Once we have
 151 it working, we'll then consider a variant.
 152
 153     let rec reduce_stage1 (t:term):term =
 154       if (is_redex t) then reduce_stage1 (reduce_one_step t)
 155                       else t
 156
 157 If the input is a redex, we ship it off to `reduce_one_step` for
 158 processing.  But just in case the result of the one-step reduction is
 159 itself a redex, we recursively call `reduce_stage1`.  The recursion
 160 will continue until the result is no longer a redex.  We're aiming at
 161 allowing the evaluator to recognize that
 162
 163     I (I K) ~~> I K ~~> K
 164
 165 When trying to understand how recursive functions work, it can be
 166 extremely helpful to examining an execution trace of inputs and
 167 outputs.
 168
 169     # #trace reduce_stage1;;
 170     reduce_stage1 is now traced.
 171     # reduce_stage1 (App (I, App (I, K)));;
 172     reduce_stage1 <-- App (I, App (I, K))
 173       reduce_stage1 <-- App (I, K)
 174         reduce_stage1 <-- K
 175         reduce_stage1 --> K
 176       reduce_stage1 --> K
 177     reduce_stage1 --> K
 178     - : term = K
 179
 180 In the trace, "`<--`" shows the input argument to a call to
 181 `reduce_stage1`, and "`-->`" shows the output result.
 182
 183 Since the initial input (`I(IK)`) is a redex, the result after the
 184 one-step reduction is `IK`.  We recursively call `reduce_stage1` on
 185 this input.  Since `IK` is itself a redex, the result after one-step
 186 reduction is `K`.  We recursively call `reduce_stage1` on this input.  Since
 187 `K` is not a redex, the recursion bottoms out, and we return the
 188 result.
 189
 190 But this function doesn't do enough reduction.  We want to recognize
 191 the following reduction path:
 192
 193     I I K ~~> I K ~~> K
 194
 195 But the reduction function as written above does not deliver this result:
 196
 197     # reduce_stage1 (App (App (I, I), K));;
 198     - : term = App (App (I, I), K)
 199
 200 Because the top-level term is not a redex to start with,
 201 `reduce_stage1` returns it without any evaluation.  What we want is to
 202 evaluate the subparts of a complex term.  We'll do this by evaluating
 203 the subparts of the top-level expression.
 204
 205     let rec reduce (t:term):term = match t with
 206         I -> I
 207       | K -> K
 208       | S -> S
 209       | App (a, b) ->
 210           let t' = App (reduce a, reduce b) in
 211             if (is_redex t') then reduce 2 (reduce_one_step t')
 212                              else t'
 213
 214 Since we need access to the subterms, we do pattern matching on the
 215 input.  If the input is simple (the first three `match` cases), we
 216 return it without further processing.  But if the input is complex, we
 217 first process the subexpressions, and only then see if we have a redex
 218 at the top level.  To understand how this works, follow the trace
 219 carefully:
 220
 221     # reduce (App(App(I,I),K));;
 222     reduce <-- App (App (I, I), K)
 223
 224       reduce <-- K          ; first main recursive call
 225       reduce --> K
 226
 227       reduce <-- App (I, I)  ; second main recursive call
 228         reduce <-- I
 229         reduce --> I
 230         reduce <-- I
 231         reduce --> I
 232         reduce <-- I
 233         reduce --> I
 234       reduce --> I
 235
 236       reduce <-- K           ; third
 237       reduce --> K
 238     reduce --> K
 239     - : term = K
 240
 241 Ok, there's a lot going on here.  Since the input is complex, the
 242 first thing the function does is construct `t'`.  In order to do this,
 243 it must reduce the two main subexpressions, `II` and `K`.
 244
 245 There are three recursive calls to the `reduce` function, each of
 246 which gets triggered during the processing of this example.  They have
 247 been marked in the trace.
 248
 249 The don't quite go in the order in which they appear in the code,
 250 however!  We see from the trace that it begins with the right-hand
 251 expression, `K`.  We didn't explicitly tell it to begin with the
 252 right-hand subexpression, so control over evaluation order is starting
 253 to spin out of our control.  (We'll get it back later, don't worry.)
 254
 255 In any case, in the second main recursive call, we evaluate `II`.  The
 256 result is `I`.
 257
 258 At this point, we have constructed `t' == App(I,K)`.  Since that's a
 259 redex, we ship it off to reduce_one_step, getting the term `K` as a
 260 result.  The third recursive call checks that there is no more
 261 reduction work to be done (there isn't), and that's our final result.
 262
 263 You can see in more detail what is going on by tracing both reduce
 264 and reduce_one_step, but that makes for some long traces.
 265
 266 So we've solved our first problem: reduce recognizes that `IIK ~~>
 267 K`, as desired.
 268
 269 Because the OCaml interpreter evaluates each subexpression in the
 270 course of building `t'`, however, it will always evaluate the right
 271 hand subexpression, whether it needs to or not.  And sure enough,
 272
 273     # reduce (App(App(K,I),skomega));;
 274       C-c C-cInterrupted.
 275
 276 Running the evaluator with this input leads to an infinite loop, and
 277 the only way to get out is to kill the interpreter with control-c.
 278
 279 Instead of performing the leftmost reduction first, and recognizing
 280 that this term reduces to the normal form `I`, we get lost endlessly
 281 trying to reduce skomega.
 282
 283 ## Laziness is hard to overcome
 284
 285 To emphasize that our evaluation order here is at the mercy of the
 286 evaluation order of OCaml, here is the exact same program translated
 287 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 288
 289 <pre>
 290 OCaml                                                          Haskell
 291 ==========================================================     =========================================================
 292
 293 type term = I | S | K | App of (term * term)                   data Term = I | S | K | App Term Term deriving (Eq, Show)
 294
 295 let skomega = App (App (App (S,I), I), App (App (S,I), I))     skomega = (App (App (App S I) I) (App (App S I) I))
 296
 297                                                                reduce_one_step :: Term -> Term
 298 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
 299     App(I,a) -> a                                                App I a -> a
 300   | App(App(K,a),b) -> a                                         App (App K a) b -> a
 301   | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))             App (App (App S a) b) c -> App (App a c) (App b c)
 302   | _ -> t                                                       _ -> t
 303
 304                                                                is_redex :: Term -> Bool
 305 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
 306
 307                                                                reduce :: Term -> Term
 308 let rec reduce (t:term):term = match t with                    reduce t = case t of
 309     I -> I                                                       I -> I
 310   | K -> K                                                       K -> K
 311   | S -> S                                                       S -> S
 312   | App (a, b) ->                                                 App a b ->
 313       let t' = App (reduce a, reduce b) in                          let t' = App (reduce a) (reduce b) in
 314         if (is_redex t') then reduce (reduce_one_step t')             if (is_redex t') then reduce (reduce_one_step t')
 315                          else t'                                                       else t'
 316 </pre>
 317
 318 There are some differences in the way types are made explicit, and in
 319 the way terms are specified (`App(a,b)` for Ocaml versus `App a b` for
 320 Haskell).  But the two programs are essentially identical.
 321
 322 Yet the Haskell program finds the normal form for `KIΩ`:
 323
 324     *Main> reduce (App (App K I) skomega)
 325     I
 326
 327 Woa!  First of all, this is wierd.  Haskell's evaluation strategy is
 328 called "lazy".  Apparently, Haskell is so lazy that even after we've
 329 asked it to construct t' by evaluating `reduce a` and `reduce b`, it
 330 doesn't bother computing `reduce b`.  Instead, it waits to see if we
 331 ever really need to use the result.
 332
 333 So the program as written does NOT fully determine evaluation order
 334 behavior.  At this stage, we have defined an evaluation order that
 335 still depends on the evaluation order of the underlying interpreter.
 336
 337 There are two questions we could ask:
 338
 339 * Can we adjust the OCaml evaluator to exhibit lazy behavior?
 340
 341 * Can we adjust the Haskell evaluator to exhibit eager behavior?
 342
 343 The answer to the first question is easy and interesting, and we'll
 344 give it right away.  The answer to the second question is also
 345 interesting, but not easy.  There are various tricks available in
 346 Haskell we could use (such as the `seq` operator, or the `deepseq`
 347 operator), but a fully general, satisifying resolution will have to
 348 wait until we have Continuation Passing Style transforms.
 349
 350 The answer to the first question (Can we adjust the OCaml evaluator to
 351 exhibit lazy behavior?) is quite simple:
 352
 353 <pre>
 354 let rec reduce_lazy (t:term):term = match t with
 355     I -> I
 356   | K -> K
 357   | S -> S
 358   | App (a, b) ->
 359       let t' = App (reduce_lazy a, b) in
 360         if (is_redex t') then reduce_lazy (reduce_one_step t')
 361                          else t'
 362 </pre>
 363
 364 There is only one small difference: instead of setting `t'` to `App
 365 (reduce a, reduce b)`, we omit one of the recursive calls, and have
 366 `App (reduce a, b)`.  That is, we don't evaluate the right-hand
 367 subexpression at all.  Ever!  The only way to get evaluated is to
 368 somehow get into functor position.
 369
 370     # reduce3 (App(App(K,I),skomega));;
 371     - : term = I
 372     # reduce3 skomega;;
 373     C-c C-cInterrupted.
 374
 375 The evaluator now has no trouble finding the normal form for `KIΩ`,
 376 but evaluating skomega still gives an infinite loop.
 377
 378 We can now clarify the larger question at the heart of
 379 this discussion:
 380
 381 *How can we can we specify the evaluation order of a computational
 382 system in a way that is completely insensitive to the evaluation order
 383 of the specification language?*
 384
 385 As a final note, we should mention that the evaluators given here are
 386 absurdly inefficient computationally.  Some computer scientists have
 387 trouble even looking at code this inefficient, but the emphasis here
 388 is on getting the concepts across as simply as possible.