topics/week7_eval_cl.mdwn

   1 <!-- λ Λ ∀ ≡ α β ω Ω -->
   2
   3 [[!toc levels=2]]
   4
   5 # Reasoning about evaluation order in Combinatory Logic
   6
   7 We've discussed [[evaluation order|topics/week3_evaluation_order]]
   8 before, primarily in connection with the untyped lambda calculus.
   9 Whenever a term contains more than one redex, we have to choose which
  10 one to reduce, and this choice can make a difference.  For instance,
  11 recall that
  12
  13     Ω == ωω == (\x.xx)(\x.xx), so
  14
  15     ((\x.I)Ω) == ((\x.I)((\x.xx)(\x.xx)))
  16                    *      *
  17
  18 There are two redexes in this term; we've marked the operative lambdas
  19 with a star.  If we reduce the leftmost redex first, the term reduces
  20 to the normal form `I` in one step.  But if we reduce the rightmost
  21 redex instead, the "reduced" form is `(\x.I)Ω` again, and we are in
  22 danger of entering an infinite loop.
  23
  24 Thanks to the introduction of sum types (disjoint union) in the last lecture, we
  25 are now in a position to gain a deeper understanding of evaluation
  26 order by writing a program that allows us to reasoning explicitly about evaluation.
  27
  28 One thing we'll see is that it is all too easy for the evaluation
  29 order properties of an evaluator to depend on the evaluation order
  30 properties of the programming language in which the evaluator is
  31 written.  We would like to write an evaluator in which the order of
  32 evaluation is insensitive to the evaluator language.  That is, the goal is to
  33 find an order-insensitive way to reason about evaluation order.  We
  34 will not fully succeed in this first attempt, but we will make good
  35 progress.
  36
  37 The first evaluator we will develop will evaluate terms in Combinatory
  38 Logic.  This significantly simplifies the discussion, since we won't
  39 need to worry about variables or substitution.  As we develop and
  40 extend our evaluator in future weeks, we'll switch to lambdas, but for
  41 now, working with the simplicity of Combinatory Logic will make it
  42 easier to highlight evaluation order issues.
  43
  44 A brief review of Combinatory Logic: a term in CL is the combination
  45 of three basic expressions, `S`, `K`, and `I`, governed by the
  46 following reduction rules:
  47
  48     Ia ~~> a
  49     Kab ~~> a
  50     Sabc ~~> ac(bc)
  51
  52 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
  53 how to embed the untyped lambda calculus in CL, so it's no surprise
  54 that evaluation order issues arise in CL.  To illustrate, we'll use
  55 the following definition:
  56
  57     skomega = SII
  58     Skomega = skomega skomega == SII(SII)
  59             ~~> I(SII)(I(SII))
  60             ~~> SII(SII)
  61
  62 We'll use the same symbol, `Ω`, for Omega and Skomega: in a lambda
  63 term, `Ω` refers to Omega, but in a CL term, `Ω` refers to Skomega as
  64 defined here.
  65
  66 Just as in the corresponding term in the lambda calculus, CL terms can
  67 contain more than one redex:
  68
  69     KIΩ == KI(SII(SII))
  70            *  *
  71
  72 we can choose to reduce the leftmost redex by applying the reduction
  73 rule for `K`, in which case the term reduces to the normal form `I` in
  74 one step; or we can choose to reduce the Skomega part, by applying the
  75 reduction rule `S`, in which case we do not get a normal form, and
  76 we're headed towards an infinite loop.
  77
  78 With sum types, we can define CL terms in OCaml as follows:
  79
  80     type term = I | K | S | App of (term * term)
  81
  82     let skomega = App (App (App (S, I), I), App (App (S, I), I))
  83
  84 This type definition says that a term in CL is either one of the three
  85 simple expressions (`I`, `K`, or `S`), or else a pair of CL
  86 expressions.  `App` stands for Functional Application.  With this type
  87 definition, we can encode Skomega, as well as other terms whose
  88 reduction behavior we want to try to control.
  89
  90 Using pattern matching, it is easy to code the one-step reduction
  91 rules for CL:
  92
  93     let reduce_one_step (t:term):term = match t with
  94         App(I,a) -> a
  95       | App(App(K,a),b) -> a
  96       | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))
  97       | _ -> t
  98
  99     # reduce_one_step (App(App(K,S),I));;
 100     - : term = S
 101     # reduce_one_step skomega;;
 102     - : term = App (App (I, App (App (S, I), I)), App (I, App (App (S, I), I)))
 103
 104 The definition of `reduce_one_step` explicitly says that it expects
 105 its input argument `t` to have type `term`, and the second `:term`
 106 says that the type of the output the function delivers as a result will also be of
 107 type `term`.
 108
 109 The type constructor `App` obscures things a bit, but it's still
 110 possible to see how the one-step reduction function is just the
 111 reduction rules for CL.  The OCaml interpreter responses given above show us that the
 112 function faithfully recognizes that `KSI ~~> S`, and that `Skomega ~~>
 113 I(SII)(I(SII))`.
 114
 115 We can now say precisely what it means to be a redex in CL.
 116
 117     let is_redex (t:term):bool = not (t = reduce_one_step t)
 118
 119     # is_redex K;;
 120     - : bool = false
 121     # is_redex (App(K,I));;
 122     - : bool = false
 123     # is_redex (App(App(K,I),S));;
 124     - : bool = true
 125     # is_redex skomega;;
 126     - : book = true
 127
 128 Warning: this definition relies on the accidental fact that the
 129 one-step reduction of a CL term is never identical to the original
 130 term.  This would not work for the untyped lambda calculus, since
 131 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.
 132
 133 Note that in order to decide whether two terms are equal, OCaml has to
 134 recursively compare the elements of complex CL terms.  It is able to
 135 figure out how to do this because we provided an explicit definition
 136 of the datatype `term`.
 137
 138 As you would expect, a term in CL is in normal form when it contains
 139 no redexes (analogously for head normal form, weak head normal form, etc.)
 140
 141 In order to fully reduce a term, we need to be able to reduce redexes
 142 that are not at the top level of the term.
 143 Because we need to process subparts, and because the result after
 144 processing a subpart may require further processing, the recursive
 145 structure of our evaluation function has to be somewhat subtle.  To
 146 truly understand, we will need to do some sophisticated thinking
 147 about how recursion works.
 148
 149 We'll develop our full reduction function in two stages.  Once we have
 150 it working, we'll then consider a variant.
 151
 152     let rec reduce_stage1 (t:term):term =
 153       if (is_redex t) then reduce_stage1 (reduce_one_step t)
 154                       else t
 155
 156 If the input is a redex, we ship it off to `reduce_one_step` for
 157 processing.  But just in case the result of the one-step reduction is
 158 itself a redex, we recursively call `reduce_stage1`.  The recursion
 159 will continue until the result is no longer a redex.  We're aiming at
 160 allowing the evaluator to recognize that
 161
 162     I (I K) ~~> I K ~~> K
 163
 164 When trying to understand how recursive functions work, it can be
 165 extremely helpful to examine an execution trace of inputs and
 166 outputs.
 167
 168     # #trace reduce_stage1;;
 169     reduce_stage1 is now traced.
 170     # reduce_stage1 (App (I, App (I, K)));;
 171     reduce_stage1 <-- App (I, App (I, K))
 172       reduce_stage1 <-- App (I, K)
 173         reduce_stage1 <-- K
 174         reduce_stage1 --> K
 175       reduce_stage1 --> K
 176     reduce_stage1 --> K
 177     - : term = K
 178
 179 In the trace, "`<--`" shows the input argument to a call to
 180 `reduce_stage1`, and "`-->`" shows the output result.
 181
 182 Since the initial input (`I(IK)`) is a redex, the result after the
 183 one-step reduction is `IK`.  We recursively call `reduce_stage1` on
 184 this input.  Since `IK` is itself a redex, the result after one-step
 185 reduction is `K`.  We recursively call `reduce_stage1` on this input.  Since
 186 `K` is not a redex, the recursion bottoms out, and we return the
 187 result.
 188
 189 But this function doesn't do enough reduction.  We want to recognize
 190 the following reduction path:
 191
 192     I I K ~~> I K ~~> K
 193
 194 But the reduction function as written above does not deliver this result:
 195
 196     # reduce_stage1 (App (App (I, I), K));;
 197     - : term = App (App (I, I), K)
 198
 199 The reason is that the top-level term is not a redex to start with,
 200 so `reduce_stage1` returns it without any evaluation.
 201
 202 What we want is to
 203 evaluate the subparts of a complex term.  We'll do this by evaluating
 204 the subparts of the top-level expression.
 205
 206     let rec reduce (t:term):term = match t with
 207         I -> I
 208       | K -> K
 209       | S -> S
 210       | App (a, b) ->
 211           let t' = App (reduce a, reduce b) in
 212             if (is_redex t') then reduce 2 (reduce_one_step t')
 213                              else t'
 214
 215 Since we need access to the subterms, we do pattern matching on the
 216 input.  If the input is simple (the first three `match` cases), we
 217 return it without further processing.  But if the input is complex, we
 218 first process the subexpressions, and only then see if we have a redex
 219 at the top level.
 220
 221 To understand how this works, follow the trace
 222 carefully:
 223
 224     # reduce (App(App(I,I),K));;
 225     reduce <-- App (App (I, I), K)
 226
 227       reduce <-- K          ; first main recursive call
 228       reduce --> K
 229
 230       reduce <-- App (I, I)  ; second main recursive call
 231         reduce <-- I
 232         reduce --> I
 233         reduce <-- I
 234         reduce --> I
 235         reduce <-- I
 236         reduce --> I
 237       reduce --> I
 238
 239       reduce <-- K           ; third
 240       reduce --> K
 241     reduce --> K
 242     - : term = K
 243
 244 Ok, there's a lot going on here.  Since the input is complex, the
 245 first thing the function does is construct `t'`.  In order to do this,
 246 it must reduce the two main subexpressions, `II` and `K`.
 247
 248 There are three recursive calls to the `reduce` function, each of
 249 which gets triggered during the processing of this example.  They have
 250 been marked in the trace.
 251
 252 The don't quite go in the order in which they appear in the code,
 253 however!  We see from the trace that it begins with the right-hand
 254 expression, `K`.  We didn't explicitly tell it to begin with the
 255 right-hand subexpression, so control over evaluation order is starting
 256 to spin out of our control.  (We'll get it back later, don't worry.)
 257
 258 In any case, in the second main recursive call, we evaluate `II`.  The
 259 result is `I`.
 260
 261 At this point, we have constructed `t' == App(I,K)`.  Since that's a
 262 redex, we ship it off to reduce_one_step, getting the term `K` as a
 263 result.  The third recursive call checks that there is no more
 264 reduction work to be done (there isn't), and that's our final result.
 265
 266 You can see in more detail what is going on by tracing both reduce
 267 and reduce_one_step, but that makes for some long traces.
 268
 269 So we've solved our first problem: `reduce` now recognizes that `IIK ~~>
 270 K`, as desired.
 271
 272 Because the OCaml interpreter evaluates each subexpression in the
 273 course of building `t'`, however, it will always evaluate the right
 274 hand subexpression, whether it needs to or not.  And sure enough,
 275
 276     # reduce (App(App(K,I),skomega));;
 277       C-c C-cInterrupted.
 278
 279 Running the evaluator with this input leads to an infinite loop, and
 280 the only way to get out is to kill the interpreter with control-c.
 281
 282 Instead of performing the leftmost reduction first, and recognizing
 283 that this term reduces to the normal form `I`, we get lost endlessly
 284 trying to reduce Skomega.
 285
 286 ## Laziness is hard to overcome
 287
 288 To emphasize that our evaluation order here is at the mercy of the
 289 evaluation order of OCaml, here is the exact same program translated
 290 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 291
 292 <pre>
 293 OCaml                                                          Haskell
 294 ==========================================================     =========================================================
 295
 296 type term = I | S | K | App of (term * term)                   data Term = I | S | K | App Term Term deriving (Eq, Show)
 297
 298 let skomega = App (App (App (S,I), I), App (App (S,I), I))     skomega = (App (App (App S I) I) (App (App S I) I))
 299
 300                                                                reduce_one_step :: Term -> Term
 301 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
 302     App(I,a) -> a                                                App I a -> a
 303   | App(App(K,a),b) -> a                                         App (App K a) b -> a
 304   | App(App(App(S,a),b),c) -> App(App(a,c),App(b,c))             App (App (App S a) b) c -> App (App a c) (App b c)
 305   | _ -> t                                                       _ -> t
 306
 307                                                                is_redex :: Term -> Bool
 308 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
 309
 310                                                                reduce :: Term -> Term
 311 let rec reduce (t:term):term = match t with                    reduce t = case t of
 312     I -> I                                                       I -> I
 313   | K -> K                                                       K -> K
 314   | S -> S                                                       S -> S
 315   | App (a, b) ->                                                 App a b ->
 316       let t' = App (reduce a, reduce b) in                          let t' = App (reduce a) (reduce b) in
 317         if (is_redex t') then reduce (reduce_one_step t')             if (is_redex t') then reduce (reduce_one_step t')
 318                          else t'                                                       else t'
 319 </pre>
 320
 321 There are some differences in the way types are made explicit, and in
 322 the way terms are specified (`App(a,b)` for Ocaml versus `App a b` for
 323 Haskell).  But the two programs are essentially identical.
 324
 325 Yet the Haskell program finds the normal form for `KIΩ`:
 326
 327     *Main> reduce (App (App K I) skomega)
 328     I
 329
 330 Woa!  First of all, this is wierd.  Haskell's evaluation strategy is
 331 called "lazy".  Apparently, Haskell is so lazy that even after we've
 332 asked it to construct t' by evaluating `reduce a` and `reduce b`, it
 333 doesn't bother computing `reduce b`.  Instead, it waits to see if we
 334 ever really need to use the result.
 335
 336 So the program as written does NOT fully determine evaluation order
 337 behavior.  At this stage, we have defined an evaluation order that
 338 still depends on the evaluation order of the underlying interpreter.
 339
 340 There are two questions we could ask:
 341
 342 * Can we adjust the OCaml evaluator to exhibit lazy behavior?
 343
 344 * Can we adjust the Haskell evaluator to exhibit eager behavior?
 345
 346 The answer to the first question is easy and interesting, and we'll
 347 give it right away.  The answer to the second question is also
 348 interesting, but not easy.  There are various tricks available in
 349 Haskell we could use (such as the `seq` operator, or the `deepseq`
 350 operator), but a fully general, satisifying resolution will have to
 351 wait until we have Continuation Passing Style transforms.
 352
 353 The answer to the first question (Can we adjust the OCaml evaluator to
 354 exhibit lazy behavior?) is quite simple:
 355
 356 <pre>
 357 let rec reduce_lazy (t:term):term = match t with
 358     I -> I
 359   | K -> K
 360   | S -> S
 361   | App (a, b) ->
 362       let t' = App (reduce_lazy a, b) in
 363         if (is_redex t') then reduce_lazy (reduce_one_step t')
 364                          else t'
 365 </pre>
 366
 367 There is only one small difference: instead of setting `t'` to `App
 368 (reduce a, reduce b)`, we omit one of the recursive calls, and have
 369 `App (reduce a, b)`.  That is, we don't evaluate the right-hand
 370 subexpression at all.  Ever!  The only way to get evaluated is to
 371 somehow get into functor position.
 372
 373     # reduce3 (App(App(K,I),skomega));;
 374     - : term = I
 375     # reduce3 skomega;;
 376     C-c C-cInterrupted.
 377
 378 The evaluator now has no trouble finding the normal form for `KIΩ`,
 379 but evaluating skomega still gives an infinite loop.
 380
 381 We can now clarify the larger question at the heart of
 382 this discussion:
 383
 384 *How can we can we specify the evaluation order of a computational
 385 system in a way that is completely insensitive to the evaluation order
 386 of the specification language?*
 387
 388 As a final note, we should mention that the evaluators given here are
 389 absurdly inefficient computationally.  Some computer scientists have
 390 trouble even looking at code this inefficient, but the emphasis here
 391 is on getting the concepts across as simply as possible.