topics/_week7_eval_cl.mdwn

   1 <!-- λ Λ ∀ ≡ α β ω Ω -->
   2
   3 [[!toc levels=2]]
   4
   5 # Reasoning about evaluation order in Combinatory Logic
   6
   7 We've discussed evaluation order before, primarily in connection with
   8 the untyped lambda calculus.  Whenever a term has more than one redex,
   9 we have to choose which one to reduce, and this choice can make a
  10 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
  11 leftmost redex first, the term reduces to the normal form `I` in one
  12 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
  13 we do not arrive at a normal form, and are in danger of entering an
  14 infinite loop.
  15
  16 Thanks to the introduction of sum types (disjoint union), we are now
  17 in a position to gain a deeper understanding by writing a program that
  18 allows us to experiment with different evaluation order strategies.
  19
  20 One thing we'll see is that it is all too easy for the evaluation
  21 order properties of an evaluator to depend on the evaluation order
  22 properties of the programming language in which the evaluator is
  23 written.  We will seek to write an evaluator in which the order of
  24 evaluation is insensitive to the evaluator language.  The goal is to
  25 find an order-insensitive way to reason about evaluation order.
  26
  27 The first evaluator we develop will evaluate terms in Combinatory
  28 Logic.  That will significantly simplify the program, since we won't
  29 need to worry about variables or substitution.  As we develop and
  30 extend our evaluator in future weeks, we'll switch to lambdas, but for
  31 now, working with the elegant simplicity of Combinatory Logic will
  32 make the evaluation order issues easier to grasp.
  33
  34 A brief review: a term in CL is the combination of three basic
  35 expressions, `S`, `K`, and `I`, governed by the following reduction
  36 rules:
  37
  38     Ia ~~> a
  39     Kab ~~> b
  40     Sabc ~~> ac(bc)
  41
  42 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
  43 that how to embed the untyped lambda calculus in CL, so it's no
  44 surprise that the same evaluation order issues arise in CL.
  45
  46     skomega = SII(SII)
  47             ~~> I(SII)(I(SII))
  48             ~~> SII(SII)
  49
  50 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
  51 here, though we'll use the same symbol, `Ω`.  If we consider the term
  52
  53     KIΩ == KI(SII(SII))
  54
  55 we can choose to reduce the leftmost redex by firing the reduction
  56 rule for `K`, in which case the term reduces to the normal form `I` in
  57 one step; or we can choose to reduce the skomega part, by firing the
  58 reduction rule fo `S`, in which case we do not get a normal form,
  59 we're headed towards an infinite loop.
  60
  61 With sum types, we can define terms in CL in OCaml as follows:
  62
  63     type term = I | S | K | FA of (term * term)
  64
  65     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
  66     let test1 = FA (FA (K,I), skomega)
  67
  68 This recursive type definition says that a term in CL is either one of
  69 the three simple expressions, or else a pair of CL expressions.
  70 Following Heim and Kratzer, `FA` stands for Functional Application.
  71 With this type definition, we can encode skomega, as well as other
  72 terms whose reduction behavior we want to control.
  73
  74 Using pattern matching, it is easy to code the one-step reduction
  75 rules for CL:
  76
  77     let reduce_one_step (t:term):term = match t with
  78         FA(I,a) -> a
  79       | FA(FA(K,a),b) -> a
  80       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
  81       | _ -> t
  82
  83     # reduce_one_step (FA(FA(K,S),I));;
  84     - : term = S
  85     # reduce_one_step skomega;;
  86     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
  87
  88 The need to explicitly insert the type constructor `FA` obscures
  89 things a bit, but it's still possible to see how the one-step
  90 reduction function is just the reduction rules for CL.  The
  91 OCaml interpreter shows us that the function faithfully recognizes
  92 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
  93
  94 We can now say precisely what it means to be a redex in CL.
  95
  96     let is_redex (t:term):bool = not (t = reduce_one_step t)
  97
  98     # is_redex K;;
  99     - : bool = false
 100     # is_redex (FA(K,I));;
 101     - : bool = false
 102     # is_redex (FA(FA(K,I),S));;
 103     - : bool = true
 104     # is_redex skomega;;
 105     - : book = true
 106
 107 Warning: this definition relies on the fact that the one-step
 108 reduction of a CL term is never identical to the original term.  This
 109 would not work for the untyped lambda calculus, since
 110 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
 111 order to decide whether two terms are equal, OCaml has to recursively
 112 compare the elements of complex CL terms.  It is able to figure out
 113 how to do this because we provided an explicit definition of the
 114 datatype Term.
 115
 116 As you would expect, a term in CL is in normal form when it contains
 117 no redexes.
 118
 119 In order to fully reduce a term, we need to be able to reduce redexes
 120 that are not at the top level of the term.
 121
 122     (KKI)SI ~~> KSI ~~> S
 123
 124 That is, we want to be able to first evaluate the redex `KKI` that is
 125 a proper subpart of the larger term to produce a new intermediate term
 126 that we can then evaluate to the final normal form.
 127
 128 Because we need to process subparts, and because the result after
 129 processing a subpart may require further processing, the recursive
 130 structure of our evaluation function has to be quite subtle.  To truly
 131 understand it, you will need to do some sophisticated thinking about
 132 how recursion works.  We'll show you how to keep track of what is
 133 going on by constructing an recursive execution trace of inputs and
 134 outputs.
 135
 136 We'll develop our full reduction function in stages.  Once we have it
 137 working, we'll then consider some variants.  Just to be precise, we'll
 138 distinguish each microvariant with its own index number embedded in
 139 its name.
 140
 141     let rec reduce1 (t:term):term =
 142       if (is_redex t) then reduce1 (reduce_one_step t)
 143                       else t
 144
 145 If the input is a redex, we ship it off to `reduce_one_step` for
 146 processing.  But just in case the result of the one-step reduction is
 147 itself a redex, we recursively call `reduce1`.  The recursion will
 148 continue until the result is no longer a redex.
 149
 150     #trace reduce1;;
 151     reduce1 is now traced.
 152     # reduce1 (FA (I, FA (I, K)));;
 153     reduce1 <-- FA (I, FA (I, K))
 154       reduce1 <-- FA (I, K)
 155         reduce1 <-- K
 156         reduce1 --> K
 157       reduce1 --> K
 158     reduce1 --> K
 159     - : term = K
 160
 161 Since the initial input (`I(IK)`) is a redex, the result after the
 162 one-step reduction is `IK`.  We recursively call `reduce1` on this
 163 input.  Since `IK` is itself a redex, the result after one-step
 164 reduction is `K`.  We recursively call `reduce1` on this input.  Since
 165 `K` is not a redex, the recursion bottoms out, and we simply return
 166 it.
 167
 168 But this function doesn't do enough reduction.
 169
 170     # reduce1 (FA (FA (I, I), K));;
 171     - : term = FA (FA (I, I), K)
 172
 173 Because the top-level term is not a redex, `reduce1` returns it
 174 without any evaluation.  What we want is to evaluate the subparts of a
 175 complex term.
 176
 177     let rec reduce2 (t:term):term = match t with
 178         I -> I
 179       | K -> K
 180       | S -> S
 181       | FA (a, b) ->
 182           let t' = FA (reduce2 a, reduce2 b) in
 183             if (is_redex t') then reduce2 (reduce_one_step t')
 184                              else t'
 185
 186 Since we need access to the subterms, we do pattern matching on the
 187 input term.  If the input is simple, we return it.  If the input is
 188 complex, we first process the subexpressions, and only then see if we
 189 have a redex.  To understand how this works, follow the trace
 190 carefully:
 191
 192     # reduce2 (FA(FA(I,I),K));;
 193     reduce2 <-- FA (FA (I, I), K)
 194
 195       reduce2 <-- K          ; first main recursive call
 196       reduce2 --> K
 197
 198       reduce2 <-- FA (I, I)  ; second main recursive call
 199         reduce2 <-- I
 200         reduce2 --> I
 201         reduce2 <-- I
 202         reduce2 --> I
 203       reduce2 <-- I
 204
 205       reduce2 --> I           ; third main recursive call
 206       reduce2 --> I
 207
 208       reduce2 <-- K           ; fourth
 209       reduce2 --> K
 210     reduce2 --> K
 211     - : term = K
 212
 213 Ok, there's a lot going on here.  Since the input is complex, the
 214 first thing the function does is construct `t'`.  In order to do this,
 215 it must reduce the two main subexpressions, `II` and `K`.  But we see
 216 from the trace that it begins with the right-hand expression, `K`.  We
 217 didn't explicitly tell it to begin with the right-hand subexpression,
 218 so control over evaluation order is starting to spin out of our
 219 control.  (We'll get it back later, don't worry.)
 220
 221 In any case, in the second main recursive call, we evaluate `II`.  The
 222 result is `I`.  The third main recursive call tests whether this
 223 result needs any further processing; it doesn't.
 224
 225 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
 226 redex, we ship it off to reduce_one_step, getting the term `K` as a
 227 result.  The fourth recursive call checks that there is no more
 228 reduction work to be done (there isn't), and that's our final result.
 229
 230 You can see in more detail what is going on by tracing both reduce2
 231 and reduce_one_step, but that makes for some long traces.
 232
 233 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
 234 K`, as desired.
 235
 236 Because the OCaml interpreter evaluates the rightmost expression
 237 in the course of building `t'`, however, it will always evaluate the
 238 right hand subexpression, whether it needs to or not.  And sure
 239 enough,
 240
 241     # reduce2 (FA(FA(K,I),skomega));;
 242       C-c C-cInterrupted.
 243
 244 instead of performing the leftmost reduction first, and recognizing
 245 that this term reduces to the normal form `I`, we get lost endlessly
 246 trying to reduce skomega.
 247
 248 To emphasize that our evaluation order here is at the mercy of the
 249 evaluation order of OCaml, here is the exact same program translated
 250 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 251
 252 <pre>
 253 OCaml                                                          Haskell
 254 ==========================================================     =========================================================
 255
 256 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
 257
 258 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
 259
 260                                                                reduce_one_step :: Term -> Term
 261 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
 262     FA(I,a) -> a                                                 FA I a -> a
 263   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
 264   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
 265   | _ -> t                                                       _ -> t
 266
 267                                                                is_redex :: Term -> Bool
 268 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
 269
 270                                                                reduce2 :: Term -> Term
 271 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
 272     I -> I                                                       I -> I
 273   | K -> K                                                       K -> K
 274   | S -> S                                                       S -> S
 275   | FA (a, b) ->                                                 FA a b ->
 276       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
 277         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
 278                          else t'                                                      else t'
 279 </pre>
 280
 281 There are some differences in the way types are made explicit, and in
 282 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
 283 Haskell).  But the two programs are essentially identical.
 284
 285 Yet the Haskell program finds the normal form for KI -->
 286
 287 [[!toc levels=2]]
 288
 289 # Reasoning about evaluation order in Combinatory Logic
 290
 291 We've discussed evaluation order before, primarily in connection with
 292 the untyped lambda calculus.  Whenever a term has more than one redex,
 293 we have to choose which one to reduce, and this choice can make a
 294 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
 295 leftmost redex first, the term reduces to the normal form `I` in one
 296 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
 297 we do not arrive at a normal form, and are in danger of entering an
 298 infinite loop.
 299
 300 Thanks to the introduction of sum types (disjoint union), we are now
 301 in a position to gain a deeper understanding by writing a program that
 302 allows us to experiment with different evaluation order strategies.
 303
 304 One thing we'll see is that it is all too easy for the evaluation
 305 order properties of an evaluator to depend on the evaluation order
 306 properties of the programming language in which the evaluator is
 307 written.  We will seek to write an evaluator in which the order of
 308 evaluation is insensitive to the evaluator language.  The goal is to
 309 find an order-insensitive way to reason about evaluation order.
 310
 311 The first evaluator we develop will evaluate terms in Combinatory
 312 Logic.  That will significantly simplify the program, since we won't
 313 need to worry about variables or substitution.  As we develop and
 314 extend our evaluator in future weeks, we'll switch to lambdas, but for
 315 now, working with the elegant simplicity of Combinatory Logic will
 316 make the evaluation order issues easier to grasp.
 317
 318 A brief review: a term in CL is the combination of three basic
 319 expressions, `S`, `K`, and `I`, governed by the following reduction
 320 rules:
 321
 322     Ia ~~> a
 323     Kab ~~> b
 324     Sabc ~~> ac(bc)
 325
 326 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
 327 that how to embed the untyped lambda calculus in CL, so it's no
 328 surprise that the same evaluation order issues arise in CL.
 329
 330     skomega = SII(SII)
 331             ~~> I(SII)(I(SII))
 332             ~~> SII(SII)
 333
 334 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
 335 here, though we'll use the same symbol, `Ω`.  If we consider the term
 336
 337     KIΩ == KI(SII(SII))
 338
 339 we can choose to reduce the leftmost redex by firing the reduction
 340 rule for `K`, in which case the term reduces to the normal form `I` in
 341 one step; or we can choose to reduce the skomega part, by firing the
 342 reduction rule fo `S`, in which case we do not get a normal form,
 343 we're headed towards an infinite loop.
 344
 345 With sum types, we can define terms in CL in OCaml as follows:
 346
 347     type term = I | S | K | FA of (term * term)
 348
 349     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
 350     let test1 = FA (FA (K,I), skomega)
 351
 352 This recursive type definition says that a term in CL is either one of
 353 the three simple expressions, or else a pair of CL expressions.
 354 Following Heim and Kratzer, `FA` stands for Functional Application.
 355 With this type definition, we can encode skomega, as well as other
 356 terms whose reduction behavior we want to control.
 357
 358 Using pattern matching, it is easy to code the one-step reduction
 359 rules for CL:
 360
 361     let reduce_one_step (t:term):term = match t with
 362         FA(I,a) -> a
 363       | FA(FA(K,a),b) -> a
 364       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
 365       | _ -> t
 366
 367     # reduce_one_step (FA(FA(K,S),I));;
 368     - : term = S
 369     # reduce_one_step skomega;;
 370     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
 371
 372 The need to explicitly insert the type constructor `FA` obscures
 373 things a bit, but it's still possible to see how the one-step
 374 reduction function is just the reduction rules for CL.  The
 375 OCaml interpreter shows us that the function faithfully recognizes
 376 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
 377
 378 We can now say precisely what it means to be a redex in CL.
 379
 380     let is_redex (t:term):bool = not (t = reduce_one_step t)
 381
 382     # is_redex K;;
 383     - : bool = false
 384     # is_redex (FA(K,I));;
 385     - : bool = false
 386     # is_redex (FA(FA(K,I),S));;
 387     - : bool = true
 388     # is_redex skomega;;
 389     - : book = true
 390
 391 Warning: this definition relies on the fact that the one-step
 392 reduction of a CL term is never identical to the original term.  This
 393 would not work for the untyped lambda calculus, since
 394 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
 395 order to decide whether two terms are equal, OCaml has to recursively
 396 compare the elements of complex CL terms.  It is able to figure out
 397 how to do this because we provided an explicit definition of the
 398 datatype Term.
 399
 400 As you would expect, a term in CL is in normal form when it contains
 401 no redexes.
 402
 403 In order to fully reduce a term, we need to be able to reduce redexes
 404 that are not at the top level of the term.
 405
 406     (KKI)SI ~~> KSI ~~> S
 407
 408 That is, we want to be able to first evaluate the redex `KKI` that is
 409 a proper subpart of the larger term to produce a new intermediate term
 410 that we can then evaluate to the final normal form.
 411
 412 Because we need to process subparts, and because the result after
 413 processing a subpart may require further processing, the recursive
 414 structure of our evaluation function has to be quite subtle.  To truly
 415 understand it, you will need to do some sophisticated thinking about
 416 how recursion works.  We'll show you how to keep track of what is
 417 going on by constructing an recursive execution trace of inputs and
 418 outputs.
 419
 420 We'll develop our full reduction function in stages.  Once we have it
 421 working, we'll then consider some variants.  Just to be precise, we'll
 422 distinguish each microvariant with its own index number embedded in
 423 its name.
 424
 425     let rec reduce1 (t:term):term =
 426       if (is_redex t) then reduce1 (reduce_one_step t)
 427                       else t
 428
 429 If the input is a redex, we ship it off to `reduce_one_step` for
 430 processing.  But just in case the result of the one-step reduction is
 431 itself a redex, we recursively call `reduce1`.  The recursion will
 432 continue until the result is no longer a redex.
 433
 434     #trace reduce1;;
 435     reduce1 is now traced.
 436     # reduce1 (FA (I, FA (I, K)));;
 437     reduce1 <-- FA (I, FA (I, K))
 438       reduce1 <-- FA (I, K)
 439         reduce1 <-- K
 440         reduce1 --> K
 441       reduce1 --> K
 442     reduce1 --> K
 443     - : term = K
 444
 445 Since the initial input (`I(IK)`) is a redex, the result after the
 446 one-step reduction is `IK`.  We recursively call `reduce1` on this
 447 input.  Since `IK` is itself a redex, the result after one-step
 448 reduction is `K`.  We recursively call `reduce1` on this input.  Since
 449 `K` is not a redex, the recursion bottoms out, and we simply return
 450 it.
 451
 452 But this function doesn't do enough reduction.
 453
 454     # reduce1 (FA (FA (I, I), K));;
 455     - : term = FA (FA (I, I), K)
 456
 457 Because the top-level term is not a redex, `reduce1` returns it
 458 without any evaluation.  What we want is to evaluate the subparts of a
 459 complex term.
 460
 461     let rec reduce2 (t:term):term = match t with
 462         I -> I
 463       | K -> K
 464       | S -> S
 465       | FA (a, b) ->
 466           let t' = FA (reduce2 a, reduce2 b) in
 467             if (is_redex t') then reduce2 (reduce_one_step t')
 468                              else t'
 469
 470 Since we need access to the subterms, we do pattern matching on the
 471 input term.  If the input is simple, we return it.  If the input is
 472 complex, we first process the subexpressions, and only then see if we
 473 have a redex.  To understand how this works, follow the trace
 474 carefully:
 475
 476     # reduce2 (FA(FA(I,I),K));;
 477     reduce2 <-- FA (FA (I, I), K)
 478
 479       reduce2 <-- K          ; first main recursive call
 480       reduce2 --> K
 481
 482       reduce2 <-- FA (I, I)  ; second main recursive call
 483         reduce2 <-- I
 484         reduce2 --> I
 485         reduce2 <-- I
 486         reduce2 --> I
 487       reduce2 <-- I
 488
 489       reduce2 --> I           ; third main recursive call
 490       reduce2 --> I
 491
 492       reduce2 <-- K           ; fourth
 493       reduce2 --> K
 494     reduce2 --> K
 495     - : term = K
 496
 497 Ok, there's a lot going on here.  Since the input is complex, the
 498 first thing the function does is construct `t'`.  In order to do this,
 499 it must reduce the two main subexpressions, `II` and `K`.  But we see
 500 from the trace that it begins with the right-hand expression, `K`.  We
 501 didn't explicitly tell it to begin with the right-hand subexpression,
 502 so control over evaluation order is starting to spin out of our
 503 control.  (We'll get it back later, don't worry.)
 504
 505 In any case, in the second main recursive call, we evaluate `II`.  The
 506 result is `I`.  The third main recursive call tests whether this
 507 result needs any further processing; it doesn't.
 508
 509 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
 510 redex, we ship it off to reduce_one_step, getting the term `K` as a
 511 result.  The fourth recursive call checks that there is no more
 512 reduction work to be done (there isn't), and that's our final result.
 513
 514 You can see in more detail what is going on by tracing both reduce2
 515 and reduce_one_step, but that makes for some long traces.
 516
 517 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
 518 K`, as desired.
 519
 520 Because the OCaml interpreter evaluates the rightmost expression
 521 in the course of building `t'`, however, it will always evaluate the
 522 right hand subexpression, whether it needs to or not.  And sure
 523 enough,
 524
 525     # reduce2 (FA(FA(K,I),skomega));;
 526       C-c C-cInterrupted.
 527
 528 instead of performing the leftmost reduction first, and recognizing
 529 that this term reduces to the normal form `I`, we get lost endlessly
 530 trying to reduce skomega.
 531
 532 To emphasize that our evaluation order here is at the mercy of the
 533 evaluation order of OCaml, here is the exact same program translated
 534 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 535
 536 <pre>
 537 OCaml                                                          Haskell
 538 ==========================================================     =========================================================
 539
 540 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
 541
 542 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
 543
 544                                                                reduce_one_step :: Term -> Term
 545 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
 546     FA(I,a) -> a                                                 FA I a -> a
 547   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
 548   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
 549   | _ -> t                                                       _ -> t
 550
 551                                                                is_redex :: Term -> Bool
 552 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
 553
 554                                                                reduce2 :: Term -> Term
 555 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
 556     I -> I                                                       I -> I
 557   | K -> K                                                       K -> K
 558   | S -> S                                                       S -> S
 559   | FA (a, b) ->                                                 FA a b ->
 560       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
 561         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
 562                          else t'                                                      else t'
 563 </pre>
 564
 565 There are some differences in the way types are made explicit, and in
 566 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
 567 Haskell).  But the two programs are essentially identical.
 568
 569 Yet the Haskell program finds the normal form for KI -->
 570
 571 [[!toc levels=2]]
 572
 573 # Reasoning about evaluation order in Combinatory Logic
 574
 575 We've discussed evaluation order before, primarily in connection with
 576 the untyped lambda calculus.  Whenever a term has more than one redex,
 577 we have to choose which one to reduce, and this choice can make a
 578 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
 579 leftmost redex first, the term reduces to the normal form `I` in one
 580 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
 581 we do not arrive at a normal form, and are in danger of entering an
 582 infinite loop.
 583
 584 Thanks to the introduction of sum types (disjoint union), we are now
 585 in a position to gain a deeper understanding by writing a program that
 586 allows us to experiment with different evaluation order strategies.
 587
 588 One thing we'll see is that it is all too easy for the evaluation
 589 order properties of an evaluator to depend on the evaluation order
 590 properties of the programming language in which the evaluator is
 591 written.  We will seek to write an evaluator in which the order of
 592 evaluation is insensitive to the evaluator language.  The goal is to
 593 find an order-insensitive way to reason about evaluation order.
 594
 595 The first evaluator we develop will evaluate terms in Combinatory
 596 Logic.  That will significantly simplify the program, since we won't
 597 need to worry about variables or substitution.  As we develop and
 598 extend our evaluator in future weeks, we'll switch to lambdas, but for
 599 now, working with the elegant simplicity of Combinatory Logic will
 600 make the evaluation order issues easier to grasp.
 601
 602 A brief review: a term in CL is the combination of three basic
 603 expressions, `S`, `K`, and `I`, governed by the following reduction
 604 rules:
 605
 606     Ia ~~> a
 607     Kab ~~> b
 608     Sabc ~~> ac(bc)
 609
 610 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
 611 that how to embed the untyped lambda calculus in CL, so it's no
 612 surprise that the same evaluation order issues arise in CL.
 613
 614     skomega = SII(SII)
 615             ~~> I(SII)(I(SII))
 616             ~~> SII(SII)
 617
 618 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
 619 here, though we'll use the same symbol, `Ω`.  If we consider the term
 620
 621     KIΩ == KI(SII(SII))
 622
 623 we can choose to reduce the leftmost redex by firing the reduction
 624 rule for `K`, in which case the term reduces to the normal form `I` in
 625 one step; or we can choose to reduce the skomega part, by firing the
 626 reduction rule fo `S`, in which case we do not get a normal form,
 627 we're headed towards an infinite loop.
 628
 629 With sum types, we can define terms in CL in OCaml as follows:
 630
 631     type term = I | S | K | FA of (term * term)
 632
 633     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
 634     let test1 = FA (FA (K,I), skomega)
 635
 636 This recursive type definition says that a term in CL is either one of
 637 the three simple expressions, or else a pair of CL expressions.
 638 Following Heim and Kratzer, `FA` stands for Functional Application.
 639 With this type definition, we can encode skomega, as well as other
 640 terms whose reduction behavior we want to control.
 641
 642 Using pattern matching, it is easy to code the one-step reduction
 643 rules for CL:
 644
 645     let reduce_one_step (t:term):term = match t with
 646         FA(I,a) -> a
 647       | FA(FA(K,a),b) -> a
 648       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
 649       | _ -> t
 650
 651     # reduce_one_step (FA(FA(K,S),I));;
 652     - : term = S
 653     # reduce_one_step skomega;;
 654     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
 655
 656 The need to explicitly insert the type constructor `FA` obscures
 657 things a bit, but it's still possible to see how the one-step
 658 reduction function is just the reduction rules for CL.  The
 659 OCaml interpreter shows us that the function faithfully recognizes
 660 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
 661
 662 We can now say precisely what it means to be a redex in CL.
 663
 664     let is_redex (t:term):bool = not (t = reduce_one_step t)
 665
 666     # is_redex K;;
 667     - : bool = false
 668     # is_redex (FA(K,I));;
 669     - : bool = false
 670     # is_redex (FA(FA(K,I),S));;
 671     - : bool = true
 672     # is_redex skomega;;
 673     - : book = true
 674
 675 Warning: this definition relies on the fact that the one-step
 676 reduction of a CL term is never identical to the original term.  This
 677 would not work for the untyped lambda calculus, since
 678 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
 679 order to decide whether two terms are equal, OCaml has to recursively
 680 compare the elements of complex CL terms.  It is able to figure out
 681 how to do this because we provided an explicit definition of the
 682 datatype Term.
 683
 684 As you would expect, a term in CL is in normal form when it contains
 685 no redexes.
 686
 687 In order to fully reduce a term, we need to be able to reduce redexes
 688 that are not at the top level of the term.
 689
 690     (KKI)SI ~~> KSI ~~> S
 691
 692 That is, we want to be able to first evaluate the redex `KKI` that is
 693 a proper subpart of the larger term to produce a new intermediate term
 694 that we can then evaluate to the final normal form.
 695
 696 Because we need to process subparts, and because the result after
 697 processing a subpart may require further processing, the recursive
 698 structure of our evaluation function has to be quite subtle.  To truly
 699 understand it, you will need to do some sophisticated thinking about
 700 how recursion works.  We'll show you how to keep track of what is
 701 going on by constructing an recursive execution trace of inputs and
 702 outputs.
 703
 704 We'll develop our full reduction function in stages.  Once we have it
 705 working, we'll then consider some variants.  Just to be precise, we'll
 706 distinguish each microvariant with its own index number embedded in
 707 its name.
 708
 709     let rec reduce1 (t:term):term =
 710       if (is_redex t) then reduce1 (reduce_one_step t)
 711                       else t
 712
 713 If the input is a redex, we ship it off to `reduce_one_step` for
 714 processing.  But just in case the result of the one-step reduction is
 715 itself a redex, we recursively call `reduce1`.  The recursion will
 716 continue until the result is no longer a redex.
 717
 718     #trace reduce1;;
 719     reduce1 is now traced.
 720     # reduce1 (FA (I, FA (I, K)));;
 721     reduce1 <-- FA (I, FA (I, K))
 722       reduce1 <-- FA (I, K)
 723         reduce1 <-- K
 724         reduce1 --> K
 725       reduce1 --> K
 726     reduce1 --> K
 727     - : term = K
 728
 729 Since the initial input (`I(IK)`) is a redex, the result after the
 730 one-step reduction is `IK`.  We recursively call `reduce1` on this
 731 input.  Since `IK` is itself a redex, the result after one-step
 732 reduction is `K`.  We recursively call `reduce1` on this input.  Since
 733 `K` is not a redex, the recursion bottoms out, and we simply return
 734 it.
 735
 736 But this function doesn't do enough reduction.
 737
 738     # reduce1 (FA (FA (I, I), K));;
 739     - : term = FA (FA (I, I), K)
 740
 741 Because the top-level term is not a redex, `reduce1` returns it
 742 without any evaluation.  What we want is to evaluate the subparts of a
 743 complex term.
 744
 745     let rec reduce2 (t:term):term = match t with
 746         I -> I
 747       | K -> K
 748       | S -> S
 749       | FA (a, b) ->
 750           let t' = FA (reduce2 a, reduce2 b) in
 751             if (is_redex t') then reduce2 (reduce_one_step t')
 752                              else t'
 753
 754 Since we need access to the subterms, we do pattern matching on the
 755 input term.  If the input is simple, we return it.  If the input is
 756 complex, we first process the subexpressions, and only then see if we
 757 have a redex.  To understand how this works, follow the trace
 758 carefully:
 759
 760     # reduce2 (FA(FA(I,I),K));;
 761     reduce2 <-- FA (FA (I, I), K)
 762
 763       reduce2 <-- K          ; first main recursive call
 764       reduce2 --> K
 765
 766       reduce2 <-- FA (I, I)  ; second main recursive call
 767         reduce2 <-- I
 768         reduce2 --> I
 769         reduce2 <-- I
 770         reduce2 --> I
 771       reduce2 <-- I
 772
 773       reduce2 --> I           ; third main recursive call
 774       reduce2 --> I
 775
 776       reduce2 <-- K           ; fourth
 777       reduce2 --> K
 778     reduce2 --> K
 779     - : term = K
 780
 781 Ok, there's a lot going on here.  Since the input is complex, the
 782 first thing the function does is construct `t'`.  In order to do this,
 783 it must reduce the two main subexpressions, `II` and `K`.  But we see
 784 from the trace that it begins with the right-hand expression, `K`.  We
 785 didn't explicitly tell it to begin with the right-hand subexpression,
 786 so control over evaluation order is starting to spin out of our
 787 control.  (We'll get it back later, don't worry.)
 788
 789 In any case, in the second main recursive call, we evaluate `II`.  The
 790 result is `I`.  The third main recursive call tests whether this
 791 result needs any further processing; it doesn't.
 792
 793 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
 794 redex, we ship it off to reduce_one_step, getting the term `K` as a
 795 result.  The fourth recursive call checks that there is no more
 796 reduction work to be done (there isn't), and that's our final result.
 797
 798 You can see in more detail what is going on by tracing both reduce2
 799 and reduce_one_step, but that makes for some long traces.
 800
 801 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
 802 K`, as desired.
 803
 804 Because the OCaml interpreter evaluates the rightmost expression
 805 in the course of building `t'`, however, it will always evaluate the
 806 right hand subexpression, whether it needs to or not.  And sure
 807 enough,
 808
 809     # reduce2 (FA(FA(K,I),skomega));;
 810       C-c C-cInterrupted.
 811
 812 instead of performing the leftmost reduction first, and recognizing
 813 that this term reduces to the normal form `I`, we get lost endlessly
 814 trying to reduce skomega.
 815
 816 To emphasize that our evaluation order here is at the mercy of the
 817 evaluation order of OCaml, here is the exact same program translated
 818 into Haskell.  We'll put them side by side to emphasize the exact parallel.
 819
 820 <pre>
 821 OCaml                                                          Haskell
 822 ==========================================================     =========================================================
 823
 824 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
 825
 826 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
 827
 828                                                                reduce_one_step :: Term -> Term
 829 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
 830     FA(I,a) -> a                                                 FA I a -> a
 831   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
 832   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
 833   | _ -> t                                                       _ -> t
 834
 835                                                                is_redex :: Term -> Bool
 836 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
 837
 838                                                                reduce2 :: Term -> Term
 839 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
 840     I -> I                                                       I -> I
 841   | K -> K                                                       K -> K
 842   | S -> S                                                       S -> S
 843   | FA (a, b) ->                                                 FA a b ->
 844       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
 845         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
 846                          else t'                                                      else t'
 847 </pre>
 848
 849 There are some differences in the way types are made explicit, and in
 850 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
 851 Haskell).  But the two programs are essentially identical.
 852
 853 Yet the Haskell program finds the normal form for KI -->
 854
 855 [[!toc levels=2]]
 856
 857 # Reasoning about evaluation order in Combinatory Logic
 858
 859 We've discussed evaluation order before, primarily in connection with
 860 the untyped lambda calculus.  Whenever a term has more than one redex,
 861 we have to choose which one to reduce, and this choice can make a
 862 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
 863 leftmost redex first, the term reduces to the normal form `I` in one
 864 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
 865 we do not arrive at a normal form, and are in danger of entering an
 866 infinite loop.
 867
 868 Thanks to the introduction of sum types (disjoint union), we are now
 869 in a position to gain a deeper understanding by writing a program that
 870 allows us to experiment with different evaluation order strategies.
 871
 872 One thing we'll see is that it is all too easy for the evaluation
 873 order properties of an evaluator to depend on the evaluation order
 874 properties of the programming language in which the evaluator is
 875 written.  We will seek to write an evaluator in which the order of
 876 evaluation is insensitive to the evaluator language.  The goal is to
 877 find an order-insensitive way to reason about evaluation order.
 878
 879 The first evaluator we develop will evaluate terms in Combinatory
 880 Logic.  That will significantly simplify the program, since we won't
 881 need to worry about variables or substitution.  As we develop and
 882 extend our evaluator in future weeks, we'll switch to lambdas, but for
 883 now, working with the elegant simplicity of Combinatory Logic will
 884 make the evaluation order issues easier to grasp.
 885
 886 A brief review: a term in CL is the combination of three basic
 887 expressions, `S`, `K`, and `I`, governed by the following reduction
 888 rules:
 889
 890     Ia ~~> a
 891     Kab ~~> b
 892     Sabc ~~> ac(bc)
 893
 894 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
 895 that how to embed the untyped lambda calculus in CL, so it's no
 896 surprise that the same evaluation order issues arise in CL.
 897
 898     skomega = SII(SII)
 899             ~~> I(SII)(I(SII))
 900             ~~> SII(SII)
 901
 902 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
 903 here, though we'll use the same symbol, `Ω`.  If we consider the term
 904
 905     KIΩ == KI(SII(SII))
 906
 907 we can choose to reduce the leftmost redex by firing the reduction
 908 rule for `K`, in which case the term reduces to the normal form `I` in
 909 one step; or we can choose to reduce the skomega part, by firing the
 910 reduction rule fo `S`, in which case we do not get a normal form,
 911 we're headed towards an infinite loop.
 912
 913 With sum types, we can define terms in CL in OCaml as follows:
 914
 915     type term = I | S | K | FA of (term * term)
 916
 917     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
 918     let test1 = FA (FA (K,I), skomega)
 919
 920 This recursive type definition says that a term in CL is either one of
 921 the three simple expressions, or else a pair of CL expressions.
 922 Following Heim and Kratzer, `FA` stands for Functional Application.
 923 With this type definition, we can encode skomega, as well as other
 924 terms whose reduction behavior we want to control.
 925
 926 Using pattern matching, it is easy to code the one-step reduction
 927 rules for CL:
 928
 929     let reduce_one_step (t:term):term = match t with
 930         FA(I,a) -> a
 931       | FA(FA(K,a),b) -> a
 932       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
 933       | _ -> t
 934
 935     # reduce_one_step (FA(FA(K,S),I));;
 936     - : term = S
 937     # reduce_one_step skomega;;
 938     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
 939
 940 The need to explicitly insert the type constructor `FA` obscures
 941 things a bit, but it's still possible to see how the one-step
 942 reduction function is just the reduction rules for CL.  The
 943 OCaml interpreter shows us that the function faithfully recognizes
 944 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
 945
 946 We can now say precisely what it means to be a redex in CL.
 947
 948     let is_redex (t:term):bool = not (t = reduce_one_step t)
 949
 950     # is_redex K;;
 951     - : bool = false
 952     # is_redex (FA(K,I));;
 953     - : bool = false
 954     # is_redex (FA(FA(K,I),S));;
 955     - : bool = true
 956     # is_redex skomega;;
 957     - : book = true
 958
 959 Warning: this definition relies on the fact that the one-step
 960 reduction of a CL term is never identical to the original term.  This
 961 would not work for the untyped lambda calculus, since
 962 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
 963 order to decide whether two terms are equal, OCaml has to recursively
 964 compare the elements of complex CL terms.  It is able to figure out
 965 how to do this because we provided an explicit definition of the
 966 datatype Term.
 967
 968 As you would expect, a term in CL is in normal form when it contains
 969 no redexes.
 970
 971 In order to fully reduce a term, we need to be able to reduce redexes
 972 that are not at the top level of the term.
 973
 974     (KKI)SI ~~> KSI ~~> S
 975
 976 That is, we want to be able to first evaluate the redex `KKI` that is
 977 a proper subpart of the larger term to produce a new intermediate term
 978 that we can then evaluate to the final normal form.
 979
 980 Because we need to process subparts, and because the result after
 981 processing a subpart may require further processing, the recursive
 982 structure of our evaluation function has to be quite subtle.  To truly
 983 understand it, you will need to do some sophisticated thinking about
 984 how recursion works.  We'll show you how to keep track of what is
 985 going on by constructing an recursive execution trace of inputs and
 986 outputs.
 987
 988 We'll develop our full reduction function in stages.  Once we have it
 989 working, we'll then consider some variants.  Just to be precise, we'll
 990 distinguish each microvariant with its own index number embedded in
 991 its name.
 992
 993     let rec reduce1 (t:term):term =
 994       if (is_redex t) then reduce1 (reduce_one_step t)
 995                       else t
 996
 997 If the input is a redex, we ship it off to `reduce_one_step` for
 998 processing.  But just in case the result of the one-step reduction is
 999 itself a redex, we recursively call `reduce1`.  The recursion will
1000 continue until the result is no longer a redex.
1001
1002     #trace reduce1;;
1003     reduce1 is now traced.
1004     # reduce1 (FA (I, FA (I, K)));;
1005     reduce1 <-- FA (I, FA (I, K))
1006       reduce1 <-- FA (I, K)
1007         reduce1 <-- K
1008         reduce1 --> K
1009       reduce1 --> K
1010     reduce1 --> K
1011     - : term = K
1012
1013 Since the initial input (`I(IK)`) is a redex, the result after the
1014 one-step reduction is `IK`.  We recursively call `reduce1` on this
1015 input.  Since `IK` is itself a redex, the result after one-step
1016 reduction is `K`.  We recursively call `reduce1` on this input.  Since
1017 `K` is not a redex, the recursion bottoms out, and we simply return
1018 it.
1019
1020 But this function doesn't do enough reduction.
1021
1022     # reduce1 (FA (FA (I, I), K));;
1023     - : term = FA (FA (I, I), K)
1024
1025 Because the top-level term is not a redex, `reduce1` returns it
1026 without any evaluation.  What we want is to evaluate the subparts of a
1027 complex term.
1028
1029     let rec reduce2 (t:term):term = match t with
1030         I -> I
1031       | K -> K
1032       | S -> S
1033       | FA (a, b) ->
1034           let t' = FA (reduce2 a, reduce2 b) in
1035             if (is_redex t') then reduce2 (reduce_one_step t')
1036                              else t'
1037
1038 Since we need access to the subterms, we do pattern matching on the
1039 input term.  If the input is simple, we return it.  If the input is
1040 complex, we first process the subexpressions, and only then see if we
1041 have a redex.  To understand how this works, follow the trace
1042 carefully:
1043
1044     # reduce2 (FA(FA(I,I),K));;
1045     reduce2 <-- FA (FA (I, I), K)
1046
1047       reduce2 <-- K          ; first main recursive call
1048       reduce2 --> K
1049
1050       reduce2 <-- FA (I, I)  ; second main recursive call
1051         reduce2 <-- I
1052         reduce2 --> I
1053         reduce2 <-- I
1054         reduce2 --> I
1055       reduce2 <-- I
1056
1057       reduce2 --> I           ; third main recursive call
1058       reduce2 --> I
1059
1060       reduce2 <-- K           ; fourth
1061       reduce2 --> K
1062     reduce2 --> K
1063     - : term = K
1064
1065 Ok, there's a lot going on here.  Since the input is complex, the
1066 first thing the function does is construct `t'`.  In order to do this,
1067 it must reduce the two main subexpressions, `II` and `K`.  But we see
1068 from the trace that it begins with the right-hand expression, `K`.  We
1069 didn't explicitly tell it to begin with the right-hand subexpression,
1070 so control over evaluation order is starting to spin out of our
1071 control.  (We'll get it back later, don't worry.)
1072
1073 In any case, in the second main recursive call, we evaluate `II`.  The
1074 result is `I`.  The third main recursive call tests whether this
1075 result needs any further processing; it doesn't.
1076
1077 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
1078 redex, we ship it off to reduce_one_step, getting the term `K` as a
1079 result.  The fourth recursive call checks that there is no more
1080 reduction work to be done (there isn't), and that's our final result.
1081
1082 You can see in more detail what is going on by tracing both reduce2
1083 and reduce_one_step, but that makes for some long traces.
1084
1085 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
1086 K`, as desired.
1087
1088 Because the OCaml interpreter evaluates the rightmost expression
1089 in the course of building `t'`, however, it will always evaluate the
1090 right hand subexpression, whether it needs to or not.  And sure
1091 enough,
1092
1093     # reduce2 (FA(FA(K,I),skomega));;
1094       C-c C-cInterrupted.
1095
1096 instead of performing the leftmost reduction first, and recognizing
1097 that this term reduces to the normal form `I`, we get lost endlessly
1098 trying to reduce skomega.
1099
1100 To emphasize that our evaluation order here is at the mercy of the
1101 evaluation order of OCaml, here is the exact same program translated
1102 into Haskell.  We'll put them side by side to emphasize the exact parallel.
1103
1104 <pre>
1105 OCaml                                                          Haskell
1106 ==========================================================     =========================================================
1107
1108 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
1109
1110 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
1111
1112                                                                reduce_one_step :: Term -> Term
1113 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
1114     FA(I,a) -> a                                                 FA I a -> a
1115   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
1116   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
1117   | _ -> t                                                       _ -> t
1118
1119                                                                is_redex :: Term -> Bool
1120 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
1121
1122                                                                reduce2 :: Term -> Term
1123 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
1124     I -> I                                                       I -> I
1125   | K -> K                                                       K -> K
1126   | S -> S                                                       S -> S
1127   | FA (a, b) ->                                                 FA a b ->
1128       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
1129         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
1130                          else t'                                                      else t'
1131 </pre>
1132
1133 There are some differences in the way types are made explicit, and in
1134 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
1135 Haskell).  But the two programs are essentially identical.
1136
1137 Yet the Haskell program finds the normal form for KI -->
1138
1139 [[!toc levels=2]]
1140
1141 # Reasoning about evaluation order in Combinatory Logic
1142
1143 We've discussed evaluation order before, primarily in connection with
1144 the untyped lambda calculus.  Whenever a term has more than one redex,
1145 we have to choose which one to reduce, and this choice can make a
1146 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
1147 leftmost redex first, the term reduces to the normal form `I` in one
1148 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
1149 we do not arrive at a normal form, and are in danger of entering an
1150 infinite loop.
1151
1152 Thanks to the introduction of sum types (disjoint union), we are now
1153 in a position to gain a deeper understanding by writing a program that
1154 allows us to experiment with different evaluation order strategies.
1155
1156 One thing we'll see is that it is all too easy for the evaluation
1157 order properties of an evaluator to depend on the evaluation order
1158 properties of the programming language in which the evaluator is
1159 written.  We will seek to write an evaluator in which the order of
1160 evaluation is insensitive to the evaluator language.  The goal is to
1161 find an order-insensitive way to reason about evaluation order.
1162
1163 The first evaluator we develop will evaluate terms in Combinatory
1164 Logic.  That will significantly simplify the program, since we won't
1165 need to worry about variables or substitution.  As we develop and
1166 extend our evaluator in future weeks, we'll switch to lambdas, but for
1167 now, working with the elegant simplicity of Combinatory Logic will
1168 make the evaluation order issues easier to grasp.
1169
1170 A brief review: a term in CL is the combination of three basic
1171 expressions, `S`, `K`, and `I`, governed by the following reduction
1172 rules:
1173
1174     Ia ~~> a
1175     Kab ~~> b
1176     Sabc ~~> ac(bc)
1177
1178 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
1179 that how to embed the untyped lambda calculus in CL, so it's no
1180 surprise that the same evaluation order issues arise in CL.
1181
1182     skomega = SII(SII)
1183             ~~> I(SII)(I(SII))
1184             ~~> SII(SII)
1185
1186 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
1187 here, though we'll use the same symbol, `Ω`.  If we consider the term
1188
1189     KIΩ == KI(SII(SII))
1190
1191 we can choose to reduce the leftmost redex by firing the reduction
1192 rule for `K`, in which case the term reduces to the normal form `I` in
1193 one step; or we can choose to reduce the skomega part, by firing the
1194 reduction rule fo `S`, in which case we do not get a normal form,
1195 we're headed towards an infinite loop.
1196
1197 With sum types, we can define terms in CL in OCaml as follows:
1198
1199     type term = I | S | K | FA of (term * term)
1200
1201     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
1202     let test1 = FA (FA (K,I), skomega)
1203
1204 This recursive type definition says that a term in CL is either one of
1205 the three simple expressions, or else a pair of CL expressions.
1206 Following Heim and Kratzer, `FA` stands for Functional Application.
1207 With this type definition, we can encode skomega, as well as other
1208 terms whose reduction behavior we want to control.
1209
1210 Using pattern matching, it is easy to code the one-step reduction
1211 rules for CL:
1212
1213     let reduce_one_step (t:term):term = match t with
1214         FA(I,a) -> a
1215       | FA(FA(K,a),b) -> a
1216       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
1217       | _ -> t
1218
1219     # reduce_one_step (FA(FA(K,S),I));;
1220     - : term = S
1221     # reduce_one_step skomega;;
1222     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
1223
1224 The need to explicitly insert the type constructor `FA` obscures
1225 things a bit, but it's still possible to see how the one-step
1226 reduction function is just the reduction rules for CL.  The
1227 OCaml interpreter shows us that the function faithfully recognizes
1228 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
1229
1230 We can now say precisely what it means to be a redex in CL.
1231
1232     let is_redex (t:term):bool = not (t = reduce_one_step t)
1233
1234     # is_redex K;;
1235     - : bool = false
1236     # is_redex (FA(K,I));;
1237     - : bool = false
1238     # is_redex (FA(FA(K,I),S));;
1239     - : bool = true
1240     # is_redex skomega;;
1241     - : book = true
1242
1243 Warning: this definition relies on the fact that the one-step
1244 reduction of a CL term is never identical to the original term.  This
1245 would not work for the untyped lambda calculus, since
1246 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
1247 order to decide whether two terms are equal, OCaml has to recursively
1248 compare the elements of complex CL terms.  It is able to figure out
1249 how to do this because we provided an explicit definition of the
1250 datatype Term.
1251
1252 As you would expect, a term in CL is in normal form when it contains
1253 no redexes.
1254
1255 In order to fully reduce a term, we need to be able to reduce redexes
1256 that are not at the top level of the term.
1257
1258     (KKI)SI ~~> KSI ~~> S
1259
1260 That is, we want to be able to first evaluate the redex `KKI` that is
1261 a proper subpart of the larger term to produce a new intermediate term
1262 that we can then evaluate to the final normal form.
1263
1264 Because we need to process subparts, and because the result after
1265 processing a subpart may require further processing, the recursive
1266 structure of our evaluation function has to be quite subtle.  To truly
1267 understand it, you will need to do some sophisticated thinking about
1268 how recursion works.  We'll show you how to keep track of what is
1269 going on by constructing an recursive execution trace of inputs and
1270 outputs.
1271
1272 We'll develop our full reduction function in stages.  Once we have it
1273 working, we'll then consider some variants.  Just to be precise, we'll
1274 distinguish each microvariant with its own index number embedded in
1275 its name.
1276
1277     let rec reduce1 (t:term):term =
1278       if (is_redex t) then reduce1 (reduce_one_step t)
1279                       else t
1280
1281 If the input is a redex, we ship it off to `reduce_one_step` for
1282 processing.  But just in case the result of the one-step reduction is
1283 itself a redex, we recursively call `reduce1`.  The recursion will
1284 continue until the result is no longer a redex.
1285
1286     #trace reduce1;;
1287     reduce1 is now traced.
1288     # reduce1 (FA (I, FA (I, K)));;
1289     reduce1 <-- FA (I, FA (I, K))
1290       reduce1 <-- FA (I, K)
1291         reduce1 <-- K
1292         reduce1 --> K
1293       reduce1 --> K
1294     reduce1 --> K
1295     - : term = K
1296
1297 Since the initial input (`I(IK)`) is a redex, the result after the
1298 one-step reduction is `IK`.  We recursively call `reduce1` on this
1299 input.  Since `IK` is itself a redex, the result after one-step
1300 reduction is `K`.  We recursively call `reduce1` on this input.  Since
1301 `K` is not a redex, the recursion bottoms out, and we simply return
1302 it.
1303
1304 But this function doesn't do enough reduction.
1305
1306     # reduce1 (FA (FA (I, I), K));;
1307     - : term = FA (FA (I, I), K)
1308
1309 Because the top-level term is not a redex, `reduce1` returns it
1310 without any evaluation.  What we want is to evaluate the subparts of a
1311 complex term.
1312
1313     let rec reduce2 (t:term):term = match t with
1314         I -> I
1315       | K -> K
1316       | S -> S
1317       | FA (a, b) ->
1318           let t' = FA (reduce2 a, reduce2 b) in
1319             if (is_redex t') then reduce2 (reduce_one_step t')
1320                              else t'
1321
1322 Since we need access to the subterms, we do pattern matching on the
1323 input term.  If the input is simple, we return it.  If the input is
1324 complex, we first process the subexpressions, and only then see if we
1325 have a redex.  To understand how this works, follow the trace
1326 carefully:
1327
1328     # reduce2 (FA(FA(I,I),K));;
1329     reduce2 <-- FA (FA (I, I), K)
1330
1331       reduce2 <-- K          ; first main recursive call
1332       reduce2 --> K
1333
1334       reduce2 <-- FA (I, I)  ; second main recursive call
1335         reduce2 <-- I
1336         reduce2 --> I
1337         reduce2 <-- I
1338         reduce2 --> I
1339       reduce2 <-- I
1340
1341       reduce2 --> I           ; third main recursive call
1342       reduce2 --> I
1343
1344       reduce2 <-- K           ; fourth
1345       reduce2 --> K
1346     reduce2 --> K
1347     - : term = K
1348
1349 Ok, there's a lot going on here.  Since the input is complex, the
1350 first thing the function does is construct `t'`.  In order to do this,
1351 it must reduce the two main subexpressions, `II` and `K`.  But we see
1352 from the trace that it begins with the right-hand expression, `K`.  We
1353 didn't explicitly tell it to begin with the right-hand subexpression,
1354 so control over evaluation order is starting to spin out of our
1355 control.  (We'll get it back later, don't worry.)
1356
1357 In any case, in the second main recursive call, we evaluate `II`.  The
1358 result is `I`.  The third main recursive call tests whether this
1359 result needs any further processing; it doesn't.
1360
1361 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
1362 redex, we ship it off to reduce_one_step, getting the term `K` as a
1363 result.  The fourth recursive call checks that there is no more
1364 reduction work to be done (there isn't), and that's our final result.
1365
1366 You can see in more detail what is going on by tracing both reduce2
1367 and reduce_one_step, but that makes for some long traces.
1368
1369 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
1370 K`, as desired.
1371
1372 Because the OCaml interpreter evaluates the rightmost expression
1373 in the course of building `t'`, however, it will always evaluate the
1374 right hand subexpression, whether it needs to or not.  And sure
1375 enough,
1376
1377     # reduce2 (FA(FA(K,I),skomega));;
1378       C-c C-cInterrupted.
1379
1380 instead of performing the leftmost reduction first, and recognizing
1381 that this term reduces to the normal form `I`, we get lost endlessly
1382 trying to reduce skomega.
1383
1384 To emphasize that our evaluation order here is at the mercy of the
1385 evaluation order of OCaml, here is the exact same program translated
1386 into Haskell.  We'll put them side by side to emphasize the exact parallel.
1387
1388 <pre>
1389 OCaml                                                          Haskell
1390 ==========================================================     =========================================================
1391
1392 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
1393
1394 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
1395
1396                                                                reduce_one_step :: Term -> Term
1397 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
1398     FA(I,a) -> a                                                 FA I a -> a
1399   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
1400   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
1401   | _ -> t                                                       _ -> t
1402
1403                                                                is_redex :: Term -> Bool
1404 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
1405
1406                                                                reduce2 :: Term -> Term
1407 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
1408     I -> I                                                       I -> I
1409   | K -> K                                                       K -> K
1410   | S -> S                                                       S -> S
1411   | FA (a, b) ->                                                 FA a b ->
1412       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
1413         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
1414                          else t'                                                      else t'
1415 </pre>
1416
1417 There are some differences in the way types are made explicit, and in
1418 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
1419 Haskell).  But the two programs are essentially identical.
1420
1421 Yet the Haskell program finds the normal form for KI -->
1422
1423 [[!toc levels=2]]
1424
1425 # Reasoning about evaluation order in Combinatory Logic
1426
1427 We've discussed evaluation order before, primarily in connection with
1428 the untyped lambda calculus.  Whenever a term has more than one redex,
1429 we have to choose which one to reduce, and this choice can make a
1430 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
1431 leftmost redex first, the term reduces to the normal form `I` in one
1432 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
1433 we do not arrive at a normal form, and are in danger of entering an
1434 infinite loop.
1435
1436 Thanks to the introduction of sum types (disjoint union), we are now
1437 in a position to gain a deeper understanding by writing a program that
1438 allows us to experiment with different evaluation order strategies.
1439
1440 One thing we'll see is that it is all too easy for the evaluation
1441 order properties of an evaluator to depend on the evaluation order
1442 properties of the programming language in which the evaluator is
1443 written.  We will seek to write an evaluator in which the order of
1444 evaluation is insensitive to the evaluator language.  The goal is to
1445 find an order-insensitive way to reason about evaluation order.
1446
1447 The first evaluator we develop will evaluate terms in Combinatory
1448 Logic.  That will significantly simplify the program, since we won't
1449 need to worry about variables or substitution.  As we develop and
1450 extend our evaluator in future weeks, we'll switch to lambdas, but for
1451 now, working with the elegant simplicity of Combinatory Logic will
1452 make the evaluation order issues easier to grasp.
1453
1454 A brief review: a term in CL is the combination of three basic
1455 expressions, `S`, `K`, and `I`, governed by the following reduction
1456 rules:
1457
1458     Ia ~~> a
1459     Kab ~~> b
1460     Sabc ~~> ac(bc)
1461
1462 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
1463 that how to embed the untyped lambda calculus in CL, so it's no
1464 surprise that the same evaluation order issues arise in CL.
1465
1466     skomega = SII(SII)
1467             ~~> I(SII)(I(SII))
1468             ~~> SII(SII)
1469
1470 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
1471 here, though we'll use the same symbol, `Ω`.  If we consider the term
1472
1473     KIΩ == KI(SII(SII))
1474
1475 we can choose to reduce the leftmost redex by firing the reduction
1476 rule for `K`, in which case the term reduces to the normal form `I` in
1477 one step; or we can choose to reduce the skomega part, by firing the
1478 reduction rule fo `S`, in which case we do not get a normal form,
1479 we're headed towards an infinite loop.
1480
1481 With sum types, we can define terms in CL in OCaml as follows:
1482
1483     type term = I | S | K | FA of (term * term)
1484
1485     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
1486     let test1 = FA (FA (K,I), skomega)
1487
1488 This recursive type definition says that a term in CL is either one of
1489 the three simple expressions, or else a pair of CL expressions.
1490 Following Heim and Kratzer, `FA` stands for Functional Application.
1491 With this type definition, we can encode skomega, as well as other
1492 terms whose reduction behavior we want to control.
1493
1494 Using pattern matching, it is easy to code the one-step reduction
1495 rules for CL:
1496
1497     let reduce_one_step (t:term):term = match t with
1498         FA(I,a) -> a
1499       | FA(FA(K,a),b) -> a
1500       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
1501       | _ -> t
1502
1503     # reduce_one_step (FA(FA(K,S),I));;
1504     - : term = S
1505     # reduce_one_step skomega;;
1506     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
1507
1508 The need to explicitly insert the type constructor `FA` obscures
1509 things a bit, but it's still possible to see how the one-step
1510 reduction function is just the reduction rules for CL.  The
1511 OCaml interpreter shows us that the function faithfully recognizes
1512 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
1513
1514 We can now say precisely what it means to be a redex in CL.
1515
1516     let is_redex (t:term):bool = not (t = reduce_one_step t)
1517
1518     # is_redex K;;
1519     - : bool = false
1520     # is_redex (FA(K,I));;
1521     - : bool = false
1522     # is_redex (FA(FA(K,I),S));;
1523     - : bool = true
1524     # is_redex skomega;;
1525     - : book = true
1526
1527 Warning: this definition relies on the fact that the one-step
1528 reduction of a CL term is never identical to the original term.  This
1529 would not work for the untyped lambda calculus, since
1530 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
1531 order to decide whether two terms are equal, OCaml has to recursively
1532 compare the elements of complex CL terms.  It is able to figure out
1533 how to do this because we provided an explicit definition of the
1534 datatype Term.
1535
1536 As you would expect, a term in CL is in normal form when it contains
1537 no redexes.
1538
1539 In order to fully reduce a term, we need to be able to reduce redexes
1540 that are not at the top level of the term.
1541
1542     (KKI)SI ~~> KSI ~~> S
1543
1544 That is, we want to be able to first evaluate the redex `KKI` that is
1545 a proper subpart of the larger term to produce a new intermediate term
1546 that we can then evaluate to the final normal form.
1547
1548 Because we need to process subparts, and because the result after
1549 processing a subpart may require further processing, the recursive
1550 structure of our evaluation function has to be quite subtle.  To truly
1551 understand it, you will need to do some sophisticated thinking about
1552 how recursion works.  We'll show you how to keep track of what is
1553 going on by constructing an recursive execution trace of inputs and
1554 outputs.
1555
1556 We'll develop our full reduction function in stages.  Once we have it
1557 working, we'll then consider some variants.  Just to be precise, we'll
1558 distinguish each microvariant with its own index number embedded in
1559 its name.
1560
1561     let rec reduce1 (t:term):term =
1562       if (is_redex t) then reduce1 (reduce_one_step t)
1563                       else t
1564
1565 If the input is a redex, we ship it off to `reduce_one_step` for
1566 processing.  But just in case the result of the one-step reduction is
1567 itself a redex, we recursively call `reduce1`.  The recursion will
1568 continue until the result is no longer a redex.
1569
1570     #trace reduce1;;
1571     reduce1 is now traced.
1572     # reduce1 (FA (I, FA (I, K)));;
1573     reduce1 <-- FA (I, FA (I, K))
1574       reduce1 <-- FA (I, K)
1575         reduce1 <-- K
1576         reduce1 --> K
1577       reduce1 --> K
1578     reduce1 --> K
1579     - : term = K
1580
1581 Since the initial input (`I(IK)`) is a redex, the result after the
1582 one-step reduction is `IK`.  We recursively call `reduce1` on this
1583 input.  Since `IK` is itself a redex, the result after one-step
1584 reduction is `K`.  We recursively call `reduce1` on this input.  Since
1585 `K` is not a redex, the recursion bottoms out, and we simply return
1586 it.
1587
1588 But this function doesn't do enough reduction.
1589
1590     # reduce1 (FA (FA (I, I), K));;
1591     - : term = FA (FA (I, I), K)
1592
1593 Because the top-level term is not a redex, `reduce1` returns it
1594 without any evaluation.  What we want is to evaluate the subparts of a
1595 complex term.
1596
1597     let rec reduce2 (t:term):term = match t with
1598         I -> I
1599       | K -> K
1600       | S -> S
1601       | FA (a, b) ->
1602           let t' = FA (reduce2 a, reduce2 b) in
1603             if (is_redex t') then reduce2 (reduce_one_step t')
1604                              else t'
1605
1606 Since we need access to the subterms, we do pattern matching on the
1607 input term.  If the input is simple, we return it.  If the input is
1608 complex, we first process the subexpressions, and only then see if we
1609 have a redex.  To understand how this works, follow the trace
1610 carefully:
1611
1612     # reduce2 (FA(FA(I,I),K));;
1613     reduce2 <-- FA (FA (I, I), K)
1614
1615       reduce2 <-- K          ; first main recursive call
1616       reduce2 --> K
1617
1618       reduce2 <-- FA (I, I)  ; second main recursive call
1619         reduce2 <-- I
1620         reduce2 --> I
1621         reduce2 <-- I
1622         reduce2 --> I
1623       reduce2 <-- I
1624
1625       reduce2 --> I           ; third main recursive call
1626       reduce2 --> I
1627
1628       reduce2 <-- K           ; fourth
1629       reduce2 --> K
1630     reduce2 --> K
1631     - : term = K
1632
1633 Ok, there's a lot going on here.  Since the input is complex, the
1634 first thing the function does is construct `t'`.  In order to do this,
1635 it must reduce the two main subexpressions, `II` and `K`.  But we see
1636 from the trace that it begins with the right-hand expression, `K`.  We
1637 didn't explicitly tell it to begin with the right-hand subexpression,
1638 so control over evaluation order is starting to spin out of our
1639 control.  (We'll get it back later, don't worry.)
1640
1641 In any case, in the second main recursive call, we evaluate `II`.  The
1642 result is `I`.  The third main recursive call tests whether this
1643 result needs any further processing; it doesn't.
1644
1645 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
1646 redex, we ship it off to reduce_one_step, getting the term `K` as a
1647 result.  The fourth recursive call checks that there is no more
1648 reduction work to be done (there isn't), and that's our final result.
1649
1650 You can see in more detail what is going on by tracing both reduce2
1651 and reduce_one_step, but that makes for some long traces.
1652
1653 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
1654 K`, as desired.
1655
1656 Because the OCaml interpreter evaluates the rightmost expression
1657 in the course of building `t'`, however, it will always evaluate the
1658 right hand subexpression, whether it needs to or not.  And sure
1659 enough,
1660
1661     # reduce2 (FA(FA(K,I),skomega));;
1662       C-c C-cInterrupted.
1663
1664 instead of performing the leftmost reduction first, and recognizing
1665 that this term reduces to the normal form `I`, we get lost endlessly
1666 trying to reduce skomega.
1667
1668 To emphasize that our evaluation order here is at the mercy of the
1669 evaluation order of OCaml, here is the exact same program translated
1670 into Haskell.  We'll put them side by side to emphasize the exact parallel.
1671
1672 <pre>
1673 OCaml                                                          Haskell
1674 ==========================================================     =========================================================
1675
1676 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
1677
1678 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
1679
1680                                                                reduce_one_step :: Term -> Term
1681 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
1682     FA(I,a) -> a                                                 FA I a -> a
1683   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
1684   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
1685   | _ -> t                                                       _ -> t
1686
1687                                                                is_redex :: Term -> Bool
1688 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
1689
1690                                                                reduce2 :: Term -> Term
1691 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
1692     I -> I                                                       I -> I
1693   | K -> K                                                       K -> K
1694   | S -> S                                                       S -> S
1695   | FA (a, b) ->                                                 FA a b ->
1696       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
1697         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
1698                          else t'                                                      else t'
1699 </pre>
1700
1701 There are some differences in the way types are made explicit, and in
1702 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
1703 Haskell).  But the two programs are essentially identical.
1704
1705 Yet the Haskell program finds the normal form for KI -->
1706
1707 [[!toc levels=2]]
1708
1709 # Reasoning about evaluation order in Combinatory Logic
1710
1711 We've discussed evaluation order before, primarily in connection with
1712 the untyped lambda calculus.  Whenever a term has more than one redex,
1713 we have to choose which one to reduce, and this choice can make a
1714 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
1715 leftmost redex first, the term reduces to the normal form `I` in one
1716 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
1717 we do not arrive at a normal form, and are in danger of entering an
1718 infinite loop.
1719
1720 Thanks to the introduction of sum types (disjoint union), we are now
1721 in a position to gain a deeper understanding by writing a program that
1722 allows us to experiment with different evaluation order strategies.
1723
1724 One thing we'll see is that it is all too easy for the evaluation
1725 order properties of an evaluator to depend on the evaluation order
1726 properties of the programming language in which the evaluator is
1727 written.  We will seek to write an evaluator in which the order of
1728 evaluation is insensitive to the evaluator language.  The goal is to
1729 find an order-insensitive way to reason about evaluation order.
1730
1731 The first evaluator we develop will evaluate terms in Combinatory
1732 Logic.  That will significantly simplify the program, since we won't
1733 need to worry about variables or substitution.  As we develop and
1734 extend our evaluator in future weeks, we'll switch to lambdas, but for
1735 now, working with the elegant simplicity of Combinatory Logic will
1736 make the evaluation order issues easier to grasp.
1737
1738 A brief review: a term in CL is the combination of three basic
1739 expressions, `S`, `K`, and `I`, governed by the following reduction
1740 rules:
1741
1742     Ia ~~> a
1743     Kab ~~> b
1744     Sabc ~~> ac(bc)
1745
1746 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
1747 that how to embed the untyped lambda calculus in CL, so it's no
1748 surprise that the same evaluation order issues arise in CL.
1749
1750     skomega = SII(SII)
1751             ~~> I(SII)(I(SII))
1752             ~~> SII(SII)
1753
1754 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
1755 here, though we'll use the same symbol, `Ω`.  If we consider the term
1756
1757     KIΩ == KI(SII(SII))
1758
1759 we can choose to reduce the leftmost redex by firing the reduction
1760 rule for `K`, in which case the term reduces to the normal form `I` in
1761 one step; or we can choose to reduce the skomega part, by firing the
1762 reduction rule fo `S`, in which case we do not get a normal form,
1763 we're headed towards an infinite loop.
1764
1765 With sum types, we can define terms in CL in OCaml as follows:
1766
1767     type term = I | S | K | FA of (term * term)
1768
1769     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
1770     let test1 = FA (FA (K,I), skomega)
1771
1772 This recursive type definition says that a term in CL is either one of
1773 the three simple expressions, or else a pair of CL expressions.
1774 Following Heim and Kratzer, `FA` stands for Functional Application.
1775 With this type definition, we can encode skomega, as well as other
1776 terms whose reduction behavior we want to control.
1777
1778 Using pattern matching, it is easy to code the one-step reduction
1779 rules for CL:
1780
1781     let reduce_one_step (t:term):term = match t with
1782         FA(I,a) -> a
1783       | FA(FA(K,a),b) -> a
1784       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
1785       | _ -> t
1786
1787     # reduce_one_step (FA(FA(K,S),I));;
1788     - : term = S
1789     # reduce_one_step skomega;;
1790     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
1791
1792 The need to explicitly insert the type constructor `FA` obscures
1793 things a bit, but it's still possible to see how the one-step
1794 reduction function is just the reduction rules for CL.  The
1795 OCaml interpreter shows us that the function faithfully recognizes
1796 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
1797
1798 We can now say precisely what it means to be a redex in CL.
1799
1800     let is_redex (t:term):bool = not (t = reduce_one_step t)
1801
1802     # is_redex K;;
1803     - : bool = false
1804     # is_redex (FA(K,I));;
1805     - : bool = false
1806     # is_redex (FA(FA(K,I),S));;
1807     - : bool = true
1808     # is_redex skomega;;
1809     - : book = true
1810
1811 Warning: this definition relies on the fact that the one-step
1812 reduction of a CL term is never identical to the original term.  This
1813 would not work for the untyped lambda calculus, since
1814 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
1815 order to decide whether two terms are equal, OCaml has to recursively
1816 compare the elements of complex CL terms.  It is able to figure out
1817 how to do this because we provided an explicit definition of the
1818 datatype Term.
1819
1820 As you would expect, a term in CL is in normal form when it contains
1821 no redexes.
1822
1823 In order to fully reduce a term, we need to be able to reduce redexes
1824 that are not at the top level of the term.
1825
1826     (KKI)SI ~~> KSI ~~> S
1827
1828 That is, we want to be able to first evaluate the redex `KKI` that is
1829 a proper subpart of the larger term to produce a new intermediate term
1830 that we can then evaluate to the final normal form.
1831
1832 Because we need to process subparts, and because the result after
1833 processing a subpart may require further processing, the recursive
1834 structure of our evaluation function has to be quite subtle.  To truly
1835 understand it, you will need to do some sophisticated thinking about
1836 how recursion works.  We'll show you how to keep track of what is
1837 going on by constructing an recursive execution trace of inputs and
1838 outputs.
1839
1840 We'll develop our full reduction function in stages.  Once we have it
1841 working, we'll then consider some variants.  Just to be precise, we'll
1842 distinguish each microvariant with its own index number embedded in
1843 its name.
1844
1845     let rec reduce1 (t:term):term =
1846       if (is_redex t) then reduce1 (reduce_one_step t)
1847                       else t
1848
1849 If the input is a redex, we ship it off to `reduce_one_step` for
1850 processing.  But just in case the result of the one-step reduction is
1851 itself a redex, we recursively call `reduce1`.  The recursion will
1852 continue until the result is no longer a redex.
1853
1854     #trace reduce1;;
1855     reduce1 is now traced.
1856     # reduce1 (FA (I, FA (I, K)));;
1857     reduce1 <-- FA (I, FA (I, K))
1858       reduce1 <-- FA (I, K)
1859         reduce1 <-- K
1860         reduce1 --> K
1861       reduce1 --> K
1862     reduce1 --> K
1863     - : term = K
1864
1865 Since the initial input (`I(IK)`) is a redex, the result after the
1866 one-step reduction is `IK`.  We recursively call `reduce1` on this
1867 input.  Since `IK` is itself a redex, the result after one-step
1868 reduction is `K`.  We recursively call `reduce1` on this input.  Since
1869 `K` is not a redex, the recursion bottoms out, and we simply return
1870 it.
1871
1872 But this function doesn't do enough reduction.
1873
1874     # reduce1 (FA (FA (I, I), K));;
1875     - : term = FA (FA (I, I), K)
1876
1877 Because the top-level term is not a redex, `reduce1` returns it
1878 without any evaluation.  What we want is to evaluate the subparts of a
1879 complex term.
1880
1881     let rec reduce2 (t:term):term = match t with
1882         I -> I
1883       | K -> K
1884       | S -> S
1885       | FA (a, b) ->
1886           let t' = FA (reduce2 a, reduce2 b) in
1887             if (is_redex t') then reduce2 (reduce_one_step t')
1888                              else t'
1889
1890 Since we need access to the subterms, we do pattern matching on the
1891 input term.  If the input is simple, we return it.  If the input is
1892 complex, we first process the subexpressions, and only then see if we
1893 have a redex.  To understand how this works, follow the trace
1894 carefully:
1895
1896     # reduce2 (FA(FA(I,I),K));;
1897     reduce2 <-- FA (FA (I, I), K)
1898
1899       reduce2 <-- K          ; first main recursive call
1900       reduce2 --> K
1901
1902       reduce2 <-- FA (I, I)  ; second main recursive call
1903         reduce2 <-- I
1904         reduce2 --> I
1905         reduce2 <-- I
1906         reduce2 --> I
1907       reduce2 <-- I
1908
1909       reduce2 --> I           ; third main recursive call
1910       reduce2 --> I
1911
1912       reduce2 <-- K           ; fourth
1913       reduce2 --> K
1914     reduce2 --> K
1915     - : term = K
1916
1917 Ok, there's a lot going on here.  Since the input is complex, the
1918 first thing the function does is construct `t'`.  In order to do this,
1919 it must reduce the two main subexpressions, `II` and `K`.  But we see
1920 from the trace that it begins with the right-hand expression, `K`.  We
1921 didn't explicitly tell it to begin with the right-hand subexpression,
1922 so control over evaluation order is starting to spin out of our
1923 control.  (We'll get it back later, don't worry.)
1924
1925 In any case, in the second main recursive call, we evaluate `II`.  The
1926 result is `I`.  The third main recursive call tests whether this
1927 result needs any further processing; it doesn't.
1928
1929 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
1930 redex, we ship it off to reduce_one_step, getting the term `K` as a
1931 result.  The fourth recursive call checks that there is no more
1932 reduction work to be done (there isn't), and that's our final result.
1933
1934 You can see in more detail what is going on by tracing both reduce2
1935 and reduce_one_step, but that makes for some long traces.
1936
1937 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
1938 K`, as desired.
1939
1940 Because the OCaml interpreter evaluates the rightmost expression
1941 in the course of building `t'`, however, it will always evaluate the
1942 right hand subexpression, whether it needs to or not.  And sure
1943 enough,
1944
1945     # reduce2 (FA(FA(K,I),skomega));;
1946       C-c C-cInterrupted.
1947
1948 instead of performing the leftmost reduction first, and recognizing
1949 that this term reduces to the normal form `I`, we get lost endlessly
1950 trying to reduce skomega.
1951
1952 To emphasize that our evaluation order here is at the mercy of the
1953 evaluation order of OCaml, here is the exact same program translated
1954 into Haskell.  We'll put them side by side to emphasize the exact parallel.
1955
1956 <pre>
1957 OCaml                                                          Haskell
1958 ==========================================================     =========================================================
1959
1960 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
1961
1962 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
1963
1964                                                                reduce_one_step :: Term -> Term
1965 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
1966     FA(I,a) -> a                                                 FA I a -> a
1967   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
1968   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
1969   | _ -> t                                                       _ -> t
1970
1971                                                                is_redex :: Term -> Bool
1972 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
1973
1974                                                                reduce2 :: Term -> Term
1975 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
1976     I -> I                                                       I -> I
1977   | K -> K                                                       K -> K
1978   | S -> S                                                       S -> S
1979   | FA (a, b) ->                                                 FA a b ->
1980       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
1981         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
1982                          else t'                                                      else t'
1983 </pre>
1984
1985 There are some differences in the way types are made explicit, and in
1986 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
1987 Haskell).  But the two programs are essentially identical.
1988
1989 Yet the Haskell program finds the normal form for KI -->
1990
1991 [[!toc levels=2]]
1992
1993 # Reasoning about evaluation order in Combinatory Logic
1994
1995 We've discussed evaluation order before, primarily in connection with
1996 the untyped lambda calculus.  Whenever a term has more than one redex,
1997 we have to choose which one to reduce, and this choice can make a
1998 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
1999 leftmost redex first, the term reduces to the normal form `I` in one
2000 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
2001 we do not arrive at a normal form, and are in danger of entering an
2002 infinite loop.
2003
2004 Thanks to the introduction of sum types (disjoint union), we are now
2005 in a position to gain a deeper understanding by writing a program that
2006 allows us to experiment with different evaluation order strategies.
2007
2008 One thing we'll see is that it is all too easy for the evaluation
2009 order properties of an evaluator to depend on the evaluation order
2010 properties of the programming language in which the evaluator is
2011 written.  We will seek to write an evaluator in which the order of
2012 evaluation is insensitive to the evaluator language.  The goal is to
2013 find an order-insensitive way to reason about evaluation order.
2014
2015 The first evaluator we develop will evaluate terms in Combinatory
2016 Logic.  That will significantly simplify the program, since we won't
2017 need to worry about variables or substitution.  As we develop and
2018 extend our evaluator in future weeks, we'll switch to lambdas, but for
2019 now, working with the elegant simplicity of Combinatory Logic will
2020 make the evaluation order issues easier to grasp.
2021
2022 A brief review: a term in CL is the combination of three basic
2023 expressions, `S`, `K`, and `I`, governed by the following reduction
2024 rules:
2025
2026     Ia ~~> a
2027     Kab ~~> b
2028     Sabc ~~> ac(bc)
2029
2030 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
2031 that how to embed the untyped lambda calculus in CL, so it's no
2032 surprise that the same evaluation order issues arise in CL.
2033
2034     skomega = SII(SII)
2035             ~~> I(SII)(I(SII))
2036             ~~> SII(SII)
2037
2038 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
2039 here, though we'll use the same symbol, `Ω`.  If we consider the term
2040
2041     KIΩ == KI(SII(SII))
2042
2043 we can choose to reduce the leftmost redex by firing the reduction
2044 rule for `K`, in which case the term reduces to the normal form `I` in
2045 one step; or we can choose to reduce the skomega part, by firing the
2046 reduction rule fo `S`, in which case we do not get a normal form,
2047 we're headed towards an infinite loop.
2048
2049 With sum types, we can define terms in CL in OCaml as follows:
2050
2051     type term = I | S | K | FA of (term * term)
2052
2053     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
2054     let test1 = FA (FA (K,I), skomega)
2055
2056 This recursive type definition says that a term in CL is either one of
2057 the three simple expressions, or else a pair of CL expressions.
2058 Following Heim and Kratzer, `FA` stands for Functional Application.
2059 With this type definition, we can encode skomega, as well as other
2060 terms whose reduction behavior we want to control.
2061
2062 Using pattern matching, it is easy to code the one-step reduction
2063 rules for CL:
2064
2065     let reduce_one_step (t:term):term = match t with
2066         FA(I,a) -> a
2067       | FA(FA(K,a),b) -> a
2068       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
2069       | _ -> t
2070
2071     # reduce_one_step (FA(FA(K,S),I));;
2072     - : term = S
2073     # reduce_one_step skomega;;
2074     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
2075
2076 The need to explicitly insert the type constructor `FA` obscures
2077 things a bit, but it's still possible to see how the one-step
2078 reduction function is just the reduction rules for CL.  The
2079 OCaml interpreter shows us that the function faithfully recognizes
2080 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
2081
2082 We can now say precisely what it means to be a redex in CL.
2083
2084     let is_redex (t:term):bool = not (t = reduce_one_step t)
2085
2086     # is_redex K;;
2087     - : bool = false
2088     # is_redex (FA(K,I));;
2089     - : bool = false
2090     # is_redex (FA(FA(K,I),S));;
2091     - : bool = true
2092     # is_redex skomega;;
2093     - : book = true
2094
2095 Warning: this definition relies on the fact that the one-step
2096 reduction of a CL term is never identical to the original term.  This
2097 would not work for the untyped lambda calculus, since
2098 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
2099 order to decide whether two terms are equal, OCaml has to recursively
2100 compare the elements of complex CL terms.  It is able to figure out
2101 how to do this because we provided an explicit definition of the
2102 datatype Term.
2103
2104 As you would expect, a term in CL is in normal form when it contains
2105 no redexes.
2106
2107 In order to fully reduce a term, we need to be able to reduce redexes
2108 that are not at the top level of the term.
2109
2110     (KKI)SI ~~> KSI ~~> S
2111
2112 That is, we want to be able to first evaluate the redex `KKI` that is
2113 a proper subpart of the larger term to produce a new intermediate term
2114 that we can then evaluate to the final normal form.
2115
2116 Because we need to process subparts, and because the result after
2117 processing a subpart may require further processing, the recursive
2118 structure of our evaluation function has to be quite subtle.  To truly
2119 understand it, you will need to do some sophisticated thinking about
2120 how recursion works.  We'll show you how to keep track of what is
2121 going on by constructing an recursive execution trace of inputs and
2122 outputs.
2123
2124 We'll develop our full reduction function in stages.  Once we have it
2125 working, we'll then consider some variants.  Just to be precise, we'll
2126 distinguish each microvariant with its own index number embedded in
2127 its name.
2128
2129     let rec reduce1 (t:term):term =
2130       if (is_redex t) then reduce1 (reduce_one_step t)
2131                       else t
2132
2133 If the input is a redex, we ship it off to `reduce_one_step` for
2134 processing.  But just in case the result of the one-step reduction is
2135 itself a redex, we recursively call `reduce1`.  The recursion will
2136 continue until the result is no longer a redex.
2137
2138     #trace reduce1;;
2139     reduce1 is now traced.
2140     # reduce1 (FA (I, FA (I, K)));;
2141     reduce1 <-- FA (I, FA (I, K))
2142       reduce1 <-- FA (I, K)
2143         reduce1 <-- K
2144         reduce1 --> K
2145       reduce1 --> K
2146     reduce1 --> K
2147     - : term = K
2148
2149 Since the initial input (`I(IK)`) is a redex, the result after the
2150 one-step reduction is `IK`.  We recursively call `reduce1` on this
2151 input.  Since `IK` is itself a redex, the result after one-step
2152 reduction is `K`.  We recursively call `reduce1` on this input.  Since
2153 `K` is not a redex, the recursion bottoms out, and we simply return
2154 it.
2155
2156 But this function doesn't do enough reduction.
2157
2158     # reduce1 (FA (FA (I, I), K));;
2159     - : term = FA (FA (I, I), K)
2160
2161 Because the top-level term is not a redex, `reduce1` returns it
2162 without any evaluation.  What we want is to evaluate the subparts of a
2163 complex term.
2164
2165     let rec reduce2 (t:term):term = match t with
2166         I -> I
2167       | K -> K
2168       | S -> S
2169       | FA (a, b) ->
2170           let t' = FA (reduce2 a, reduce2 b) in
2171             if (is_redex t') then reduce2 (reduce_one_step t')
2172                              else t'
2173
2174 Since we need access to the subterms, we do pattern matching on the
2175 input term.  If the input is simple, we return it.  If the input is
2176 complex, we first process the subexpressions, and only then see if we
2177 have a redex.  To understand how this works, follow the trace
2178 carefully:
2179
2180     # reduce2 (FA(FA(I,I),K));;
2181     reduce2 <-- FA (FA (I, I), K)
2182
2183       reduce2 <-- K          ; first main recursive call
2184       reduce2 --> K
2185
2186       reduce2 <-- FA (I, I)  ; second main recursive call
2187         reduce2 <-- I
2188         reduce2 --> I
2189         reduce2 <-- I
2190         reduce2 --> I
2191       reduce2 <-- I
2192
2193       reduce2 --> I           ; third main recursive call
2194       reduce2 --> I
2195
2196       reduce2 <-- K           ; fourth
2197       reduce2 --> K
2198     reduce2 --> K
2199     - : term = K
2200
2201 Ok, there's a lot going on here.  Since the input is complex, the
2202 first thing the function does is construct `t'`.  In order to do this,
2203 it must reduce the two main subexpressions, `II` and `K`.  But we see
2204 from the trace that it begins with the right-hand expression, `K`.  We
2205 didn't explicitly tell it to begin with the right-hand subexpression,
2206 so control over evaluation order is starting to spin out of our
2207 control.  (We'll get it back later, don't worry.)
2208
2209 In any case, in the second main recursive call, we evaluate `II`.  The
2210 result is `I`.  The third main recursive call tests whether this
2211 result needs any further processing; it doesn't.
2212
2213 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
2214 redex, we ship it off to reduce_one_step, getting the term `K` as a
2215 result.  The fourth recursive call checks that there is no more
2216 reduction work to be done (there isn't), and that's our final result.
2217
2218 You can see in more detail what is going on by tracing both reduce2
2219 and reduce_one_step, but that makes for some long traces.
2220
2221 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
2222 K`, as desired.
2223
2224 Because the OCaml interpreter evaluates the rightmost expression
2225 in the course of building `t'`, however, it will always evaluate the
2226 right hand subexpression, whether it needs to or not.  And sure
2227 enough,
2228
2229     # reduce2 (FA(FA(K,I),skomega));;
2230       C-c C-cInterrupted.
2231
2232 instead of performing the leftmost reduction first, and recognizing
2233 that this term reduces to the normal form `I`, we get lost endlessly
2234 trying to reduce skomega.
2235
2236 To emphasize that our evaluation order here is at the mercy of the
2237 evaluation order of OCaml, here is the exact same program translated
2238 into Haskell.  We'll put them side by side to emphasize the exact parallel.
2239
2240 <pre>
2241 OCaml                                                          Haskell
2242 ==========================================================     =========================================================
2243
2244 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
2245
2246 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
2247
2248                                                                reduce_one_step :: Term -> Term
2249 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
2250     FA(I,a) -> a                                                 FA I a -> a
2251   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
2252   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
2253   | _ -> t                                                       _ -> t
2254
2255                                                                is_redex :: Term -> Bool
2256 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
2257
2258                                                                reduce2 :: Term -> Term
2259 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
2260     I -> I                                                       I -> I
2261   | K -> K                                                       K -> K
2262   | S -> S                                                       S -> S
2263   | FA (a, b) ->                                                 FA a b ->
2264       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
2265         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
2266                          else t'                                                      else t'
2267 </pre>
2268
2269 There are some differences in the way types are made explicit, and in
2270 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
2271 Haskell).  But the two programs are essentially identical.
2272
2273 Yet the Haskell program finds the normal form for KI -->
2274
2275 [[!toc levels=2]]
2276
2277 # Reasoning about evaluation order in Combinatory Logic
2278
2279 We've discussed evaluation order before, primarily in connection with
2280 the untyped lambda calculus.  Whenever a term has more than one redex,
2281 we have to choose which one to reduce, and this choice can make a
2282 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
2283 leftmost redex first, the term reduces to the normal form `I` in one
2284 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
2285 we do not arrive at a normal form, and are in danger of entering an
2286 infinite loop.
2287
2288 Thanks to the introduction of sum types (disjoint union), we are now
2289 in a position to gain a deeper understanding by writing a program that
2290 allows us to experiment with different evaluation order strategies.
2291
2292 One thing we'll see is that it is all too easy for the evaluation
2293 order properties of an evaluator to depend on the evaluation order
2294 properties of the programming language in which the evaluator is
2295 written.  We will seek to write an evaluator in which the order of
2296 evaluation is insensitive to the evaluator language.  The goal is to
2297 find an order-insensitive way to reason about evaluation order.
2298
2299 The first evaluator we develop will evaluate terms in Combinatory
2300 Logic.  That will significantly simplify the program, since we won't
2301 need to worry about variables or substitution.  As we develop and
2302 extend our evaluator in future weeks, we'll switch to lambdas, but for
2303 now, working with the elegant simplicity of Combinatory Logic will
2304 make the evaluation order issues easier to grasp.
2305
2306 A brief review: a term in CL is the combination of three basic
2307 expressions, `S`, `K`, and `I`, governed by the following reduction
2308 rules:
2309
2310     Ia ~~> a
2311     Kab ~~> b
2312     Sabc ~~> ac(bc)
2313
2314 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
2315 that how to embed the untyped lambda calculus in CL, so it's no
2316 surprise that the same evaluation order issues arise in CL.
2317
2318     skomega = SII(SII)
2319             ~~> I(SII)(I(SII))
2320             ~~> SII(SII)
2321
2322 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
2323 here, though we'll use the same symbol, `Ω`.  If we consider the term
2324
2325     KIΩ == KI(SII(SII))
2326
2327 we can choose to reduce the leftmost redex by firing the reduction
2328 rule for `K`, in which case the term reduces to the normal form `I` in
2329 one step; or we can choose to reduce the skomega part, by firing the
2330 reduction rule fo `S`, in which case we do not get a normal form,
2331 we're headed towards an infinite loop.
2332
2333 With sum types, we can define terms in CL in OCaml as follows:
2334
2335     type term = I | S | K | FA of (term * term)
2336
2337     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
2338     let test1 = FA (FA (K,I), skomega)
2339
2340 This recursive type definition says that a term in CL is either one of
2341 the three simple expressions, or else a pair of CL expressions.
2342 Following Heim and Kratzer, `FA` stands for Functional Application.
2343 With this type definition, we can encode skomega, as well as other
2344 terms whose reduction behavior we want to control.
2345
2346 Using pattern matching, it is easy to code the one-step reduction
2347 rules for CL:
2348
2349     let reduce_one_step (t:term):term = match t with
2350         FA(I,a) -> a
2351       | FA(FA(K,a),b) -> a
2352       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
2353       | _ -> t
2354
2355     # reduce_one_step (FA(FA(K,S),I));;
2356     - : term = S
2357     # reduce_one_step skomega;;
2358     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
2359
2360 The need to explicitly insert the type constructor `FA` obscures
2361 things a bit, but it's still possible to see how the one-step
2362 reduction function is just the reduction rules for CL.  The
2363 OCaml interpreter shows us that the function faithfully recognizes
2364 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
2365
2366 We can now say precisely what it means to be a redex in CL.
2367
2368     let is_redex (t:term):bool = not (t = reduce_one_step t)
2369
2370     # is_redex K;;
2371     - : bool = false
2372     # is_redex (FA(K,I));;
2373     - : bool = false
2374     # is_redex (FA(FA(K,I),S));;
2375     - : bool = true
2376     # is_redex skomega;;
2377     - : book = true
2378
2379 Warning: this definition relies on the fact that the one-step
2380 reduction of a CL term is never identical to the original term.  This
2381 would not work for the untyped lambda calculus, since
2382 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
2383 order to decide whether two terms are equal, OCaml has to recursively
2384 compare the elements of complex CL terms.  It is able to figure out
2385 how to do this because we provided an explicit definition of the
2386 datatype Term.
2387
2388 As you would expect, a term in CL is in normal form when it contains
2389 no redexes.
2390
2391 In order to fully reduce a term, we need to be able to reduce redexes
2392 that are not at the top level of the term.
2393
2394     (KKI)SI ~~> KSI ~~> S
2395
2396 That is, we want to be able to first evaluate the redex `KKI` that is
2397 a proper subpart of the larger term to produce a new intermediate term
2398 that we can then evaluate to the final normal form.
2399
2400 Because we need to process subparts, and because the result after
2401 processing a subpart may require further processing, the recursive
2402 structure of our evaluation function has to be quite subtle.  To truly
2403 understand it, you will need to do some sophisticated thinking about
2404 how recursion works.  We'll show you how to keep track of what is
2405 going on by constructing an recursive execution trace of inputs and
2406 outputs.
2407
2408 We'll develop our full reduction function in stages.  Once we have it
2409 working, we'll then consider some variants.  Just to be precise, we'll
2410 distinguish each microvariant with its own index number embedded in
2411 its name.
2412
2413     let rec reduce1 (t:term):term =
2414       if (is_redex t) then reduce1 (reduce_one_step t)
2415                       else t
2416
2417 If the input is a redex, we ship it off to `reduce_one_step` for
2418 processing.  But just in case the result of the one-step reduction is
2419 itself a redex, we recursively call `reduce1`.  The recursion will
2420 continue until the result is no longer a redex.
2421
2422     #trace reduce1;;
2423     reduce1 is now traced.
2424     # reduce1 (FA (I, FA (I, K)));;
2425     reduce1 <-- FA (I, FA (I, K))
2426       reduce1 <-- FA (I, K)
2427         reduce1 <-- K
2428         reduce1 --> K
2429       reduce1 --> K
2430     reduce1 --> K
2431     - : term = K
2432
2433 Since the initial input (`I(IK)`) is a redex, the result after the
2434 one-step reduction is `IK`.  We recursively call `reduce1` on this
2435 input.  Since `IK` is itself a redex, the result after one-step
2436 reduction is `K`.  We recursively call `reduce1` on this input.  Since
2437 `K` is not a redex, the recursion bottoms out, and we simply return
2438 it.
2439
2440 But this function doesn't do enough reduction.
2441
2442     # reduce1 (FA (FA (I, I), K));;
2443     - : term = FA (FA (I, I), K)
2444
2445 Because the top-level term is not a redex, `reduce1` returns it
2446 without any evaluation.  What we want is to evaluate the subparts of a
2447 complex term.
2448
2449     let rec reduce2 (t:term):term = match t with
2450         I -> I
2451       | K -> K
2452       | S -> S
2453       | FA (a, b) ->
2454           let t' = FA (reduce2 a, reduce2 b) in
2455             if (is_redex t') then reduce2 (reduce_one_step t')
2456                              else t'
2457
2458 Since we need access to the subterms, we do pattern matching on the
2459 input term.  If the input is simple, we return it.  If the input is
2460 complex, we first process the subexpressions, and only then see if we
2461 have a redex.  To understand how this works, follow the trace
2462 carefully:
2463
2464     # reduce2 (FA(FA(I,I),K));;
2465     reduce2 <-- FA (FA (I, I), K)
2466
2467       reduce2 <-- K          ; first main recursive call
2468       reduce2 --> K
2469
2470       reduce2 <-- FA (I, I)  ; second main recursive call
2471         reduce2 <-- I
2472         reduce2 --> I
2473         reduce2 <-- I
2474         reduce2 --> I
2475       reduce2 <-- I
2476
2477       reduce2 --> I           ; third main recursive call
2478       reduce2 --> I
2479
2480       reduce2 <-- K           ; fourth
2481       reduce2 --> K
2482     reduce2 --> K
2483     - : term = K
2484
2485 Ok, there's a lot going on here.  Since the input is complex, the
2486 first thing the function does is construct `t'`.  In order to do this,
2487 it must reduce the two main subexpressions, `II` and `K`.  But we see
2488 from the trace that it begins with the right-hand expression, `K`.  We
2489 didn't explicitly tell it to begin with the right-hand subexpression,
2490 so control over evaluation order is starting to spin out of our
2491 control.  (We'll get it back later, don't worry.)
2492
2493 In any case, in the second main recursive call, we evaluate `II`.  The
2494 result is `I`.  The third main recursive call tests whether this
2495 result needs any further processing; it doesn't.
2496
2497 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
2498 redex, we ship it off to reduce_one_step, getting the term `K` as a
2499 result.  The fourth recursive call checks that there is no more
2500 reduction work to be done (there isn't), and that's our final result.
2501
2502 You can see in more detail what is going on by tracing both reduce2
2503 and reduce_one_step, but that makes for some long traces.
2504
2505 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
2506 K`, as desired.
2507
2508 Because the OCaml interpreter evaluates the rightmost expression
2509 in the course of building `t'`, however, it will always evaluate the
2510 right hand subexpression, whether it needs to or not.  And sure
2511 enough,
2512
2513     # reduce2 (FA(FA(K,I),skomega));;
2514       C-c C-cInterrupted.
2515
2516 instead of performing the leftmost reduction first, and recognizing
2517 that this term reduces to the normal form `I`, we get lost endlessly
2518 trying to reduce skomega.
2519
2520 To emphasize that our evaluation order here is at the mercy of the
2521 evaluation order of OCaml, here is the exact same program translated
2522 into Haskell.  We'll put them side by side to emphasize the exact parallel.
2523
2524 <pre>
2525 OCaml                                                          Haskell
2526 ==========================================================     =========================================================
2527
2528 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
2529
2530 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
2531
2532                                                                reduce_one_step :: Term -> Term
2533 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
2534     FA(I,a) -> a                                                 FA I a -> a
2535   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
2536   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
2537   | _ -> t                                                       _ -> t
2538
2539                                                                is_redex :: Term -> Bool
2540 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
2541
2542                                                                reduce2 :: Term -> Term
2543 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
2544     I -> I                                                       I -> I
2545   | K -> K                                                       K -> K
2546   | S -> S                                                       S -> S
2547   | FA (a, b) ->                                                 FA a b ->
2548       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
2549         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
2550                          else t'                                                      else t'
2551 </pre>
2552
2553 There are some differences in the way types are made explicit, and in
2554 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
2555 Haskell).  But the two programs are essentially identical.
2556
2557 Yet the Haskell program finds the normal form for KI -->
2558
2559 [[!toc levels=2]]
2560
2561 # Reasoning about evaluation order in Combinatory Logic
2562
2563 We've discussed evaluation order before, primarily in connection with
2564 the untyped lambda calculus.  Whenever a term has more than one redex,
2565 we have to choose which one to reduce, and this choice can make a
2566 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
2567 leftmost redex first, the term reduces to the normal form `I` in one
2568 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
2569 we do not arrive at a normal form, and are in danger of entering an
2570 infinite loop.
2571
2572 Thanks to the introduction of sum types (disjoint union), we are now
2573 in a position to gain a deeper understanding by writing a program that
2574 allows us to experiment with different evaluation order strategies.
2575
2576 One thing we'll see is that it is all too easy for the evaluation
2577 order properties of an evaluator to depend on the evaluation order
2578 properties of the programming language in which the evaluator is
2579 written.  We will seek to write an evaluator in which the order of
2580 evaluation is insensitive to the evaluator language.  The goal is to
2581 find an order-insensitive way to reason about evaluation order.
2582
2583 The first evaluator we develop will evaluate terms in Combinatory
2584 Logic.  That will significantly simplify the program, since we won't
2585 need to worry about variables or substitution.  As we develop and
2586 extend our evaluator in future weeks, we'll switch to lambdas, but for
2587 now, working with the elegant simplicity of Combinatory Logic will
2588 make the evaluation order issues easier to grasp.
2589
2590 A brief review: a term in CL is the combination of three basic
2591 expressions, `S`, `K`, and `I`, governed by the following reduction
2592 rules:
2593
2594     Ia ~~> a
2595     Kab ~~> b
2596     Sabc ~~> ac(bc)
2597
2598 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
2599 that how to embed the untyped lambda calculus in CL, so it's no
2600 surprise that the same evaluation order issues arise in CL.
2601
2602     skomega = SII(SII)
2603             ~~> I(SII)(I(SII))
2604             ~~> SII(SII)
2605
2606 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
2607 here, though we'll use the same symbol, `Ω`.  If we consider the term
2608
2609     KIΩ == KI(SII(SII))
2610
2611 we can choose to reduce the leftmost redex by firing the reduction
2612 rule for `K`, in which case the term reduces to the normal form `I` in
2613 one step; or we can choose to reduce the skomega part, by firing the
2614 reduction rule fo `S`, in which case we do not get a normal form,
2615 we're headed towards an infinite loop.
2616
2617 With sum types, we can define terms in CL in OCaml as follows:
2618
2619     type term = I | S | K | FA of (term * term)
2620
2621     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
2622     let test1 = FA (FA (K,I), skomega)
2623
2624 This recursive type definition says that a term in CL is either one of
2625 the three simple expressions, or else a pair of CL expressions.
2626 Following Heim and Kratzer, `FA` stands for Functional Application.
2627 With this type definition, we can encode skomega, as well as other
2628 terms whose reduction behavior we want to control.
2629
2630 Using pattern matching, it is easy to code the one-step reduction
2631 rules for CL:
2632
2633     let reduce_one_step (t:term):term = match t with
2634         FA(I,a) -> a
2635       | FA(FA(K,a),b) -> a
2636       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
2637       | _ -> t
2638
2639     # reduce_one_step (FA(FA(K,S),I));;
2640     - : term = S
2641     # reduce_one_step skomega;;
2642     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
2643
2644 The need to explicitly insert the type constructor `FA` obscures
2645 things a bit, but it's still possible to see how the one-step
2646 reduction function is just the reduction rules for CL.  The
2647 OCaml interpreter shows us that the function faithfully recognizes
2648 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
2649
2650 We can now say precisely what it means to be a redex in CL.
2651
2652     let is_redex (t:term):bool = not (t = reduce_one_step t)
2653
2654     # is_redex K;;
2655     - : bool = false
2656     # is_redex (FA(K,I));;
2657     - : bool = false
2658     # is_redex (FA(FA(K,I),S));;
2659     - : bool = true
2660     # is_redex skomega;;
2661     - : book = true
2662
2663 Warning: this definition relies on the fact that the one-step
2664 reduction of a CL term is never identical to the original term.  This
2665 would not work for the untyped lambda calculus, since
2666 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
2667 order to decide whether two terms are equal, OCaml has to recursively
2668 compare the elements of complex CL terms.  It is able to figure out
2669 how to do this because we provided an explicit definition of the
2670 datatype Term.
2671
2672 As you would expect, a term in CL is in normal form when it contains
2673 no redexes.
2674
2675 In order to fully reduce a term, we need to be able to reduce redexes
2676 that are not at the top level of the term.
2677
2678     (KKI)SI ~~> KSI ~~> S
2679
2680 That is, we want to be able to first evaluate the redex `KKI` that is
2681 a proper subpart of the larger term to produce a new intermediate term
2682 that we can then evaluate to the final normal form.
2683
2684 Because we need to process subparts, and because the result after
2685 processing a subpart may require further processing, the recursive
2686 structure of our evaluation function has to be quite subtle.  To truly
2687 understand it, you will need to do some sophisticated thinking about
2688 how recursion works.  We'll show you how to keep track of what is
2689 going on by constructing an recursive execution trace of inputs and
2690 outputs.
2691
2692 We'll develop our full reduction function in stages.  Once we have it
2693 working, we'll then consider some variants.  Just to be precise, we'll
2694 distinguish each microvariant with its own index number embedded in
2695 its name.
2696
2697     let rec reduce1 (t:term):term =
2698       if (is_redex t) then reduce1 (reduce_one_step t)
2699                       else t
2700
2701 If the input is a redex, we ship it off to `reduce_one_step` for
2702 processing.  But just in case the result of the one-step reduction is
2703 itself a redex, we recursively call `reduce1`.  The recursion will
2704 continue until the result is no longer a redex.
2705
2706     #trace reduce1;;
2707     reduce1 is now traced.
2708     # reduce1 (FA (I, FA (I, K)));;
2709     reduce1 <-- FA (I, FA (I, K))
2710       reduce1 <-- FA (I, K)
2711         reduce1 <-- K
2712         reduce1 --> K
2713       reduce1 --> K
2714     reduce1 --> K
2715     - : term = K
2716
2717 Since the initial input (`I(IK)`) is a redex, the result after the
2718 one-step reduction is `IK`.  We recursively call `reduce1` on this
2719 input.  Since `IK` is itself a redex, the result after one-step
2720 reduction is `K`.  We recursively call `reduce1` on this input.  Since
2721 `K` is not a redex, the recursion bottoms out, and we simply return
2722 it.
2723
2724 But this function doesn't do enough reduction.
2725
2726     # reduce1 (FA (FA (I, I), K));;
2727     - : term = FA (FA (I, I), K)
2728
2729 Because the top-level term is not a redex, `reduce1` returns it
2730 without any evaluation.  What we want is to evaluate the subparts of a
2731 complex term.
2732
2733     let rec reduce2 (t:term):term = match t with
2734         I -> I
2735       | K -> K
2736       | S -> S
2737       | FA (a, b) ->
2738           let t' = FA (reduce2 a, reduce2 b) in
2739             if (is_redex t') then reduce2 (reduce_one_step t')
2740                              else t'
2741
2742 Since we need access to the subterms, we do pattern matching on the
2743 input term.  If the input is simple, we return it.  If the input is
2744 complex, we first process the subexpressions, and only then see if we
2745 have a redex.  To understand how this works, follow the trace
2746 carefully:
2747
2748     # reduce2 (FA(FA(I,I),K));;
2749     reduce2 <-- FA (FA (I, I), K)
2750
2751       reduce2 <-- K          ; first main recursive call
2752       reduce2 --> K
2753
2754       reduce2 <-- FA (I, I)  ; second main recursive call
2755         reduce2 <-- I
2756         reduce2 --> I
2757         reduce2 <-- I
2758         reduce2 --> I
2759       reduce2 <-- I
2760
2761       reduce2 --> I           ; third main recursive call
2762       reduce2 --> I
2763
2764       reduce2 <-- K           ; fourth
2765       reduce2 --> K
2766     reduce2 --> K
2767     - : term = K
2768
2769 Ok, there's a lot going on here.  Since the input is complex, the
2770 first thing the function does is construct `t'`.  In order to do this,
2771 it must reduce the two main subexpressions, `II` and `K`.  But we see
2772 from the trace that it begins with the right-hand expression, `K`.  We
2773 didn't explicitly tell it to begin with the right-hand subexpression,
2774 so control over evaluation order is starting to spin out of our
2775 control.  (We'll get it back later, don't worry.)
2776
2777 In any case, in the second main recursive call, we evaluate `II`.  The
2778 result is `I`.  The third main recursive call tests whether this
2779 result needs any further processing; it doesn't.
2780
2781 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
2782 redex, we ship it off to reduce_one_step, getting the term `K` as a
2783 result.  The fourth recursive call checks that there is no more
2784 reduction work to be done (there isn't), and that's our final result.
2785
2786 You can see in more detail what is going on by tracing both reduce2
2787 and reduce_one_step, but that makes for some long traces.
2788
2789 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
2790 K`, as desired.
2791
2792 Because the OCaml interpreter evaluates the rightmost expression
2793 in the course of building `t'`, however, it will always evaluate the
2794 right hand subexpression, whether it needs to or not.  And sure
2795 enough,
2796
2797     # reduce2 (FA(FA(K,I),skomega));;
2798       C-c C-cInterrupted.
2799
2800 instead of performing the leftmost reduction first, and recognizing
2801 that this term reduces to the normal form `I`, we get lost endlessly
2802 trying to reduce skomega.
2803
2804 To emphasize that our evaluation order here is at the mercy of the
2805 evaluation order of OCaml, here is the exact same program translated
2806 into Haskell.  We'll put them side by side to emphasize the exact parallel.
2807
2808 <pre>
2809 OCaml                                                          Haskell
2810 ==========================================================     =========================================================
2811
2812 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
2813
2814 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
2815
2816                                                                reduce_one_step :: Term -> Term
2817 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
2818     FA(I,a) -> a                                                 FA I a -> a
2819   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
2820   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
2821   | _ -> t                                                       _ -> t
2822
2823                                                                is_redex :: Term -> Bool
2824 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
2825
2826                                                                reduce2 :: Term -> Term
2827 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
2828     I -> I                                                       I -> I
2829   | K -> K                                                       K -> K
2830   | S -> S                                                       S -> S
2831   | FA (a, b) ->                                                 FA a b ->
2832       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
2833         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
2834                          else t'                                                      else t'
2835 </pre>
2836
2837 There are some differences in the way types are made explicit, and in
2838 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
2839 Haskell).  But the two programs are essentially identical.
2840
2841 Yet the Haskell program finds the normal form for KI
2842  -->
2843
2844 [[!toc levels=2]]
2845
2846 # Reasoning about evaluation order in Combinatory Logic
2847
2848 We've discussed evaluation order before, primarily in connection with
2849 the untyped lambda calculus.  Whenever a term has more than one redex,
2850 we have to choose which one to reduce, and this choice can make a
2851 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
2852 leftmost redex first, the term reduces to the normal form `I` in one
2853 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
2854 we do not arrive at a normal form, and are in danger of entering an
2855 infinite loop.
2856
2857 Thanks to the introduction of sum types (disjoint union), we are now
2858 in a position to gain a deeper understanding by writing a program that
2859 allows us to experiment with different evaluation order strategies.
2860
2861 One thing we'll see is that it is all too easy for the evaluation
2862 order properties of an evaluator to depend on the evaluation order
2863 properties of the programming language in which the evaluator is
2864 written.  We will seek to write an evaluator in which the order of
2865 evaluation is insensitive to the evaluator language.  The goal is to
2866 find an order-insensitive way to reason about evaluation order.
2867
2868 The first evaluator we develop will evaluate terms in Combinatory
2869 Logic.  That will significantly simplify the program, since we won't
2870 need to worry about variables or substitution.  As we develop and
2871 extend our evaluator in future weeks, we'll switch to lambdas, but for
2872 now, working with the elegant simplicity of Combinatory Logic will
2873 make the evaluation order issues easier to grasp.
2874
2875 A brief review: a term in CL is the combination of three basic
2876 expressions, `S`, `K`, and `I`, governed by the following reduction
2877 rules:
2878
2879     Ia ~~> a
2880     Kab ~~> b
2881     Sabc ~~> ac(bc)
2882
2883 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
2884 that how to embed the untyped lambda calculus in CL, so it's no
2885 surprise that the same evaluation order issues arise in CL.
2886
2887     skomega = SII(SII)
2888             ~~> I(SII)(I(SII))
2889             ~~> SII(SII)
2890
2891 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
2892 here, though we'll use the same symbol, `Ω`.  If we consider the term
2893
2894     KIΩ == KI(SII(SII))
2895
2896 we can choose to reduce the leftmost redex by firing the reduction
2897 rule for `K`, in which case the term reduces to the normal form `I` in
2898 one step; or we can choose to reduce the skomega part, by firing the
2899 reduction rule fo `S`, in which case we do not get a normal form,
2900 we're headed towards an infinite loop.
2901
2902 With sum types, we can define terms in CL in OCaml as follows:
2903
2904     type term = I | S | K | FA of (term * term)
2905
2906     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
2907     let test1 = FA (FA (K,I), skomega)
2908
2909 This recursive type definition says that a term in CL is either one of
2910 the three simple expressions, or else a pair of CL expressions.
2911 Following Heim and Kratzer, `FA` stands for Functional Application.
2912 With this type definition, we can encode skomega, as well as other
2913 terms whose reduction behavior we want to control.
2914
2915 Using pattern matching, it is easy to code the one-step reduction
2916 rules for CL:
2917
2918     let reduce_one_step (t:term):term = match t with
2919         FA(I,a) -> a
2920       | FA(FA(K,a),b) -> a
2921       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
2922       | _ -> t
2923
2924     # reduce_one_step (FA(FA(K,S),I));;
2925     - : term = S
2926     # reduce_one_step skomega;;
2927     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
2928
2929 The need to explicitly insert the type constructor `FA` obscures
2930 things a bit, but it's still possible to see how the one-step
2931 reduction function is just the reduction rules for CL.  The
2932 OCaml interpreter shows us that the function faithfully recognizes
2933 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
2934
2935 We can now say precisely what it means to be a redex in CL.
2936
2937     let is_redex (t:term):bool = not (t = reduce_one_step t)
2938
2939     # is_redex K;;
2940     - : bool = false
2941     # is_redex (FA(K,I));;
2942     - : bool = false
2943     # is_redex (FA(FA(K,I),S));;
2944     - : bool = true
2945     # is_redex skomega;;
2946     - : book = true
2947
2948 Warning: this definition relies on the fact that the one-step
2949 reduction of a CL term is never identical to the original term.  This
2950 would not work for the untyped lambda calculus, since
2951 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
2952 order to decide whether two terms are equal, OCaml has to recursively
2953 compare the elements of complex CL terms.  It is able to figure out
2954 how to do this because we provided an explicit definition of the
2955 datatype Term.
2956
2957 As you would expect, a term in CL is in normal form when it contains
2958 no redexes.
2959
2960 In order to fully reduce a term, we need to be able to reduce redexes
2961 that are not at the top level of the term.
2962
2963     (KKI)SI ~~> KSI ~~> S
2964
2965 That is, we want to be able to first evaluate the redex `KKI` that is
2966 a proper subpart of the larger term to produce a new intermediate term
2967 that we can then evaluate to the final normal form.
2968
2969 Because we need to process subparts, and because the result after
2970 processing a subpart may require further processing, the recursive
2971 structure of our evaluation function has to be quite subtle.  To truly
2972 understand it, you will need to do some sophisticated thinking about
2973 how recursion works.  We'll show you how to keep track of what is
2974 going on by constructing an recursive execution trace of inputs and
2975 outputs.
2976
2977 We'll develop our full reduction function in stages.  Once we have it
2978 working, we'll then consider some variants.  Just to be precise, we'll
2979 distinguish each microvariant with its own index number embedded in
2980 its name.
2981
2982     let rec reduce1 (t:term):term =
2983       if (is_redex t) then reduce1 (reduce_one_step t)
2984                       else t
2985
2986 If the input is a redex, we ship it off to `reduce_one_step` for
2987 processing.  But just in case the result of the one-step reduction is
2988 itself a redex, we recursively call `reduce1`.  The recursion will
2989 continue until the result is no longer a redex.
2990
2991     #trace reduce1;;
2992     reduce1 is now traced.
2993     # reduce1 (FA (I, FA (I, K)));;
2994     reduce1 <-- FA (I, FA (I, K))
2995       reduce1 <-- FA (I, K)
2996         reduce1 <-- K
2997         reduce1 --> K
2998       reduce1 --> K
2999     reduce1 --> K
3000     - : term = K
3001
3002 Since the initial input (`I(IK)`) is a redex, the result after the
3003 one-step reduction is `IK`.  We recursively call `reduce1` on this
3004 input.  Since `IK` is itself a redex, the result after one-step
3005 reduction is `K`.  We recursively call `reduce1` on this input.  Since
3006 `K` is not a redex, the recursion bottoms out, and we simply return
3007 it.
3008
3009 But this function doesn't do enough reduction.
3010
3011     # reduce1 (FA (FA (I, I), K));;
3012     - : term = FA (FA (I, I), K)
3013
3014 Because the top-level term is not a redex, `reduce1` returns it
3015 without any evaluation.  What we want is to evaluate the subparts of a
3016 complex term.
3017
3018     let rec reduce2 (t:term):term = match t with
3019         I -> I
3020       | K -> K
3021       | S -> S
3022       | FA (a, b) ->
3023           let t' = FA (reduce2 a, reduce2 b) in
3024             if (is_redex t') then reduce2 (reduce_one_step t')
3025                              else t'
3026
3027 Since we need access to the subterms, we do pattern matching on the
3028 input term.  If the input is simple, we return it.  If the input is
3029 complex, we first process the subexpressions, and only then see if we
3030 have a redex.  To understand how this works, follow the trace
3031 carefully:
3032
3033     # reduce2 (FA(FA(I,I),K));;
3034     reduce2 <-- FA (FA (I, I), K)
3035
3036       reduce2 <-- K          ; first main recursive call
3037       reduce2 --> K
3038
3039       reduce2 <-- FA (I, I)  ; second main recursive call
3040         reduce2 <-- I
3041         reduce2 --> I
3042         reduce2 <-- I
3043         reduce2 --> I
3044       reduce2 <-- I
3045
3046       reduce2 --> I           ; third main recursive call
3047       reduce2 --> I
3048
3049       reduce2 <-- K           ; fourth
3050       reduce2 --> K
3051     reduce2 --> K
3052     - : term = K
3053
3054 Ok, there's a lot going on here.  Since the input is complex, the
3055 first thing the function does is construct `t'`.  In order to do this,
3056 it must reduce the two main subexpressions, `II` and `K`.  But we see
3057 from the trace that it begins with the right-hand expression, `K`.  We
3058 didn't explicitly tell it to begin with the right-hand subexpression,
3059 so control over evaluation order is starting to spin out of our
3060 control.  (We'll get it back later, don't worry.)
3061
3062 In any case, in the second main recursive call, we evaluate `II`.  The
3063 result is `I`.  The third main recursive call tests whether this
3064 result needs any further processing; it doesn't.
3065
3066 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
3067 redex, we ship it off to reduce_one_step, getting the term `K` as a
3068 result.  The fourth recursive call checks that there is no more
3069 reduction work to be done (there isn't), and that's our final result.
3070
3071 You can see in more detail what is going on by tracing both reduce2
3072 and reduce_one_step, but that makes for some long traces.
3073
3074 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
3075 K`, as desired.
3076
3077 Because the OCaml interpreter evaluates the rightmost expression
3078 in the course of building `t'`, however, it will always evaluate the
3079 right hand subexpression, whether it needs to or not.  And sure
3080 enough,
3081
3082     # reduce2 (FA(FA(K,I),skomega));;
3083       C-c C-cInterrupted.
3084
3085 instead of performing the leftmost reduction first, and recognizing
3086 that this term reduces to the normal form `I`, we get lost endlessly
3087 trying to reduce skomega.
3088
3089 To emphasize that our evaluation order here is at the mercy of the
3090 evaluation order of OCaml, here is the exact same program translated
3091 into Haskell.  We'll put them side by side to emphasize the exact parallel.
3092
3093 <pre>
3094 OCaml                                                          Haskell
3095 ==========================================================     =========================================================
3096
3097 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
3098
3099 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
3100
3101                                                                reduce_one_step :: Term -> Term
3102 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
3103     FA(I,a) -> a                                                 FA I a -> a
3104   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
3105   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
3106   | _ -> t                                                       _ -> t
3107
3108                                                                is_redex :: Term -> Bool
3109 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
3110
3111                                                                reduce2 :: Term -> Term
3112 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
3113     I -> I                                                       I -> I
3114   | K -> K                                                       K -> K
3115   | S -> S                                                       S -> S
3116   | FA (a, b) ->                                                 FA a b ->
3117       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
3118         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
3119                          else t'                                                      else t'
3120 </pre>
3121
3122 There are some differences in the way types are made explicit, and in
3123 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
3124 Haskell).  But the two programs are essentially identical.
3125
3126 Yet the Haskell program finds the normal form for KI -->
3127
3128 [[!toc levels=2]]
3129
3130 # Reasoning about evaluation order in Combinatory Logic
3131
3132 We've discussed evaluation order before, primarily in connection with
3133 the untyped lambda calculus.  Whenever a term has more than one redex,
3134 we have to choose which one to reduce, and this choice can make a
3135 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
3136 leftmost redex first, the term reduces to the normal form `I` in one
3137 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
3138 we do not arrive at a normal form, and are in danger of entering an
3139 infinite loop.
3140
3141 Thanks to the introduction of sum types (disjoint union), we are now
3142 in a position to gain a deeper understanding by writing a program that
3143 allows us to experiment with different evaluation order strategies.
3144
3145 One thing we'll see is that it is all too easy for the evaluation
3146 order properties of an evaluator to depend on the evaluation order
3147 properties of the programming language in which the evaluator is
3148 written.  We will seek to write an evaluator in which the order of
3149 evaluation is insensitive to the evaluator language.  The goal is to
3150 find an order-insensitive way to reason about evaluation order.
3151
3152 The first evaluator we develop will evaluate terms in Combinatory
3153 Logic.  That will significantly simplify the program, since we won't
3154 need to worry about variables or substitution.  As we develop and
3155 extend our evaluator in future weeks, we'll switch to lambdas, but for
3156 now, working with the elegant simplicity of Combinatory Logic will
3157 make the evaluation order issues easier to grasp.
3158
3159 A brief review: a term in CL is the combination of three basic
3160 expressions, `S`, `K`, and `I`, governed by the following reduction
3161 rules:
3162
3163     Ia ~~> a
3164     Kab ~~> b
3165     Sabc ~~> ac(bc)
3166
3167 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
3168 that how to embed the untyped lambda calculus in CL, so it's no
3169 surprise that the same evaluation order issues arise in CL.
3170
3171     skomega = SII(SII)
3172             ~~> I(SII)(I(SII))
3173             ~~> SII(SII)
3174
3175 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
3176 here, though we'll use the same symbol, `Ω`.  If we consider the term
3177
3178     KIΩ == KI(SII(SII))
3179
3180 we can choose to reduce the leftmost redex by firing the reduction
3181 rule for `K`, in which case the term reduces to the normal form `I` in
3182 one step; or we can choose to reduce the skomega part, by firing the
3183 reduction rule fo `S`, in which case we do not get a normal form,
3184 we're headed towards an infinite loop.
3185
3186 With sum types, we can define terms in CL in OCaml as follows:
3187
3188     type term = I | S | K | FA of (term * term)
3189
3190     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
3191     let test1 = FA (FA (K,I), skomega)
3192
3193 This recursive type definition says that a term in CL is either one of
3194 the three simple expressions, or else a pair of CL expressions.
3195 Following Heim and Kratzer, `FA` stands for Functional Application.
3196 With this type definition, we can encode skomega, as well as other
3197 terms whose reduction behavior we want to control.
3198
3199 Using pattern matching, it is easy to code the one-step reduction
3200 rules for CL:
3201
3202     let reduce_one_step (t:term):term = match t with
3203         FA(I,a) -> a
3204       | FA(FA(K,a),b) -> a
3205       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
3206       | _ -> t
3207
3208     # reduce_one_step (FA(FA(K,S),I));;
3209     - : term = S
3210     # reduce_one_step skomega;;
3211     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
3212
3213 The need to explicitly insert the type constructor `FA` obscures
3214 things a bit, but it's still possible to see how the one-step
3215 reduction function is just the reduction rules for CL.  The
3216 OCaml interpreter shows us that the function faithfully recognizes
3217 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
3218
3219 We can now say precisely what it means to be a redex in CL.
3220
3221     let is_redex (t:term):bool = not (t = reduce_one_step t)
3222
3223     # is_redex K;;
3224     - : bool = false
3225     # is_redex (FA(K,I));;
3226     - : bool = false
3227     # is_redex (FA(FA(K,I),S));;
3228     - : bool = true
3229     # is_redex skomega;;
3230     - : book = true
3231
3232 Warning: this definition relies on the fact that the one-step
3233 reduction of a CL term is never identical to the original term.  This
3234 would not work for the untyped lambda calculus, since
3235 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
3236 order to decide whether two terms are equal, OCaml has to recursively
3237 compare the elements of complex CL terms.  It is able to figure out
3238 how to do this because we provided an explicit definition of the
3239 datatype Term.
3240
3241 As you would expect, a term in CL is in normal form when it contains
3242 no redexes.
3243
3244 In order to fully reduce a term, we need to be able to reduce redexes
3245 that are not at the top level of the term.
3246
3247     (KKI)SI ~~> KSI ~~> S
3248
3249 That is, we want to be able to first evaluate the redex `KKI` that is
3250 a proper subpart of the larger term to produce a new intermediate term
3251 that we can then evaluate to the final normal form.
3252
3253 Because we need to process subparts, and because the result after
3254 processing a subpart may require further processing, the recursive
3255 structure of our evaluation function has to be quite subtle.  To truly
3256 understand it, you will need to do some sophisticated thinking about
3257 how recursion works.  We'll show you how to keep track of what is
3258 going on by constructing an recursive execution trace of inputs and
3259 outputs.
3260
3261 We'll develop our full reduction function in stages.  Once we have it
3262 working, we'll then consider some variants.  Just to be precise, we'll
3263 distinguish each microvariant with its own index number embedded in
3264 its name.
3265
3266     let rec reduce1 (t:term):term =
3267       if (is_redex t) then reduce1 (reduce_one_step t)
3268                       else t
3269
3270 If the input is a redex, we ship it off to `reduce_one_step` for
3271 processing.  But just in case the result of the one-step reduction is
3272 itself a redex, we recursively call `reduce1`.  The recursion will
3273 continue until the result is no longer a redex.
3274
3275     #trace reduce1;;
3276     reduce1 is now traced.
3277     # reduce1 (FA (I, FA (I, K)));;
3278     reduce1 <-- FA (I, FA (I, K))
3279       reduce1 <-- FA (I, K)
3280         reduce1 <-- K
3281         reduce1 --> K
3282       reduce1 --> K
3283     reduce1 --> K
3284     - : term = K
3285
3286 Since the initial input (`I(IK)`) is a redex, the result after the
3287 one-step reduction is `IK`.  We recursively call `reduce1` on this
3288 input.  Since `IK` is itself a redex, the result after one-step
3289 reduction is `K`.  We recursively call `reduce1` on this input.  Since
3290 `K` is not a redex, the recursion bottoms out, and we simply return
3291 it.
3292
3293 But this function doesn't do enough reduction.
3294
3295     # reduce1 (FA (FA (I, I), K));;
3296     - : term = FA (FA (I, I), K)
3297
3298 Because the top-level term is not a redex, `reduce1` returns it
3299 without any evaluation.  What we want is to evaluate the subparts of a
3300 complex term.
3301
3302     let rec reduce2 (t:term):term = match t with
3303         I -> I
3304       | K -> K
3305       | S -> S
3306       | FA (a, b) ->
3307           let t' = FA (reduce2 a, reduce2 b) in
3308             if (is_redex t') then reduce2 (reduce_one_step t')
3309                              else t'
3310
3311 Since we need access to the subterms, we do pattern matching on the
3312 input term.  If the input is simple, we return it.  If the input is
3313 complex, we first process the subexpressions, and only then see if we
3314 have a redex.  To understand how this works, follow the trace
3315 carefully:
3316
3317     # reduce2 (FA(FA(I,I),K));;
3318     reduce2 <-- FA (FA (I, I), K)
3319
3320       reduce2 <-- K          ; first main recursive call
3321       reduce2 --> K
3322
3323       reduce2 <-- FA (I, I)  ; second main recursive call
3324         reduce2 <-- I
3325         reduce2 --> I
3326         reduce2 <-- I
3327         reduce2 --> I
3328       reduce2 <-- I
3329
3330       reduce2 --> I           ; third main recursive call
3331       reduce2 --> I
3332
3333       reduce2 <-- K           ; fourth
3334       reduce2 --> K
3335     reduce2 --> K
3336     - : term = K
3337
3338 Ok, there's a lot going on here.  Since the input is complex, the
3339 first thing the function does is construct `t'`.  In order to do this,
3340 it must reduce the two main subexpressions, `II` and `K`.  But we see
3341 from the trace that it begins with the right-hand expression, `K`.  We
3342 didn't explicitly tell it to begin with the right-hand subexpression,
3343 so control over evaluation order is starting to spin out of our
3344 control.  (We'll get it back later, don't worry.)
3345
3346 In any case, in the second main recursive call, we evaluate `II`.  The
3347 result is `I`.  The third main recursive call tests whether this
3348 result needs any further processing; it doesn't.
3349
3350 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
3351 redex, we ship it off to reduce_one_step, getting the term `K` as a
3352 result.  The fourth recursive call checks that there is no more
3353 reduction work to be done (there isn't), and that's our final result.
3354
3355 You can see in more detail what is going on by tracing both reduce2
3356 and reduce_one_step, but that makes for some long traces.
3357
3358 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
3359 K`, as desired.
3360
3361 Because the OCaml interpreter evaluates the rightmost expression
3362 in the course of building `t'`, however, it will always evaluate the
3363 right hand subexpression, whether it needs to or not.  And sure
3364 enough,
3365
3366     # reduce2 (FA(FA(K,I),skomega));;
3367       C-c C-cInterrupted.
3368
3369 instead of performing the leftmost reduction first, and recognizing
3370 that this term reduces to the normal form `I`, we get lost endlessly
3371 trying to reduce skomega.
3372
3373 To emphasize that our evaluation order here is at the mercy of the
3374 evaluation order of OCaml, here is the exact same program translated
3375 into Haskell.  We'll put them side by side to emphasize the exact parallel.
3376
3377 <pre>
3378 OCaml                                                          Haskell
3379 ==========================================================     =========================================================
3380
3381 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
3382
3383 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
3384
3385                                                                reduce_one_step :: Term -> Term
3386 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
3387     FA(I,a) -> a                                                 FA I a -> a
3388   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
3389   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
3390   | _ -> t                                                       _ -> t
3391
3392                                                                is_redex :: Term -> Bool
3393 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
3394
3395                                                                reduce2 :: Term -> Term
3396 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
3397     I -> I                                                       I -> I
3398   | K -> K                                                       K -> K
3399   | S -> S                                                       S -> S
3400   | FA (a, b) ->                                                 FA a b ->
3401       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
3402         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
3403                          else t'                                                      else t'
3404 </pre>
3405
3406 There are some differences in the way types are made explicit, and in
3407 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
3408 Haskell).  But the two programs are essentially identical.
3409
3410 Yet the Haskell program finds the normal form for KI -->
3411
3412 [[!toc levels=2]]
3413
3414 # Reasoning about evaluation order in Combinatory Logic
3415
3416 We've discussed evaluation order before, primarily in connection with
3417 the untyped lambda calculus.  Whenever a term has more than one redex,
3418 we have to choose which one to reduce, and this choice can make a
3419 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
3420 leftmost redex first, the term reduces to the normal form `I` in one
3421 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
3422 we do not arrive at a normal form, and are in danger of entering an
3423 infinite loop.
3424
3425 Thanks to the introduction of sum types (disjoint union), we are now
3426 in a position to gain a deeper understanding by writing a program that
3427 allows us to experiment with different evaluation order strategies.
3428
3429 One thing we'll see is that it is all too easy for the evaluation
3430 order properties of an evaluator to depend on the evaluation order
3431 properties of the programming language in which the evaluator is
3432 written.  We will seek to write an evaluator in which the order of
3433 evaluation is insensitive to the evaluator language.  The goal is to
3434 find an order-insensitive way to reason about evaluation order.
3435
3436 The first evaluator we develop will evaluate terms in Combinatory
3437 Logic.  That will significantly simplify the program, since we won't
3438 need to worry about variables or substitution.  As we develop and
3439 extend our evaluator in future weeks, we'll switch to lambdas, but for
3440 now, working with the elegant simplicity of Combinatory Logic will
3441 make the evaluation order issues easier to grasp.
3442
3443 A brief review: a term in CL is the combination of three basic
3444 expressions, `S`, `K`, and `I`, governed by the following reduction
3445 rules:
3446
3447     Ia ~~> a
3448     Kab ~~> b
3449     Sabc ~~> ac(bc)
3450
3451 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
3452 that how to embed the untyped lambda calculus in CL, so it's no
3453 surprise that the same evaluation order issues arise in CL.
3454
3455     skomega = SII(SII)
3456             ~~> I(SII)(I(SII))
3457             ~~> SII(SII)
3458
3459 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
3460 here, though we'll use the same symbol, `Ω`.  If we consider the term
3461
3462     KIΩ == KI(SII(SII))
3463
3464 we can choose to reduce the leftmost redex by firing the reduction
3465 rule for `K`, in which case the term reduces to the normal form `I` in
3466 one step; or we can choose to reduce the skomega part, by firing the
3467 reduction rule fo `S`, in which case we do not get a normal form,
3468 we're headed towards an infinite loop.
3469
3470 With sum types, we can define terms in CL in OCaml as follows:
3471
3472     type term = I | S | K | FA of (term * term)
3473
3474     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
3475     let test1 = FA (FA (K,I), skomega)
3476
3477 This recursive type definition says that a term in CL is either one of
3478 the three simple expressions, or else a pair of CL expressions.
3479 Following Heim and Kratzer, `FA` stands for Functional Application.
3480 With this type definition, we can encode skomega, as well as other
3481 terms whose reduction behavior we want to control.
3482
3483 Using pattern matching, it is easy to code the one-step reduction
3484 rules for CL:
3485
3486     let reduce_one_step (t:term):term = match t with
3487         FA(I,a) -> a
3488       | FA(FA(K,a),b) -> a
3489       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
3490       | _ -> t
3491
3492     # reduce_one_step (FA(FA(K,S),I));;
3493     - : term = S
3494     # reduce_one_step skomega;;
3495     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
3496
3497 The need to explicitly insert the type constructor `FA` obscures
3498 things a bit, but it's still possible to see how the one-step
3499 reduction function is just the reduction rules for CL.  The
3500 OCaml interpreter shows us that the function faithfully recognizes
3501 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
3502
3503 We can now say precisely what it means to be a redex in CL.
3504
3505     let is_redex (t:term):bool = not (t = reduce_one_step t)
3506
3507     # is_redex K;;
3508     - : bool = false
3509     # is_redex (FA(K,I));;
3510     - : bool = false
3511     # is_redex (FA(FA(K,I),S));;
3512     - : bool = true
3513     # is_redex skomega;;
3514     - : book = true
3515
3516 Warning: this definition relies on the fact that the one-step
3517 reduction of a CL term is never identical to the original term.  This
3518 would not work for the untyped lambda calculus, since
3519 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
3520 order to decide whether two terms are equal, OCaml has to recursively
3521 compare the elements of complex CL terms.  It is able to figure out
3522 how to do this because we provided an explicit definition of the
3523 datatype Term.
3524
3525 As you would expect, a term in CL is in normal form when it contains
3526 no redexes.
3527
3528 In order to fully reduce a term, we need to be able to reduce redexes
3529 that are not at the top level of the term.
3530
3531     (KKI)SI ~~> KSI ~~> S
3532
3533 That is, we want to be able to first evaluate the redex `KKI` that is
3534 a proper subpart of the larger term to produce a new intermediate term
3535 that we can then evaluate to the final normal form.
3536
3537 Because we need to process subparts, and because the result after
3538 processing a subpart may require further processing, the recursive
3539 structure of our evaluation function has to be quite subtle.  To truly
3540 understand it, you will need to do some sophisticated thinking about
3541 how recursion works.  We'll show you how to keep track of what is
3542 going on by constructing an recursive execution trace of inputs and
3543 outputs.
3544
3545 We'll develop our full reduction function in stages.  Once we have it
3546 working, we'll then consider some variants.  Just to be precise, we'll
3547 distinguish each microvariant with its own index number embedded in
3548 its name.
3549
3550     let rec reduce1 (t:term):term =
3551       if (is_redex t) then reduce1 (reduce_one_step t)
3552                       else t
3553
3554 If the input is a redex, we ship it off to `reduce_one_step` for
3555 processing.  But just in case the result of the one-step reduction is
3556 itself a redex, we recursively call `reduce1`.  The recursion will
3557 continue until the result is no longer a redex.
3558
3559     #trace reduce1;;
3560     reduce1 is now traced.
3561     # reduce1 (FA (I, FA (I, K)));;
3562     reduce1 <-- FA (I, FA (I, K))
3563       reduce1 <-- FA (I, K)
3564         reduce1 <-- K
3565         reduce1 --> K
3566       reduce1 --> K
3567     reduce1 --> K
3568     - : term = K
3569
3570 Since the initial input (`I(IK)`) is a redex, the result after the
3571 one-step reduction is `IK`.  We recursively call `reduce1` on this
3572 input.  Since `IK` is itself a redex, the result after one-step
3573 reduction is `K`.  We recursively call `reduce1` on this input.  Since
3574 `K` is not a redex, the recursion bottoms out, and we simply return
3575 it.
3576
3577 But this function doesn't do enough reduction.
3578
3579     # reduce1 (FA (FA (I, I), K));;
3580     - : term = FA (FA (I, I), K)
3581
3582 Because the top-level term is not a redex, `reduce1` returns it
3583 without any evaluation.  What we want is to evaluate the subparts of a
3584 complex term.
3585
3586     let rec reduce2 (t:term):term = match t with
3587         I -> I
3588       | K -> K
3589       | S -> S
3590       | FA (a, b) ->
3591           let t' = FA (reduce2 a, reduce2 b) in
3592             if (is_redex t') then reduce2 (reduce_one_step t')
3593                              else t'
3594
3595 Since we need access to the subterms, we do pattern matching on the
3596 input term.  If the input is simple, we return it.  If the input is
3597 complex, we first process the subexpressions, and only then see if we
3598 have a redex.  To understand how this works, follow the trace
3599 carefully:
3600
3601     # reduce2 (FA(FA(I,I),K));;
3602     reduce2 <-- FA (FA (I, I), K)
3603
3604       reduce2 <-- K          ; first main recursive call
3605       reduce2 --> K
3606
3607       reduce2 <-- FA (I, I)  ; second main recursive call
3608         reduce2 <-- I
3609         reduce2 --> I
3610         reduce2 <-- I
3611         reduce2 --> I
3612       reduce2 <-- I
3613
3614       reduce2 --> I           ; third main recursive call
3615       reduce2 --> I
3616
3617       reduce2 <-- K           ; fourth
3618       reduce2 --> K
3619     reduce2 --> K
3620     - : term = K
3621
3622 Ok, there's a lot going on here.  Since the input is complex, the
3623 first thing the function does is construct `t'`.  In order to do this,
3624 it must reduce the two main subexpressions, `II` and `K`.  But we see
3625 from the trace that it begins with the right-hand expression, `K`.  We
3626 didn't explicitly tell it to begin with the right-hand subexpression,
3627 so control over evaluation order is starting to spin out of our
3628 control.  (We'll get it back later, don't worry.)
3629
3630 In any case, in the second main recursive call, we evaluate `II`.  The
3631 result is `I`.  The third main recursive call tests whether this
3632 result needs any further processing; it doesn't.
3633
3634 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
3635 redex, we ship it off to reduce_one_step, getting the term `K` as a
3636 result.  The fourth recursive call checks that there is no more
3637 reduction work to be done (there isn't), and that's our final result.
3638
3639 You can see in more detail what is going on by tracing both reduce2
3640 and reduce_one_step, but that makes for some long traces.
3641
3642 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
3643 K`, as desired.
3644
3645 Because the OCaml interpreter evaluates the rightmost expression
3646 in the course of building `t'`, however, it will always evaluate the
3647 right hand subexpression, whether it needs to or not.  And sure
3648 enough,
3649
3650     # reduce2 (FA(FA(K,I),skomega));;
3651       C-c C-cInterrupted.
3652
3653 instead of performing the leftmost reduction first, and recognizing
3654 that this term reduces to the normal form `I`, we get lost endlessly
3655 trying to reduce skomega.
3656
3657 To emphasize that our evaluation order here is at the mercy of the
3658 evaluation order of OCaml, here is the exact same program translated
3659 into Haskell.  We'll put them side by side to emphasize the exact parallel.
3660
3661 <pre>
3662 OCaml                                                          Haskell
3663 ==========================================================     =========================================================
3664
3665 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
3666
3667 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
3668
3669                                                                reduce_one_step :: Term -> Term
3670 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
3671     FA(I,a) -> a                                                 FA I a -> a
3672   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
3673   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
3674   | _ -> t                                                       _ -> t
3675
3676                                                                is_redex :: Term -> Bool
3677 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
3678
3679                                                                reduce2 :: Term -> Term
3680 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
3681     I -> I                                                       I -> I
3682   | K -> K                                                       K -> K
3683   | S -> S                                                       S -> S
3684   | FA (a, b) ->                                                 FA a b ->
3685       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
3686         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
3687                          else t'                                                      else t'
3688 </pre>
3689
3690 There are some differences in the way types are made explicit, and in
3691 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
3692 Haskell).  But the two programs are essentially identical.
3693
3694 Yet the Haskell program finds the normal form for KI -->
3695
3696 [[!toc levels=2]]
3697
3698 # Reasoning about evaluation order in Combinatory Logic
3699
3700 We've discussed evaluation order before, primarily in connection with
3701 the untyped lambda calculus.  Whenever a term has more than one redex,
3702 we have to choose which one to reduce, and this choice can make a
3703 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
3704 leftmost redex first, the term reduces to the normal form `I` in one
3705 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
3706 we do not arrive at a normal form, and are in danger of entering an
3707 infinite loop.
3708
3709 Thanks to the introduction of sum types (disjoint union), we are now
3710 in a position to gain a deeper understanding by writing a program that
3711 allows us to experiment with different evaluation order strategies.
3712
3713 One thing we'll see is that it is all too easy for the evaluation
3714 order properties of an evaluator to depend on the evaluation order
3715 properties of the programming language in which the evaluator is
3716 written.  We will seek to write an evaluator in which the order of
3717 evaluation is insensitive to the evaluator language.  The goal is to
3718 find an order-insensitive way to reason about evaluation order.
3719
3720 The first evaluator we develop will evaluate terms in Combinatory
3721 Logic.  That will significantly simplify the program, since we won't
3722 need to worry about variables or substitution.  As we develop and
3723 extend our evaluator in future weeks, we'll switch to lambdas, but for
3724 now, working with the elegant simplicity of Combinatory Logic will
3725 make the evaluation order issues easier to grasp.
3726
3727 A brief review: a term in CL is the combination of three basic
3728 expressions, `S`, `K`, and `I`, governed by the following reduction
3729 rules:
3730
3731     Ia ~~> a
3732     Kab ~~> b
3733     Sabc ~~> ac(bc)
3734
3735 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
3736 that how to embed the untyped lambda calculus in CL, so it's no
3737 surprise that the same evaluation order issues arise in CL.
3738
3739     skomega = SII(SII)
3740             ~~> I(SII)(I(SII))
3741             ~~> SII(SII)
3742
3743 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
3744 here, though we'll use the same symbol, `Ω`.  If we consider the term
3745
3746     KIΩ == KI(SII(SII))
3747
3748 we can choose to reduce the leftmost redex by firing the reduction
3749 rule for `K`, in which case the term reduces to the normal form `I` in
3750 one step; or we can choose to reduce the skomega part, by firing the
3751 reduction rule fo `S`, in which case we do not get a normal form,
3752 we're headed towards an infinite loop.
3753
3754 With sum types, we can define terms in CL in OCaml as follows:
3755
3756     type term = I | S | K | FA of (term * term)
3757
3758     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
3759     let test1 = FA (FA (K,I), skomega)
3760
3761 This recursive type definition says that a term in CL is either one of
3762 the three simple expressions, or else a pair of CL expressions.
3763 Following Heim and Kratzer, `FA` stands for Functional Application.
3764 With this type definition, we can encode skomega, as well as other
3765 terms whose reduction behavior we want to control.
3766
3767 Using pattern matching, it is easy to code the one-step reduction
3768 rules for CL:
3769
3770     let reduce_one_step (t:term):term = match t with
3771         FA(I,a) -> a
3772       | FA(FA(K,a),b) -> a
3773       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
3774       | _ -> t
3775
3776     # reduce_one_step (FA(FA(K,S),I));;
3777     - : term = S
3778     # reduce_one_step skomega;;
3779     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
3780
3781 The need to explicitly insert the type constructor `FA` obscures
3782 things a bit, but it's still possible to see how the one-step
3783 reduction function is just the reduction rules for CL.  The
3784 OCaml interpreter shows us that the function faithfully recognizes
3785 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
3786
3787 We can now say precisely what it means to be a redex in CL.
3788
3789     let is_redex (t:term):bool = not (t = reduce_one_step t)
3790
3791     # is_redex K;;
3792     - : bool = false
3793     # is_redex (FA(K,I));;
3794     - : bool = false
3795     # is_redex (FA(FA(K,I),S));;
3796     - : bool = true
3797     # is_redex skomega;;
3798     - : book = true
3799
3800 Warning: this definition relies on the fact that the one-step
3801 reduction of a CL term is never identical to the original term.  This
3802 would not work for the untyped lambda calculus, since
3803 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
3804 order to decide whether two terms are equal, OCaml has to recursively
3805 compare the elements of complex CL terms.  It is able to figure out
3806 how to do this because we provided an explicit definition of the
3807 datatype Term.
3808
3809 As you would expect, a term in CL is in normal form when it contains
3810 no redexes.
3811
3812 In order to fully reduce a term, we need to be able to reduce redexes
3813 that are not at the top level of the term.
3814
3815     (KKI)SI ~~> KSI ~~> S
3816
3817 That is, we want to be able to first evaluate the redex `KKI` that is
3818 a proper subpart of the larger term to produce a new intermediate term
3819 that we can then evaluate to the final normal form.
3820
3821 Because we need to process subparts, and because the result after
3822 processing a subpart may require further processing, the recursive
3823 structure of our evaluation function has to be quite subtle.  To truly
3824 understand it, you will need to do some sophisticated thinking about
3825 how recursion works.  We'll show you how to keep track of what is
3826 going on by constructing an recursive execution trace of inputs and
3827 outputs.
3828
3829 We'll develop our full reduction function in stages.  Once we have it
3830 working, we'll then consider some variants.  Just to be precise, we'll
3831 distinguish each microvariant with its own index number embedded in
3832 its name.
3833
3834     let rec reduce1 (t:term):term =
3835       if (is_redex t) then reduce1 (reduce_one_step t)
3836                       else t
3837
3838 If the input is a redex, we ship it off to `reduce_one_step` for
3839 processing.  But just in case the result of the one-step reduction is
3840 itself a redex, we recursively call `reduce1`.  The recursion will
3841 continue until the result is no longer a redex.
3842
3843     #trace reduce1;;
3844     reduce1 is now traced.
3845     # reduce1 (FA (I, FA (I, K)));;
3846     reduce1 <-- FA (I, FA (I, K))
3847       reduce1 <-- FA (I, K)
3848         reduce1 <-- K
3849         reduce1 --> K
3850       reduce1 --> K
3851     reduce1 --> K
3852     - : term = K
3853
3854 Since the initial input (`I(IK)`) is a redex, the result after the
3855 one-step reduction is `IK`.  We recursively call `reduce1` on this
3856 input.  Since `IK` is itself a redex, the result after one-step
3857 reduction is `K`.  We recursively call `reduce1` on this input.  Since
3858 `K` is not a redex, the recursion bottoms out, and we simply return
3859 it.
3860
3861 But this function doesn't do enough reduction.
3862
3863     # reduce1 (FA (FA (I, I), K));;
3864     - : term = FA (FA (I, I), K)
3865
3866 Because the top-level term is not a redex, `reduce1` returns it
3867 without any evaluation.  What we want is to evaluate the subparts of a
3868 complex term.
3869
3870     let rec reduce2 (t:term):term = match t with
3871         I -> I
3872       | K -> K
3873       | S -> S
3874       | FA (a, b) ->
3875           let t' = FA (reduce2 a, reduce2 b) in
3876             if (is_redex t') then reduce2 (reduce_one_step t')
3877                              else t'
3878
3879 Since we need access to the subterms, we do pattern matching on the
3880 input term.  If the input is simple, we return it.  If the input is
3881 complex, we first process the subexpressions, and only then see if we
3882 have a redex.  To understand how this works, follow the trace
3883 carefully:
3884
3885     # reduce2 (FA(FA(I,I),K));;
3886     reduce2 <-- FA (FA (I, I), K)
3887
3888       reduce2 <-- K          ; first main recursive call
3889       reduce2 --> K
3890
3891       reduce2 <-- FA (I, I)  ; second main recursive call
3892         reduce2 <-- I
3893         reduce2 --> I
3894         reduce2 <-- I
3895         reduce2 --> I
3896       reduce2 <-- I
3897
3898       reduce2 --> I           ; third main recursive call
3899       reduce2 --> I
3900
3901       reduce2 <-- K           ; fourth
3902       reduce2 --> K
3903     reduce2 --> K
3904     - : term = K
3905
3906 Ok, there's a lot going on here.  Since the input is complex, the
3907 first thing the function does is construct `t'`.  In order to do this,
3908 it must reduce the two main subexpressions, `II` and `K`.  But we see
3909 from the trace that it begins with the right-hand expression, `K`.  We
3910 didn't explicitly tell it to begin with the right-hand subexpression,
3911 so control over evaluation order is starting to spin out of our
3912 control.  (We'll get it back later, don't worry.)
3913
3914 In any case, in the second main recursive call, we evaluate `II`.  The
3915 result is `I`.  The third main recursive call tests whether this
3916 result needs any further processing; it doesn't.
3917
3918 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
3919 redex, we ship it off to reduce_one_step, getting the term `K` as a
3920 result.  The fourth recursive call checks that there is no more
3921 reduction work to be done (there isn't), and that's our final result.
3922
3923 You can see in more detail what is going on by tracing both reduce2
3924 and reduce_one_step, but that makes for some long traces.
3925
3926 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
3927 K`, as desired.
3928
3929 Because the OCaml interpreter evaluates the rightmost expression
3930 in the course of building `t'`, however, it will always evaluate the
3931 right hand subexpression, whether it needs to or not.  And sure
3932 enough,
3933
3934     # reduce2 (FA(FA(K,I),skomega));;
3935       C-c C-cInterrupted.
3936
3937 instead of performing the leftmost reduction first, and recognizing
3938 that this term reduces to the normal form `I`, we get lost endlessly
3939 trying to reduce skomega.
3940
3941 To emphasize that our evaluation order here is at the mercy of the
3942 evaluation order of OCaml, here is the exact same program translated
3943 into Haskell.  We'll put them side by side to emphasize the exact parallel.
3944
3945 <pre>
3946 OCaml                                                          Haskell
3947 ==========================================================     =========================================================
3948
3949 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
3950
3951 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
3952
3953                                                                reduce_one_step :: Term -> Term
3954 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
3955     FA(I,a) -> a                                                 FA I a -> a
3956   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
3957   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
3958   | _ -> t                                                       _ -> t
3959
3960                                                                is_redex :: Term -> Bool
3961 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
3962
3963                                                                reduce2 :: Term -> Term
3964 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
3965     I -> I                                                       I -> I
3966   | K -> K                                                       K -> K
3967   | S -> S                                                       S -> S
3968   | FA (a, b) ->                                                 FA a b ->
3969       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
3970         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
3971                          else t'                                                      else t'
3972 </pre>
3973
3974 There are some differences in the way types are made explicit, and in
3975 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
3976 Haskell).  But the two programs are essentially identical.
3977
3978 Yet the Haskell program finds the normal form for KI -->
3979
3980 [[!toc levels=2]]
3981
3982 # Reasoning about evaluation order in Combinatory Logic
3983
3984 We've discussed evaluation order before, primarily in connection with
3985 the untyped lambda calculus.  Whenever a term has more than one redex,
3986 we have to choose which one to reduce, and this choice can make a
3987 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
3988 leftmost redex first, the term reduces to the normal form `I` in one
3989 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
3990 we do not arrive at a normal form, and are in danger of entering an
3991 infinite loop.
3992
3993 Thanks to the introduction of sum types (disjoint union), we are now
3994 in a position to gain a deeper understanding by writing a program that
3995 allows us to experiment with different evaluation order strategies.
3996
3997 One thing we'll see is that it is all too easy for the evaluation
3998 order properties of an evaluator to depend on the evaluation order
3999 properties of the programming language in which the evaluator is
4000 written.  We will seek to write an evaluator in which the order of
4001 evaluation is insensitive to the evaluator language.  The goal is to
4002 find an order-insensitive way to reason about evaluation order.
4003
4004 The first evaluator we develop will evaluate terms in Combinatory
4005 Logic.  That will significantly simplify the program, since we won't
4006 need to worry about variables or substitution.  As we develop and
4007 extend our evaluator in future weeks, we'll switch to lambdas, but for
4008 now, working with the elegant simplicity of Combinatory Logic will
4009 make the evaluation order issues easier to grasp.
4010
4011 A brief review: a term in CL is the combination of three basic
4012 expressions, `S`, `K`, and `I`, governed by the following reduction
4013 rules:
4014
4015     Ia ~~> a
4016     Kab ~~> b
4017     Sabc ~~> ac(bc)
4018
4019 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
4020 that how to embed the untyped lambda calculus in CL, so it's no
4021 surprise that the same evaluation order issues arise in CL.
4022
4023     skomega = SII(SII)
4024             ~~> I(SII)(I(SII))
4025             ~~> SII(SII)
4026
4027 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
4028 here, though we'll use the same symbol, `Ω`.  If we consider the term
4029
4030     KIΩ == KI(SII(SII))
4031
4032 we can choose to reduce the leftmost redex by firing the reduction
4033 rule for `K`, in which case the term reduces to the normal form `I` in
4034 one step; or we can choose to reduce the skomega part, by firing the
4035 reduction rule fo `S`, in which case we do not get a normal form,
4036 we're headed towards an infinite loop.
4037
4038 With sum types, we can define terms in CL in OCaml as follows:
4039
4040     type term = I | S | K | FA of (term * term)
4041
4042     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
4043     let test1 = FA (FA (K,I), skomega)
4044
4045 This recursive type definition says that a term in CL is either one of
4046 the three simple expressions, or else a pair of CL expressions.
4047 Following Heim and Kratzer, `FA` stands for Functional Application.
4048 With this type definition, we can encode skomega, as well as other
4049 terms whose reduction behavior we want to control.
4050
4051 Using pattern matching, it is easy to code the one-step reduction
4052 rules for CL:
4053
4054     let reduce_one_step (t:term):term = match t with
4055         FA(I,a) -> a
4056       | FA(FA(K,a),b) -> a
4057       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
4058       | _ -> t
4059
4060     # reduce_one_step (FA(FA(K,S),I));;
4061     - : term = S
4062     # reduce_one_step skomega;;
4063     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
4064
4065 The need to explicitly insert the type constructor `FA` obscures
4066 things a bit, but it's still possible to see how the one-step
4067 reduction function is just the reduction rules for CL.  The
4068 OCaml interpreter shows us that the function faithfully recognizes
4069 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
4070
4071 We can now say precisely what it means to be a redex in CL.
4072
4073     let is_redex (t:term):bool = not (t = reduce_one_step t)
4074
4075     # is_redex K;;
4076     - : bool = false
4077     # is_redex (FA(K,I));;
4078     - : bool = false
4079     # is_redex (FA(FA(K,I),S));;
4080     - : bool = true
4081     # is_redex skomega;;
4082     - : book = true
4083
4084 Warning: this definition relies on the fact that the one-step
4085 reduction of a CL term is never identical to the original term.  This
4086 would not work for the untyped lambda calculus, since
4087 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
4088 order to decide whether two terms are equal, OCaml has to recursively
4089 compare the elements of complex CL terms.  It is able to figure out
4090 how to do this because we provided an explicit definition of the
4091 datatype Term.
4092
4093 As you would expect, a term in CL is in normal form when it contains
4094 no redexes.
4095
4096 In order to fully reduce a term, we need to be able to reduce redexes
4097 that are not at the top level of the term.
4098
4099     (KKI)SI ~~> KSI ~~> S
4100
4101 That is, we want to be able to first evaluate the redex `KKI` that is
4102 a proper subpart of the larger term to produce a new intermediate term
4103 that we can then evaluate to the final normal form.
4104
4105 Because we need to process subparts, and because the result after
4106 processing a subpart may require further processing, the recursive
4107 structure of our evaluation function has to be quite subtle.  To truly
4108 understand it, you will need to do some sophisticated thinking about
4109 how recursion works.  We'll show you how to keep track of what is
4110 going on by constructing an recursive execution trace of inputs and
4111 outputs.
4112
4113 We'll develop our full reduction function in stages.  Once we have it
4114 working, we'll then consider some variants.  Just to be precise, we'll
4115 distinguish each microvariant with its own index number embedded in
4116 its name.
4117
4118     let rec reduce1 (t:term):term =
4119       if (is_redex t) then reduce1 (reduce_one_step t)
4120                       else t
4121
4122 If the input is a redex, we ship it off to `reduce_one_step` for
4123 processing.  But just in case the result of the one-step reduction is
4124 itself a redex, we recursively call `reduce1`.  The recursion will
4125 continue until the result is no longer a redex.
4126
4127     #trace reduce1;;
4128     reduce1 is now traced.
4129     # reduce1 (FA (I, FA (I, K)));;
4130     reduce1 <-- FA (I, FA (I, K))
4131       reduce1 <-- FA (I, K)
4132         reduce1 <-- K
4133         reduce1 --> K
4134       reduce1 --> K
4135     reduce1 --> K
4136     - : term = K
4137
4138 Since the initial input (`I(IK)`) is a redex, the result after the
4139 one-step reduction is `IK`.  We recursively call `reduce1` on this
4140 input.  Since `IK` is itself a redex, the result after one-step
4141 reduction is `K`.  We recursively call `reduce1` on this input.  Since
4142 `K` is not a redex, the recursion bottoms out, and we simply return
4143 it.
4144
4145 But this function doesn't do enough reduction.
4146
4147     # reduce1 (FA (FA (I, I), K));;
4148     - : term = FA (FA (I, I), K)
4149
4150 Because the top-level term is not a redex, `reduce1` returns it
4151 without any evaluation.  What we want is to evaluate the subparts of a
4152 complex term.
4153
4154     let rec reduce2 (t:term):term = match t with
4155         I -> I
4156       | K -> K
4157       | S -> S
4158       | FA (a, b) ->
4159           let t' = FA (reduce2 a, reduce2 b) in
4160             if (is_redex t') then reduce2 (reduce_one_step t')
4161                              else t'
4162
4163 Since we need access to the subterms, we do pattern matching on the
4164 input term.  If the input is simple, we return it.  If the input is
4165 complex, we first process the subexpressions, and only then see if we
4166 have a redex.  To understand how this works, follow the trace
4167 carefully:
4168
4169     # reduce2 (FA(FA(I,I),K));;
4170     reduce2 <-- FA (FA (I, I), K)
4171
4172       reduce2 <-- K          ; first main recursive call
4173       reduce2 --> K
4174
4175       reduce2 <-- FA (I, I)  ; second main recursive call
4176         reduce2 <-- I
4177         reduce2 --> I
4178         reduce2 <-- I
4179         reduce2 --> I
4180       reduce2 <-- I
4181
4182       reduce2 --> I           ; third main recursive call
4183       reduce2 --> I
4184
4185       reduce2 <-- K           ; fourth
4186       reduce2 --> K
4187     reduce2 --> K
4188     - : term = K
4189
4190 Ok, there's a lot going on here.  Since the input is complex, the
4191 first thing the function does is construct `t'`.  In order to do this,
4192 it must reduce the two main subexpressions, `II` and `K`.  But we see
4193 from the trace that it begins with the right-hand expression, `K`.  We
4194 didn't explicitly tell it to begin with the right-hand subexpression,
4195 so control over evaluation order is starting to spin out of our
4196 control.  (We'll get it back later, don't worry.)
4197
4198 In any case, in the second main recursive call, we evaluate `II`.  The
4199 result is `I`.  The third main recursive call tests whether this
4200 result needs any further processing; it doesn't.
4201
4202 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
4203 redex, we ship it off to reduce_one_step, getting the term `K` as a
4204 result.  The fourth recursive call checks that there is no more
4205 reduction work to be done (there isn't), and that's our final result.
4206
4207 You can see in more detail what is going on by tracing both reduce2
4208 and reduce_one_step, but that makes for some long traces.
4209
4210 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
4211 K`, as desired.
4212
4213 Because the OCaml interpreter evaluates the rightmost expression
4214 in the course of building `t'`, however, it will always evaluate the
4215 right hand subexpression, whether it needs to or not.  And sure
4216 enough,
4217
4218     # reduce2 (FA(FA(K,I),skomega));;
4219       C-c C-cInterrupted.
4220
4221 instead of performing the leftmost reduction first, and recognizing
4222 that this term reduces to the normal form `I`, we get lost endlessly
4223 trying to reduce skomega.
4224
4225 To emphasize that our evaluation order here is at the mercy of the
4226 evaluation order of OCaml, here is the exact same program translated
4227 into Haskell.  We'll put them side by side to emphasize the exact parallel.
4228
4229 <pre>
4230 OCaml                                                          Haskell
4231 ==========================================================     =========================================================
4232
4233 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
4234
4235 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
4236
4237                                                                reduce_one_step :: Term -> Term
4238 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
4239     FA(I,a) -> a                                                 FA I a -> a
4240   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
4241   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
4242   | _ -> t                                                       _ -> t
4243
4244                                                                is_redex :: Term -> Bool
4245 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
4246
4247                                                                reduce2 :: Term -> Term
4248 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
4249     I -> I                                                       I -> I
4250   | K -> K                                                       K -> K
4251   | S -> S                                                       S -> S
4252   | FA (a, b) ->                                                 FA a b ->
4253       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
4254         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
4255                          else t'                                                      else t'
4256 </pre>
4257
4258 There are some differences in the way types are made explicit, and in
4259 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
4260 Haskell).  But the two programs are essentially identical.
4261
4262 Yet the Haskell program finds the normal form for KI -->
4263
4264 [[!toc levels=2]]
4265
4266 # Reasoning about evaluation order in Combinatory Logic
4267
4268 We've discussed evaluation order before, primarily in connection with
4269 the untyped lambda calculus.  Whenever a term has more than one redex,
4270 we have to choose which one to reduce, and this choice can make a
4271 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
4272 leftmost redex first, the term reduces to the normal form `I` in one
4273 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
4274 we do not arrive at a normal form, and are in danger of entering an
4275 infinite loop.
4276
4277 Thanks to the introduction of sum types (disjoint union), we are now
4278 in a position to gain a deeper understanding by writing a program that
4279 allows us to experiment with different evaluation order strategies.
4280
4281 One thing we'll see is that it is all too easy for the evaluation
4282 order properties of an evaluator to depend on the evaluation order
4283 properties of the programming language in which the evaluator is
4284 written.  We will seek to write an evaluator in which the order of
4285 evaluation is insensitive to the evaluator language.  The goal is to
4286 find an order-insensitive way to reason about evaluation order.
4287
4288 The first evaluator we develop will evaluate terms in Combinatory
4289 Logic.  That will significantly simplify the program, since we won't
4290 need to worry about variables or substitution.  As we develop and
4291 extend our evaluator in future weeks, we'll switch to lambdas, but for
4292 now, working with the elegant simplicity of Combinatory Logic will
4293 make the evaluation order issues easier to grasp.
4294
4295 A brief review: a term in CL is the combination of three basic
4296 expressions, `S`, `K`, and `I`, governed by the following reduction
4297 rules:
4298
4299     Ia ~~> a
4300     Kab ~~> b
4301     Sabc ~~> ac(bc)
4302
4303 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
4304 that how to embed the untyped lambda calculus in CL, so it's no
4305 surprise that the same evaluation order issues arise in CL.
4306
4307     skomega = SII(SII)
4308             ~~> I(SII)(I(SII))
4309             ~~> SII(SII)
4310
4311 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
4312 here, though we'll use the same symbol, `Ω`.  If we consider the term
4313
4314     KIΩ == KI(SII(SII))
4315
4316 we can choose to reduce the leftmost redex by firing the reduction
4317 rule for `K`, in which case the term reduces to the normal form `I` in
4318 one step; or we can choose to reduce the skomega part, by firing the
4319 reduction rule fo `S`, in which case we do not get a normal form,
4320 we're headed towards an infinite loop.
4321
4322 With sum types, we can define terms in CL in OCaml as follows:
4323
4324     type term = I | S | K | FA of (term * term)
4325
4326     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
4327     let test1 = FA (FA (K,I), skomega)
4328
4329 This recursive type definition says that a term in CL is either one of
4330 the three simple expressions, or else a pair of CL expressions.
4331 Following Heim and Kratzer, `FA` stands for Functional Application.
4332 With this type definition, we can encode skomega, as well as other
4333 terms whose reduction behavior we want to control.
4334
4335 Using pattern matching, it is easy to code the one-step reduction
4336 rules for CL:
4337
4338     let reduce_one_step (t:term):term = match t with
4339         FA(I,a) -> a
4340       | FA(FA(K,a),b) -> a
4341       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
4342       | _ -> t
4343
4344     # reduce_one_step (FA(FA(K,S),I));;
4345     - : term = S
4346     # reduce_one_step skomega;;
4347     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
4348
4349 The need to explicitly insert the type constructor `FA` obscures
4350 things a bit, but it's still possible to see how the one-step
4351 reduction function is just the reduction rules for CL.  The
4352 OCaml interpreter shows us that the function faithfully recognizes
4353 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
4354
4355 We can now say precisely what it means to be a redex in CL.
4356
4357     let is_redex (t:term):bool = not (t = reduce_one_step t)
4358
4359     # is_redex K;;
4360     - : bool = false
4361     # is_redex (FA(K,I));;
4362     - : bool = false
4363     # is_redex (FA(FA(K,I),S));;
4364     - : bool = true
4365     # is_redex skomega;;
4366     - : book = true
4367
4368 Warning: this definition relies on the fact that the one-step
4369 reduction of a CL term is never identical to the original term.  This
4370 would not work for the untyped lambda calculus, since
4371 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
4372 order to decide whether two terms are equal, OCaml has to recursively
4373 compare the elements of complex CL terms.  It is able to figure out
4374 how to do this because we provided an explicit definition of the
4375 datatype Term.
4376
4377 As you would expect, a term in CL is in normal form when it contains
4378 no redexes.
4379
4380 In order to fully reduce a term, we need to be able to reduce redexes
4381 that are not at the top level of the term.
4382
4383     (KKI)SI ~~> KSI ~~> S
4384
4385 That is, we want to be able to first evaluate the redex `KKI` that is
4386 a proper subpart of the larger term to produce a new intermediate term
4387 that we can then evaluate to the final normal form.
4388
4389 Because we need to process subparts, and because the result after
4390 processing a subpart may require further processing, the recursive
4391 structure of our evaluation function has to be quite subtle.  To truly
4392 understand it, you will need to do some sophisticated thinking about
4393 how recursion works.  We'll show you how to keep track of what is
4394 going on by constructing an recursive execution trace of inputs and
4395 outputs.
4396
4397 We'll develop our full reduction function in stages.  Once we have it
4398 working, we'll then consider some variants.  Just to be precise, we'll
4399 distinguish each microvariant with its own index number embedded in
4400 its name.
4401
4402     let rec reduce1 (t:term):term =
4403       if (is_redex t) then reduce1 (reduce_one_step t)
4404                       else t
4405
4406 If the input is a redex, we ship it off to `reduce_one_step` for
4407 processing.  But just in case the result of the one-step reduction is
4408 itself a redex, we recursively call `reduce1`.  The recursion will
4409 continue until the result is no longer a redex.
4410
4411     #trace reduce1;;
4412     reduce1 is now traced.
4413     # reduce1 (FA (I, FA (I, K)));;
4414     reduce1 <-- FA (I, FA (I, K))
4415       reduce1 <-- FA (I, K)
4416         reduce1 <-- K
4417         reduce1 --> K
4418       reduce1 --> K
4419     reduce1 --> K
4420     - : term = K
4421
4422 Since the initial input (`I(IK)`) is a redex, the result after the
4423 one-step reduction is `IK`.  We recursively call `reduce1` on this
4424 input.  Since `IK` is itself a redex, the result after one-step
4425 reduction is `K`.  We recursively call `reduce1` on this input.  Since
4426 `K` is not a redex, the recursion bottoms out, and we simply return
4427 it.
4428
4429 But this function doesn't do enough reduction.
4430
4431     # reduce1 (FA (FA (I, I), K));;
4432     - : term = FA (FA (I, I), K)
4433
4434 Because the top-level term is not a redex, `reduce1` returns it
4435 without any evaluation.  What we want is to evaluate the subparts of a
4436 complex term.
4437
4438     let rec reduce2 (t:term):term = match t with
4439         I -> I
4440       | K -> K
4441       | S -> S
4442       | FA (a, b) ->
4443           let t' = FA (reduce2 a, reduce2 b) in
4444             if (is_redex t') then reduce2 (reduce_one_step t')
4445                              else t'
4446
4447 Since we need access to the subterms, we do pattern matching on the
4448 input term.  If the input is simple, we return it.  If the input is
4449 complex, we first process the subexpressions, and only then see if we
4450 have a redex.  To understand how this works, follow the trace
4451 carefully:
4452
4453     # reduce2 (FA(FA(I,I),K));;
4454     reduce2 <-- FA (FA (I, I), K)
4455
4456       reduce2 <-- K          ; first main recursive call
4457       reduce2 --> K
4458
4459       reduce2 <-- FA (I, I)  ; second main recursive call
4460         reduce2 <-- I
4461         reduce2 --> I
4462         reduce2 <-- I
4463         reduce2 --> I
4464       reduce2 <-- I
4465
4466       reduce2 --> I           ; third main recursive call
4467       reduce2 --> I
4468
4469       reduce2 <-- K           ; fourth
4470       reduce2 --> K
4471     reduce2 --> K
4472     - : term = K
4473
4474 Ok, there's a lot going on here.  Since the input is complex, the
4475 first thing the function does is construct `t'`.  In order to do this,
4476 it must reduce the two main subexpressions, `II` and `K`.  But we see
4477 from the trace that it begins with the right-hand expression, `K`.  We
4478 didn't explicitly tell it to begin with the right-hand subexpression,
4479 so control over evaluation order is starting to spin out of our
4480 control.  (We'll get it back later, don't worry.)
4481
4482 In any case, in the second main recursive call, we evaluate `II`.  The
4483 result is `I`.  The third main recursive call tests whether this
4484 result needs any further processing; it doesn't.
4485
4486 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
4487 redex, we ship it off to reduce_one_step, getting the term `K` as a
4488 result.  The fourth recursive call checks that there is no more
4489 reduction work to be done (there isn't), and that's our final result.
4490
4491 You can see in more detail what is going on by tracing both reduce2
4492 and reduce_one_step, but that makes for some long traces.
4493
4494 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
4495 K`, as desired.
4496
4497 Because the OCaml interpreter evaluates the rightmost expression
4498 in the course of building `t'`, however, it will always evaluate the
4499 right hand subexpression, whether it needs to or not.  And sure
4500 enough,
4501
4502     # reduce2 (FA(FA(K,I),skomega));;
4503       C-c C-cInterrupted.
4504
4505 instead of performing the leftmost reduction first, and recognizing
4506 that this term reduces to the normal form `I`, we get lost endlessly
4507 trying to reduce skomega.
4508
4509 To emphasize that our evaluation order here is at the mercy of the
4510 evaluation order of OCaml, here is the exact same program translated
4511 into Haskell.  We'll put them side by side to emphasize the exact parallel.
4512
4513 <pre>
4514 OCaml                                                          Haskell
4515 ==========================================================     =========================================================
4516
4517 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
4518
4519 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
4520
4521                                                                reduce_one_step :: Term -> Term
4522 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
4523     FA(I,a) -> a                                                 FA I a -> a
4524   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
4525   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
4526   | _ -> t                                                       _ -> t
4527
4528                                                                is_redex :: Term -> Bool
4529 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
4530
4531                                                                reduce2 :: Term -> Term
4532 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
4533     I -> I                                                       I -> I
4534   | K -> K                                                       K -> K
4535   | S -> S                                                       S -> S
4536   | FA (a, b) ->                                                 FA a b ->
4537       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
4538         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
4539                          else t'                                                      else t'
4540 </pre>
4541
4542 There are some differences in the way types are made explicit, and in
4543 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
4544 Haskell).  But the two programs are essentially identical.
4545
4546 Yet the Haskell program finds the normal form for KI -->
4547
4548 [[!toc levels=2]]
4549
4550 # Reasoning about evaluation order in Combinatory Logic
4551
4552 We've discussed evaluation order before, primarily in connection with
4553 the untyped lambda calculus.  Whenever a term has more than one redex,
4554 we have to choose which one to reduce, and this choice can make a
4555 difference.  For instance, in the term `((\x.I)(ωω)`, if we reduce the
4556 leftmost redex first, the term reduces to the normal form `I` in one
4557 step.  But if we reduce the left most redex instead (namely, `(ωω)`),
4558 we do not arrive at a normal form, and are in danger of entering an
4559 infinite loop.
4560
4561 Thanks to the introduction of sum types (disjoint union), we are now
4562 in a position to gain a deeper understanding by writing a program that
4563 allows us to experiment with different evaluation order strategies.
4564
4565 One thing we'll see is that it is all too easy for the evaluation
4566 order properties of an evaluator to depend on the evaluation order
4567 properties of the programming language in which the evaluator is
4568 written.  We will seek to write an evaluator in which the order of
4569 evaluation is insensitive to the evaluator language.  The goal is to
4570 find an order-insensitive way to reason about evaluation order.
4571
4572 The first evaluator we develop will evaluate terms in Combinatory
4573 Logic.  That will significantly simplify the program, since we won't
4574 need to worry about variables or substitution.  As we develop and
4575 extend our evaluator in future weeks, we'll switch to lambdas, but for
4576 now, working with the elegant simplicity of Combinatory Logic will
4577 make the evaluation order issues easier to grasp.
4578
4579 A brief review: a term in CL is the combination of three basic
4580 expressions, `S`, `K`, and `I`, governed by the following reduction
4581 rules:
4582
4583     Ia ~~> a
4584     Kab ~~> b
4585     Sabc ~~> ac(bc)
4586
4587 where `a`, `b`, and `c` stand for an arbitrary term of CL.  We've seen
4588 that how to embed the untyped lambda calculus in CL, so it's no
4589 surprise that the same evaluation order issues arise in CL.
4590
4591     skomega = SII(SII)
4592             ~~> I(SII)(I(SII))
4593             ~~> SII(SII)
4594
4595 Instead of the lambda term `Ω`, we'll use the CL term skomega, defined
4596 here, though we'll use the same symbol, `Ω`.  If we consider the term
4597
4598     KIΩ == KI(SII(SII))
4599
4600 we can choose to reduce the leftmost redex by firing the reduction
4601 rule for `K`, in which case the term reduces to the normal form `I` in
4602 one step; or we can choose to reduce the skomega part, by firing the
4603 reduction rule fo `S`, in which case we do not get a normal form,
4604 we're headed towards an infinite loop.
4605
4606 With sum types, we can define terms in CL in OCaml as follows:
4607
4608     type term = I | S | K | FA of (term * term)
4609
4610     let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))
4611     let test1 = FA (FA (K,I), skomega)
4612
4613 This recursive type definition says that a term in CL is either one of
4614 the three simple expressions, or else a pair of CL expressions.
4615 Following Heim and Kratzer, `FA` stands for Functional Application.
4616 With this type definition, we can encode skomega, as well as other
4617 terms whose reduction behavior we want to control.
4618
4619 Using pattern matching, it is easy to code the one-step reduction
4620 rules for CL:
4621
4622     let reduce_one_step (t:term):term = match t with
4623         FA(I,a) -> a
4624       | FA(FA(K,a),b) -> a
4625       | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))
4626       | _ -> t
4627
4628     # reduce_one_step (FA(FA(K,S),I));;
4629     - : term = S
4630     # reduce_one_step skomega;;
4631     - : term = FA (FA (I, FA (FA (S, I), I)), FA (I, FA (FA (S, I), I)))
4632
4633 The need to explicitly insert the type constructor `FA` obscures
4634 things a bit, but it's still possible to see how the one-step
4635 reduction function is just the reduction rules for CL.  The
4636 OCaml interpreter shows us that the function faithfully recognizes
4637 that `KSI ~~> S`, and `skomega ~~> I(SII)(I(SII))`.
4638
4639 We can now say precisely what it means to be a redex in CL.
4640
4641     let is_redex (t:term):bool = not (t = reduce_one_step t)
4642
4643     # is_redex K;;
4644     - : bool = false
4645     # is_redex (FA(K,I));;
4646     - : bool = false
4647     # is_redex (FA(FA(K,I),S));;
4648     - : bool = true
4649     # is_redex skomega;;
4650     - : book = true
4651
4652 Warning: this definition relies on the fact that the one-step
4653 reduction of a CL term is never identical to the original term.  This
4654 would not work for the untyped lambda calculus, since
4655 `((\x.xx)(\x.xx)) ~~> ((\x.xx)(\x.xx))` in one step.  Note that in
4656 order to decide whether two terms are equal, OCaml has to recursively
4657 compare the elements of complex CL terms.  It is able to figure out
4658 how to do this because we provided an explicit definition of the
4659 datatype Term.
4660
4661 As you would expect, a term in CL is in normal form when it contains
4662 no redexes.
4663
4664 In order to fully reduce a term, we need to be able to reduce redexes
4665 that are not at the top level of the term.
4666
4667     (KKI)SI ~~> KSI ~~> S
4668
4669 That is, we want to be able to first evaluate the redex `KKI` that is
4670 a proper subpart of the larger term to produce a new intermediate term
4671 that we can then evaluate to the final normal form.
4672
4673 Because we need to process subparts, and because the result after
4674 processing a subpart may require further processing, the recursive
4675 structure of our evaluation function has to be quite subtle.  To truly
4676 understand it, you will need to do some sophisticated thinking about
4677 how recursion works.  We'll show you how to keep track of what is
4678 going on by constructing an recursive execution trace of inputs and
4679 outputs.
4680
4681 We'll develop our full reduction function in stages.  Once we have it
4682 working, we'll then consider some variants.  Just to be precise, we'll
4683 distinguish each microvariant with its own index number embedded in
4684 its name.
4685
4686     let rec reduce1 (t:term):term =
4687       if (is_redex t) then reduce1 (reduce_one_step t)
4688                       else t
4689
4690 If the input is a redex, we ship it off to `reduce_one_step` for
4691 processing.  But just in case the result of the one-step reduction is
4692 itself a redex, we recursively call `reduce1`.  The recursion will
4693 continue until the result is no longer a redex.
4694
4695     #trace reduce1;;
4696     reduce1 is now traced.
4697     # reduce1 (FA (I, FA (I, K)));;
4698     reduce1 <-- FA (I, FA (I, K))
4699       reduce1 <-- FA (I, K)
4700         reduce1 <-- K
4701         reduce1 --> K
4702       reduce1 --> K
4703     reduce1 --> K
4704     - : term = K
4705
4706 Since the initial input (`I(IK)`) is a redex, the result after the
4707 one-step reduction is `IK`.  We recursively call `reduce1` on this
4708 input.  Since `IK` is itself a redex, the result after one-step
4709 reduction is `K`.  We recursively call `reduce1` on this input.  Since
4710 `K` is not a redex, the recursion bottoms out, and we simply return
4711 it.
4712
4713 But this function doesn't do enough reduction.
4714
4715     # reduce1 (FA (FA (I, I), K));;
4716     - : term = FA (FA (I, I), K)
4717
4718 Because the top-level term is not a redex, `reduce1` returns it
4719 without any evaluation.  What we want is to evaluate the subparts of a
4720 complex term.
4721
4722     let rec reduce2 (t:term):term = match t with
4723         I -> I
4724       | K -> K
4725       | S -> S
4726       | FA (a, b) ->
4727           let t' = FA (reduce2 a, reduce2 b) in
4728             if (is_redex t') then reduce2 (reduce_one_step t')
4729                              else t'
4730
4731 Since we need access to the subterms, we do pattern matching on the
4732 input term.  If the input is simple, we return it.  If the input is
4733 complex, we first process the subexpressions, and only then see if we
4734 have a redex.  To understand how this works, follow the trace
4735 carefully:
4736
4737     # reduce2 (FA(FA(I,I),K));;
4738     reduce2 <-- FA (FA (I, I), K)
4739
4740       reduce2 <-- K          ; first main recursive call
4741       reduce2 --> K
4742
4743       reduce2 <-- FA (I, I)  ; second main recursive call
4744         reduce2 <-- I
4745         reduce2 --> I
4746         reduce2 <-- I
4747         reduce2 --> I
4748       reduce2 <-- I
4749
4750       reduce2 --> I           ; third main recursive call
4751       reduce2 --> I
4752
4753       reduce2 <-- K           ; fourth
4754       reduce2 --> K
4755     reduce2 --> K
4756     - : term = K
4757
4758 Ok, there's a lot going on here.  Since the input is complex, the
4759 first thing the function does is construct `t'`.  In order to do this,
4760 it must reduce the two main subexpressions, `II` and `K`.  But we see
4761 from the trace that it begins with the right-hand expression, `K`.  We
4762 didn't explicitly tell it to begin with the right-hand subexpression,
4763 so control over evaluation order is starting to spin out of our
4764 control.  (We'll get it back later, don't worry.)
4765
4766 In any case, in the second main recursive call, we evaluate `II`.  The
4767 result is `I`.  The third main recursive call tests whether this
4768 result needs any further processing; it doesn't.
4769
4770 At this point, we have constructed `t' == FA(I,K)`.  Since that's a
4771 redex, we ship it off to reduce_one_step, getting the term `K` as a
4772 result.  The fourth recursive call checks that there is no more
4773 reduction work to be done (there isn't), and that's our final result.
4774
4775 You can see in more detail what is going on by tracing both reduce2
4776 and reduce_one_step, but that makes for some long traces.
4777
4778 So we've solved our first problem: reduce2 recognizes that `IIK ~~>
4779 K`, as desired.
4780
4781 Because the OCaml interpreter evaluates the rightmost expression
4782 in the course of building `t'`, however, it will always evaluate the
4783 right hand subexpression, whether it needs to or not.  And sure
4784 enough,
4785
4786     # reduce2 (FA(FA(K,I),skomega));;
4787       C-c C-cInterrupted.
4788
4789 instead of performing the leftmost reduction first, and recognizing
4790 that this term reduces to the normal form `I`, we get lost endlessly
4791 trying to reduce skomega.
4792
4793 To emphasize that our evaluation order here is at the mercy of the
4794 evaluation order of OCaml, here is the exact same program translated
4795 into Haskell.  We'll put them side by side to emphasize the exact parallel.
4796
4797 <pre>
4798 OCaml                                                          Haskell
4799 ==========================================================     =========================================================
4800
4801 type term = I | S | K | FA of (term * term)                    data Term = I | S | K | FA Term Term deriving (Eq, Show)
4802
4803 let skomega = FA (FA (FA (S,I), I), FA (FA (S,I), I))          skomega = (FA (FA (FA S I) I) (FA (FA S I) I))
4804
4805                                                                reduce_one_step :: Term -> Term
4806 let reduce_one_step (t:term):term = match t with               reduce_one_step t = case t of
4807     FA(I,a) -> a                                                 FA I a -> a
4808   | FA(FA(K,a),b) -> a                                           FA (FA K a) b -> a
4809   | FA(FA(FA(S,a),b),c) -> FA(FA(a,c),FA(b,c))                   FA (FA (FA S a) b) c -> FA (FA a c) (FA b c)
4810   | _ -> t                                                       _ -> t
4811
4812                                                                is_redex :: Term -> Bool
4813 let is_redex (t:term):bool = not (t = reduce_one_step t)       is_redex t = not (t == reduce_one_step t)
4814
4815                                                                reduce2 :: Term -> Term
4816 let rec reduce2 (t:term):term = match t with                   reduce2 t = case t of
4817     I -> I                                                       I -> I
4818   | K -> K                                                       K -> K
4819   | S -> S                                                       S -> S
4820   | FA (a, b) ->                                                 FA a b ->
4821       let t' = FA (reduce2 a, reduce2 b) in                        let t' = FA (reduce2 a) (reduce2 b) in
4822         if (is_redex t') then reduce2 (reduce_one_step t')           if (is_redex t') then reduce2 (reduce_one_step t')
4823                          else t'                                                      else t'
4824 </pre>
4825
4826 There are some differences in the way types are specified, and in
4827 the way terms are specified (`FA(a,b)` for Ocaml versus `FA a b` for
4828 Haskell).  But the two programs are essentially identical.
4829
4830 Yet the Haskell program finds the normal form for KIΩ.
4831
4832     *Main> reduce2 (FA (FA K I) skomega)
4833     I
4834
4835 Woa!  First of all, this is wierd.  Haskell's evaluation strategy is
4836 called "lazy".  Apparently, Haskell is so lazy that even after we've
4837 asked it to construct t' by evaluating `reduce2 a` and `reduce2 b`, it
4838 doesn't bother computing `reduce2 b`.  Instead, it waits to see if we
4839 ever really need to use the result.
4840
4841 So despite intuitions, the program as written does NOT determine
4842 evaluation order behavior.  At this stage, we have defined an
4843 evaluation order that still depends on the evaluation order of the
4844 underlying interpreter.
4845
4846 There are two questions we could ask: Can we adjust the OCaml
4847 evaluator to exhibit lazy behavior?  and Can we adjust the Haskell
4848 evaluator to exhibit eager behavior?  The answer to the first question
4849 is easy and interesting, and we'll give it right away.  The answer to
4850 the second question is also interesting, but not easy.  There are
4851 various tricks available in Haskell we could use (such as the `seq`
4852 operator), but a fully general, satisifying resolution will have to
4853 wait until we have Continuation Passing Transforms.
4854
4855 The answer to the first question (Can we adjust the OCaml evaluator to
4856 exhibit lazy behavior?) is quite simple:
4857
4858 <pre>
4859 let rec reduce3 (t:term):term = match t with
4860     I -> I
4861   | K -> K
4862   | S -> S
4863   | FA (a, b) ->
4864       let t' = FA (reduce3 a, b) in
4865         if (is_redex t') then reduce3 (reduce_one_step t')
4866                          else t'
4867 </pre>
4868
4869 There is only one small difference: instead of setting `t'` to
4870 `FA (reduce a, reduce b)`, we have `FA (reduce a, b)`.  That is, we
4871 don't evaluate the right-hand subexpression at all.  Ever!  The only
4872 way to get evaluated is to somehow get into functor position.
4873
4874     # reduce3 (FA(FA(K,I),skomega));;
4875     - : term = I
4876     # reduce3 skomega;;
4877     C-c C-cInterrupted.
4878
4879 The evaluator now has no trouble finding the normal form for `KIΩ`,
4880 but evaluating skomega still gives an infinite loop.
4881
4882 As a final note, we can clarify the larger question at the heart of
4883 this discussion:
4884
4885 *How can we can we specify the evaluation order of a computational
4886 system in a way that is completely insensitive to the evaluation order
4887 of the specification language?*