topics/_cps.mdwn

   1 **Note to Chris**: [[don't forget this material to be merged in somehow|/topics/_cps_and_continuation_operators]]. I marked where I cut some material to put into week13_control_operators, but that page is still a work in progress in my browser...
   2
   3
   4 Gaining control over order of evaluation
   5 ----------------------------------------
   6
   7 We know that evaluation order matters.  We're beginning to learn how
   8 to gain some control over order of evaluation (think of Jim's abort handler).
   9 We continue to reason about order of evaluation.
  10
  11 A lucid discussion of evaluation order in the
  12 context of the lambda calculus can be found here:
  13 [Sestoft: Demonstrating Lambda Calculus Reduction](http://www.itu.dk/~sestoft/papers/mfps2001-sestoft.pdf).
  14 Sestoft also provides a lovely on-line lambda evaluator:
  15 [Sestoft: Lambda calculus reduction workbench](http://www.itu.dk/~sestoft/lamreduce/index.html),
  16 which allows you to select multiple evaluation strategies,
  17 and to see reductions happen step by step.
  18
  19 Evaluation order matters
  20 ------------------------
  21
  22 We've seen this many times.  For instance, consider the following
  23 reductions.  It will be convenient to use the abbreviation `w =
  24 \x.xx`.  I'll
  25 indicate which lambda is about to be reduced with a * underneath:
  26
  27 <pre>
  28 (\x.y)(ww)
  29  *
  30 y
  31 </pre>
  32
  33 Done!  We have a normal form.  But if we reduce using a different
  34 strategy, things go wrong:
  35
  36 <pre>
  37 (\x.y)(ww) =
  38 (\x.y)((\x.xx)w) =
  39         *
  40 (\x.y)(ww) =
  41 (\x.y)((\x.xx)w) =
  42         *
  43 (\x.y)(ww)
  44 </pre>
  45
  46 Etc.
  47
  48 As a second reminder of when evaluation order matters, consider using
  49 `Y = \f.(\h.f(hh))(\h.f(hh))` as a fixed point combinator to define a recursive function:
  50
  51 <pre>
  52 Y (\f n. blah) =
  53 (\f.(\h.f(hh))(\h.f(hh))) (\f n. blah)
  54      *
  55 (\f.f((\h.f(hh))(\h.f(hh)))) (\f n. blah)
  56        *
  57 (\f.f(f((\h.f(hh))(\h.f(hh))))) (\f n. blah)
  58          *
  59 (\f.f(f(f((\h.f(hh))(\h.f(hh)))))) (\f n. blah)
  60 </pre>
  61
  62 And we never get the recursion off the ground.
  63
  64
  65 Using a Continuation Passing Style transform to control order of evaluation
  66 ---------------------------------------------------------------------------
  67
  68 We'll present a technique for controlling evaluation order by transforming a lambda term
  69 using a Continuation Passing Style transform (CPS), then we'll explore
  70 what the CPS is doing, and how.
  71
  72 In order for the CPS to work, we have to adopt a new restriction on
  73 beta reduction: beta reduction does not occur underneath a lambda.
  74 That is, `(\x.y)z` reduces to `z`, but `\u.(\x.y)z` does not reduce to
  75 `\u.z`, because the `\u` protects the redex in the body from
  76 reduction.  (In this context, a "redex" is a part of a term that matches
  77 the pattern `...((\xM)N)...`, i.e., something that can potentially be
  78 the target of beta reduction.)
  79
  80 Start with a simple form that has two different reduction paths:
  81
  82 reducing the leftmost lambda first: `(\x.y)((\x.z)u)  ~~> y`
  83
  84 reducing the rightmost lambda first: `(\x.y)((\x.z)u)  ~~> (\x.y)z ~~> y`
  85
  86 After using the following call-by-name CPS transform---and assuming
  87 that we never evaluate redexes protected by a lambda---only the first
  88 reduction path will be available: we will have gained control over the
  89 order in which beta reductions are allowed to be performed.
  90
  91 Here's the CPS transform defined:
  92
  93     [x] = x
  94     [\xM] = \k.k(\x[M])
  95     [MN] = \k.[M](\m.m[N]k)
  96
  97 Here's the result of applying the transform to our simple example:
  98
  99     [(\x.y)((\x.z)u)] =
 100     \k.[\x.y](\m.m[(\x.z)u]k) =
 101     \k.(\k.k(\x.[y]))(\m.m(\k.[\x.z](\m.m[u]k))k) =
 102     \k.(\k.k(\x.y))(\m.m(\k.(\k.k(\x.z))(\m.muk))k)
 103
 104 Because the initial `\k` protects (i.e., takes scope over) the entire
 105 transformed term, we can't perform any reductions.  In order to watch
 106 the computation unfold, we have to apply the transformed term to a
 107 trivial continuation, usually the identity function `I = \x.x`.
 108
 109     [(\x.y)((\x.z)u)] I =
 110     (\k.[\x.y](\m.m[(\x.z)u]k)) I
 111      *
 112     [\x.y](\m.m[(\x.z)u] I) =
 113     (\k.k(\x.y))(\m.m[(\x.z)u] I)
 114      *           *
 115     (\x.y)[(\x.z)u] I           --A--
 116      *
 117     y I
 118
 119 The application to `I` unlocks the leftmost functor.  Because that
 120 functor (`\x.y`) throws away its argument (consider the reduction in the
 121 line marked (A)), we never need to expand the
 122 CPS transform of the argument.  This means that we never bother to
 123 reduce redexes inside the argument.
 124
 125 Compare with a call-by-value xform:
 126
 127     {x} = \k.kx
 128     {\aM} = \k.k(\a{M})
 129     {MN} = \k.{M}(\m.{N}(\n.mnk))
 130
 131 This time the reduction unfolds in a different manner:
 132
 133     {(\x.y)((\x.z)u)} I =
 134     (\k.{\x.y}(\m.{(\x.z)u}(\n.mnk))) I
 135      *
 136     {\x.y}(\m.{(\x.z)u}(\n.mnI)) =
 137     (\k.k(\x.{y}))(\m.{(\x.z)u}(\n.mnI))
 138      *             *
 139     {(\x.z)u}(\n.(\x.{y})nI) =
 140     (\k.{\x.z}(\m.{u}(\n.mnk)))(\n.(\x.{y})nI)
 141      *
 142     {\x.z}(\m.{u}(\n.mn(\n.(\x.{y})nI))) =
 143     (\k.k(\x.{z}))(\m.{u}(\n.mn(\n.(\x.{y})nI)))
 144      *             *
 145     {u}(\n.(\x.{z})n(\n.(\x.{y})nI)) =
 146     (\k.ku)(\n.(\x.{z})n(\n.(\x.{y})nI))
 147      *      *
 148     (\x.{z})u(\n.(\x.{y})nI)       --A--
 149      *
 150     {z}(\n.(\x.{y})nI) =
 151     (\k.kz)(\n.(\x.{y})nI)
 152      *      *
 153     (\x.{y})zI
 154      *
 155     {y}I =
 156     (\k.ky)I
 157      *
 158     I y
 159
 160 In this case, the argument does get evaluated: consider the reduction
 161 in the line marked (A).
 162
 163 Both xforms make the following guarantee: as long as redexes
 164 underneath a lambda are never evaluated, there will be at most one
 165 reduction available at any step in the evaluation.
 166 That is, all choice is removed from the evaluation process.
 167
 168 Now let's verify that the CBN CPS avoids the infinite reduction path
 169 discussed above (remember that `w = \x.xx`):
 170
 171     [(\x.y)(ww)] I =
 172     (\k.[\x.y](\m.m[ww]k)) I
 173      *
 174     [\x.y](\m.m[ww]I) =
 175     (\k.k(\x.y))(\m.m[ww]I)
 176      *             *
 177     (\x.y)[ww]I
 178      *
 179     y I
 180
 181
 182 Questions and exercises:
 183
 184 1. Prove that {(\x.y)(ww)} does not terminate.
 185
 186 2. Why is the CBN xform for variables `[x] = x` instead of something
 187 involving kappas (i.e., `k`'s)?
 188
 189 3. Write an Ocaml function that takes a lambda term and returns a
 190 CPS-xformed lambda term.  You can use the following data declaration:
 191
 192     type form = Var of char | Abs of char * form | App of form * form;;
 193
 194 4. The discussion above talks about the "leftmost" redex, or the
 195 "rightmost".  But these words apply accurately only in a special set
 196 of terms.  Characterize the order of evaluation for CBN (likewise, for
 197 CBV) more completely and carefully.
 198
 199 5. What happens (in terms of evaluation order) when the application
 200 rule for CBV CPS is changed to `{MN} = \k.{N}(\n.{M}(\m.mnk))`?
 201
 202 6. A term and its CPS xform are different lambda terms.  Yet in some
 203 sense they "do" the same thing computationally.  Make this sense
 204 precise.
 205
 206
 207 Thinking through the types
 208 --------------------------
 209
 210 This discussion is based on [Meyer and Wand 1985](http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.44.7943&rep=rep1&type=pdf).
 211
 212 Let's say we're working in the simply-typed lambda calculus.
 213 Then if the original term is well-typed, the CPS xform will also be
 214 well-typed.  But what will the type of the transformed term be?
 215
 216 The transformed terms all have the form `\k.blah`.  The rule for the
 217 CBN xform of a variable appears to be an exception, but instead of
 218 writing `[x] = x`, we can write `[x] = \k.xk`, which is
 219 eta-equivalent.  The `k`'s are continuations: functions from something
 220 to a result.  Let's use &sigma; as the result type.  The each `k` in
 221 the transform will be a function of type &rho; --> &sigma; for some
 222 choice of &rho;.
 223
 224 We'll need an ancilliary function ': for any ground type a, a' = a;
 225 for functional types a->b, (a->b)' = ((a' -> &sigma;) -> &sigma;) -> (b' -> &sigma;) -> &sigma;.
 226
 227     Call by name transform
 228
 229     Terms                            Types
 230
 231     [x] = \k.xk                      [a] = (a'->o)->o
 232     [\xM] = \k.k(\x[M])              [a->b] = ((a->b)'->o)->o
 233     [MN] = \k.[M](\m.m[N]k)          [b] = (b'->o)->o
 234
 235 Remember that types associate to the right.  Let's work through the
 236 application xform and make sure the types are consistent.  We'll have
 237 the following types:
 238
 239     M:a->b
 240     N:a
 241     MN:b
 242     k:b'->o
 243     [N]:(a'->o)->o
 244     m:((a'->o)->o)->(b'->o)->o
 245     m[N]:(b'->o)->o
 246     m[N]k:o
 247     [M]:((a->b)'->o)->o = ((((a'->o)->o)->(b'->o)->o)->o)->o
 248     [M](\m.m[N]k):o
 249     [MN]:(b'->o)->o
 250
 251 Be aware that even though the transform uses the same symbol for the
 252 translation of a variable (i.e., `[x] = x`), in general the variable
 253 in the transformed term will have a different type than in the source
 254 term.
 255
 256 Excercise: what should the function ' be for the CBV xform?  Hint:
 257 see the Meyer and Wand abstract linked above for the answer.
 258
 259
 260 Other CPS transforms
 261 --------------------
 262
 263 It is easy to think that CBN and CBV are the only two CPS transforms.
 264 (We've already seen a variant on call-by-value one of the excercises above.)
 265
 266 In fact, the number of distinct transforms is unbounded.  For
 267 instance, here is a variant of CBV that uses the same types as CBN:
 268
 269     <x> = x
 270     <\xM> = \k.k(\x<M>)
 271     <MN> = \k.<M>(\m.<N>(\n.m(\k.kn)k))
 272
 273 Try reducing `<(\x.x) ((\y.y) (\z.z))> I` to convince yourself that
 274 this is a version of call-by-value.
 275
 276 Once we have two evaluation strategies that rely on the same types, we
 277 can mix and match:
 278
 279     [x] = x
 280     <x> = x
 281     [\xM] = \k.k(\x<M>)
 282     <\xM] = \k.k(\x[M])
 283     [MN] = \k.<M>(\m.m<N>k)
 284     <MN> = \k.[M](\m.[N](\n.m(\k.kn)k))
 285
 286 This xform interleaves call-by-name and call-by-value in layers,
 287 according to the depth of embedding.
 288 (Cf. page 4 of Reynold's 1974 paper ftp://ftp.cs.cmu.edu/user/jcr/reldircont.pdf (equation (4) and the
 289 explanation in the paragraph below.)