changes

[lambda.git] / cps.mdwn
diff --git a/cps.mdwn b/cps.mdwn

index 73edf0d..6668b48 100644 (file)
--- a/cps.mdwn
+++ b/cps.mdwn
@@ -9,8 +9,7 @@ A lucid discussion of evaluation order in the
  context of the lambda calculus can be found here:
  [Sestoft: Demonstrating Lambda Calculus Reduction](http://www.itu.dk/~sestoft/papers/mfps2001-sestoft.pdf).
  Sestoft also provides a lovely on-line lambda evaluator:
  context of the lambda calculus can be found here:
  [Sestoft: Demonstrating Lambda Calculus Reduction](http://www.itu.dk/~sestoft/papers/mfps2001-sestoft.pdf).
  Sestoft also provides a lovely on-line lambda evaluator:
-[Sestoft: Lambda calculus reduction workbench]
-(http://www.itu.dk/~sestoft/lamreduce/index.html),
+[Sestoft: Lambda calculus reduction workbench](http://www.itu.dk/~sestoft/lamreduce/index.html),
  which allows you to select multiple evaluation strategies, 
  and to see reductions happen step by step.
  
  which allows you to select multiple evaluation strategies, 
  and to see reductions happen step by step.
  
@@ -19,8 +18,8 @@ Evaluation order matters
  
  We've seen this many times.  For instance, consider the following
  reductions.  It will be convenient to use the abbreviation `w =
  
  We've seen this many times.  For instance, consider the following
  reductions.  It will be convenient to use the abbreviation `w =
-\x.xx`.  I'll indicate which lambda is about to be reduced with a *
-underneath:
+\x.xx`.  I'll
+indicate which lambda is about to be reduced with a * underneath:
  
  <pre>
  (\x.y)(ww)
  
  <pre>
  (\x.y)(ww)
@@ -63,105 +62,139 @@ And we never get the recursion off the ground.
  Using a Continuation Passing Style transform to control order of evaluation
  ---------------------------------------------------------------------------
  
  Using a Continuation Passing Style transform to control order of evaluation
  ---------------------------------------------------------------------------
  
-We'll exhibit and explore the technique of transforming a lambda term
+We'll present a technique for controlling evaluation order by transforming a lambda term
  using a Continuation Passing Style transform (CPS), then we'll explore
  what the CPS is doing, and how.
  
  In order for the CPS to work, we have to adopt a new restriction on
  beta reduction: beta reduction does not occur underneath a lambda.
  using a Continuation Passing Style transform (CPS), then we'll explore
  what the CPS is doing, and how.
  
  In order for the CPS to work, we have to adopt a new restriction on
  beta reduction: beta reduction does not occur underneath a lambda.
-That is, `(\x.y)z` reduces to `z`, but `\w.(\x.y)z` does not, because
-the `\w` protects the redex in the body from reduction.  
+That is, `(\x.y)z` reduces to `z`, but `\u.(\x.y)z` does not reduce to
+`\u.z`, because the `\u` protects the redex in the body from
+reduction.  (In this context, a "redex" is a part of a term that matches
+the pattern `...((\xM)N)...`, i.e., something that can potentially be
+the target of beta reduction.)
  
  Start with a simple form that has two different reduction paths:
  
  
  Start with a simple form that has two different reduction paths:
  
-reducing the leftmost lambda first: `(\x.y)((\x.z)w)  ~~> y'
+reducing the leftmost lambda first: `(\x.y)((\x.z)u)  ~~> y`
  
  
-reducing the rightmost lambda first: `(\x.y)((\x.z)w)  ~~> (x.y)z ~~> y'
+reducing the rightmost lambda first: `(\x.y)((\x.z)u)  ~~> (\x.y)z ~~> y`
  
  After using the following call-by-name CPS transform---and assuming
  that we never evaluate redexes protected by a lambda---only the first
  reduction path will be available: we will have gained control over the
  order in which beta reductions are allowed to be performed.
  
  
  After using the following call-by-name CPS transform---and assuming
  that we never evaluate redexes protected by a lambda---only the first
  reduction path will be available: we will have gained control over the
  order in which beta reductions are allowed to be performed.
  
-Here's the CPS transform:
+Here's the CPS transform defined:
  
  
-    [x] => x
-    [\xM] => \k.k(\x[M])
-    [MN] => \k.[M](\m.m[N]k)
+    [x] = x
+    [\xM] = \k.k(\x[M])
+    [MN] = \k.[M](\m.m[N]k)
  
  
-Here's the result of applying the transform to our problem term:
+Here's the result of applying the transform to our simple example:
  
  
-    [(\x.y)((\x.z)w)]
-    \k.[\x.y](\m.m[(\x.z)w]k)
-    \k.(\k.k(\x.[y]))(\m.m(\k.[\x.z](\m.m[w]k))k)
-    \k.(\k.k(\x.y))(\m.m(\k.(\k.k(\x.z))(\m.mwk))k)
+    [(\x.y)((\x.z)u)] =
+    \k.[\x.y](\m.m[(\x.z)u]k) =
+    \k.(\k.k(\x.[y]))(\m.m(\k.[\x.z](\m.m[u]k))k) =
+    \k.(\k.k(\x.y))(\m.m(\k.(\k.k(\x.z))(\m.muk))k)
  
  
-Because the initial `\k` protects the entire transformed term, 
-we can't perform any reductions.  In order to see the computation
-unfold, we have to apply the transformed term to a trivial
-continuation, usually the identity function `I = \x.x`.
+Because the initial `\k` protects (i.e., takes scope over) the entire
+transformed term, we can't perform any reductions.  In order to watch
+the computation unfold, we have to apply the transformed term to a
+trivial continuation, usually the identity function `I = \x.x`.
  
  
-    [(\x.y)((\x.z)w)] I
-    \k.[\x.y](\m.m[(\x.z)w]k) I
-    [\x.y](\m.m[(\x.z)w] I)
-    (\k.k(\x.y))(\m.m[(\x.z)w] I)
-    (\x.y)[(\x.z)w] I
+    [(\x.y)((\x.z)u)] I =
+    (\k.[\x.y](\m.m[(\x.z)u]k)) I
+     *
+    [\x.y](\m.m[(\x.z)u] I) =
+    (\k.k(\x.y))(\m.m[(\x.z)u] I)
+     *           *
+    (\x.y)[(\x.z)u] I           --A--
+     *
      y I
  
  The application to `I` unlocks the leftmost functor.  Because that
      y I
  
  The application to `I` unlocks the leftmost functor.  Because that
-functor (`\x.y`) throws away its argument, we never need to expand the
-CPS transform of the argument.
+functor (`\x.y`) throws away its argument (consider the reduction in the
+line marked (A)), we never need to expand the
+CPS transform of the argument.  This means that we never bother to
+reduce redexes inside the argument.
  
  Compare with a call-by-value xform:
  
  
  Compare with a call-by-value xform:
  
-    <x> => \k.kx
-    <\aM> => \k.k(\a<M>)
-    <MN> => \k.<M>(\m.<N>(\n.mnk))
+    {x} = \k.kx
+    {\aM} = \k.k(\a{M})
+    {MN} = \k.{M}(\m.{N}(\n.mnk))
  
  This time the reduction unfolds in a different manner:
  
  
  This time the reduction unfolds in a different manner:
  
-    <(\x.y)((\x.z)w)> I
-    (\k.<\x.y>(\m.<(\x.z)w>(\n.mnk))) I
-    <\x.y>(\m.<(\x.z)w>(\n.mnI))
-    (\k.k(\x.<y>))(\m.<(\x.z)w>(\n.mnI))
-    <(\x.z)w>(\n.(\x.<y>)nI)
-    (\k.<\x.z>(\m.<w>(\n.mnk)))(\n.(\x.<y>)nI)
-    <\x.z>(\m.<w>(\n.mn(\n.(\x.<y>)nI)))
-    (\k.k(\x.<z>))(\m.<w>(\n.mn(\n.(\x.<y>)nI)))
-    <w>(\n.(\x.<z>)n(\n.(\x.<y>)nI))
-    (\k.kw)(\n.(\x.<z>)n(\n.(\x.<y>)nI))
-    (\x.<z>)w(\n.(\x.<y>)nI)
-    <z>(\n.(\x.<y>)nI)
-    (\k.kz)(\n.(\x.<y>)nI)
-    (\x.<y>)zI
-    <y>I
+    {(\x.y)((\x.z)u)} I =
+    (\k.{\x.y}(\m.{(\x.z)u}(\n.mnk))) I
+     *
+    {\x.y}(\m.{(\x.z)u}(\n.mnI)) =
+    (\k.k(\x.{y}))(\m.{(\x.z)u}(\n.mnI))
+     *             *
+    {(\x.z)u}(\n.(\x.{y})nI) =
+    (\k.{\x.z}(\m.{u}(\n.mnk)))(\n.(\x.{y})nI)
+     *
+    {\x.z}(\m.{u}(\n.mn(\n.(\x.{y})nI))) =
+    (\k.k(\x.{z}))(\m.{u}(\n.mn(\n.(\x.{y})nI)))
+     *             *
+    {u}(\n.(\x.{z})n(\n.(\x.{y})nI)) =
+    (\k.ku)(\n.(\x.{z})n(\n.(\x.{y})nI))
+     *      *
+    (\x.{z})u(\n.(\x.{y})nI)       --A--
+     *
+    {z}(\n.(\x.{y})nI) =
+    (\k.kz)(\n.(\x.{y})nI)
+     *      *
+    (\x.{y})zI
+     *
+    {y}I =
      (\k.ky)I
      (\k.ky)I
+     *
      I y
  
      I y
  
+In this case, the argument does get evaluated: consider the reduction
+in the line marked (A).
+
  Both xforms make the following guarantee: as long as redexes
  underneath a lambda are never evaluated, there will be at most one
  Both xforms make the following guarantee: as long as redexes
  underneath a lambda are never evaluated, there will be at most one
-reduction avaialble at any step in the evaluation.
+reduction available at any step in the evaluation.
  That is, all choice is removed from the evaluation process.
  
  That is, all choice is removed from the evaluation process.
  
-Questions and excercises:
+Now let's verify that the CBN CPS avoids the infinite reduction path
+discussed above (remember that `w = \x.xx`):
  
  
-1. Why is the CBN xform for variables `[x] = x' instead of something
+    [(\x.y)(ww)] I =
+    (\k.[\x.y](\m.m[ww]k)) I
+     *
+    [\x.y](\m.m[ww]I) =
+    (\k.k(\x.y))(\m.m[ww]I)
+     *             *
+    (\x.y)[ww]I
+     *
+    y I
+
+
+Questions and exercises:
+
+1. Prove that {(\x.y)(ww)} does not terminate.
+
+2. Why is the CBN xform for variables `[x] = x' instead of something
  involving kappas?  
  
  involving kappas?  
  
-2. Write an Ocaml function that takes a lambda term and returns a
-CPS-xformed lambda term.
+3. Write an Ocaml function that takes a lambda term and returns a
+CPS-xformed lambda term.  You can use the following data declaration:
  
      type form = Var of char | Abs of char * form | App of form * form;;
  
  
      type form = Var of char | Abs of char * form | App of form * form;;
  
-3. What happens (in terms of evaluation order) when the application
-rule for CBN CPS is changed to `[MN] = \k.[N](\n.[M]nk)`?  Likewise,
-What happens when the application rule for CBV CPS is changed to `<MN>
-= \k.[N](\n.[M](\m.mnk))'?
-
-4. What happens when the application rules for the CPS xforms are changed to
+4. The discussion above talks about the "leftmost" redex, or the
+"rightmost".  But these words apply accurately only in a special set
+of terms.  Characterize the order of evaluation for CBN (likewise, for
+CBV) more completely and carefully.
  
  
-    [MN] = \k.<M>(\m.m<N>k)
-    <MN> = \k.[M](\m.[N](\n.mnk))
+5. What happens (in terms of evaluation order) when the application
+rule for CBV CPS is changed to `{MN} = \k.{N}(\n.{M}(\m.mnk))`?
  
  
  Thinking through the types
  
  
  Thinking through the types
@@ -175,22 +208,22 @@ well-typed.  But what will the type of the transformed term be?
  
  The transformed terms all have the form `\k.blah`.  The rule for the
  CBN xform of a variable appears to be an exception, but instead of
  
  The transformed terms all have the form `\k.blah`.  The rule for the
  CBN xform of a variable appears to be an exception, but instead of
-writing `[x] => x`, we can write `[x] => \k.xk`, which is
+writing `[x] = x`, we can write `[x] = \k.xk`, which is
  eta-equivalent.  The `k`'s are continuations: functions from something
  eta-equivalent.  The `k`'s are continuations: functions from something
-to a result.  Let's use $sigma; as the result type.  The each `k` in
-the transform will be a function of type `&rho; --> &sigma;` for some
+to a result.  Let's use &sigma; as the result type.  The each `k` in
+the transform will be a function of type &rho; --> &sigma; for some
  choice of &rho;.
  
  We'll need an ancilliary function ': for any ground type a, a' = a;
  choice of &rho;.
  
  We'll need an ancilliary function ': for any ground type a, a' = a;
-for functional types a->b, (a->b)' = a' -> (b' -> o) -> o.
+for functional types a->b, (a->b)' = ((a' -> &sigma;) -> &sigma;) -> (b' -> &sigma;) -> &sigma;.
  
      Call by name transform
  
      Terms                            Types
  
  
      Call by name transform
  
      Terms                            Types
  
-    [x] => \k.xk                     [a] => (a'->o)->o
-    [\xM] => \k.k(\x[M])             [a->b] => ((a->b)'->o)->o
-    [MN] => \k.[M](\m.m[N]k)         [b] => (b'->o)->o
+    [x] = \k.xk                      [a] = (a'->o)->o
+    [\xM] = \k.k(\x[M])              [a->b] = ((a->b)'->o)->o
+    [MN] = \k.[M](\m.m[N]k)          [b] = (b'->o)->o
  
  Remember that types associate to the right.  Let's work through the
  application xform and make sure the types are consistent.  We'll have
  
  Remember that types associate to the right.  Let's work through the
  application xform and make sure the types are consistent.  We'll have
@@ -200,15 +233,18 @@ the following types:
      N:a
      MN:b 
      k:b'->o
      N:a
      MN:b 
      k:b'->o
-    [N]:a'
-    m:a'->(b'->o)->o
+    [N]:(a'->o)->o
+    m:((a'->o)->o)->(b'->o)->o
      m[N]:(b'->o)->o
      m[N]k:o 
      m[N]:(b'->o)->o
      m[N]k:o 
-    [M]:((a->b)'->o)->o = ((a'->(b'->o)->o)->o)->o
+    [M]:((a->b)'->o)->o = ((((a'->o)->o)->(b'->o)->o)->o)->o
      [M](\m.m[N]k):o
      [MN]:(b'->o)->o
  
      [M](\m.m[N]k):o
      [MN]:(b'->o)->o
  
-Note that even though the transform uses the same symbol for the
-translation of a variable, in general it will have a different type in
-the transformed term.
+Be aware that even though the transform uses the same symbol for the
+translation of a variable (i.e., `[x] = x`), in general the variable
+in the transformed term will have a different type than in the source
+term.
  
  
+Excercise: what should the function ' be for the CBV xform?  Hint: 
+see the Meyer and Wand abstract linked above for the answer.