switched the map from lambda to CL back to Barendregt's

[lambda.git] / topics / week3_combinatory_logic.mdwn
diff --git a/topics/week3_combinatory_logic.mdwn b/topics/week3_combinatory_logic.mdwn

index d24e6c1..d54f9f8 100644 (file)
--- a/topics/week3_combinatory_logic.mdwn
+++ b/topics/week3_combinatory_logic.mdwn
@@ -39,11 +39,12 @@ over the first two arguments.
  >   **ω** (that is, lower-case omega) is defined to be: `\x. x x`. Sometimes this combinator is called **M**. It and `W` both duplicate arguments, just in different ways. <!-- L is \uv.u(vv) -->
  
  
-It's possible to build a logical system equally powerful as the Lambda Calculus
-(and readily intertranslatable with it) using just combinators, considered as
-*primitive operations*. (That is, we refrain from defining them in terms of lambda expressions, as we did above.)
-Such a language doesn't have any variables in it: not just
-no free variables, but no variables (or "bound positions") at all.
+It's possible to build a logical system equally powerful as the Lambda
+Calculus (and readily intertranslatable with it) using just
+combinators, considered as *primitive operations*. (That is, we
+refrain from defining them in terms of lambda expressions, as we did
+above.)  Such a language doesn't have any variables in it: not just no
+free variables, but no variables (or "bound positions") at all.
  
  One can do that with a very spare set of basic combinators. These days
  the standard base is just three combinators: `S`, `K`, and `I`.
@@ -93,6 +94,12 @@ Instead of defining combinators in terms of antecedently understood lambda terms
  
      IX ~~> X
  
+That is, asume that `X` stands in for any expression.  Then if `X`
+happens to be the expression `I`, this schematic pattern guarantees
+that `II ~~> I`; if `X` happens to be the expression `SK`, the pattern
+guarantees that `I(SK) ~~> SK`; and so on.  That is, `X` here is a
+metavariable over expressions.
+
  Thinking of this as a reduction rule, we can perform the following computation:
  
      II(IX) ~~> I(IX) ~~> IX ~~> X
@@ -130,7 +137,10 @@ Logic are considerably more simple than, say, beta reduction.  Also, since
  there are no variables in Combinatory Logic, there is no need to worry
  about variables colliding when we substitute.
  
-Combinatory Logic is what you have when you choose a set of combinators and regulate their behavior with a set of reduction rules. As we said, the most common system uses `S`, `K`, and `I` as defined here.
+Combinatory Logic is what you have when you choose a set of
+combinators and regulate their behavior with a set of reduction
+rules. As we said, the most common system uses `S`, `K`, and `I` as
+defined here.
  
  ###The equivalence of the untyped Lambda Calculus and Combinatory Logic###
  
@@ -165,32 +175,75 @@ used to establish a correspondence between two natural language grammars, one
  of which is based on lambda-like abstraction, the other of which is based on
  Combinatory Logic-like manipulations.
  
-Assume that for any lambda term T, [T] is the equivalent Combinatory Logic term.  Then we can define the [.] mapping as follows:
+In order to establish the correspondence, we need to get a bit more
+official about what counts as an expression in CL.  We'll endow CL
+with an infinite stock of variable symbols, just like the lambda
+calculus, including `x`, `y`, and `z`.  In addition, `S`, `K`, and `I`
+are expressions in CL.  Finally, `(XY)` is in CL for any CL
+expressions `X` and `Y`.  So examples of CL expressions include
+`x`, `(xy)`, `Sx`, `SK`, `(x(SK))`, `(K(IS))`, and so on.  When we 
+omit parentheses, the assumption will be left associativity, so that
+`XYZ == ((XY)Z)`.
+
+It may seem wierd to allow variables in CL.  The reason that is
+necessary is because we're trying to show that every lambda term can
+be translated into an equivalent CL term.  Since some lambda terms
+contain free variables, we need to provide a translation for free
+variables.  As you might expect, it will turn out that whenever the
+lambda term in question contains no free variables (i.e., is a
+combinator), its translation in CL will also contain no variables.
+
+Assume that for any lambda term T, [T] is the equivalent Combinatory
+Logic term.  Then we can define the [.] mapping as follows. 
+
+     1. [a]              a
+     2. [\aX]            @a[X]
+     3. [(XY)]           ([X][Y])
+
+     4. @aa              I
+     5. @aX              KX           if a is not in X
+     6. @a(XY)           S(@aX)(@aY)
+
+Think of `@aX` as a psuedo-lambda abstract.
  
-     1. [a]               a
-     2. [(M N)]           ([M][N])
-     3. [\a.a]            I
-     4. [\a.M]            K[M]                 when a does not occur free in M
-     5. [\a.(M N)]        S[\a.M][\a.N]
-     6. [\a\b.M]          [\a[\b.M]]
+It's easy to understand these rules based on what `S`, `K` and `I` do.
  
-If the recursive unpacking of these rules ever direct you to "translate" an `S` or a `K` or an `I`, introduced at an earlier stage of translation, those symbols translate themselves.
+Rule (1) says that variables are mapped to themselves. If the original
+lambda expression had no free variables in it, then any such
+translations will only be temporary. The variable will later get
+eliminated by the application of other rules.
  
-It's easy to understand these rules based on what `S`, `K` and `I` do.
+Rule (2) says that the way to translate an application is to
+first translate the body (i.e., `[X]`), and then prefix a kind of
+temporary psuedo-lambda built from `@` and the original variable.
  
-The first rule says that variables are mapped to themselves. If the original lambda expression had no free variables in it, then any such translations will only be temporary. The variable will later get eliminated by the application of other rules. (If the original lambda term *does* have free variables in it, so too will the final Combinatory Logic translation.  Feel free to worry about this, though you should be confident that it makes sense.)
+Rule (3) says that the translation of an application of `X` to `Y` is
+the application of the transtlation of `X` to the translation of `Y`.
  
-The second rule says that the way to translate an application is to translate the first element and the second element separately.
+Rules (4) through (6) tell us how to eliminate all the `@`'s.
  
-The third rule should be obvious.
+In rule (4), if we have `@aa`, we need a CL expression that behaves
+like the lambda term `\aa`.  Obviously, `I` is the right choice here.
  
-The fourth rule should also be fairly self-evident: since what a lambda term such as `\x. y` does it throw away its first argument and return `y`, that's exactly what the Combinatory Logic translation should do.  And indeed, `K y` is a function that throws away its argument and returns `y`.
+In rule (5), if we're binding into an expression that doesn't contain
+any variables that need binding, then we need a CL term that behaves
+the same as `\aX` would if `X` didn't contain `a` as a free variable.
+Well, how does `\aX` behave?  When `\aX` occurs in the head position
+of a redex, then no matter what argument it occurs with, it throws
+away its argument and returns `X`.  In other words, `\aX` is a
+constant function returning `X`, which is exactly the behavior
+we get by prefixing `K`.
  
-The fifth rule deals with an abstract whose body is an application: the `S` combinator takes its next argument (which will fill the role of the original variable a) and copies it, feeding one copy to the translation of `\a. M`, and the other copy to the translation of `\a. N`.  This ensures that any free occurrences of a inside `M` or `N` will end up taking on the appropriate value.
+The easiest way to grasp rule (6) is to consider the following claim:
  
-Finally, the last rule says that if the body of an abstract is itself an abstract, translate the inner abstract first, and then do the outermost.  (Since the translation of `[\b. M]` will have eliminated any inner lambdas, we can be sure that we won't end up applying rule 6 again in an infinite loop.)
+    \a(XY) <~~> S(\aX)(\aY) 
  
-Persuade yourself that if the original lambda term contains no free variables --- i.e., is a combinator --- then the translation will consist only of `S`, `K`, and `I` (plus parentheses).
+To prove it to yourself, just substitute `(\xyz.xz(yz))` in for `S`
+and reduce.
+
+Persuade yourself that if the original lambda term contains no free
+variables --- i.e., is a combinator --- then the translation will
+consist only of `S`, `K`, and `I` (plus parentheses).
  
  (Fussy note: this translation algorithm builds intermediate expressions that combine lambdas with primitive combinators.  For instance, the translation of our boolean `false` (`\x y. y`) is `[\x [\y. y]] = [\x. I] = KI`.  In the intermediate stage, we have `\x. I`, which has a combinator in the body of a lambda abstract.  It's possible to avoid this if you want to,  but it takes some careful thought.  See, e.g., Barendregt 1984, page 156.)
  
@@ -205,25 +258,45 @@ strengthened with axioms beyond anything we've here described in order to make
  convertible.  But then, we've been a bit cavalier about giving the full set of
  reduction rules for the Lambda Calculus in a similar way.  <!-- FIXME -->
  
-For instance, one
-issue we mentioned in the notes on [[Reduction Strategies|week3_reduction_strategies]] is whether reduction rules (in either the Lambda Calculus or Combinatory Logic) apply to embedded expressions.  Often, we do want that to happen, but
-making it happen requires adding explicit axioms.
-
+For instance, one issue we mentioned in the notes on [[Reduction
+Strategies|week3_reduction_strategies]] is whether reduction rules (in
+either the Lambda Calculus or Combinatory Logic) apply to embedded
+expressions.  Often, we do want that to happen, but making it happen
+requires adding explicit axioms.
+
+Let's see the translation rules in action.  We'll start by translating
+the combinator we use to represent false:
+
+       [\t\ff] 
+    == @t[\ff]      rule 2
+    == @t(@ff)      rule 2
+    == @tI          rule 4
+    == KI           rule 5
+   
  Let's check that the translation of the `false` boolean behaves as expected by feeding it two arbitrary arguments:
  
      KIXY ~~> IY ~~> Y
  
  Throws away the first argument, returns the second argument---yep, it works.
  
-Here's a more elaborate example of the translation.  Let's say we want to establish that combinators can reverse order, so we use the **T** combinator (`\x y. y x`):
-
-    [\x y. y x] =
-    [\x [\y. y x]] =
-    [\x. S [\y. y] [\y. x]] = 
-    [\x. (SI) (K x)] =
-    S [\x. SI] [\x. K x] =
-    S (K(SI)) (S [\x. K] [\x. x]) =
-    S (K(SI)) (S(KK)I)
+Here's a more elaborate example of the translation.  Let's say we want
+to establish that combinators can reverse order, so we set out to
+translate the **T** combinator (`\x y. y x`):
+
+       [\x\y(yx)]
+    == @x[\y(yx)]
+    == @x(@y[(yx)])
+    == @x(@y([y][x]))
+    == @x(@y(yx))
+    == @x(S(@yy)(@yx))
+    == @x(SI(@yx))
+    == @x(SI(Kx))
+    == S (@x(SI)) (@x(Kx))
+    == S (K(SI)) (S (@xK) (@xx))
+    == S (K(SI)) (S (KK) I)
+
+By now, you should realize that all rules (1) through (3) do is sweep
+through the lambda term turning lambdas into @'s.  
  
  We can test this translation by seeing if it behaves like the original lambda term does.
  The orginal lambda term lifts its first argument (think of it as reversing the order of its two arguments):