added discussion of computation

[lambda.git] / topics / week3_what_is_computation.mdwn
diff --git a/topics/week3_what_is_computation.mdwn b/topics/week3_what_is_computation.mdwn

new file mode 100644 (file)

index 0000000..32a4572
--- /dev/null
+++ b/topics/week3_what_is_computation.mdwn
@@ -0,0 +1,166 @@
+#What is computation?#
+
+The folk notion of computation involves taking a complicated
+expression and replacing it with an equivalent simpler expression.
+
+    3 + 4 == 7
+
+This equation can be interpreted as expressing the thought that the
+<<<<<<< HEAD:topics/_week3_what_is_computation.mdwn
+complex expression `3 + 4` evaluates to `7`.  The evaluation of the
+expression computing a sum.  There is a clear sense in which the
+expression `7` is simpler than the expression `3 + 4`: `7` is
+syntactically simple, and `3 + 4` is syntactically complex.
+
+Now let's take this folk notion of computation, and put some pressure
+on it.
+=======
+complex expression `3 + 4` evaluates to `7`.  In this case, the
+evaluation of the expression involves computing a sum.  There is a
+clear sense in which the expression `7` is simpler than the expression
+`3 + 4`: `7` is syntactically simple, and `3 + 4` is syntactically
+complex.
+
+It's worth pausing a moment and wondering why we feel that replacing a
+complex expression like `3 + 4` with a simplex expression like `7`
+feels like we've accomplished something.  If they really are
+equivalent, why shouldn't we consider them to be equally valuable, or
+even to prefer the longer expression?  For instance, should we prefer
+2^9, or 512?  Likewise, in the realm of logic, why shold we ever
+prefer `B` to the conjunction of `A` with `A --> B`?
+
+The question to ask here is whether our intuitions about what counts
+as more evaluated always tracks simplicity of expression, or whether
+it tracks what is more useful to us in a given larger situation.
+
+But even deciding which expression ought to count as simpler is not
+always so clear.
+>>>>>>> working:topics/week3_what_is_computation.mdwn
+
+##Church arithmetic##
+
+In our system of computing with Church encodings, we have the
+following representations:
+
+    3 :=  \f z . f (f (f z))
+    4 :=  \f z . f (f (f (f z)))
+    7 :=  \f z . f (f (f (f (f (f (f z))))))
+
+[By the way, those of you who did the extra credit problem on homework
+2, solving the reverse function with Oleg's holes---do you see the
+resemblance?]
+
+The addition of 3 and 4, then, clearly needs to just tack 4's string
+of f's onto the end of 3's string of f's.  This suggests the following
+implementation of the arithmetic addition:
+
+    + := \lrfz.lf(rfz)
+
+    + 3 4 == (\lrfz.lf(rfz)) 3 4
+          ~~> \fz.3f(4fz)
+          ==  \fz.(\fz.f(f(fz))) f (4fz)
+          ~~> \fz.f(f(f(4fz)))
+          ==  \fz.f(f(f((\fz.f(f(f(fz))) f z))))
+          ~~> \fz.f(f(f(f(f(f(fz))))))
+          ==  7
+
+as desired.  Is there still a sense in which the encoded version of `3
++ 4` is simpler than the encoded version of `7`?  Well, yes: once the
+numerals `3` and `4` have been replaced with their encodings, the
+encoding of `3 + 4` contains more symbols than the encoding of `7`.
+
+But now consider multiplication:
+
+    * := \lrf.l(rf)
+    * 3 4 :=  (\lrf.l(rf)) 3 4
+          ~~> (\rf.3(rf)) 4
+          ~~> \f.3(4f)
+          ==  \f.(\fz.f(f(fz)))(4f)
+          ~~> \fz.(4f)((4f)((4f)z))
+          ==  \fz.((\fz.f(f(f(fz))))f) (((\fz.f(f(f(fz))))f) (((\fz.f(f(f(fz))))f) z))
+          ~~> \fz.((\z.f(f(f(fz)))) ((\z.f(f(f(fz)))) ((\z.f(f(f(fz))))z)))
+          ~~> \fz.((\z.f(f(f(fz)))) ((\z.f(f(f(fz)))) (f(f(f(fz))))))
+          ~~> \fz.((\z.f(f(f(fz)))) (f(f(f(f(f(f(f(fz)))))))))
+          ~~> \fz.f(f(f(f(f(f(f(f(f(f(f(fz)))))))))))
+          ==  12
+
+Is the final result simpler?  This time, the answer is not so straightfoward.
+Compare the starting expression with the final expression:
+
+        *           3             4
+        (\lrf.l(rf))(\fz.f(f(fz)))(\fz.f(f(f(fz))))
+    ~~> 12
+        (\fz.f(f(f(f(f(f(f(f(f(f(f(fz))))))))))))
+
+And if we choose different numbers, the result is even less clear:
+
+        *           3             6
+        (\lrf.l(rf))(\fz.f(f(fz)))(\fz.f(f(f(f(f(fz))))))
+    ~~> 18
+        (\fz.f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(fz))))))))))))))))))
+
+On the on hand, there are more symbols in the encoding of `18` than in
+the encoding of `* 3 6`.  On the other hand, there is more complexity
+(speaking pre-theoretically) in the encoding of `* 3 6`, since the
+encoding of `18` is just a uniform sequence of nested `f`'s.
+
+This example shows that computation can't be just simplicity as
+measured by the number of symbols in the representation.  There is
+still some sense in which the evaluated expression is simpler, but the
+right way to characterize "simpler" is elusive.
+
+One possibility is to define simpler in terms of irreversability.  The
+reduction rules of the lambda calculus define an asymmetric relation
+over terms: for any given redex, there is a unique reduced term (up to
+alphabetic variance).  But for any given term, there are many redexes
+that reduce to that term.
+
+    ((\x.xx)y) ~~> yy
+    ((\xx)yy) ~~> yy
+    (y((\xx)y)) ~~> yy
+    etc.
+
+<<<<<<< HEAD:topics/_week3_what_is_computation.mdwn
+In the arithmetic example, there is only one number that corresponds
+to the sum of 3 and 4 (namely, 7).  But there are many sums that add
+up to 7: 3+4, 4+3, 5+2, 2+5, 6+1, 1+6, etc.
+=======
+Likewise, in the arithmetic example, there is only one number that
+corresponds to the sum of 3 and 4 (namely, 7).  But there are many
+sums that add up to 7: 3+4, 4+3, 5+2, 2+5, 6+1, 1+6, etc.
+>>>>>>> working:topics/week3_what_is_computation.mdwn
+
+So the unevaluated expression contains information that is missing
+from the evaluated value: information about *how* that value was
+arrived at.  So this suggests the following way of thinking about what
+counts as evaluated:
+
+    Given two expressions such that one reduces to the other,
+    the more evaluated one is the one that contains less information.
+
+This definition is problematic, though, if we try to define the amount
+of information using, say, [[!wikipedia Komolgorov complexity]].
+
+Ultimately, we have to decide that the reduction rules determine what
+counts as evaluated.  If we're lucky, that will align well with our
+intuitive desires about what should count as simpler.  But we're not
+always lucky.  In particular, although beta reduction in general lines
+up well with our intuitive sense of simplification, there are
+pathological examples where the results do not align so well:
+
+    (\x.xx)(\x.xx) ~~> (\x.xx)(\x.xx) ~~> (\x.xx)(\x.xx)
+
+In this example, reduction returns the exact same lambda term.  There
+is no simplification at all.
+
+    (\x.xxx)(\x.xxx) ~~> ((\x.xxxx)(\x.xxxx)(\x.xxxx))
+
+Even worse, in this case, the "reduced" form is longer and more
+complex by any measure.
+
+We may have to settle for the idea that a well-chosen reduction system
+will characterize our intuitive notion of evaluation in most cases, or
+in some useful class of non-pathological cases.  
+
+These are some of the deeper issues to keep in mind as we discuss the
+ins and outs of reduction strategies.