topics/week3_what_is_computation.mdwn

   1 #What is computation?#
   2
   3 The folk notion of computation involves taking a complicated
   4 expression and replacing it with an equivalent simpler expression.
   5
   6     4 + 3 == 7
   7
   8 This equation can be interpreted as expressing the thought that the
   9 complex expression `4 + 3` evaluates to `7`.  In this case, the
  10 evaluation of the expression involves computing a sum.  There is a
  11 clear sense in which the expression `7` is simpler than the expression
  12 `4 + 3`: `7` is syntactically simple, and `4 + 3` is syntactically
  13 complex.
  14
  15 It's worth pausing a moment and wondering why we feel that replacing a
  16 complex expression like `4 + 3` with a simpler expression like `7`
  17 feels like we've accomplished something.  If they really are
  18 equivalent, why shouldn't we consider them to be equally valuable, or
  19 even to prefer the longer expression?  For instance, should we prefer
  20 `2^9`, or `512`?  Likewise, in the realm of logic, why should we ever
  21 prefer `B` to the conjunction of `A` with <code>A &sup; B</code>?
  22
  23 The question to ask here is whether our intuitions about what counts
  24 as more evaluated always tracks simplicity of expression, or whether
  25 it tracks what is more useful to us in a given larger situation.
  26
  27 But even deciding which expression ought to count as simpler is not
  28 always so clear.
  29
  30 ##Church arithmetic##
  31
  32 In our system of computing with Church encodings, we have the
  33 following representations:
  34
  35     3 ≡  \f z. f (f (f z))
  36     4 ≡  \f z. f (f (f (f z)))
  37     7 ≡  \f z. f (f (f (f (f (f (f z))))))
  38
  39 <!--
  40 By the way, those of you who solved the `reverse` function in homework 2 using the "holes" idea: do you see the resemblance?
  41 -->
  42
  43 The addition of `4` and `3`, then, clearly needs to just insert `3`'s string
  44 of `f`s in front of `4`'s string of `f`s.  That guides our
  45 implementation of addition:
  46
  47     add ≡ \l r. \f z. r f (l f z)
  48
  49     add 4 3 ≡ (\l r. \f z. r f (l f z)) 4 3
  50           ~~> \f z. 3 f (4 f z)
  51             ≡ \f z. (\f z. f (f (f z))) f (4 f z)
  52           ~~> \f z. f (f (f (4 f z)))
  53             ≡ \f z. f (f (f ((\f z. f (f (f (f z))) f z))))
  54           ~~> \f z. f (f (f (f (f (f (f z))))))
  55             ≡ 7
  56
  57 as desired.  Is there still a sense in which the encoded version of `add 4 3`
  58 is simpler than the encoded version of `7`?  Well, yes: once the
  59 numerals `4` and `3` have been replaced with their encodings, the
  60 encoding of `add 4 3` contains more symbols than the encoding of `7`.
  61
  62 But now consider multiplication:
  63
  64     mul ≡ \l r. \f z. r (l f) z
  65
  66     mul 4 3 ≡ (\l r. \f z. r (l f) z) 4 3
  67           ~~> \f z. 3 (4 f) z
  68             ≡ \f z. (\f z. f (f (f z))) (4 f) z
  69           ~~> \f z. (4 f) ((4 f) (4 f z))
  70             ≡ \f z. ((\f z. f (f (f (f z)))) f) (((\f z. f (f (f (f z)))) f) ((\f z. f (f (f (f z)))) f z))
  71           ~~> \f z. (\z. f (f (f (f z)))) ((\z. f (f (f (f z)))) (f (f (f (f z)))))
  72           ~~> \f z. (\z. f (f (f (f z)))) (f (f (f (f (f (f (f (f z))))))))
  73           ~~> \f z. f (f (f (f (f (f (f (f (f (f (f (f z)))))))))))
  74             ≡ 12
  75
  76 Is the final result simpler?  This time, the answer is not so straightforward.
  77 Compare the starting expression with the final expression:
  78
  79         mul                     4                       3
  80         (\l r. \f z. r (l f) z) (\f z. f (f (f (f z)))) (\f z. f (f (f z))))
  81     ~~> 12
  82         (\f z. f (f (f (f (f (f (f (f (f (f (f (f z))))))))))))
  83
  84 And if we choose different numbers, the result is even less clear:
  85
  86         mul                     6                                3
  87         (\l r. \f z. r (l f) z) (\f z. f ( f (f (f (f (f z)))))) (\f z. f (f (f z))))
  88     ~~> 18
  89         (\f z. f (f (f (f (f (f (f (f (f (f (f (f (f (f (f (f (f (f z))))))))))))))))))
  90
  91 On the on hand, there are more symbols in the encoding of `18` than in
  92 the encoding of `mul 6 3`.  On the other hand, there is more complexity
  93 (speaking pre-theoretically) in the encoding of `mul 6 3`, since the
  94 encoding of `18` is just a uniform sequence of nested `f`s.
  95
  96 This example shows that computation can't be just simplicity as
  97 measured by the number of symbols in the representation.  There is
  98 still some sense in which the evaluated expression is simpler, but the
  99 right way to characterize "simpler" is elusive.
 100
 101 One possibility is to define simpler in terms of irreversability.  The
 102 reduction rules of the lambda calculus define an non-symmetric relation
 103 over terms: for any given redex, there is a unique reduced term (up to
 104 alphabetic variance).  But for any given term, there will be many redexes
 105 that reduce to that term.
 106
 107     ((\x. x x) y) ~~> y y
 108     ((\x x) y y) ~~> y y
 109     (y ((\x x) y)) ~~> y y
 110     etc.
 111
 112 Likewise, in the arithmetic example, there is only one number that
 113 corresponds to the sum of `4` and `3` (namely, `7`).  But there are many
 114 ways to add numbers to get `7`: `4+3`, `3+4`, `5+2`, `2+5`, `6+1`, `1+6`, `7+0` and `0+7`.
 115 And that's only looking at sums.
 116
 117 So the unevaluated expression contains information that is missing
 118 from the evaluated value: information about *how* that value was
 119 arrived at.  So this suggests the following way of thinking about what
 120 counts as evaluated:
 121
 122 > Given two expressions such that one reduces to the other,
 123 the more evaluated one is the one that contains less information.
 124
 125 This definition is problematic, though, if we try to define the amount
 126 of information using, say, [[!wikipedia Komolgorov complexity]].
 127
 128 Ultimately, we have to decide that the reduction rules determine what
 129 counts as evaluated.  If we're lucky, that will align well with our
 130 intuitive desires about what should count as simpler.  But we're not
 131 always lucky.  In particular, although beta reduction in general lines
 132 up well with our intuitive sense of simplification, there are
 133 pathological examples where the results do not align so well:
 134
 135     (\x. x x) (\x. x x) ~~> (\x. x x) (\x. x x) ~~> (\x. x x) (\x. x x) ~~> ...
 136
 137 In this example, reduction returns the exact same lambda term.  There
 138 is no simplification at all. (As we mentioned in class, the term `(\x. x x)` is often referred to in these discussions as (little) <b>&omega;</b> or "omega", or sometimes **M**; and its self-application <code>&omega; &omega;</code>, displayed above, is called (big) <b>&Omega;</b> or "Omega".)
 139
 140 Even worse, consider this term:
 141
 142     (\x. x x x) (\x. x x x) ~~> (\x. x x x) (\x. x x x) (\x. x x x) ~~> ...
 143
 144 Here, the "reduced" form is longer and more
 145 complex by any reasonable measure.
 146
 147 We may have to settle for the idea that a well-chosen reduction system
 148 will characterize our intuitive notion of evaluation in most cases, or
 149 in some useful class of non-pathological cases.
 150
 151 These are some of the deeper issues to keep in mind as we discuss the
 152 ins and outs of reduction strategies.