topics/week2_encodings.mdwn

   1 # Encoding Booleans, Tuples, Lists, and Numbers #
   2
   3 The Lambda Calculus can represent any computable function?
   4
   5 We need to do some work to show how to represent some of the functions
   6 we've become acquainted with.
   7
   8
   9 ## Booleans ##
  10
  11 We'll start with the `if ... then
  12 ... else...` construction we saw last week:
  13
  14     if M then N else L
  15
  16 For a boolean-valued expression `M`, the displayed expression should evaluate to whatever `N` does
  17 if `M` evaluates to `'true`, and to whatever `L` does if `M` evaluates to `'false`.
  18 In order to encode or simulate such an `if` clause in the Lambda Calculus, we
  19 need to settle on a way to represent `'true` and `'false`.
  20
  21 We could simply add the constants `'true` and `'false` to the
  22 language, since it does make sense to extend the Lambda Calculus by adding constants.
  23 However, that would also require complicating the
  24 interpretation of the language; at the very least, we would need more
  25 than just beta-reduction as our engine of computation.  In the spirit
  26 of computational minimalism, let's see how far we can get with the
  27 "pure" lambda calculus, without any special constants.
  28
  29 Let's get rid of the parts of the `if` statement that are
  30 just syntactic window-dressing.  That is, let's get rid of the
  31 `if`, the `then`, and the `else`:
  32
  33     M  N  L
  34
  35 Recall that our convention is that values associate left to right, so
  36 this series of terms would be evaluated as:
  37
  38     ((M N) L)
  39
  40 If this expression is supposed to have the meaning of `if M then N
  41 else L`, then we need to find a value for `'true` such that when it is
  42 substituted in place of `M`, the expression evaluates to `N`.  The
  43 function we're looking for should take two arguments: first `N`, then `L`,
  44 and it should throw away `L` and return `N`.
  45
  46 We've already seen such a function.  We called it **K**: `(\x y. x)`.
  47 Let's test:
  48
  49     ((K N) L) == (((\x y. x) N) L) ~~> ((\y. N) L) ~~> N
  50
  51 Sucess! In the same spirit, `'false` could be **K I**, which reduces to `(\y x. x)` (also written as `(\y. (\x x))`):
  52
  53     (((K I) N) L) == ((((\x y. x) (\x x)) N) L)
  54                      ~~> (((\y. (\x x)) N) L)
  55                      ~~> ((\x x) L)
  56                      ~~> L
  57
  58 So have seen our first major encoding in the Lambda Calculus:
  59 "true" is represented by **K**, and "false" is represented by **K I**.
  60 We'll be building up a lot of representations in the weeks to come,
  61 and they will all maintain the discipline that if a expression is
  62 meant to be interpreted as a truth value (i.e. as a Boolean), it will
  63 be equivalent to (convertible with) **K** or **K I**.
  64
  65 In class, we explained how "and" could be thought of as the function, here written
  66 in Kapulet syntax:
  67
  68     lambda p q. if p then q else false
  69
  70 or:
  71
  72     lambda p q. if p then q else p
  73
  74 Given that we know how to express `if ... then ...` in terms of our encoded Booleans,
  75 we can represent this easily in the Lambda Calculus as either:
  76
  77     \p q. p q (\y x. x)
  78
  79 or:
  80
  81     \p q. p q p
  82
  83 Typically there will be more than one way to encode a given function into the Lambda Calculus, even holding fixed
  84 all your other decisions about how to encode other functions. In this case, we can use either representation
  85 because we're assuming that the `p` only gets bound to arguments that are either `\x y. x` or `\y x. x`. We don't make any
  86 promises about what will happen if our encodings are abused, that is, supplied with arguments they weren't designed to accept.
  87 If `p` is bound to `\x y. x`, then the result of both of the above displayed expressions will be whatever `q` is bound to.
  88 If on the other hand, `p` is bound to `\y x. x`, then the two displayed expressions will return the same result, namely that function.
  89
  90 Despite this, the two displayed expressions are two different lambda terms, and
  91 are not convertible. It's only within the frame of our assumptions about the
  92 restricted arguments we're thinking of supplying them that they behave exactly
  93 alike.
  94
  95 You can try out these expressions in the [[lambda evaluator|code/lambda evaluator]]:
  96
  97     let true = \x y. x in
  98     let false = \y x. x in
  99     let and = \p q. p q p in
 100     and true false
 101
 102 will reduce or "normalize" to `\y x. x`, or false, just as you'd predict. The `let true =` stuff isn't officially part of the Lambda Calculus language. It's just a convenient shorthand that lets you use the lambda evaluator more easily. Behind the scenes, the displayed expression above gets translated to:
 103
 104     (\true.
 105       (\false.
 106         (\and. and true false)))
 107         (\x y. x) ; this gets bound to variable "true"
 108         (\y x. x) ; this gets bound to variable "false"
 109         (\p q. p q p) ; this gets bound to variable "and"
 110
 111 We expect you'll agree that the first is easier to write and to read.
 112
 113 In Scheme (Racket), all of these notions can be defined like this:
 114
 115     (define true  (lambda (x) (lambda (y) x)))
 116     (define false (lambda (y) (lambda (x) x)))
 117     (define lambda-and (lambda (p) (lambda (q) ((p q) p))))
 118
 119 and then used like this:
 120
 121     ((lambda-and true) false)
 122
 123 which will evaluate to a function, that happens to be the same function `false` is bound to. Most Scheme interpreters like Racket will helpfully display the function with the name, if any, that was first defined to be bound to it. So we'll see the result as:
 124
 125     #<procedure:false>
 126
 127 The funny calling pattern, where we write `((lambda-and true) false)` instead of just `(lambda-and true false)`, is because that's how you have to write curried functions in Scheme. Similarly for why we have `(lambda (x) (lambda (y) x))` instead of just `(lambda (x y) x)`.
 128
 129 It's possible to do the next few weeks of assignment without using a Scheme interpreter, however
 130 we do recommend you [get Scheme installed on your
 131 computer](/how_to_get_the_programming_languages_running_on_your_computer), and
 132 [get started learning Scheme](/learning_scheme). It will help you understand the ideas better to experiment with it.
 133
 134 There's also a (slow, bare-bones, but perfectly adequate) version of Scheme available for online use at <http://tryscheme.sourceforge.net/>. (Unfortunately, though, that Scheme implementation will only display the result of `((lambda-and true) false)` as `#<procedure lambda>`. You won't be able to see if it's the function assigned to `true` or the function assigned to `false`. You'd have to feed that result some more arguments to see how it behaved.)
 135
 136 You should also be experimenting with this site's [[lambda evaluator|code/lambda evaluator]].
 137
 138
 139 ## Tuples ##
 140
 141 In class, we also showed you how to encode a tuple in the Lambda Calculus. We did it with an ordered triple, but the strategy generalizes in a straightforward way. (Some authors just use this strategy to define *pairs*, then define triples as pairs whose second member is another pair, and so on. Yech. If you keep a firm grip on your wits, that can also be made to work, but it's extremely likely that people who code in that way are going to lose their grip at some point and get themselves in a corner where they'll regret having made that decision about how to encode triples. And they will be forced to add further complexities at later points, that they're probably not anticipating now. The strategy presented here is as elegant as it first looks, and will help you program more hygienically even when your attention lapses.)
 142
 143 Our proposal was to define the triple `(a, b, c)` as:
 144
 145     \f. f a b c
 146
 147 To extract the first element of this, you'd write:
 148
 149     (\f. f a b c) fst_of_three
 150
 151 where `fst_of_three` is the function `\x y z. x`:
 152
 153     (\f. f a b c) (\x y z. x) ~~>
 154     (\x y z. x) a b c ~~>
 155     (\y z. a) b c ~~>
 156     (\z. a) c ~~>
 157     a
 158
 159 Here are the corresponding definitions in Scheme (Racket):
 160
 161     (define make-triple (lambda (a) (lambda (b) (lambda (c)
 162                           (lambda (f) (((f a) b) c))))))
 163
 164     (define fst_of_three (lambda (x) (lambda (y) (lambda (z) x))))
 165     (define snd_of_three (lambda (x) (lambda (y) (lambda (z) y))))
 166     (define trd_of_three (lambda (x) (lambda (y) (lambda (z) z))))
 167
 168 Then:
 169
 170     (define p (((make-triple 10) 20) 30))
 171     (p fst_of_three)  ; will evaluate to 10
 172     (p snd_of_three)  ; will evaluate to 20
 173
 174 If you're puzzled by having the triple to the left and the function that "uses" or "consumes" or operates on it to the right, think about why it's being done this way: the triple is a package that will be used in coordination with some function for operating on its elements. We don't know in advance what that function will be. And it's not obvious how to make the triple be some structure that the function can "look inside" and extract the elements from. We're still trying to *figure out* how to define such structures! But what we can do is make *the triple take the function as an argument*, and return *the result of* operating on its elements with that function. In other words, the triple is a higher-order function: a function that expects another function as an argument.
 175
 176 (Consider the similarities between this definition of a triple and a generalized quantifier. This is in fact our first taste of "continuations" in the course, which are a systematic pattern for inverting the naive order of who-is-the-argument? and who-is-the-operator?)
 177
 178 If you really want to, you can disguise what's going on like this:
 179
 180     (define lifted-fst_of_three (lambda (p) (p fst_of_three)))
 181
 182 Then you could say:
 183
 184     (lifted-fst_of_three p)
 185
 186 instead of:
 187
 188     (p fst_of_three)
 189
 190 Of course, the last is still what's happening under the hood.
 191
 192 (Remark: `(lifted-f (((make-triple 10) 20) 30))` stands to `((((make-triple 10) 20) 30) f)` as
 193 `((((make-triple 10) 20) 30) f)` stands to `(((f 10) 20) 30)`.)
 194
 195 ## Lists ##
 196
 197 There are multiple ways to encode lists, and also multiple ways to encode numbers. We are going to start with what we think are the most natural and elegant encodings. Historically these were the first encodings of numbers but not of lists.
 198
 199 In seminar we discussed two further functions for working with lists or sequences. Reverting to Kapulet syntax, these functions work like this:
 200
 201     fold_right (f, z) [10, 20, 30]
 202
 203 That will evaluate to whatever this does:
 204
 205     f (10, f (20, f (30, z)))
 206
 207 For example, if we let `f` be `(+)` and `z` be `0`, then it will be `10 + (20 + (30 + 0))` or `60`. Another example, if we let `f` be `(&)` and `z` be `[]`, then `fold_right ((&), []) [10, 20, 30]` will be `10 & (20 & (30 & []))` or `[10, 20, 30]`, the same sequence we began with.
 208
 209 The other function works like this:
 210
 211     fold_left (f, z) [10, 20, 30]
 212
 213 That will evaluate to whatever this does:
 214
 215     f (f (f (z, 10), 20), 30)
 216
 217 With a commutative operator like `(+)`, it makes no difference whether you say `fold_right ((+), z) xs` or `fold_left ((+), z) xs`. But with other operators it will make a difference. We can't say `fold_left ((&), []) [10, 20, 30]`, since that would start by trying to evaluate `[] & 10`, which would crash. But we could do this:
 218
 219     let
 220       flipped_cons match lambda (xs, x). x & xs  # also expressible as `uncurried_flip (&)`
 221     in fold_left (flipped_cons, []) [10, 20, 30]
 222
 223 and that would evaluate to `flipped_cons (flipped_cons (flipped_cons ([], 10), 20), 30)`, in other words, to `30 & (20 & (10 & []))`, or `[30, 20, 10]`. So this reverses the sequence we began with.
 224
 225 In class we considered how to express other list operations in terms of these. For example, we saw that we could define `length xs` as:
 226
 227     fold_right ((lambda (_, z). 1 + z), 0) xs
 228
 229 And we could define `map g xs` as:
 230
 231     fold_right ((lambda (x, zs). g x & zs), []) xs
 232
 233 Here is [a nice website](http://stevekrouse.github.io/hs.js) we found that lets you evaluate these things step by step. Just click on the subexpressions to evaluate them.
 234
 235 We also saw in seminar how to define `fold_right`. We could do this in Kapulet like this:
 236
 237     letrec
 238       fold_right (f, z) xs = case xs of
 239                                [] then z;
 240                                y & ys then f (y, fold_right (f, z) ys)
 241                              end
 242     in fold_right
 243
 244 (In some presentations you may see this expecting `f` to be a curried function `lambda y z. ...` rather than the uncurried `lambda (y, z). ...` that I'm expecting here. We will be shifting to the curried form ourselves momentarily.)
 245
 246 We suggested in class that these functions were very powerful, and could be deployed to do most everything you might want to do with a list. Given how our strategy for encoding booleans turned out, this ought to suggest to you the idea that *folding is what we fundamentally do* with lists, just as *conditional branching is what we fundamentally do* with booleans. So we can try to encode lists in terms of lambda expressions that will let us perform folds on them. We could try to do this with either right-folds or left-folds. Either is viable. Some things are more natural if you use right-folds, though, so let's do that.
 247
 248 What should the encoding look like? We don't know *what* function and *what* starting value someone might want to fold over our list!
 249
 250 Wait, does that remind you of anything?
 251
 252 Good. I knew it would.
 253
 254 Indeed, we'll make our encoded lists consist of higher-order *functions* that take the `f` and the starting value `z` to be folded *as arguments*. So the list `[a, b, c]` should look something like this:
 255
 256     \f z. SOMETHING
 257
 258 Now, what should the `SOMETHING` be? Well, when we supply an `f` and a `z` we should get back the right-fold of those over `[a, b, c]`, so the answer should evidently be:
 259
 260     \f z. f a (f b (f c z))
 261
 262 Here we work with curried functions, because that's how the Lambda Calculus does things. You wouldn't want to build up a tuple using the mechanisms described above, and then supply `f` as an argument to that tuple, and so on. That would be a lot of red tape for no benefit. In the Lambda Calculus, it's simpler to just work with curried functions as our natural idiom.
 263
 264 So if `[a, b, c]` should be the displayed higher-order function above, what should `[c]` be? Evidently:
 265
 266     \f z. f c z
 267
 268 Now what should the empty list `[]` be? Think about it...
 269
 270 Did you arrive at an answer?
 271
 272 I hope it was this one: `\f z. z`. Because when we fold a function `f` and a starting value `z` over the empty list, we just get back whatever the starting value `z` was.
 273
 274 We saw before what a `make-triple` function would look like. What about a `make-list` function, or as we've been calling it, "cons" (`&` in Kapulet)? Well, we've already seen what the representation of `[]` and `[c]` are. In that simplest case, the `cons` function should take us from the encoding of `[]`, namely `\f z. z` to the encoding of `[c]`, namely `\f z. f c z`, as a result. Here is how we can define this:
 275
 276     let cons = \d ds. \f z. f d (ds f z)
 277     in ...
 278
 279 Let's walk through this. We supply this function with our `c` and `[]` as arguments. So its `d` gets bound to our `c`, and its `ds` gets bound to our `[]`, namely `\f z. z`. Then our result is a higher order function of the form `\f z. f c SOMETHING`. That looks good, what we're after is just this, except the `SOMETHING` should end up as `z`. If we just simply *substituted* `z` for `SOMETHING`, then our `cons` function would *always* end up giving us back a singleton list of the form `\f z. f X z` for some `X`. That is the form we want in the present case, computing `cons c []`, but not in the general case. What we'd like instead is to *do something to* the second argument we fed to `cons`, here `[]` or `\f z. z`, and have *it* reduce to `z`, with the hope that the same operation would make *more complex* second arguments reduce to *other* appropriate values. Well, `\f z. z` expects an `f` and a `z` as arguments, and hey we happen to have an `f` and a `z` handy, since we're inside something of the form `\f z. f c SOMETHING`. The `f` and `z` will be supplied by the consumer or user of the sequence `[c]` that we're building. So let's just give *those* `f` and `z` to `\f z. z` as *its* arguments, and we end up with simply `z`. In other words, where `ds` is the second argument we fed to `cons`, we make `SOMETHING` in `\f z. f c SOMETHING` be `ds f z`. That is why we make `cons` be:
 280
 281     \d ds. \f z. f d (ds f z)
 282
 283 Let's step through it formally. Given our definitions:
 284
 285     cons c []
 286
 287 is:
 288
 289     (\d ds. \f z. f d (ds f z)) c (\f z. z) ~~>
 290     (\ds. \f z. f c (ds f z)) (\f z. z) ~~>
 291     \f z. f c ((\f z. z) f z) ~~>
 292     \f z. f c ((\z. z) z) ~~>
 293     \f z. f c z
 294
 295 which is just the representation of `[c]` we were after.
 296
 297 Will this work when more complex values are supplied as the second argument to `cons`? Let's see:
 298
 299     cons b [c]
 300
 301 is:
 302
 303     (\d ds. \f z. f d (ds f z)) b (\f z. f c z) ~~>
 304     (\ds. \f z. f b (ds f z)) (\f z. f c z) ~~>
 305     \f z. f b ((\f z. f c z) f z) ~~>
 306     \f z. f b ((\z. f c z) z) ~~>
 307     \f z. f b (f c z)
 308
 309 which looks right. Persuade yourself that the `cons` we've defined here, together with the representation `\f z. z` for the empty list, always give us the correct encoding for complex lists, in terms of a function that takes an `f` and a `z` as arguments, and then returns the result of right-folding those arguments over the list elements in the appropriate order.
 310
 311 You may notice that the encoding we're proposing for `[]` is the same encoding we proposed above for `false`. There's nothing deep to this. If we had adopted other conventions, we could easily have made it instead be `true` and `[]` that had the same encoding.
 312
 313 Now we saw above how to define `map` in terms of `fold_right`. In Kapulet syntax, `map g xs` was:
 314
 315     fold_right ((lambda (x, zs). g x & zs), []) xs
 316
 317 In our Lambda Calculus encoding, `fold_right (f, z) xs` gets translated to `xs f z`. That is, the list itself is the operator, just as we saw triples being. So we just need to know how to represent `lambda (x, zs). g x & zs`, on the one hand, and `[]` on the other, into the Lambda Calculus, and then we can also express `map`. Well, in the Lambda Calculus we're working with curried functions, and there's no infix syntax, so we'll replace the first by `lambda x zs. cons (g x) zs`. But we just defined `cons`, and the lambda is straightforward. And we also just defined `[]`. So we already have all the pieces to do this. Namely:
 318
 319     map (g, z) xs
 320
 321 in Kapulet syntax, turns into this in our lambda evaluator:
 322
 323     let empty = \f z. z in
 324     let cons = \d ds. \f z. f d (ds f z) in
 325     xs (\x zs. cons (g x) zs) empty
 326
 327 Try out this:
 328
 329     let empty = \f z. z in
 330     let cons = \d ds. \f z. f d (ds f z) in
 331     let map = \g xs. xs (\x zs. cons (g x) zs) empty in
 332     let abc = \f z. f a (f b (f c z)) in
 333     map g abc
 334
 335 That will evaluate to:
 336
 337     \f z. f (g a) (f (g b) (f (g c) z))
 338
 339 which looks like what we want, a higher-order function that will take an `f` and a `z` as arguments and then return the right fold of those arguments over `[g a, g b, g c]`, which is `map g [a, b, c]`.
 340
 341
 342 ## Numbers ##
 343
 344 Armed with the encoding we developed for lists above, Church's method for encoding numbers in the Lambda Calculus is very natural. But this is not the order that these ideas were historically developed.
 345
 346 It's noteworthy that when he was developing the Lambda Calculus, Church had not predicted it would be possible to encode the numbers. It came as a welcome surprise.
 347
 348 The idea here is that a number is something like a list, only without any slot for an interesting, possibly varying head element at each level. Whereas in a list like:
 349
 350     [a, b, c]
 351
 352 we have:
 353
 354     head=a, sublist=(head=b, sublist=(head=c, sublist=[]))
 355
 356 with the number `3` we instead only have:
 357
 358     head=(), subnumber=(head=(), subnumber=(head=(), subnumber=zero))
 359
 360 where `()` is the zero-length tuple, the one and only member of its type. That is, the head here is a type of thing that cannot interestingly vary. Or we could just eliminate the head, and have:
 361
 362     subnumber=(subnumber=(subnumber=zero))
 363
 364 probably more familiar to you as:
 365
 366     succ (succ (succ zero))
 367
 368 In other words, where a list is inductively built up out of some interesting new datum and its "sublist" or tail, a number is inductively built up out of no interesting new datum and its "subnumber" or predecessor. In this light, a number can be seen as a kind of stunted or undernourished list.
 369
 370 Now, we had a strategy for encoding the list `[a, b, c]` as:
 371
 372     \f z. f a (f b (f c z))
 373
 374 So perhaps we can apply the same kind of idea to the number `3`, and encode it as:
 375
 376     \f z. f   (f   (f   z))
 377
 378 In fact, there's a way of looking at this that makes it look incredibly natural. You may have encountered the idea of a composition of two functions `f` and `g`, written as <code>f &cir; g</code>, or in Kapulet as `f comp g`. This is defined to be:
 379
 380     \x. f (g x)
 381
 382 The composition of a function `f` with itself, namely:
 383
 384     \x. f (f x)
 385
 386 is sometimes abbreviated as <code>f<sup>2</sup></code>. Similarly, <code>f &cir; f &cir; f</code> or `\x. f (f (f x))` is sometimes abbreviated as <code>f<sup>3</sup></code>. <code>f<sup>1</sup></code> is understood to be just `f` itself, and <code>f<sup>0</sup></code> is understood to be the identity function.
 387
 388 So in proposing to encode the number `3` as:
 389
 390     \f z. f   (f   (f   z))
 391
 392 we are proposing to encode it as:
 393
 394 <code>\f z. f<sup>3</sup> z</code>
 395
 396 (The <code>f<sup><em>n</em></sup></code> isn't some bit of new Lambda Calculus syntax. It's just our metalanguage abbreviation for the same expressions we had before. It does help focus our attention on what we're essentially doing here.)
 397
 398 And indeed this is the Church encoding of the numbers:
 399
 400 <code>0 &equiv; \f z. I z &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;  or \f z. f<sup>0</sup> z</code>
 401 <code>1 &equiv; \f z. f z &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;  or \f z. f<sup>1</sup> z</code>
 402 <code>2 &equiv; \f z. f (f z) &nbsp;&nbsp;&nbsp;&nbsp;;  or \f z. f<sup>2</sup> z</code>
 403 <code>3 &equiv; \f z. f (f (f z)) ;  or \f z. f<sup>3</sup> z</code>
 404 <code>...</code>
 405
 406 The encoding for `0` can also be written as `\f z. z`, which we've also proposed as the encoding for `[]` and for `false`. Don't read too much into this.
 407
 408 Given the above, can you figure out how to define the `succ` function? We already worked through the definition of `cons`, and this is just a simplification of that, so you should be able to do it. We'll make it a homework.
 409
 410 We saw that using the `cons` operation (Kapulet's `&`) and `[]` as arguments to `fold_right` over a list give us back that very same list. In the Lambda Calculus, that comes to the following:
 411
 412     [a, b, c] cons []
 413
 414 with the appropriate lambda terms substituted throughout, just reduces to the same lambda term we used to encode `[a, b, c]`.
 415
 416 In the same spirit, what do you expect this will reduce to:
 417
 418     3 succ 0
 419
 420 Since `3` is `\f z. f (f (f z))`, it should reduce to whatever:
 421
 422     succ (succ (succ 0))
 423
 424 does. And if we define `succ` properly, we should expect that to give us the same lambda term that we used to encode `3` in the first place.
 425
 426
 427 We'll look at other encodings of lists and numbers later.