add comments

[lambda.git] / topics / week3_lists.mdwn
diff --git a/topics/week3_lists.mdwn b/topics/week3_lists.mdwn

index 08b3516..bea4c5b 100644 (file)
--- a/topics/week3_lists.mdwn
+++ b/topics/week3_lists.mdwn
@@ -5,17 +5,17 @@
  
  We know you are already familiar with the following kind of notation for designating sets:
  
-{ <em>x</em> + 1 | <em>x</em> &isin; Primes and &phi;<em>x</em> }
+{ <em>x</em> + 1 | <em>x</em> &isin; Primes and φ<em>x</em> }
  
  This kind of notation is called a **set comprehension.**
-Here Primes is assumed to be some, presumably larger set. &phi; expresses some condition that members of Primes might conceivably fail to satisfy.
+Here Primes is assumed to be some, presumably larger set. φ expresses some condition that members of Primes might conceivably fail to satisfy.
  
  
  Some of the functional programming languages permit you to specify data structures in this same way. Kapulet comes closest, in that it also has set comprehension notation. In Kaupulet one writes:
  
-<code>{ x + 1 | x from Primes, &phi; x }</code>
+    { x + 1 | x from Primes, φ x }
  
-The changes are only that we write `x from Primes`, with `Primes` being an expression that evaluates to a set, and we separate that clause from the test clause with a comma. <code>&phi; x</code> can be any expression that evaluates to a boolean. Moreover, such clauses can come in any order, and there can be any number of them, though the above is the most useful pattern. But you can also write:
+The changes are only that we write `x from Primes`, with `Primes` being an expression that evaluates to a set, and we separate that clause from the test clause with a comma. `φ x` can be any expression that evaluates to a boolean. Moreover, such clauses can (syntactically) come in any order, and there can be any number of them, though the above is the most useful pattern, because the `x` introduced by the first clause is then bound (over the specified domain) when evaluating the clause `φ x`. But you can also write:
  
      { 1 | 'true }
  
@@ -39,7 +39,7 @@ and that evaluates to `[14, 15, 24, 25, 34, 35]`. Notice that Haskell's syntax d
  
  That evaluates to `[14, 24, 34, 15, 25, 35]`.
  
-You can also mix in test clauses:
+You can also mix in test or "guard" clauses:
  
      [ 10*x + y | y <- [4, 5], odd y, x <- [1, 2, 3] ]
  
@@ -79,7 +79,7 @@ To translate that, first let's handle the iteration over the final list, that `x
  
      map (lambda x. 10*x + y) [1, 2, 3]
  
-This looks like what we had before, except that now we have this free variable `y` in our lambda expression. Perhaps we can bind that variable inside a *larger* lambda expression, and then map (and filter) *that* larger lambda expression over the list that `y` is drawn from:
+This looks like what we had before, except that now we have this free variable `y` in our lambda expression. Perhaps we can bind that variable inside a *larger* lambda expression, and then map *that* larger lambda expression over the (filtered) list that `y` is drawn from:
  
      let
        f match lambda y. map (lambda x. 10*x + y) [1, 2, 3]
@@ -89,9 +89,9 @@ This gives us nearly what we want. It evaluates to:
  
      [[14, 24, 34], [15, 25, 35]]
  
-Why? Because the `filter` expression at the end is restricting the domain that `y` ranges over to `[4, 5]`. Over this domain we are selecting a value to bind `y` to, and then evaluating the `map` expression inside `f` with `y` so bound. With `y` bound to `4`, we get the result `[14, 24, 34]`. With `y` bound to `5`, we get the result `[15, 25, 35]`. These two results, in order, are the elements that make up the sequence which is the result of the outermost `map` expression.
+Why? Because the `filter` expression at the end is restricting the domain that `y` ranges over to `[4, 5]`. Over this domain we are selecting a value to bind `y` to, and then evaluating the `map` expression inside `f` with `y` so bound. With `y` bound to `4`, we get the result `[14, 24, 34]`. With `y` bound to `5`, we get the result `[15, 25, 35]`. These two results, in order, are the elements that make up the sequence which is the result of the final, outermost `map` expression.
  
-One final twist is that our original list comprehension gives us a "flatter" result. In both Kapulet (and Haskell, modulo a few syntax adjustments), the list comprehension:
+One last twist is that our original list comprehension gives us a "flatter" result. In both Kapulet (and Haskell, modulo a few syntax adjustments), the list comprehension:
  
      [ 10*x + y | y from [4, 5, 6], y < 6, x from [1, 2, 3] ]
  
@@ -155,4 +155,134 @@ Try it out in the lambda evaluator. After the code above, you can write:
  and the result will be `\f z. f b (f c z)`, our encoding of `[b, c]`.
  
  
-<a id=v2-lists></a>
+<a id=v3-lists></a>
+A possible refinement on our right-fold encoding of lists would be to keep track of the tails as the lists are constructed, and feed them to the folding functions `f` as an additional argument. We've been encoding `[a, b, c]` like this:
+
+    \f z. f a (f b (f c z))
+
+or in other words:
+
+    \f z. f a MORE
+
+where `MORE` is the result of folding `f` and `z` over `[a, b, c]`'s tail. Instead, we might encode `[a, b, c]` like this:
+
+    \g z. g a [b,c] (g b [c] (g c [] z))
+
+or in other words:
+
+    \g z. g a [b,c] MORE
+
+The difference is that this new encoding uses a fold function `g` expecting *three* arguments, and the encoding not only passes the current head as a first argument to that function (as before), but *also* passes the current tail as a new middle argument to that function. As before, we continue to pass the result of the fold *applied to* the tail as the last argument.
+
+With this encoding scheme, each list would be represented by a somewhat more complex function than before. On the other hand, it would now become as easy to query the list's tail as it is to query its head. Before the refinement, querying the tail of a list required us to *build up* the tail afresh each time we wanted to extract it. Whereas this refinement *saves a copy* of the tail *for direct access*, as a new middle argument to the fold function, when the list is first constructed.
+
+(Oleg discusses making a parallel refinement in the encoding of numbers [here](http://okmij.org/ftp/Computation/lambda-calc.html#p-numerals).)
+
+
+<a id=other-lists></a>
+## Other Encoding Strategies for Lists ##
+
+If you're keeping track, we've now seen three different encoding strategies for lists: in terms of their right-folds, their left-folds, and the somewhat more complex right-fold encoding just discussed.
+
+Let's consider one or two more, that take different, apparently simpler strategies, avoiding the use of folds in the encoding.
+
+<a id=v4-lists></a>
+Isn't the simplest approach just to represent a non-empty list as an *ordered pair* of its head and its tail? In the first week, we mentioned that the fancier functional programming languages like OCaml and Haskell sharply distinguish between lists and tuples. Lists had to be type-homogenous, and their type was insensitive to their length. Tuples on the other hand could be (though needn't be) type-heterogenous, and their own types *were* determined by the number (and order) of the types of their elements. But we don't have types in the Lambda Calculus. (At least, we don't have a *variety* of types.) So maybe in this context we can identify lists with certain tuples, without getting in trouble.
+
+(In fact, as we discuss elsewhere, this is how Scheme also implements its lists.)
+
+We'll still need to choose some lambda term to encode the empty list. It's not straightaway obvious what that should be. What else will need for our lists? Well, let's think about what we're going to *do* with them. One thing we did with the lists again and again in the functional programming languages was *pattern-match* on them. What that amounted to was: we checked if the list was empty, and if so we took one program branch, and if not we bound some variables to the list's head and to its tail, and then took a second program branch. We already know how to take an ordered pair apart in that way. What we don't have yet is is a way to tell whether an arbitrary list is empty.
+
+One idea would be to represent the non-empty lists as a *triple* instead of a pair, with their first element being the boolean `false`, announcing that they aren't empty, then their second element being their head, and their third element being their tail. Then the empty list could be a triple whose first element is `true`. It's not obvious yet what *its* second and third elements should be.
+
+The conventional implementation of this idea does essentially this, only instead of triples, they use *pairs*, whose first element is the same boolean flag just described, and whose second element in the case of non-empty lists are *another pair* holding the list's head and its tail. So the list `[a, b, c]` would be encoded as:
+
+    (false, (a, [b,c]))
+
+where `(x, y)` represents the Lambda Calculus encoding of an ordered tuple (we're already familiar with this, `\f. f x y`), and `[b,c]` represents the tail of `[a, b, c]`, encoded by this same scheme, namely:
+
+    (false, (b, [c]))
+
+Here `[c]` represents the tail of `[b, c]` encoded by this same scheme, namely:
+
+    (false, (c, []))
+
+Here `[]` represents the empty list, which we'll encode as:
+
+    (true, ???)
+
+What should stand in for `???` No particular choice seems forced here. One strategy would be to go ahead and build your family of list operations, and see whether any particular choice for that slot in the empty list made some of the other operations easier to define or more elegant. Here's an example. We shouldn't expect the result of querying the head of `[]` to be meaningful; but what about querying its tail? Perhaps that should also be meaningless. Or you could make a case that `[]` is most naturally thought to be its own tail. If we went that way, it would be nice to let `???` in our encoding of `[]` be some value, such that, when we tried to extract the empty list's tail using the same operation that worked straightforwardly and naturally for non-empty lists, we got back the empty list itself. In fact it is possible to do this. (But it requires a fixed-point combinator, which we won't be discussing until next week.)
+
+For the time being, let's not worry about what stands in for `???` in our encoding of `[]`. Let's get clear instead on what our other basic list operations should look like.
+
+Building a list from element `x` and an existing list `xs` isn't hard: we just build a pair whose first element is `false` and whose second element is the pair `(x, xs)`:
+
+    cons ≡ \x xs. \f. f false (\g. g x xs)
+
+Determining whether a list is empty is just a matter of extracting the first element of the outer pair:
+
+    empty? ≡ \xs. xs (\a b. a)
+
+Given a non-empty list, extracting its head is just a matter of extracting the first element of the inner pair that is its second element:
+
+    head ≡ \xs. xs (\a b. b (\c d. c))
+
+and so on.
+
+Now think about how you'd define recursive operations like `length` or `map`, or any of the other operations we've recently been expressing as special cases of folds. It's not at all obvious how to do this. With a list like `[a, b, c]`, my implementation of the `length` function could see that the list isn't empty, and so the length has to be at least `1`, and it's easy enough to extract the tail `[b, c]` of this list. But then I am going to want to have the `length` function call itself recursively on that tail. And we don't know yet how to do that. We don't know how to get the effect of `letrec` inside the Lambda Calculus. We don't know yet how to have any lambda terms that refer to themselves.
+
+This is in fact a formidable obstacle. The present encoding of lists makes some things easier than our (original) right-fold encoding of lists: it's easier to extract the tail, plus the whole system just seems simpler. But the compensating disadvantage is that we don't know how to perform recursive operations on the lists so encoded. At least, not until we work out a general strategy for expressing `letrec` in the Lambda Calculus. With the list encodings we looked at earlier, that "baked" the fold operation into the list's very construction, we didn't need any such general-purpose `letrec`. The natural recursive operations we wanted to perform on lists were already in our reach.
+
+<a id=v5-lists></a>
+Let's consider one more encoding strategy for lists. This will have the same serious shortcoming as the simple encoding we just considered: we won't be able to do recursive operations with it until we have a general-purpose `letrec`. But in other respects it may be improvement on that encoding. That encoding might seem a bit ad hoc. Plus there's that matter of the `???` in our construction of `[]`, where we don't know what it should be. If we proceed a bit differently, it will be easier to see some systematic rationale for our choices.
+
+We've already seen some **enumerations**. These are "data structures" that consist of a fixed, finite number of discrete values. Such as `true` and `false`. Sometimes enumerations are understood to have a meaningful intrinsic order, but that's not important for our purposes here.
+
+We've already seen how to encode data structures like this. We encode `true` as:
+
+    \y n. y
+
+The idea is that the data structure is represented as a function. We pass that function several arguments. One of the arguments, bound to `y`, will capture what we want if we've got one variant (the *true* variant) of the data structure, and the other argument, bound to `n`, will capture what we want if we've got the other variant (the *false* variant). Our encoding of `true` just chooses the argument `y` that's understood to be associated with it, and returns that one, discarding the `n`.
+
+We could easily extend this strategy to data structures that have more than two variants. For instance, suppose I want a system of colors, where the choices are only *red*, *green*, or *blue*. I might then encode `red` in the Lambda Calculus as:
+
+    \r g b. r
+
+The idea is that if some program has ahold of some as-yet-unknown rgb-color, it can pass that color three arguments, and if the color is `red`, the result will be the first of those passed arguments. Of course the order of which argument goes with *red*, which with *green*, and which with *blue*, is entirely conventional --- just as it was with the booleans. We only need to stay consistent.
+
+Another idea we've seen is to have data structures that have *parameters*. They are a kind of container that can hold one (or more) other values. Lists are a more complex example of this than I want to focus on yet. The simpler examples are just our ordered pairs, and triples, and so on. An ordered triple `(a, b, c)` is encoded as:
+
+    \f. f a b c
+
+The idea here is that if some program has ahold of some as-yet-unknown triple, it can pass that triple a single *function* argument `f`, and the triple will apply that function argument to its members.
+
+This represents a second kind of building block for data structures. We can combine these two building blocks. We can have data structures that have one or more variants, each of which variant has 0 or more parameters.
+
+The booleans represent the case of two variants, each having 0 parameters or members. The rgb-colors we just sketched give us a third variant, but still each variant has 0 parameters or members. The triples have only a single variant, but in this case each instance of that variant has 3 parameters. We can combine these building blocks in more complex ways. For instance, let's define a different kind of color data structure. This one will have four variants: *cyan*, *magenta*, *yellow*, and *gray*. Each of the first three variants takes 0 parameters. But the *gray* variant takes two parameters. (Suppose one is a natural number representing amount of brightness, with `0` being black, and the other is a natural number representing amount of glossiness. I realize these aren't the most realistic examples. Play along for pedagogy.) Then we'd encode `cyan` like this:
+
+    \c m y g. c
+
+Similarly for `magenta` and `yellow`. We'd encode an instance of *gray* whose two parameters are `a` and `b` like this:
+
+    \c m y g. g a b
+
+The idea here is that if some program has ahold of some as-yet-unknown cmyg-color, it can pass that color four arguments, the last of which must be a curried function accepting (at least) two arguments. In the case that the color is an instance of *gray*, it will apply that function argument to its brightness and its glossiness parameters.
+
+I hope this all seems natural and systematic. When we get to discussing types, you'll see that the strategy deployed here has great generality. (You may even see the encoding strategy discussed above, in terms of pairs of pairs, as an approximate implementation of it.)
+
+Now, what about lists? A list is basically just like our cmyg-colors, with just a few slight changes. First, with lists there is only one variant that has no parameters, namely `[]`. So that takes the place of one of *cyan*, *magenta*, and *yellow*, and we don't need the other two. Second, with lists the remaining variant also takes two parameters, but the first parameter needn't necessarily be a number. We haven't said what the heads of lists can be. Or rather, we haven't done anything to preclude you from consing *any* lambda term onto an existing list. Third, the second parameter of a non-empty list is also understood to not be a number, but rather to be another list --- perhaps another non-empty list, but eventually it would have to be the empty list. (Else what you've got is not understood to be an encoded list at all, but rather just some possibly garbled lambda term that only resembles a list. Consider Scheme's notion of "improper lists," that we discuss elsewhere.)
+
+That last modification would be like, if in our encoding of cmyg-colors, the second parameter of the gray variant weren't a number, but rather another cmyg-color. Perhaps people sometimes paint multiple layers of gray paint (of possibly varying brightness) on top of already-painted walls. The base coat is always cyan, magenta, or yellow. On top of that is some finite number of different coats of gray. Each layer of gray paint keeps track of its own brightness, and what layer of paint lies directly beneath it. That would be a structure more like what we have with lists.
+
+Here are some proposed list definitions based on these ideas:
+
+    []     ≡ \f n. n
+    [b]    ≡ \f n. f b []   ≡ \f n. f b (\f n. n)
+    [a, b] ≡ \f n. f a [b]  ≡ \f n. f a (\f n. f b (\f n. n))
+    cons   ≡ \x xs. \f n. f x xs
+    empty? ≡ \xs. xs (\y ys. false) true
+    tail   ≡ \xs. xs (\y ys. ys) err
+
+Or perhaps we should make `tail` return `[]` when applied to `[]`, rather than `err`.
+
+As we said, with this encoding system too, we'd still need some general strategy for expressing `letrec`, before we'd be able to define functions like `length` and `map` and so on.