topics/week3_lists.mdwn

   1 # More on Lists #
   2
   3 <a id=comprehensions></a>
   4 ## Comprehensions ##
   5
   6 We know you are already familiar with the following kind of notation for designating sets:
   7
   8 { <em>x</em> + 1 | <em>x</em> &isin; Primes and φ<em>x</em> }
   9
  10 This kind of notation is called a **set comprehension.**
  11 Here Primes is assumed to be some, presumably larger set. φ expresses some condition that members of Primes might conceivably fail to satisfy.
  12
  13
  14 Some of the functional programming languages permit you to specify data structures in this same way. Kapulet comes closest, in that it also has set comprehension notation. In Kaupulet one writes:
  15
  16     { x + 1 | x from Primes, φ x }
  17
  18 The changes are only that we write `x from Primes`, with `Primes` being an expression that evaluates to a set, and we separate that clause from the test clause with a comma. `φ x` can be any expression that evaluates to a boolean. Moreover, such clauses can (syntactically) come in any order, and there can be any number of them, though the above is the most useful pattern, because the `x` introduced by the first clause is then bound (over the specified domain) when evaluating the clause `φ x`. But you can also write:
  19
  20     { 1 | 'true }
  21
  22 which evaluates to `{ 1 }`, or:
  23
  24     { 1 | 'false }
  25
  26 which evaluates to the empty set `{ }`. What if you have multiple `from` clauses? This is possible, and iterates over the *cross-product* of the two sets you're drawing from. So:
  27
  28     { 10*x + y | x from {1, 2, 3}, y from {4, 5} }
  29
  30 evaluates to the set `{ 14, 15, 24, 25, 34, 35 }`.
  31
  32 Haskell doesn't have set literals like Kapulet does, but it also allows this kind of notation with lists, that is, it has **list comprehensions**. (And so does Kapulet.) Thus in Haskell you can write:
  33
  34     [ 10*x + y | x <- [1, 2, 3], y <- [4, 5] ]
  35
  36 and that evaluates to `[14, 15, 24, 25, 34, 35]`. Notice that Haskell's syntax differs slightly. Changing the order of the `from`/`<-` clauses changes the order in which the elements will be added to the result list:
  37
  38     [ 10*x + y | y <- [4, 5], x <- [1, 2, 3] ]
  39
  40 That evaluates to `[14, 24, 34, 15, 25, 35]`.
  41
  42 You can also mix in test or "guard" clauses:
  43
  44     [ 10*x + y | y <- [4, 5], odd y, x <- [1, 2, 3] ]
  45
  46 evaluates to `[15, 25, 35]`.
  47
  48 Haskell also has an extension that permits you to iterate over multiple lists *in parallel* rather than to iterate over their cross-product. If you type `:set -XParallelListComp` in the ghci interpreter, that will enable this extension, and then:
  49
  50     [ 10*x + y | y <- [4, 5, 6] | x <- [1, 2, 3] ]
  51
  52 will evaluate to `[14, 25, 36]`. If the lists are of unequal length, Haskell stops when it exhausts the shortest. These behaviors are similar to the `map2` function you defined in the week 1 homework. That also took an argument from each of several sequences in parallel. (The corresponding functions in Haskell are called `zip` and `zipWith`.)
  53
  54 OCaml [permits lists comprehensions as an extension](http://stackoverflow.com/questions/27652428/list-comprehension-in-ocaml), and [so too does Scheme](http://srfi.schemers.org/srfi-42/srfi-42.html), but these are a bit harder to use.
  55
  56 All of these things can also be expressed in these languages without using the comprehension syntax. For example, this list comprehension (in Kapulet syntax):
  57
  58     [ 10*x | x from [1, 2, 3, 4, 5] ]
  59
  60 can be expressed as:
  61
  62     map (lambda x. 10*x) [1, 2, 3, 4, 5]
  63
  64 and this:
  65
  66     [ 10*x | x from [1, 2, 3, 4, 5], odd? x ]
  67
  68 can be expressed as:
  69
  70     map (lambda x. 10*x) $ filter odd? [1, 2, 3, 4, 5]
  71
  72 (We explained the `$` notation in [[week 1's advanced notes|week1_kapulet_advanced/#dollar]]. This is equivalent to `map (lambda x. 10*x) (filter odd? [1, 2, 3, 4, 5])`.)
  73
  74 Iterating over the cross-product of several lists is a bit harder. Consider:
  75
  76     [ 10*x + y | y from [4, 5, 6], y < 6, x from [1, 2, 3] ]
  77
  78 To translate that, first let's handle the iteration over the final list, that `x` is drawn from:
  79
  80     map (lambda x. 10*x + y) [1, 2, 3]
  81
  82 This looks like what we had before, except that now we have this free variable `y` in our lambda expression. Perhaps we can bind that variable inside a *larger* lambda expression, and then map *that* larger lambda expression over the (filtered) list that `y` is drawn from:
  83
  84     let
  85       f match lambda y. map (lambda x. 10*x + y) [1, 2, 3]
  86     in map f $ filter (lambda y. y < 6) [4, 5, 6]
  87
  88 This gives us nearly what we want. It evaluates to:
  89
  90     [[14, 24, 34], [15, 25, 35]]
  91
  92 Why? Because the `filter` expression at the end is restricting the domain that `y` ranges over to `[4, 5]`. Over this domain we are selecting a value to bind `y` to, and then evaluating the `map` expression inside `f` with `y` so bound. With `y` bound to `4`, we get the result `[14, 24, 34]`. With `y` bound to `5`, we get the result `[15, 25, 35]`. These two results, in order, are the elements that make up the sequence which is the result of the final, outermost `map` expression.
  93
  94 One last twist is that our original list comprehension gives us a "flatter" result. In both Kapulet (and Haskell, modulo a few syntax adjustments), the list comprehension:
  95
  96     [ 10*x + y | y from [4, 5, 6], y < 6, x from [1, 2, 3] ]
  97
  98 evaluates to:
  99
 100     [14, 24, 34, 15, 25, 35]
 101
 102 We can turn the preceding result into this result with the Kapulet function `join` (Haskell calls it `concat` or `Control.Monad.join`):
 103
 104     join [[14, 24, 34], [15, 25, 35]]
 105
 106 evaluates to the "flatter" list displayed above.
 107
 108 By the way, this `join` operation only affects a single layer of `[ ]`s. This:
 109
 110     join [ [[10,20], [30], []],  [[40], [50,60]] ]
 111
 112 evaluates to:
 113
 114     [[10,20], [30], [], [40], [50,60]]
 115
 116 not to:
 117
 118     [10, 20, 30, 40, 50, 60]
 119
 120 To get the latter, you'd need to apply `join` twice.
 121
 122
 123
 124 <a id=tails></a>
 125 ## Tails ##
 126
 127 For the Lambda Calculus, we've proposed to encode lists in terms of higher-order functions that perform right-folds on (what we intuitively regard as) the real list. Thus, the list we'd write in Kapulet or Haskell as:
 128
 129     [a, b, c]
 130
 131 for some expressions `a`, `b`, and `c`, would be encoded in the Lambda Calculus as:
 132
 133     \f z. f a (f b (f c z))
 134
 135 With that choice of encoding, it's not difficult to write a `head` function. You did this for one of the week 2 homeworks. However, it is more challenging to write a `tail` function. Here is the intuitive idea behind one way we could do this. Our "starting value" --- what gets bound to `z` in the above lambda expression --- will be *a pair* of two values. I'll write it as `([], err)` for the moment, while we fix our intuitions, rather than using the more verbose Lambda Calculus representation of pairs and `[]`. The `err` will be whatever we decide should be the `tail` of an empty list. Perhaps it should be `[]`, but I'll just leave it as `err` for this exercise. Now, when we combine the rightmost element of the list with this, by evaluating `f c ([], err)`, we want the result to be `([c], [])`. That is, we throw away the second member of the pair, copy the first member over into the second slot, and `cons` the `c` onto the first member in the first slot. At the next stage, the result will be `([b, c], [c])`. And at the final stage, the result will be `([a, b, c], [b, c])`. Now we just have to extract the second member of this pair, and that will be the tail of our list.
 136
 137 If you've followed that intuitive presentation, then here is how you can write it in the lambda evaluator:
 138
 139     let empty = \f z. z in                   ; as before
 140     let cons = \d ds. \f z. f d (ds f z) in  ; as before
 141     let pair = \x y. \f. f x y in
 142     let snd = \x y. y in
 143     let shift = \h p. p (\x y. pair (cons h x) x) in
 144     let tail = \xs. (xs shift (pair empty err)) snd in
 145     ...
 146
 147 Here `shift` is our fold-function, and takes two arguments, the current list element `h` and the pair `p` that we've built up from the starting value by folding over the more rightward portion of the list, if any. The `shift` function binds the two members of the pair to `x` and `y`, disregarding the second. It returns a new pair whose first member is `cons h x` and whose second member is `x`. Our starting value is `pair empty err`. And at the end of our fold we're left with a pair, and want to extract its second member; that's why `tail` is of the form `\xs. (xs shift ...) snd`.
 148
 149 Try it out in the lambda evaluator. After the code above, you can write:
 150
 151     ...
 152     let abc = cons a (cons b (cons c empty)) in  ; encoding of [a, b, c]
 153     tail abc
 154
 155 and the result will be `\f z. f b (f c z)`, our encoding of `[b, c]`.
 156
 157
 158 <a id=v3-lists></a>
 159 A possible refinement on our right-fold encoding of lists would be to keep track of the tails as the lists are constructed, and feed them to the folding functions `f` as an additional argument. We've been encoding `[a, b, c]` like this:
 160
 161     \f z. f a (f b (f c z))
 162
 163 or in other words:
 164
 165     \f z. f a MORE
 166
 167 where `MORE` is the result of folding `f` and `z` over `[a, b, c]`'s tail. Instead, we might encode `[a, b, c]` like this:
 168
 169     \g z. g a [b,c] (g b [c] (g c [] z))
 170
 171 or in other words:
 172
 173     \g z. g a [b,c] MORE
 174
 175 The difference is that this new encoding uses a fold function `g` expecting *three* arguments, and the encoding not only passes the current head as a first argument to that function (as before), but *also* passes the current tail as a new middle argument to that function. As before, we continue to pass the result of the fold *applied to* the tail as the last argument.
 176
 177 With this encoding scheme, each list would be represented by a somewhat more complex function than before. On the other hand, it would now become as easy to query the list's tail as it is to query its head. Before the refinement, querying the tail of a list required us to *build up* the tail afresh each time we wanted to extract it. Whereas this refinement *saves a copy* of the tail *for direct access*, as a new middle argument to the fold function, when the list is first constructed.
 178
 179 (Oleg discusses making a parallel refinement in the encoding of numbers [here](http://okmij.org/ftp/Computation/lambda-calc.html#p-numerals).)
 180
 181
 182 <a id=other-lists></a>
 183 ## Other Encoding Strategies for Lists ##
 184
 185 If you're keeping track, we've now seen three different encoding strategies for lists: in terms of their right-folds, their left-folds, and the somewhat more complex right-fold encoding just discussed.
 186
 187 Let's consider one or two more, that take different, apparently simpler strategies, avoiding the use of folds in the encoding.
 188
 189 <a id=v4-lists></a>
 190 Isn't the simplest approach just to represent a non-empty list as an *ordered pair* of its head and its tail? In the first week, we mentioned that the fancier functional programming languages like OCaml and Haskell sharply distinguish between lists and tuples. Lists had to be type-homogenous, and their type was insensitive to their length. Tuples on the other hand could be (though needn't be) type-heterogenous, and their own types *were* determined by the number (and order) of the types of their elements. But we don't have types in the Lambda Calculus. (At least, we don't have a *variety* of types.) So maybe in this context we can identify lists with certain tuples, without getting in trouble.
 191
 192 (In fact, as we discuss elsewhere, this is how Scheme also implements its lists.)
 193
 194 We'll still need to choose some lambda term to encode the empty list. It's not straightaway obvious what that should be. What else will need for our lists? Well, let's think about what we're going to *do* with them. One thing we did with the lists again and again in the functional programming languages was *pattern-match* on them. What that amounted to was: we checked if the list was empty, and if so we took one program branch, and if not we bound some variables to the list's head and to its tail, and then took a second program branch. We already know how to take an ordered pair apart in that way. What we don't have yet is is a way to tell whether an arbitrary list is empty.
 195
 196 One idea would be to represent the non-empty lists as a *triple* instead of a pair, with their first element being the boolean `false`, announcing that they aren't empty, then their second element being their head, and their third element being their tail. Then the empty list could be a triple whose first element is `true`. It's not obvious yet what *its* second and third elements should be.
 197
 198 The conventional implementation of this idea does essentially this, only instead of triples, they use *pairs*, whose first element is the same boolean flag just described, and whose second element in the case of non-empty lists are *another pair* holding the list's head and its tail. So the list `[a, b, c]` would be encoded as:
 199
 200     (false, (a, [b,c]))
 201
 202 where `(x, y)` represents the Lambda Calculus encoding of an ordered tuple (we're already familiar with this, `\f. f x y`), and `[b,c]` represents the tail of `[a, b, c]`, encoded by this same scheme, namely:
 203
 204     (false, (b, [c]))
 205
 206 Here `[c]` represents the tail of `[b, c]` encoded by this same scheme, namely:
 207
 208     (false, (c, []))
 209
 210 Here `[]` represents the empty list, which we'll encode as:
 211
 212     (true, ???)
 213
 214 What should stand in for `???` No particular choice seems forced here. One strategy would be to go ahead and build your family of list operations, and see whether any particular choice for that slot in the empty list made some of the other operations easier to define or more elegant. Here's an example. We shouldn't expect the result of querying the head of `[]` to be meaningful; but what about querying its tail? Perhaps that should also be meaningless. Or you could make a case that `[]` is most naturally thought to be its own tail. If we went that way, it would be nice to let `???` in our encoding of `[]` be some value, such that, when we tried to extract the empty list's tail using the same operation that worked straightforwardly and naturally for non-empty lists, we got back the empty list itself. In fact it is possible to do this. (But it requires a fixed-point combinator, which we won't be discussing until next week.)
 215
 216 For the time being, let's not worry about what stands in for `???` in our encoding of `[]`. Let's get clear instead on what our other basic list operations should look like.
 217
 218 Building a list from element `x` and an existing list `xs` isn't hard: we just build a pair whose first element is `false` and whose second element is the pair `(x, xs)`:
 219
 220     cons ≡ \x xs. \f. f false (\g. g x xs)
 221
 222 Determining whether a list is empty is just a matter of extracting the first element of the outer pair:
 223
 224     empty? ≡ \xs. xs (\a b. a)
 225
 226 Given a non-empty list, extracting its head is just a matter of extracting the first element of the inner pair that is its second element:
 227
 228     head ≡ \xs. xs (\a b. b (\c d. c))
 229
 230 and so on.
 231
 232 Now think about how you'd define recursive operations like `length` or `map`, or any of the other operations we've recently been expressing as special cases of folds. It's not at all obvious how to do this. With a list like `[a, b, c]`, my implementation of the `length` function could see that the list isn't empty, and so the length has to be at least `1`, and it's easy enough to extract the tail `[b, c]` of this list. But then I am going to want to have the `length` function call itself recursively on that tail. And we don't know yet how to do that. We don't know how to get the effect of `letrec` inside the Lambda Calculus. We don't know yet how to have any lambda terms that refer to themselves.
 233
 234 This is in fact a formidable obstacle. The present encoding of lists makes some things easier than our (original) right-fold encoding of lists: it's easier to extract the tail, plus the whole system just seems simpler. But the compensating disadvantage is that we don't know how to perform recursive operations on the lists so encoded. At least, not until we work out a general strategy for expressing `letrec` in the Lambda Calculus. With the list encodings we looked at earlier, that "baked" the fold operation into the list's very construction, we didn't need any such general-purpose `letrec`. The natural recursive operations we wanted to perform on lists were already in our reach.
 235
 236 <a id=v5-lists></a>
 237 Let's consider one more encoding strategy for lists. This will have the same serious shortcoming as the simple encoding we just considered: we won't be able to do recursive operations with it until we have a general-purpose `letrec`. But in other respects it may be improvement on that encoding. That encoding might seem a bit ad hoc. Plus there's that matter of the `???` in our construction of `[]`, where we don't know what it should be. If we proceed a bit differently, it will be easier to see some systematic rationale for our choices.
 238
 239 We've already seen some **enumerations**. These are "data structures" that consist of a fixed, finite number of discrete values. Such as `true` and `false`. Sometimes enumerations are understood to have a meaningful intrinsic order, but that's not important for our purposes here.
 240
 241 We've already seen how to encode data structures like this. We encode `true` as:
 242
 243     \y n. y
 244
 245 The idea is that the data structure is represented as a function. We pass that function several arguments. One of the arguments, bound to `y`, will capture what we want if we've got one variant (the *true* variant) of the data structure, and the other argument, bound to `n`, will capture what we want if we've got the other variant (the *false* variant). Our encoding of `true` just chooses the argument `y` that's understood to be associated with it, and returns that one, discarding the `n`.
 246
 247 We could easily extend this strategy to data structures that have more than two variants. For instance, suppose I want a system of colors, where the choices are only *red*, *green*, or *blue*. I might then encode `red` in the Lambda Calculus as:
 248
 249     \r g b. r
 250
 251 The idea is that if some program has ahold of some as-yet-unknown rgb-color, it can pass that color three arguments, and if the color is `red`, the result will be the first of those passed arguments. Of course the order of which argument goes with *red*, which with *green*, and which with *blue*, is entirely conventional --- just as it was with the booleans. We only need to stay consistent.
 252
 253 Another idea we've seen is to have data structures that have *parameters*. They are a kind of container that can hold one (or more) other values. Lists are a more complex example of this than I want to focus on yet. The simpler examples are just our ordered pairs, and triples, and so on. An ordered triple `(a, b, c)` is encoded as:
 254
 255     \f. f a b c
 256
 257 The idea here is that if some program has ahold of some as-yet-unknown triple, it can pass that triple a single *function* argument `f`, and the triple will apply that function argument to its members.
 258
 259 This represents a second kind of building block for data structures. We can combine these two building blocks. We can have data structures that have one or more variants, each of which variant has 0 or more parameters.
 260
 261 The booleans represent the case of two variants, each having 0 parameters or members. The rgb-colors we just sketched give us a third variant, but still each variant has 0 parameters or members. The triples have only a single variant, but in this case each instance of that variant has 3 parameters. We can combine these building blocks in more complex ways. For instance, let's define a different kind of color data structure. This one will have four variants: *cyan*, *magenta*, *yellow*, and *gray*. Each of the first three variants takes 0 parameters. But the *gray* variant takes two parameters. (Suppose one is a natural number representing amount of brightness, with `0` being black, and the other is a natural number representing amount of glossiness. I realize these aren't the most realistic examples. Play along for pedagogy.) Then we'd encode `cyan` like this:
 262
 263     \c m y g. c
 264
 265 Similarly for `magenta` and `yellow`. We'd encode an instance of *gray* whose two parameters are `a` and `b` like this:
 266
 267     \c m y g. g a b
 268
 269 The idea here is that if some program has ahold of some as-yet-unknown cmyg-color, it can pass that color four arguments, the last of which must be a curried function accepting (at least) two arguments. In the case that the color is an instance of *gray*, it will apply that function argument to its brightness and its glossiness parameters.
 270
 271 I hope this all seems natural and systematic. When we get to discussing types, you'll see that the strategy deployed here has great generality. (You may even see the encoding strategy discussed above, in terms of pairs of pairs, as an approximate implementation of it.)
 272
 273 Now, what about lists? A list is basically just like our cmyg-colors, with just a few slight changes. First, with lists there is only one variant that has no parameters, namely `[]`. So that takes the place of one of *cyan*, *magenta*, and *yellow*, and we don't need the other two. Second, with lists the remaining variant also takes two parameters, but the first parameter needn't necessarily be a number. We haven't said what the heads of lists can be. Or rather, we haven't done anything to preclude you from consing *any* lambda term onto an existing list. Third, the second parameter of a non-empty list is also understood to not be a number, but rather to be another list --- perhaps another non-empty list, but eventually it would have to be the empty list. (Else what you've got is not understood to be an encoded list at all, but rather just some possibly garbled lambda term that only resembles a list. Consider Scheme's notion of "improper lists," that we discuss elsewhere.)
 274
 275 That last modification would be like, if in our encoding of cmyg-colors, the second parameter of the gray variant weren't a number, but rather another cmyg-color. Perhaps people sometimes paint multiple layers of gray paint (of possibly varying brightness) on top of already-painted walls. The base coat is always cyan, magenta, or yellow. On top of that is some finite number of different coats of gray. Each layer of gray paint keeps track of its own brightness, and what layer of paint lies directly beneath it. That would be a structure more like what we have with lists.
 276
 277 Here are some proposed list definitions based on these ideas:
 278
 279     []     ≡ \f n. n
 280     [b]    ≡ \f n. f b []   ≡ \f n. f b (\f n. n)
 281     [a, b] ≡ \f n. f a [b]  ≡ \f n. f a (\f n. f b (\f n. n))
 282     cons   ≡ \x xs. \f n. f x xs
 283     empty? ≡ \xs. xs (\y ys. false) true
 284     tail   ≡ \xs. xs (\y ys. ys) err
 285
 286 Or perhaps we should make `tail` return `[]` when applied to `[]`, rather than `err`.
 287
 288 As we said, with this encoding system too, we'd still need some general strategy for expressing `letrec`, before we'd be able to define functions like `length` and `map` and so on.