topics/week3_lists.mdwn

   1 # More on Lists #
   2
   3 <a id=comprehensions></a>
   4 ## Comprehensions ##
   5
   6 We know you are already familiar with the following kind of notation for designating sets:
   7
   8 { <em>x</em> + 1 | <em>x</em> &isin; Primes and &phi;<em>x</em> }
   9
  10 This kind of notation is called a **set comprehension.**
  11 Here Primes is assumed to be some, presumably larger set. &phi; expresses some condition that members of Primes might conceivably fail to satisfy.
  12
  13
  14 Some of the functional programming languages permit you to specify data structures in this same way. Kapulet comes closest, in that it also has set comprehension notation. In Kaupulet one writes:
  15
  16 <code>{ x + 1 | x from Primes, &phi; x }</code>
  17
  18 The changes are only that we write `x from Primes`, with `Primes` being an expression that evaluates to a set, and we separate that clause from the test clause with a comma. <code>&phi; x</code> can be any expression that evaluates to a boolean. Moreover, such clauses can come in any order, and there can be any number of them, though the above is the most useful pattern. But you can also write:
  19
  20     { 1 | 'true }
  21
  22 which evaluates to `{ 1 }`, or:
  23
  24     { 1 | 'false }
  25
  26 which evaluates to the empty set `{ }`. What if you have multiple `from` clauses? This is possible, and iterates over the *cross-product* of the two sets you're drawing from. So:
  27
  28     { 10*x + y | x from {1, 2, 3}, y from {4, 5} }
  29
  30 evaluates to the set `{ 14, 15, 24, 25, 34, 35 }`.
  31
  32 Haskell doesn't have set literals like Kapulet does, but it also allows this kind of notation with lists, that is, it has **list comprehensions**. (And so does Kapulet.) Thus in Haskell you can write:
  33
  34     [ 10*x + y | x <- [1, 2, 3], y <- [4, 5] ]
  35
  36 and that evaluates to `[14, 15, 24, 25, 34, 35]`. Notice that Haskell's syntax differs slightly. Changing the order of the `from`/`<-` clauses changes the order in which the elements will be added to the result list:
  37
  38     [ 10*x + y | y <- [4, 5], x <- [1, 2, 3] ]
  39
  40 That evaluates to `[14, 24, 34, 15, 25, 35]`.
  41
  42 You can also mix in test clauses:
  43
  44     [ 10*x + y | y <- [4, 5], odd y, x <- [1, 2, 3] ]
  45
  46 evaluates to `[15, 25, 35]`.
  47
  48 Haskell also has an extension that permits you to iterate over multiple lists *in parallel* rather than to iterate over their cross-product. If you type `:set -XParallelListComp` in the ghci interpreter, that will enable this extension, and then:
  49
  50     [ 10*x + y | y <- [4, 5, 6] | x <- [1, 2, 3] ]
  51
  52 will evaluate to `[14, 25, 36]`. If the lists are of unequal length, Haskell stops when it exhausts the shortest. These behaviors are similar to the `map2` function you defined in the week 1 homework. That also took an argument from each of several sequences in parallel. (The corresponding functions in Haskell are called `zip` and `zipWith`.)
  53
  54 OCaml [permits lists comprehensions as an extension](http://stackoverflow.com/questions/27652428/list-comprehension-in-ocaml), and [so too does Scheme](http://srfi.schemers.org/srfi-42/srfi-42.html), but these are a bit harder to use.
  55
  56 All of these things can also be expressed in these languages without using the comprehension syntax. For example, this list comprehension (in Kapulet syntax):
  57
  58     [ 10*x | x from [1, 2, 3, 4, 5] ]
  59
  60 can be expressed as:
  61
  62     map (lambda x. 10*x) [1, 2, 3, 4, 5]
  63
  64 and this:
  65
  66     [ 10*x | x from [1, 2, 3, 4, 5], odd? x ]
  67
  68 can be expressed as:
  69
  70     map (lambda x. 10*x) $ filter odd? [1, 2, 3, 4, 5]
  71
  72 (We explained the `$` notation in [[week 1's advanced notes|week1_kapulet_advanced/#dollar]]. This is equivalent to `map (lambda x. 10*x) (filter odd? [1, 2, 3, 4, 5])`.)
  73
  74 Iterating over the cross-product of several lists is a bit harder. Consider:
  75
  76     [ 10*x + y | y from [4, 5, 6], y < 6, x from [1, 2, 3] ]
  77
  78 To translate that, first let's handle the iteration over the final list, that `x` is drawn from:
  79
  80     map (lambda x. 10*x + y) [1, 2, 3]
  81
  82 This looks like what we had before, except that now we have this free variable `y` in our lambda expression. Perhaps we can bind that variable inside a *larger* lambda expression, and then map (and filter) *that* larger lambda expression over the list that `y` is drawn from:
  83
  84     let
  85       f match lambda y. map (lambda x. 10*x + y) [1, 2, 3]
  86     in map f $ filter (lambda y. y < 6) [4, 5, 6]
  87
  88 This gives us nearly what we want. It evaluates to:
  89
  90     [[14, 24, 34], [15, 25, 35]]
  91
  92 Why? Because the `filter` expression at the end is restricting the domain that `y` ranges over to `[4, 5]`. Over this domain we are selecting a value to bind `y` to, and then evaluating the `map` expression inside `f` with `y` so bound. With `y` bound to `4`, we get the result `[14, 24, 34]`. With `y` bound to `5`, we get the result `[15, 25, 35]`. These two results, in order, are the elements that make up the sequence which is the result of the outermost `map` expression.
  93
  94 One final twist is that our original list comprehension gives us a "flatter" result. In both Kapulet (and Haskell, modulo a few syntax adjustments), the list comprehension:
  95
  96     [ 10*x + y | y from [4, 5, 6], y < 6, x from [1, 2, 3] ]
  97
  98 evaluates to:
  99
 100     [14, 24, 34, 15, 25, 35]
 101
 102 We can turn the preceding result into this result with the Kapulet function `join` (Haskell calls it `concat` or `Control.Monad.join`):
 103
 104     join [[14, 24, 34], [15, 25, 35]]
 105
 106 evaluates to the "flatter" list displayed above.
 107
 108 By the way, this `join` operation only affects a single layer of `[ ]`s. This:
 109
 110     join [ [[10,20], [30], []],  [[40], [50,60]] ]
 111
 112 evaluates to:
 113
 114     [[10,20], [30], [], [40], [50,60]]
 115
 116 not to:
 117
 118     [10, 20, 30, 40, 50, 60]
 119
 120 To get the latter, you'd need to apply `join` twice.
 121
 122
 123
 124 <a id=tails></a>
 125 ## Tails ##
 126
 127 For the Lambda Calculus, we've proposed to encode lists in terms of higher-order functions that perform right-folds on (what we intuitively regard as) the real list. Thus, the list we'd write in Kapulet or Haskell as:
 128
 129     [a, b, c]
 130
 131 for some expressions `a`, `b`, and `c`, would be encoded in the Lambda Calculus as:
 132
 133     \f z. f a (f b (f c z))
 134
 135 With that choice of encoding, it's not difficult to write a `head` function. You did this for one of the week 2 homeworks. However, it is more challenging to write a `tail` function. Here is the intuitive idea behind one way we could do this. Our "starting value" --- what gets bound to `z` in the above lambda expression --- will be *a pair* of two values. I'll write it as `([], err)` for the moment, while we fix our intuitions, rather than using the more verbose Lambda Calculus representation of pairs and `[]`. The `err` will be whatever we decide should be the `tail` of an empty list. Perhaps it should be `[]`, but I'll just leave it as `err` for this exercise. Now, when we combine the rightmost element of the list with this, by evaluating `f c ([], err)`, we want the result to be `([c], [])`. That is, we throw away the second member of the pair, copy the first member over into the second slot, and `cons` the `c` onto the first member in the first slot. At the next stage, the result will be `([b, c], [c])`. And at the final stage, the result will be `([a, b, c], [b, c])`. Now we just have to extract the second member of this pair, and that will be the tail of our list.
 136
 137 If you've followed that intuitive presentation, then here is how you can write it in the lambda evaluator:
 138
 139     let empty = \f z. z in                   ; as before
 140     let cons = \d ds. \f z. f d (ds f z) in  ; as before
 141     let pair = \x y. \f. f x y in
 142     let snd = \x y. y in
 143     let shift = \h p. p (\x y. pair (cons h x) x) in
 144     let tail = \xs. (xs shift (pair empty err)) snd in
 145     ...
 146
 147 Here `shift` is our fold-function, and takes two arguments, the current list element `h` and the pair `p` that we've built up from the starting value by folding over the more rightward portion of the list, if any. The `shift` function binds the two members of the pair to `x` and `y`, disregarding the second. It returns a new pair whose first member is `cons h x` and whose second member is `x`. Our starting value is `pair empty err`. And at the end of our fold we're left with a pair, and want to extract its second member; that's why `tail` is of the form `\xs. (xs shift ...) snd`.
 148
 149 Try it out in the lambda evaluator. After the code above, you can write:
 150
 151     ...
 152     let abc = cons a (cons b (cons c empty)) in  ; encoding of [a, b, c]
 153     tail abc
 154
 155 and the result will be `\f z. f b (f c z)`, our encoding of `[b, c]`.
 156
 157
 158 <a id=v2-lists></a>