topics/week3_lists.mdwn

   1 # More on Lists #
   2
   3 ## Comprehensions ##
   4
   5 We know you are already familiar with the following kind of notation for designating sets:
   6
   7 { <em>x</em> + 1 | <em>x</em> &isin; Primes and &phi;<em>x</em> }
   8
   9 This kind of notation is called a **set comprehension.**
  10 Here Primes is assumed to be some, presumably larger set. &phi; expresses some condition that members of Primes might conceivably fail to satisfy.
  11
  12
  13 Some of the functional programming languages permit you to specify data structures in this same way. Kapulet comes closest, in that it also has set comprehension notation. In Kaupulet one writes:
  14
  15 <code>{ x + 1 | x from Primes, &phi; x }</code>
  16
  17 The changes are only that we write `x from Primes`, with `Primes` being an expression that evaluates to a set, and we separate that clause from the test clause with a comma. <code>&phi; x</code> can be any expression that evaluates to a boolean. Moreover, such clauses can come in any order, and there can be any number of them, though the above is the most useful pattern. But you can also write:
  18
  19     { 1 | 'true }
  20
  21 which evaluates to `{ 1 }`, or:
  22
  23     { 1 | 'false }
  24
  25 which evaluates to the empty set `{ }`. What if you have multiple `from` clauses? This is possible, and iterates over the *cross-product* of the two sets you're drawing from. So:
  26
  27     { 10*x + y | x from {1, 2, 3}, y from {4, 5} }
  28
  29 evaluates to the set `{ 14, 15, 24, 25, 34, 35 }`.
  30
  31 Haskell doesn't have set literals like Kapulet does, but it also allows this kind of notation with lists, that is, it has **list comprehensions**. (And so does Kapulet.) Thus in Haskell you can write:
  32
  33     [ 10*x + y | x <- [1, 2, 3], y <- [4, 5] ]
  34
  35 and that evaluates to `[14, 15, 24, 25, 34, 35]`. Notice that Haskell's syntax differs slightly. Changing the order of the `from`/`<-` clauses changes the order in which the elements will be added to the result list:
  36
  37     [ 10*x + y | y <- [4, 5], x <- [1, 2, 3] ]
  38
  39 That evaluates to `[14, 24, 34, 15, 25, 35]`.
  40
  41 You can also mix in test clauses:
  42
  43     [ 10*x + y | y <- [4, 5], odd y, x <- [1, 2, 3] ]
  44
  45 evaluates to `[15, 25, 35]`.
  46
  47 Haskell also has an extension that permits you to iterate over multiple lists *in parallel* rather than to iterate over their cross-product. If you type `:set -XParallelListComp` in the ghci interpreter, that will enable this extension, and then:
  48
  49     [ 10*x + y | y <- [4, 5, 6] | x <- [1, 2, 3] ]
  50
  51 will evaluate to `[14, 25, 36]`. If the lists are of unequal length, Haskell stops when it exhausts the first. These behaviors are similar to the `map2` function you defined in the week 1 homework. That also took an argument from each of several sequences in parallel. (The corresponding functions in Haskell are called `zip` and `zipWith`.)
  52
  53 OCaml [permits lists comprehensions as an extension](http://stackoverflow.com/questions/27652428/list-comprehension-in-ocaml), and [so too does Scheme](http://srfi.schemers.org/srfi-42/srfi-42.html), but these are a bit harder to use.
  54
  55 All of these things can also be expressed in these languages without using the comprehension syntax. For example, this list comprehension (in Kapulet syntax):
  56
  57     [ 10*x | x from [1, 2, 3, 4, 5] ]
  58
  59 can be expressed as:
  60
  61     map (lambda x. 10*x) [1, 2, 3, 4, 5]
  62
  63 and this:
  64
  65     [ 10*x | x from [1, 2, 3, 4, 5], odd? x ]
  66
  67 can be expressed as:
  68
  69     map (lambda x. 10*x) $ filter odd? [1, 2, 3, 4, 5]
  70
  71 (We explained the `$` notation in [[week 1's advanced notes|week1_kapulet_advanced/#dollar]]. This is equivalent to `map (lambda x. 10*x) (filter odd? [1, 2, 3, 4, 5])`.)
  72
  73 Iterating over the cross-product of several lists is a bit harder. Consider:
  74
  75     [ 10*x + y | y from [4, 5, 6], y < 6, x from [1, 2, 3] ]
  76
  77 To translate that, first let's handle the iteration over the final list, that `x` is drawn from:
  78
  79     map (lambda x. 10*x + y) [1, 2, 3]
  80
  81 This looks like what we had before, except that now we have this free variable `y` in our lambda expression. Perhaps we can bind that variable inside a *larger* lambda expression, and then map (and filter) *that* larger lambda expression over the list that `y` is drawn from:
  82
  83     map (lambda y. map (lambda x. 10*x + y) [1, 2, 3]) $ filter (lambda y. y < 6) [4, 5, 6]
  84
  85 This gives us nearly what we want. It evaluates to:
  86
  87     [[14, 24, 34], [15, 25, 35]]
  88
  89 Why? Because the `filter` expression at the end is restricting the domain that `y` ranges over to `[4, 5]`. Over this domain we are selecting a value to bind `y` to, and then evaluating the inner `map` expression with `y` so bound. With `y` bound to `4`, we get the result `[14, 24, 34]`. With `y` bound to `5`, we get the result `[15, 25, 35]`. These two results, in order, are the elements that make up the sequence which is the result of the outermost `map` expression.
  90
  91 One final twist is that our original list comprehension gives us a "flatter" result. In both Kapulet (and Haskell, modulo a few syntax adjustments), the list comprehension:
  92
  93     [ 10*x + y | y from [4, 5, 6], y < 6, x from [1, 2, 3] ]
  94
  95 evaluates to:
  96
  97     [14, 24, 34, 15, 25, 35]
  98
  99 We can turn the preceding result into this result with the Kapulet function `join` (Haskell calls it `concat` or `Control.Monad.join`):
 100
 101     join [[14, 24, 34], [15, 25, 35]]
 102
 103 evaluates to the "flatter" list displayed above.
 104
 105 By the way, this `join` operation only affects a single layer of `[ ]`s. This:
 106
 107     join [ [[10,20], [30], []],  [[40], [50,60]] ]
 108
 109 evaluates to:
 110
 111     [[10,20], [30], [], [40], [50,60]]
 112
 113 not to:
 114
 115     [10, 20, 30, 40, 50, 60]
 116
 117 To get the latter, you'd need to apply `join` twice.
 118
 119
 120 ## Tails ##
 121
 122 For the Lambda Calculus, we've proposed to encode lists in terms of higher-order functions that perform right-folds on (what we intuitively regard as) the real list. Thus, the list we'd write in Kapulet or Haskell as:
 123
 124     [a, b, c]
 125
 126 for some expressions `a`, `b`, and `c`, would be encoded in the Lambda Calculus as:
 127
 128     \f z. f a (f b (f c z))
 129
 130 With that choice of encoding, it's not difficult to write a `head` function. You did this for one of the week 2 homeworks. However, it is more challenging to write a `tail` function. Here is the intuitive idea behind one way we could do this. Our "starting value" --- what gets bound to `z` in the above lambda expression --- will be *a pair* of two values. I'll write it as `([], err)` for the moment, while we fix our intuitions, rather than using the more verbose Lambda Calculus representation of pairs and `[]`. The `err` will be whatever we decide should be the `tail` of an empty list. Perhaps it should be `[]`, but I'll just leave it as `err` for this exercise. Now, when we combine the rightmost element of the list with this, by evaluating `f c ([], err)`, we want the result to be `([c], [])`. That is, we throw away the second member of the pair, copy the first member over into the second slot, and `cons` the `c` onto the first member in the first slot. At the next stage, the result will be `([b, c], [c])`. And at the final stage, the result will be `([a, b, c], [b, c])`. Now we just have to extract the second member of this pair, and that will be the tail of our list.
 131
 132 If you've followed that intuitive presentation, then here is how you can write it in the lambda evaluator:
 133
 134     let empty = \f z. z in                   ; as before
 135     let cons = \d ds. \f z. f d (ds f z) in  ; as before
 136     let pair = \x y. \f. f x y in
 137     let snd = \x y. y in
 138     let shift = \h p. p (\x y. pair (cons h x) x) in
 139     let tail = \xs. (xs shift (pair empty err)) snd in
 140     ...
 141
 142 Here `shift` is our fold-function, and takes two arguments, the current list element `h` and the pair `p` that we've built up from the starting value by folding over the more rightward portion of the list, if any. The `shift` function binds the two members of the pair to `x` and `y`, disregarding the second. It returns a new pair whose first member is `cons h x` and whose second member is `x`. Our starting value is `pair empty err`. And at the end of our fold we're left with a pair, and want to extract its second member; that's why `tail` is of the form `\xs. (xs shift ...) snd`.
 143
 144 Try it out in the lambda evaluator. After the code above, you can write:
 145
 146     ...
 147     let abc = cons a (cons b (cons c empty)) in  ; encoding of [a, b, c]
 148     tail abc
 149
 150 and the result will be `\f z. f b (f c z)`, our encoding of `[b, c]`.
 151