edits

[lambda.git] / lists_and_numbers.mdwn
diff --git a/lists_and_numbers.mdwn b/lists_and_numbers.mdwn

index 01dc76c..571ed7b 100644 (file)
--- a/lists_and_numbers.mdwn
+++ b/lists_and_numbers.mdwn
@@ -1,19 +1,71 @@
-(list?)
-nil
-cons
-nil?, (pair?)
-head
-tail
-
-Chris's lists:
-       nil = (t,N)  = \f. f true N
-       [a] = (f,(a,nil))
-       [b,a] = (f,(b,[a]))
-
-isnil = get-first
-head = L get-second get-first
-tail = L get-second get-second
+[[!toc]]
  
+Building Lists
+==============
+
+To build a data-structure, you begin by deciding what the data-structure needs to do. When we built booleans, what they needed to do was select between two choices. When we built ordered pairs, what we needed was a way to wrap two elements into the pair, and ways to operate on the wrapped elements, especially a way to extract a specified one of them.
+
+Now we're going to try to build lists. First, let's explain what is the difference between a list and a pair.
+
+A list can have two elements, but it can also have more elements, or fewer. A list can even have zero elements: this is called the empty list. Sometimes this is written `nil`. In Scheme it's also written `'()` and `(list)`, and in OCaml it's written `[]`. Those languages are nice and have list structures pre-built into them. But we're going to build lists ourselves, from scratch.
+
+OK, so a list doesn't have to have two elements, but still, what's the difference between a two-element list and a pair? And the difference between a three-element list and a triple?
+
+The difference has to do with types. In the untyped lambda calculus we don't explicitly refer to or manipulate types, however even here we'll need to pay attention to which types of arguments we're giving to which functions. For instance, if you wrote:
+
+       and get-third-of-triple make-pair
+
+This would evaluate to something, but it's not immediately obvious what it would be, and it's not likely to be especially useful. Even in the untyped lambda calculus, if we want computations that are easy to work with and reason about, we're going to want to pay some attention to the types we're treating formulas to have. For example, we're only going to want to pass formulas that represent boolean values as arguments to `and`. The computation may well terminate even when we don't. But the result won't be one we're in a position to make any good use of.
+
+Moreover, lists and pairs (and triples, and so on), are data structures we'll find in many typed languages as well. Thinking about types is the way to understand what makes these data structures different.
+
+The differences are:
+
+*      A list is type-homegeneous, that is, all its elements must be (or be treatable as being) of the same type.
+
+*      Not so a pair: its elements need not be of the same type.
+
+We regard two pairs as being of the same type when their corresponding members are of the same type.
+
+Some programming languages permit type-heterogenous lists. Some imperative languages further permit a kind of *mutable* list. We'll consider such things later. For now, we regard these as frills. What we're discussing here is just the prototypical, meat-and-potatoes list.
+
+Another difference between lists and pairs:
+
+*      The length of a list is not essential to its type. A two-element list can be of the same type as a three-element list (whose members are of the right type).
+
+*      Not so a pair: no pair is of the same type as any triple, no matter what the types of their elements.
+
+Q: Sometimes mathematicians <em>identify</em> the triple (1,2,3) with the pair (1,(2,3)), whose first member is 1 and whose second member is the pair (2,3). Wouldn't then a triple have the same type as pair, namely the pair it's identical to?
+
+A: This is not an identity but an implementation. The claim that the triple (1,2,3) is not the same type as any pair is akin to the claim that the integer 1 is not the same type as any set. It allows you to go on and build set-theoretic constructions whose structure matches the desired behavior of the integers, and so is a reasonable implementation of them. This is exactly what we did when building higher-order functions to implement pairs, in the first place.
+
+It is true, there are interesting and difficult questions here in the philosophical foundations of mathematics. But I hope we can proceed nonetheless.
+
+
+Version 1
+---------
+
+OK, then, what sort of behavior do we want out of lists?
+
+Well we need something to serve as the empty list. And we need a way to take an arbitrary object `hd`, and a list (possibly the empty list) `tl`, and return the list whose **head** is `hd` and whose **tail** is `tl`. And we need a way to tell, of an arbitrary list, whether it's the empty list. And we need a way to extract, from a non-empty list, that list's head and its tail. That's it. Given these basic operations, we should be able to do whatever else we want to do with lists: count their length, reverse them, or whatever. We'll explore how to do those more complex operations in the assignment.
+
+It's natural to help ourselves to pairs as bricks to use in constructing lists. The way we described building a new list out of a head `hd` and a tail `tl`, for instance, sounds a lot like building a new pair `(hd, tl)`. But we also have to think about what the empty list will be---and more importantly, how to tell the empty list apart from other lists.
+
+We'll work through a couple of different ways to do this. They'll get more principled as we proceed.
+
+First, one way to handle the issue of "Is this the empty list or not?" is to have every list contain a boolean flag set to `true` if it's the empty list, and `false` if it isn't. Perhaps the lists could be, not pairs, but triples. The conventional implementation of this idea makes them instead a pair whose first member is the boolean flag, and whose second member, when the list is non-empty, is the list's head and tail. So a (non-empty) list whose head is `hd` and whose tail is `tl` would be represented by:
+
+       (false, (hd, tl))
+
+What about the empty list? Well we know it will be:
+
+       (true, N)
+
+for some `N`. What should stand in for `N`?
+
+No particular choice seems forced here. One strategy would be to go ahead and build your family of list operations, and see whether any particular choice for `N` made some of the other operations easier to define, or more elegant. Heres's an example. We shouldn't expect the result of extracting the head of the empty list to be meaningful. But what about the result of extracting its tail? You could argue that this operation should also be meaningless. Or you could argue that the empty list should be its own tail. If we went the latter way, it would be nice to let `N` in our construction of the empty list be some value, such that, when we tried to extract the empty list's tail in the same way we try to extract other lists' tails, we got back the empty list itself. And in fact it's possible to do this. (However, it requires a fixed-point combinator, which we won't discuss until next week.)
+
+<!--
  would be nice if nil tail = nil
         nil tail = (t, N) get-second get-second = N get-second
         so N get-second should be (t,N)
@@ -33,48 +85,278 @@ would be nice if nil tail = nil
                 (\n. K (t, n)) (Y (\n. K (t,n))) get-second ~~>
                 K (t, Y (\n. K (t,n))) get-second ~~>
                 (t, Y (\n. K (t,n)))
+-->
+
+For the time being, though, let's not worry about what stands in for `N` in our construction of the empty list. What should our other primitive list operations look like?
+
+Well, building a list from a new element `hd` and an existing list `tl` isn't hard: we just build a pair whose first value is false and whose second value is a pair of `hd` and `tl`:
+
+       make-pair false (make-pair hd tl)
+
+Determining whether a list is the empty list is just a matter of extracting the first element of the (outer) pair:
+
+       some-list get-first
+
+Given a non-empty list, extracting its head is just a matter of extracting the pair that is its second element, and extracting the first element of that pair:
+
+       some-non-empty-list get-second get-first
+
+and so on.
+
+
+Version 2
+---------
+
+OK, version 1 works. But it might look ad hoc. Plus there's that matter of the `N` in the construction of the empty list that we don't know what should be.
+
+If we do things just a bit differently, it will be easier to see some systematic rationale for them.
+
+We've already seen some **enumerations**, These are data-structures that consist of several discrete values: such as true and false, or black and white, or red and green and blue. (Sometimes these groups are understood to have an order, but that's not important for our purposes.)
+
+We've already seen how to build up data structures like this. For instance, red could be:
+
+       \rd gn bl. rd
+
+that is a function that waits to be supplied with three choices: the "if-you-are-red" choice, the "if-you-are-blue" choice, and the "if-you-are-green" choice, and then yields the "if-you-are-red" choice.
+
+Now what if we wanted one of the enumerated possibilities to be associated with some further parameter. For instance, we wanted a structure that represented things as being (just) blue, or (just) green, or as *having some specific degree* of redness. How could we do that?
+
+We might do it by adopting the convention that the "if-you-are-red" selection be not just an arbitrary value, but specifically a function that expected to be supplied with the degree of redness we're dealing with.
+
+In other words, given an member `colored` of this new data structure, we'd use it like this:
+
+       colored (\deg. colored-is-red-to-degree-deg)  colored-is-instead-green colored-is-instead-blue
+
+and then if `colored` had the blue value from our data structure, this would evaluate to `colored-is-instead-blue`. If `colored` had a red-to-degree-deg value, it would evaluate to the result of supplying the relevant degree `deg` to the function `(\deg. colored-is-red-to-degree-deg)`.
+
+To build a value of red-to-degree-deg, we'd replace our original:
+
+       \rd gn bl. rd
+
+with:
+
+       \rd gn bl. rd deg
+
+Understand?
+
+If so, then you should be able to understand the underlying rationale of a list. We've just considered a data-structure that models an exclusive choice from among being-red-to-a-given-degree, just being green, or just being blue. Eliminate the blue choice. And let's associate a second parameter with being red, so that we have being-red-to-a-given-degree-and-illuminated-to-a-different-degree. Our red value would then look like this:
+
+       \rd gn. rd deg illum
+
+And our green value would look like this:
+
+       \rd gn. gn
+
+Why are we talking about this? Can you anticipate?
+
+Answer: For "green", substitute "empty list". For "red", substitute "non-empty list". For "degree-of-redness", substitute "head of the non-empty list." for "degree-of-illumination," substitute "tail of the non-eempty list." And voil&agrave;!
+
+Spelling it out explicitly, we say:
+
+>      **nil** is defined to be `\if-non-empty if-empty. if-empty`
+
+>      **make-list** is defined to be `\hd tl. \if-non-empty if-empty. if-non-empty hd tl`
+
+Defining **isnil** and the head- and tail- extractors takes some more thought. When operating with any list implemented as we're proposing, we have to pass the list an "if-you're-non-empty" handler and a "if-you're-empty handler." If the list is non-empty, this will evaluate to the result of supplying the list's head and tail as arguments to the handler. If the list is empty, it will instead give us back the empty-handler, unprocessed.
+
+So to check whether the list is empty, we could pass it an empty-handler of `true`, and a non-empty handler which accepts a head and tail argument, and then just returns the constant false:
+
+       some-list (\hd tl. false) true
  
+What about extracting the head of a list? It only makes sense to do this when the list is known to be non-empty. In that case, the empty-handler can be anything since we know it's going to be discarded. We'll designate this dummy handler `H`. On the other hand, it's easy to see what our non-empty handler should be:
  
-Lists 2:
-       nil = false
-       [a] = (a,nil)
+       some-list (\hd tl. hd) H
  
-L (\h\t.K deal_with_h_and_t) if-nil
+Similarly for extracting the tail of a list.
  
-We've already seen enumerations: true | false, red | green | blue
-What if you want one or more of the elements to have associated data? e.g. red | green | blue <how-blue>
+When we get to discussing types, you'll see that the strategy deployed here has great generality. (Moreover, you can see the version 1 strategy as an approximate implementation of it.)
  
-could handle like this:
-       the-value if-red if-green (\n. handler-if-blue-to-degree-n)
  
-       then red = \r \g \b-handler. r
-                green = \r \g \b-handler. g
-                make-blue = \degree. \r \g \b-handler. b-handler degree
+There are other reasonable choices you could make for how to implement lists. We'll come back later and discuss a third. If you're creative, you'll be able to design more yourself. The hard part is making the design principled and minimizing extraneous cruft, like the `N` and the `H` in our above discussion.
  
-A list is basically: empty | non-empty <head,tail>
  
-               empty = \non-empty-handler \if-empty. if-empty = false
-               cons = \h \t. \non-empty-handler \if-empty. non-empty-handler h t
+Building Numbers
+================
  
-               so [a] = cons a empty = \non-empty-handler \_. non-empty-handler a empty
+Now how might we go about building numbers? We'll just try to build the natural numbers: 0, 1, 2, ...
  
+If you think about lists and numbers, you should start to see some interesting similarities between them. In each case there's a base value (the empty list, 0). And then further values are always the result of some operation (appending a new head to, taking the successor of) on an existing value.
  
+Because of this underlying similarity, we could in fact use either of the strategies described above to implement numbers.
+
+Following the version 1 strategy for lists, we could let 0 be:
+
+       (true, Z)
+
+for some useful---or, if need be, arbitrary---value `Z`. And given a number `n`, we could let the successor of `n` be:
+
+       (false, n)
+
+Given this implementation of the numbers, it would be an easy matter to determine whether a given number was zero. (How would you do it?) And it would also be an easy matter to determine the predecessor of any number that wasn't zero. (How would you do it?) Other arithmetic operations, however, would be more complicated. We haven't yet learned the tools that would be needed to determine whether two numbers were equal, or to add two numbers.
+
+Following the version 2 strategy for lists, we could adopt the convention that we'd operate on numbers by passing them an "if-you're-non-zero" handler and a "if-you're-zero" handler. If the number is non-zero, it will evaluate to the result of supplying the number's predecessor as an argument to the handler. If the number is zero, it will instead give us back the zero handler, unprocessed.
+
+With that convention, we could let 0 be:
+
+       \if-non-zero if-zero. if-zero
+
+and we could let the successor function be:
+
+       \n. \if-non-zero if-zero. if-non-zero n
+
+This is a more principled implementation, and would again make some arithmetic operations easy to implement. But as before, others would be more difficult.
+
+
+Composition
+===========
+
+We're going now to describe a third strategy, which goes in a different direction.
+
+The **composition** of two functions is the operation that first applies one of them, and then applies the second. For instance, the arithmetic operation that maps a real number *r* to *r<sup>2</sup>+1* is the composition of the squaring function and the successor function. This complex function is standardly written:
+
+<pre><code>successor &#8728; square</code></pre> 
+
+and in general:
+
+<pre><code>(s &#8728; f) z</code></pre>
+
+should be understood as:
+
+       s (f z)
+
+Now consider the following series:
+
+       z
+       s z
+       s (s z)
+       s (s (s z))
+       ...
+
+Remembering that I is the identity combinator, this could also be written:
+
+<pre><code>(I) z
+(s) z
+(s &#8728; s) z
+(s &#8728; s &#8728; s) z
+...</code></pre>
+
+And we might adopt the following natural shorthand for this:
+
+<pre><code>s<sup>0</sup> z
+s<sup>1</sup> z
+s<sup>2</sup> z
+s<sup>3</sup> z
+...</code></pre>
+
+We haven't introduced any new constants 0, 1, 2 into the object language, nor any new form of syntactic combination. This is all just a metalanguage abbreviation for:
+
+       z
+       s z
+       s (s z)
+       s (s (s z))
+       ...
+
+Church had the idea to implement the number *n* by an operation that accepted an arbitrary function `s` and base value `z` as arguments, and returned <code>s<sup><em>n</em></sup> z</code> as a result. In other words:
+
+<pre><code>zero &equiv; \s z. s<sup>0</sup> z &equiv; \s z. z
+one &equiv; \s z. s<sup>1</sup> z &equiv; \s z. s z
+two &equiv; \s z. s<sup>2</sup> z &equiv; \s z. s (s z)
+three &equiv; \s z. s<sup>3</sup> z &equiv; \s z. s (s (s z))
+...</code></pre>
+
+This is a very elegant idea. Implementing numbers this way, we'd let the successor function be:
+
+<pre><code>succ &equiv; \n. \s z. s (n s z)</code></pre>
+
+So, for example:
+
+<pre><code>    succ two
+  &equiv; (\n. \s z. s (n s z)) (\s z. s (s z))
+~~> \s z. s ((\s z, s (s z)) s z)
+~~> \s z. s (s (s z))</code></pre> 
+
+Adding *m* to *n* is a matter of applying the successor function to *n* *m* times. And we know how to apply an arbitrary function s to *n* *m* times: we just give that function s, and the base-value *n*, to *m* as arguments. Because that's what the function we're using to implement *m* *does*. Hence **add** can be defined to be, simply:
+
+       \m n. m succ n
+
+Isn't that nice?
+
+Alternatively, one could do:
+
+       \m n. \s z. m s (n s z)
+
+How would we tell whether a number was 0? Well, look again at the implementations of the first few numbers:
+
+<pre><code>zero &equiv; \s z. s<sup>0</sup> z &equiv; \s z. z
+one &equiv; \s z. s<sup>1</sup> z &equiv; \s z. s z
+two &equiv; \s z. s<sup>2</sup> z &equiv; \s z. s (s z)
+three &equiv; \s z. s<sup>3</sup> z &equiv; \s z. s (s (s z))
+...</code></pre>
+
+We can see that with the non-zero numbers, the function s is always applied to an argument at least once. With zero, on the other hand, we just get back the base-value. Hence we can determine whether a number is zero as follows:
+
+       some-number (\x. false) true
+
+If some-number is zero, this will evaluate to the base value true. If some-number is non-zero, then it will evaluate to the result of applying (\x. false) to the result of applying ... to the result of applying (\x. false) to the base value true. But the result of applying (\x. false) to any argument is always false. So when some-number is non-zero, this expressions evaluates to false.
+
+Perhaps not as elegant as addition, but still decently principled.
+
+Multiplication is even more elegant. Consider that applying an arbitrary function s to a base value z *m &times; n* times is a matter of applying s to z *n* times, and then doing that again, and again, and so on...for *m* repetitions. In other words, it's a matter of applying the function (\z. n s z) to z *m* times. In other words, *m &times; n* can be represented as:
+
+       \s z. m (\z. n s z) z
+
+which can be eta-reduced to:
+
+       \s. m (n s)
+
+and we might abbreviate that as:
+
+<pre><code>m &#8728; n</code></pre>
+
+
+Isn't that nice?
+
+And if we *apply* `m` to `n` instead of composing it, we get a implementation of exponentiation.
+
+However, at this point the elegance gives out. The predecessor function is substantially more difficult to construct on this implementation. As with all of these operations, there are several ways to do it, but they all take at least a bit of ingenuity. If you're only first learning programming right now, it would be unreasonable to expect you to be able to figure out how to do it.
+
+However, if on the other hand you do have some experience programming, consider how you might construct a predecessor function for numbers implemented in this way. Using only the resources we've so far discussed. (So you have no general facility for performing recursion, for instance.)
+
+
+Lists, version 3
+----------------
+
+It's possible to follow the same design for implementing lists, too. To see this, let's first step back and consider some of the more complex things you might do with a list. We don't need to think specifically inside the confines of the lambda calculus right now. These are general reflections.
+
+Assume you have a list of five integers, which I'll write using the OCaml notation: `[1; 2; 3; 4; 5]`.
+
+Now one thing you might want to do with the list is to double every member. Another thing you might want to do is to increment every number. More generally, given an arbitrary function `f`, you might want to get the list which is `[f 1; f 2; f 3; f 4; f 5]`. Computer scientists call this **mapping** the function `f` over the list `[1; 2; 3; 4; 5]`.
+
+Another thing you might want to do with the list is to retrieve every member which is even. Or every member which is prime. Or, given an arbitrary function f, you might want to **filter** the original list to a shorter list containing only those elements `x` for which `f x` evaluates to true.
+
+These are very basic, frequently-used operations on lists.
+
+Another operation on lists is a bit harder to get a mental hold of, but is even more fundamental than the two just mentioned. An example of this operation would be if you were to **sum up** the members of the list. What would you do? We'll you'd start with the first element of the list. Actually, for generality, let's say you start with a *seed value*. In this case the seed value can be 0. Then you take the first element of the list and add it to the seed value. Now you have 1. You take the second element of the list, and add it to the result so far. Now you have 3. You take the third element of the list, and add it to the result so far. And so on.
+
+This general form of operation is known as **folding** an operation---in this case, the addition operation---over the list. Addition is symmetric, so it doesn't matter whether you start at the left side of the list or the right. But we can't in general rely on the operations to be symmetric. So there are two notions. This is the **left-fold** of an operation f over our list `[1; 2; 3; 4; 5]` given a seed value z:
+
+       f (f (f (f (f z 1) 2) 3) 4) 5
+
+and this is the **right-fold**:
+
+       f 1 (f 2 (f 3 (f 4 (f 5 z))))
+       
+Church's proposal for implementing the numbers identified the essential behavior of a number *m* to be applying an arbitary function s to a base value z *m* times. In a similar spirit, we can identify the essential behavior of a list to be folding an arbitrary operation f over the elements of the list and a seed value z. In other words, we could represent the list `[1; 2; 3; 4; 5]` as a function that accepted arbitrary `f` and `z` as arguments, and returned one of the folds above.
  
-Lists 3:
-[a; tl] isnil == (\f. f a tl) (\h \t.false) a b ~~> false a b
+You could do this using either sort of fold, but choosing the right fold gives us an implementation closest to Church's encoding of the numbers. Then we'd define `[1; 2; 3; 4; 5]` to be:
  
-nil isnil == (\f. M) (\h \t. false) a b ~~> M[f:=isnil] a b == a
+       \f z. f 1 (f 2 (f 3 (f 4 (f 5 z))))
  
-       so M could be \a \b. a, i.e. true
-       so nil = \f. true == K true == K K = \_ K
+Compare Church's definition of the number five:
  
-       nil = K true
-       [a] = (a,nil)
-       [b,a] = (b,[a])
+       \s z. s   (s   (s   (s   (s   z))))
  
-isnil = (\x\y.false)
+This has real elegance, and it makes it easy to implement a number of primitive list operatioons. For example, checking whether a list implemented in this way is empty is easy. So too is extracting the head of a list known to be non-empty. However, other operations require some ingenuity. Extracting the tail of a list is about as difficult as retrieving the predecessor of a Church number. (This should not be surprising, given how similar in design these implementations are.)
  
-nil tail = K true tail = true = \x\y.x = \f.f? such that f? = Kx. there is no such.
-nil head = K true head = true. could mislead.