fix links to v4 lists

[lambda.git] / week4.mdwn
diff --git a/week4.mdwn b/week4.mdwn

index b7ff879..e3f0fb9 100644 (file)
--- a/week4.mdwn
+++ b/week4.mdwn
@@ -6,11 +6,11 @@ A: That's easy: let `T` be an arbitrary term in the lambda calculus.  If
  `T` has a fixed point, then there exists some `X` such that `X <~~>
  TX` (that's what it means to *have* a fixed point).
  
-<pre>
-let W = \x.T(xx) in
-let X = WW in
-X = WW = (\x.T(xx))W = T(WW) = TX
-</pre>
+<pre><code>
+let L = \x. T (x x) in
+let X = L L in
+X &equiv; L L &equiv; (\x. T (x x)) L ~~> T (L L) &equiv; T X
+</code></pre>
  
  Please slow down and make sure that you understand what justified each
  of the equalities in the last line.
@@ -173,129 +173,399 @@ so A 4 x is to A 3 x as hyper-exponentiation is to exponentiation...
       Is leastness important?
  
  
-##The simply-typed lambda calculus##
-
-The uptyped lambda calculus is pure computation.  It is much more
-common, however, for practical programming languages to be typed.
-Likewise, systems used to investigate philosophical or linguistic
-issues are almost always typed.  Types will help us reason about our
-computations.  They will also facilitate a connection between logic
-and computation.
  
-Soon we will consider polymorphic type systems.  First, however, we
-will consider the simply-typed lambda calculus.  There's good news and
-bad news: the good news is that the simply-type lambda calculus is
-strongly normalizing: every term has a normal form.  We shall see that
-self-application is outlawed, so &Omega; can't even be written, let
-alone undergo reduction.  The bad news is that fixed-point combinators
-are also forbidden, so recursion is neither simple nor direct.
+#Sets#
+
+You're now already in a position to implement sets: that is, collections with
+no intrinsic order where elements can occur at most once. Like lists, we'll
+understand the basic set structures to be *type-homogenous*. So you might have
+a set of integers, or you might have a set of pairs of integers, but you
+wouldn't have a set that mixed both types of elements. Something *like* the
+last option is also achievable, but it's more difficult, and we won't pursue it
+now. In fact, we won't talk about sets of pairs, either. We'll just talk about
+sets of integers. The same techniques we discuss here could also be applied to
+sets of pairs of integers, or sets of triples of booleans, or sets of pairs
+whose first elements are booleans, and whose second elements are triples of
+integers. And so on.
+
+(You're also now in a position to implement *multi*sets: that is, collections
+with no intrinsic order where elements can occur multiple times: the multiset
+{a,a} is distinct from the multiset {a}. But we'll leave these as an exercise.)
+
+The easiest way to implement sets of integers would just be to use lists. When
+you "add" a member to a set, you'd get back a list that was either identical to
+the original list, if the added member already was present in it, or consisted
+of a new list with the added member prepended to the old list. That is:
+
+       let empty_set = empty  in
+       ; see the library for definitions of any and eq
+       let make_set = \new_member old_set. any (eq new_member) old_set
+                                               ; if any element in old_set was eq new_member
+                                               old_set
+                                               ; else
+                                               make_list new_member old_set
+
+Think about how you'd implement operations like `set_union`,
+`set_intersection`, and `set_difference` with this implementation of sets.
+
+The implementation just described works, and it's the simplest to code.
+However, it's pretty inefficient. If you had a 100-member set, and you wanted
+to create a set which had all those 100-members and some possibly new element
+`e`, you might need to check all 100 members to see if they're equal to `e`
+before concluding they're not, and returning the new list. And comparing for
+numeric equality is a moderately expensive operation, in the first place.
+
+(You might say, well, what's the harm in just prepending `e` to the list even
+if it already occurs later in the list. The answer is, if you don't keep track
+of things like this, it will likely mess up your implementations of
+`set_difference` and so on. You'll have to do the book-keeping for duplicates
+at some point in your code. It goes much more smoothly if you plan this from
+the very beginning.)
+
+How might we make the implementation more efficient? Well, the *semantics* of
+sets says that they have no intrinsic order. That means, there's no difference
+between the set {a,b} and the set {b,a}; whereas there is a difference between
+the *list* `[a;b]` and the list `[b;a]`. But this semantic point can be respected
+even if we *implement* sets with something ordered, like list---as we're
+already doing. And we might *exploit* the intrinsic order of lists to make our
+implementation of sets more efficient.
+
+What we could do is arrange it so that a list that implements a set always
+keeps in elements in some specified order. To do this, there'd have *to be*
+some way to order its elements. Since we're talking now about sets of numbers,
+that's easy. (If we were talking about sets of pairs of numbers, we'd use
+"lexicographic" ordering, where `(a,b) < (c,d)` iff `a < c or (a == c and b <
+d)`.)
+
+So, if we were searching the list that implements some set to see if the number
+`5` belonged to it, once we get to elements in the list that are larger than `5`,
+we can stop. If we haven't found `5` already, we know it's not in the rest of the
+list either.
+
+This is an improvement, but it's still a "linear" search through the list.
+There are even more efficient methods, which employ "binary" searching. They'd
+represent the set in such a way that you could quickly determine whether some
+element fell in one half, call it the left half, of the structure that
+implements the set, if it belonged to the set at all. Or that it fell in the
+right half, it it belonged to the set at all. And then the same sort of
+determination could be made for whichever half you were directed to. And then
+for whichever quarter you were directed to next. And so on. Until you either
+found the element or exhausted the structure and could then conclude that the
+element in question was not part of the set. These sorts of structures are done
+using **binary trees** (see below).
+
+
+#Aborting a search through a list#
+
+We said that the sorted-list implementation of a set was more efficient than
+the unsorted-list implementation, because as you were searching through the
+list, you could come to a point where you knew the element wasn't going to be
+found. So you wouldn't have to continue the search.
+
+If your implementation of lists was, say v1 lists plus the Y-combinator, then
+this is exactly right. When you get to a point where you know the answer, you
+can just deliver that answer, and not branch into any further recursion. If
+you've got the right evaluation strategy in place, everything will work out
+fine.
+
+But what if you're using v3 lists? What options would you have then for
+aborting a search?
+
+Well, suppose we're searching through the list `[5;4;3;2;1]` to see if it
+contains the number `3`. The expression which represents this search would have
+something like the following form:
+
+       ..................<eq? 1 3>  ~~>
+       .................. false     ~~>
+       .............<eq? 2 3>       ~~>
+       ............. false          ~~>
+       .........<eq? 3 3>           ~~>
+       ......... true               ~~>
+       ?
+
+Of course, whether those reductions actually followed in that order would
+depend on what reduction strategy was in place. But the result of folding the
+search function over the part of the list whose head is `3` and whose tail is `[2;
+1]` will *semantically* depend on the result of applying that function to the
+more rightmost pieces of the list, too, regardless of what order the reduction
+is computed by. Conceptually, it will be easiest if we think of the reduction
+happening in the order displayed above.
+
+Well, once we've found a match between our sought number `3` and some member of
+the list, we'd like to avoid any further unnecessary computations and just
+deliver the answer `true` as "quickly" or directly as possible to the larger
+computation in which the search was embedded.
+
+With a Y-combinator based search, as we said, we could do this by just not
+following a recursion branch.
+
+But with the v3 lists, the fold is "pre-programmed" to continue over the whole
+list. There is no way for us to bail out of applying the search function to the
+parts of the list that have head `4` and head `5`, too.
  
-#Types#
+We *can* avoid *some* unneccessary computation. The search function can detect
+that the result we've accumulated so far during the fold is now `true`, so we
+don't need to bother comparing `4` or `5` to `3` for equality. That will simplify the
+computation to some degree, since as we said, numerical comparison in the
+system we're working in is moderately expensive.
+
+However, we're still going to have to traverse the remainder of the list. That
+`true` result will have to be passed along all the way to the leftmost head of
+the list. Only then can we deliver it to the larger computation in which the
+search was embedded.
+
+It would be better if there were some way to "abort" the list traversal. If,
+having found the element we're looking for (or having determined that the
+element isn't going to be found), we could just immediately stop traversing the
+list with our answer. **Continuations** will turn out to let us do that.
  
-We will have at least one ground type, `o`.  From a linguistic point
-of view, think of the ground types as the bar-level 0 categories, that
-is, the lexical types, such as Noun, Verb, Preposition (glossing over
-the internal complexity of those categories in modern theories).
+We won't try yet to fully exploit the terrible power of continuations. But
+there's a way that we can gain their benefits here locally, without yet having
+a fully general machinery or understanding of what's going on.
  
-In addition, there will be a recursively-defined class of complex
-types `T`, the smallest set such that
+The key is to recall how our implementations of booleans and pairs worked.
+Remember that with pairs, we supply the pair "handler" to the pair as *an
+argument*, rather than the other way around:
  
-*    ground types, including `o`, are in `T`
+       pair (\x y. add x y)
  
-*    for any types &sigma; and &tau; in `T`, the type &sigma; -->
-     &tau; is in `T`.
+or:
  
-For instance, here are some types in `T`:
+       pair (\x y. x)
  
-     o
-     o --> o
-     o --> o --> o
-     (o --> o) --> o
-     (o --> o) --> o --> o
+to get the first element of the pair. Of course you can lift that if you want:
  
-and so on.
+<pre><code>extract_fst &equiv; \pair. pair (\x y. x)</code></pre>
  
-#Typed lambda terms#
+but at a lower level, the pair is still accepting its handler as an argument,
+rather than the handler taking the pair as an argument. (The handler gets *the
+pair's elements*, not the pair itself, as arguments.)
  
-Given a set of types `T`, we define the set of typed lambda terms <code>&Lambda;_T</code>,
-which is the smallest set such that
+>      *Terminology*: we'll try to use names of the form `get_foo` for handlers, and
+names of the form `extract_foo` for lifted versions of them, that accept the
+lists (or whatever data structure we're working with) as arguments. But we may
+sometimes forget.
  
-*    each type `t` has an infinite set of distinct variables, {x^t}_1,
-     {x^t}_2, {x^t}_3, ...
+The v2 implementation of lists followed a similar strategy:
  
-*    If a term `M` has type &sigma; --> &tau;, and a term `N` has type
-     &sigma;, then the application `(M N)` has type &tau;.
+       v2list (\h t. do_something_with_h_and_t) result_if_empty
  
-*    If a variable `a` has type &sigma;, and term `M` has type &tau;, 
-     then the abstract <code>&lambda; a M</code> has type &sigma; --> &tau;.
+If the `v2list` here is not empty, then this will reduce to the result of
+supplying the list's head and tail to the handler `(\h t.
+do_something_with_h_and_t)`.
  
-The definitions of types and of typed terms should be highly familiar
-to semanticists, except that instead of writing &sigma; --> &tau;,
-linguists (following Montague, who followed Church) write <&sigma;,
-&tau;>.  We will use the arrow notation, since it is more iconic.
+Now, what we've been imagining ourselves doing with the search through the v3
+list is something like this:
  
-Some examples (assume that `x` has type `o`):
  
-      x            o
-      \x.x         o --> o
-      ((\x.x) x)   o
+       larger_computation (search_through_the_list_for_3) other_arguments
  
-Excercise: write down terms that have the following types:
+That is, the result of our search is supplied as an argument (perhaps together
+with other arguments) to the "larger computation". Without knowing the
+evaluation order/reduction strategy, we can't say whether the search is
+evaluated before or after it's substituted into the larger computation. But
+semantically, the search is the argument and the larger computation is the
+function to which it's supplied.
  
-                   o --> o --> o
-                   (o --> o) --> o --> o
-                   (o --> o --> o) --> o
+What if, instead, we did the same kind of thing we did with pairs and v2
+lists? That is, what if we made the larger computation a "handler" that we
+passed as an argument to the search?
  
-#Associativity of types versus terms#
+       the_search (\search_result. larger_computation search_result other_arguments)
  
-As we have seen many times, in the lambda calculus, function
-application is left associative, so that `f x y z == (((f x) y) z)`.
-Types, *THEREFORE*, are right associative: if `f`, `x`, `y`, and `z`
-have types `a`, `b`, `c`, and `d`, respectively, then `f` has type `a
---> b --> c --> d == (a --> (b --> (c --> d)))`.
+What's the advantage of that, you say. Other than to show off how cleverly
+you can lift.
  
-It is a serious faux pas to associate to the left for types.  You may
-as well use your salad fork to stir your tea.
+Well, think about it. Think about the difficulty we were having aborting the
+search. Does this switch-around offer us anything useful?
  
-#The simply-typed lambda calculus is strongly normalizing#
+It could.
  
-If `M` is a term with type &tau; in &Lambda;_T, then `M` has a
-normal form.  The proof is not particularly complex, but we will not
-present it here; see Berendregt or Hankin.
+What if the way we implemented the search procedure looked something like this?
  
-Since &Omega; does not have a normal form, it follows that &Omega;
-cannot have a type in &Lambda;_T.  We can easily see why:
+At a given stage in the search, we wouldn't just apply some function `f` to the
+head at this stage and the result accumulated so far (from folding the same
+function, and a base value, to the tail at this stage)...and then pass the result
+of that application to the embedding, more leftward computation.
  
-     &Omega; = (\x.xx)(\x.xx)
+We'd *instead* give `f` a "handler" that expects the result of the current
+stage *as an argument*, and then evaluates to what you'd get by passing that
+result leftwards up the list, as before. 
  
-Assume &Omega; has type &tau;, and `\x.xx` has type &sigma;.  Then
-because `\x.xx` takes an argument of type &sigma; and returns
-something of type &tau;, `\x.xx` must also have type `&sigma; -->
-&tau;`.  By repeating this reasoning, `\x.xx` must also have type
-`(&sigma; --> &tau;) --> &tau;`; and so on.  Since variables have
-finite types, there is no way to choose a type for the variable `x`
-that can satisfy all of the requirements imposed on it.
+Why would we do that, you say? Just more flamboyant lifting?
  
-In general, there is no way for a function to have a type that can
-take itself for an argument.  It follows that there is no way to
-define the identity function in such a way that it can take itself as
-an argument.  Instead, there must be many different identity
-functions, one for each type.
+Well, no, there's a real point here. If we give the function a "handler" that
+encodes the normal continuation of the fold leftwards through the list, we can
+also give it other "handlers" too. For example, we can also give it the underlined handler:
  
-#Typing numerals#
  
-Version 1 type numerals are not a good choice for the simply-typed
-lambda calculus.  The reason is that each different numberal has a
-different type!  For instance, if zero has type &sigma;, then `false`
-has type &tau; --> &tau; --> &tau, for some &tau;.  Since one is
-represented by the function `\x.x false 0`, one must have type (&tau;
---> &tau; --> &tau;) --> &sigma; --> &sigma;.  But this is a different
-type than zero!  Because each number has a different type, it becomes
-impossible to write arithmetic operations that can combine zero with
-one.  We would need as many different addition operations as we had
-pairs of numbers that we wanted to add.
+       the_search (\search_result. larger_computation search_result other_arguments)
+                          ------------------------------------------------------------------
  
-Fortunately, the Church numberals are well behaved with respect to
-types.  They can all be given the type (&sigma; --> &sigma;) -->
-&sigma; --> &sigma;.
+This "handler" encodes the search's having finished, and delivering a final
+answer to whatever else you wanted your program to do with the result of the
+search. If you like, at any stage in the search you might just give an argument
+to *this* handler, instead of giving an argument to the handler that continues
+the list traversal leftwards. Semantically, this would amount to *aborting* the
+list traversal! (As we've said before, whether the rest of the list traversal
+really gets evaluated will depend on what evaluation order is in place. But
+semantically we'll have avoided it. Our larger computation  won't depend on the
+rest of the list traversal having been computed.)
+
+Do you have the basic idea? Think about how you'd implement it. A good
+understanding of the v2 lists will give you a helpful model.
+
+In broad outline, a single stage of the search would look like before, except
+now f would receive two extra, "handler" arguments.
+
+       f 3 <result of folding f and z over [2; 1]> <handler to continue folding leftwards> <handler to abort the traversal>
+
+`f`'s job would be to check whether `3` matches the element we're searching for
+(here also `3`), and if it does, just evaluate to the result of passing `true` to
+the abort handler. If it doesn't, then evaluate to the result of passing
+`false` to the continue-leftwards handler.
+
+In this case, `f` wouldn't need to consult the result of folding `f` and `z` over `[2;
+1]`, since if we had found the element `3` in more rightward positions of the
+list, we'd have called the abort handler and this application of `f` to `3` etc
+would never be needed. However, in other applications the result of folding `f`
+and `z` over the more rightward parts of the list would be needed. Consider if
+you were trying to multiply all the elements of the list, and were going to
+abort (with the result `0`) if you came across any element in the list that was
+zero. If you didn't abort, you'd need to know what the more rightward elements
+of the list multiplied to, because that would affect the answer you passed
+along to the continue-leftwards handler.
+
+A **version 5** list encodes the kind of fold operation we're envisaging here, in
+the same way that v3 (and [v4](/advanced/#index1h1)) lists encoded the simpler fold operation.
+Roughly, the list `[5;4;3;2;1]` would look like this:
+
+
+       \f z continue_leftwards_handler abort_handler.
+               <fold f and z over [4;3;2;1]>
+               (\result_of_fold_over_4321. f 5 result_of_fold_over_4321  continue_leftwards_handler abort_handler)
+               abort_handler
+
+       ; or, expanding the fold over [4;3;2;1]:
+
+       \f z continue_leftwards_handler abort_handler.
+               (\continue_leftwards_handler abort_handler.
+                       <fold f and z over [3;2;1]>
+                       (\result_of_fold_over_321. f 4 result_of_fold_over_321 continue_leftwards_handler abort_handler)
+                       abort_handler
+               )
+               (\result_of_fold_over_4321. f 5 result_of_fold_over_4321  continue_leftwards_handler abort_handler)
+               abort_handler
+
+       ; and so on             
+       
+Remarks: the `larger_computation` handler should be supplied as both the
+`continue_leftwards_handler` and the `abort_handler` for the leftmost
+application, where the head `5` is supplied to `f`; because the result of this
+application should be passed to the larger computation, whether it's a "fall
+off the left end of the list" result or it's a "I'm finished, possibly early"
+result. The `larger_computation` handler also then gets passed to the next
+rightmost stage, where the head `4` is supplied to `f`, as the `abort_handler` to
+use if that stage decides it has an early answer.
+
+Finally, notice that we don't have the result of applying `f` to `4` etc given as
+an argument to the application of `f` to `5` etc. Instead, we pass
+
+       (\result_of_fold_over_4321. f 5 result_of_fold_over_4321 <one_handler> <another_handler>)
+
+*to* the application of `f` to `4` as its "continue" handler. The application of `f`
+to `4` can decide whether this handler, or the other, "abort" handler, should be
+given an argument and constitute its result.
+
+
+I'll say once again: we're using temporally-loaded vocabulary throughout this,
+but really all we're in a position to mean by that are claims about the result
+of the complex expression semantically depending only on this, not on that. A
+demon evaluator who custom-picked the evaluation order to make things maximally
+bad for you could ensure that all the semantically unnecessary computations got
+evaluated anyway. We don't have any way to prevent that. Later,
+we'll see ways to *semantically guarantee* one evaluation order rather than
+another. Though even then the demonic evaluation-order-chooser could make it
+take unnecessarily long to compute the semantically guaranteed result. Of
+course, in any real computing environment you'll know you're dealing with a
+fixed evaluation order and you'll be able to program efficiently around that.
+
+In detail, then, here's what our v5 lists will look like:
+
+       let empty = \f z continue_handler abort_handler. continue_handler z  in
+       let make_list = \h t. \f z continue_handler abort_handler.
+               t f z (\sofar. f h sofar continue_handler abort_handler) abort_handler  in
+       let isempty = \lst larger_computation. lst
+                       ; here's our f
+                       (\hd sofar continue_handler abort_handler. abort_handler false)
+                       ; here's our z
+                       true
+                       ; here's the continue_handler for the leftmost application of f
+                       larger_computation
+                       ; here's the abort_handler
+                       larger_computation  in
+       let extract_head = \lst larger_computation. lst
+                       ; here's our f
+                       (\hd sofar continue_handler abort_handler. continue_handler hd)
+                       ; here's our z
+                       junk
+                       ; here's the continue_handler for the leftmost application of f
+                       larger_computation
+                       ; here's the abort_handler
+                       larger_computation  in
+       let extract_tail = ; left as exercise
+
+These functions are used like this:
+
+       let my_list = make_list a (make_list b (make_list c empty) in
+       extract_head my_list larger_computation
+
+If you just want to see `my_list`'s head, the use `I` as the
+`larger_computation`.
+
+What we've done here does take some work to follow. But it should be within
+your reach. And once you have followed it, you'll be well on your way to
+appreciating the full terrible power of continuations.
+
+<!-- (Silly [cultural reference](http://www.newgrounds.com/portal/view/33440).) -->
+
+Of course, like everything elegant and exciting in this seminar, [Oleg
+discusses it in much more
+detail](http://okmij.org/ftp/Streams.html#enumerator-stream).
+
+*Comments*:
+
+1.     The technique deployed here, and in the v2 lists, and in our implementations
+       of pairs and booleans, is known as **continuation-passing style** programming.
+
+2.     We're still building the list as a right fold, so in a sense the
+       application of `f` to the leftmost element `5` is "outermost". However,
+       this "outermost" application is getting lifted, and passed as a *handler*
+       to the next right application. Which is in turn getting lifted, and
+       passed to its next right application, and so on. So if you
+       trace the evaluation of the `extract_head` function to the list `[5;4;3;2;1]`,
+       you'll see `1` gets passed as a "this is the head sofar" answer to its
+       `continue_handler`; then that answer is discarded and `2` is
+       passed as a "this is the head sofar" answer to *its* `continue_handler`,
+       and so on. All those steps have to be evaluated to finally get the result
+       that `5` is the outer/leftmost head of the list. That's not an efficient way
+       to get the leftmost head.
+
+       We could improve this by building lists as left folds when implementing them
+       as continuation-passing style folds. We'd just replace above:
+
+               let make_list = \h t. \f z continue_handler abort_handler.
+                       f h z (\z. t f z continue_handler abort_handler) abort_handler
+
+       now `extract_head` should return the leftmost head directly, using its `abort_handler`:
+
+               let extract_head = \lst larger_computation. lst
+                               (\hd sofar continue_handler abort_handler. abort_handler hd)
+                               junk
+                               larger_computation
+                               larger_computation
+
+3.     To extract tails efficiently, too, it'd be nice to fuse the apparatus developed
+       in these v5 lists with the ideas from [v4](/advanced/#index1h1) lists.
+       But that also is left as an exercise.