from_list_zippers_to_continuations.mdwn

   1 Refunctionalizing zippers: from lists to continuations
   2 ------------------------------------------------------
   3
   4 If zippers are continuations reified (defuntionalized), then one route
   5 to continuations is to re-functionalize a zipper.  Then the
   6 concreteness and understandability of the zipper provides a way of
   7 understanding an equivalent treatment using continuations.
   8
   9 Let's work with lists of `char`s for a change.  We'll sometimes write
  10 "abSd" as an abbreviation for
  11 `['a'; 'b'; 'S'; 'd']`.
  12
  13 We will set out to compute a deceptively simple-seeming **task: given a
  14 string, replace each occurrence of 'S' in that string with a copy of
  15 the string up to that point.**
  16
  17 We'll define a function `t` (for "task") that maps strings to their
  18 updated version.
  19
  20 Expected behavior:
  21
  22         t "abSd" ~~> "ababd"
  23
  24
  25 In linguistic terms, this is a kind of anaphora
  26 resolution, where `'S'` is functioning like an anaphoric element, and
  27 the preceding string portion is the antecedent.
  28
  29 This task can give rise to considerable complexity.
  30 Note that it matters which 'S' you target first (the position of the *
  31 indicates the targeted 'S'):
  32
  33             t "aSbS"
  34                 *
  35         ~~> t "aabS"
  36                   *
  37         ~~> "aabaab"
  38
  39 versus
  40
  41             t "aSbS"
  42                   *
  43         ~~> t "aSbaSb"
  44                 *
  45         ~~> t "aabaSb"
  46                    *
  47         ~~> "aabaaabab"
  48
  49 versus
  50
  51             t "aSbS"
  52                   *
  53         ~~> t "aSbaSb"
  54                    *
  55         ~~> t "aSbaaSbab"
  56                     *
  57         ~~> t "aSbaaaSbaabab"
  58                      *
  59         ~~> ...
  60
  61 Apparently, this task, as simple as it is, is a form of computation,
  62 and the order in which the `'S'`s get evaluated can lead to divergent
  63 behavior.
  64
  65 For now, we'll agree to always evaluate the leftmost `'S'`, which
  66 guarantees termination, and a final string without any `'S'` in it.
  67
  68 This is a task well-suited to using a zipper.  We'll define a function
  69 `tz` (for task with zippers), which accomplishes the task by mapping a
  70 `char list zipper` to a `char list`.  We'll call the two parts of the
  71 zipper `unzipped` and `zipped`; we start with a fully zipped list, and
  72 move elements to the unzipped part by pulling the zipper down until the
  73 entire list has been unzipped, at which point the zipped half of the
  74 zipper will be empty.
  75
  76         type 'a list_zipper = ('a list) * ('a list);;
  77
  78         let rec tz (z : char list_zipper) =
  79           match z with
  80             | (unzipped, []) -> List.rev(unzipped) (* Done! *)
  81             | (unzipped, 'S'::zipped) -> tz ((List.append unzipped unzipped), zipped)
  82             | (unzipped, target::zipped) -> tz (target::unzipped, zipped);; (* Pull zipper *)
  83
  84         # tz ([], ['a'; 'b'; 'S'; 'd']);;
  85         - : char list = ['a'; 'b'; 'a'; 'b'; 'd']
  86
  87         # tz ([], ['a'; 'S'; 'b'; 'S']);;
  88         - : char list = ['a'; 'a'; 'b'; 'a'; 'a'; 'b']
  89
  90 Note that the direction in which the zipper unzips enforces the
  91 evaluate-leftmost rule.  Task completed.
  92
  93 One way to see exactly what is going on is to watch the zipper in
  94 action by tracing the execution of `tz`.  By using the `#trace`
  95 directive in the OCaml interpreter, the system will print out the
  96 arguments to `tz` each time it is called, including when it is called
  97 recursively within one of the `match` clauses.  Note that the
  98 lines with left-facing arrows (`<--`) show (both initial and recursive) calls to `tz`,
  99 giving the value of its argument (a zipper), and the lines with
 100 right-facing arrows (`-->`) show the output of each recursive call, a
 101 simple list.
 102
 103         # #trace tz;;
 104         t1 is now traced.
 105         # tz ([], ['a'; 'b'; 'S'; 'd']);;
 106         tz <-- ([], ['a'; 'b'; 'S'; 'd'])       (* Initial call *)
 107         tz <-- (['a'], ['b'; 'S'; 'd'])         (* Pull zipper *)
 108         tz <-- (['b'; 'a'], ['S'; 'd'])         (* Pull zipper *)
 109         tz <-- (['b'; 'a'; 'b'; 'a'], ['d'])    (* Special 'S' step *)
 110         tz <-- (['d'; 'b'; 'a'; 'b'; 'a'], [])  (* Pull zipper *)
 111         tz --> ['a'; 'b'; 'a'; 'b'; 'd']        (* Output reversed *)
 112         tz --> ['a'; 'b'; 'a'; 'b'; 'd']
 113         tz --> ['a'; 'b'; 'a'; 'b'; 'd']
 114         tz --> ['a'; 'b'; 'a'; 'b'; 'd']
 115         tz --> ['a'; 'b'; 'a'; 'b'; 'd']
 116         - : char list = ['a'; 'b'; 'a'; 'b'; 'd']
 117
 118 The nice thing about computations involving lists is that it's so easy
 119 to visualize them as a data structure.  Eventually, we want to get to
 120 a place where we can talk about more abstract computations.  In order
 121 to get there, we'll first do the exact same thing we just did with
 122 concrete zipper using procedures instead.
 123
 124 Think of a list as a procedural recipe: `['a'; 'b'; 'c'; 'd']` is the result of
 125 the computation `'a'::('b'::('c'::('d'::[])))` (or, in our old style,
 126 `make_list 'a' (make_list 'b' (make_list 'c' (make_list 'd' empty)))`). The
 127 recipe for constructing the list goes like this:
 128
 129 >       (0)  Start with the empty list []
 130 >       (1)  make a new list whose first element is 'd' and whose tail is the list constructed in step (0)
 131 >       (2)  make a new list whose first element is 'c' and whose tail is the list constructed in step (1)
 132 >       -----------------------------------------
 133 >       (3)  make a new list whose first element is 'b' and whose tail is the list constructed in step (2)
 134 >       (4)  make a new list whose first element is 'a' and whose tail is the list constructed in step (3)
 135
 136 What is the type of each of these steps?  Well, it will be a function
 137 from the result of the previous step (a list) to a new list: it will
 138 be a function of type `char list -> char list`.  We'll call each step
 139 (or group of steps) a **continuation** of the previous steps.  So in this
 140 context, a continuation is a function of type `char list -> char
 141 list`.  For instance, the continuation corresponding to the portion of
 142 the recipe below the horizontal line is the function `fun (tail : char
 143 list) -> 'a'::('b'::tail)`. What is the continuation of the 4th step? That is, after we've built up `'a'::('b'::('c'::('d'::[])))`, what more has to happen to that for it to become the list `['a'; 'b'; 'c'; 'd']`? Nothing! Its continuation is the function that does nothing: `fun tail -> tail`.
 144
 145 In what follows, we'll be thinking about the result list that we're building up in this procedural way. We'll treat our input list just as a plain old static list data structure, that we recurse through in the normal way we're accustomed to. We won't need a zipper data structure, because the continuation-based representation of our result list will take over the same role.
 146
 147 So our new function `tc` (for task with continuations) takes an input list (not a zipper) and a also takes a continuation `k` (it's conventional to use `k` for continuation variables). `k` is a function that represents how the result list is going to continue being built up after this invocation of `tc` delivers up a value. When we invoke `tc` for the first time, we expect it to deliver as a value the very de-S'd list we're seeking, so the way for the list to continue being built up is for nothing to happen to it. That is, our initial invocation of `tc` will supply `fun tail -> tail` as the value for `k`. Here is the whole `tc` function. Its structure and behavior follows `tz` from above, which we've repeated here to facilitate detailed comparison:
 148
 149         let rec tz (z : char list_zipper) =
 150             match z with
 151             | (unzipped, []) -> List.rev(unzipped) (* Done! *)
 152             | (unzipped, 'S'::zipped) -> tz ((List.append unzipped unzipped), zipped)
 153             | (unzipped, target::zipped) -> tz (target::unzipped, zipped);; (* Pull zipper *)
 154
 155         let rec tc (l: char list) (k: (char list) -> (char list)) =
 156             match l with
 157             | [] -> List.rev (k [])
 158             | 'S'::zipped -> tc zipped (fun tail -> k (k tail))
 159             | target::zipped -> tc zipped (fun tail -> target::(k tail));;
 160
 161         # tc ['a'; 'b'; 'S'; 'd'] (fun tail -> tail);;
 162         - : char list = ['a'; 'b'; 'a'; 'b']
 163
 164         # tc ['a'; 'S'; 'b'; 'S'] (fun tail -> tail);;
 165         - : char list = ['a'; 'a'; 'b'; 'a'; 'a'; 'b']
 166
 167 To emphasize the parallel, we've re-used the names `zipped` and
 168 `target`.  The trace of the procedure will show that these variables
 169 take on the same values in the same series of steps as they did during
 170 the execution of `tz` above: there will once again be one initial and
 171 four recursive calls to `tc`, and `zipped` will take on the values
 172 `"bSd"`, `"Sd"`, `"d"`, and `""` (and, once again, on the final call,
 173 the first `match` clause will fire, so the the variable `zipped` will
 174 not be instantiated).
 175
 176 We have not named the continuation argument `unzipped`, although that is
 177 what the parallel would suggest.  The reason is that `unzipped` (in
 178 `tz`) is a list, but `k` (in `tc`) is a function.  That's the most crucial
 179 difference between the solutions---it's the
 180 point of the excercise, and it should be emphasized.  For instance,
 181 you can see this difference in the fact that in `tz`, we have to glue
 182 together the two instances of `unzipped` with an explicit (and,
 183 computationally speaking, relatively inefficient) `List.append`.
 184 In the `tc` version of the task, we simply compose `k` with itself:
 185 `k o k = fun tail -> k (k tail)`.
 186
 187 A call `tc ['a'; 'b'; 'S'; 'd']` would yield a partially-applied function; it would still wait for another argument, a continuation of type `char list -> char list`. So we have to give it an "initial continuation" to get started. As mentioned above, we supply *the identity function* as the initial continuation. Why did we choose that? Again, if
 188 you have already constructed the result list `"ababd"`, what's the desired continuation? What's the next step in the recipe to produce the desired result, i.e, the very same list, `"ababd"`?  Clearly, the identity function.
 189
 190 A good way to test your understanding is to figure out what the
 191 continuation function `k` must be at the point in the computation when
 192 `tc` is applied to the argument `"Sd"`.  Two choices: is it
 193 `fun tail -> 'a'::'b'::tail`, or it is `fun tail -> 'b'::'a'::tail`?  The way to see if you're right is to execute the following command and see what happens:
 194
 195     tc ['S'; 'd'] (fun tail -> 'a'::'b'::tail);;
 196
 197 There are a number of interesting directions we can go with this task.
 198 The reason this task was chosen is because the task itself (as opposed
 199 to the functions used to implement the task) can be viewed as a
 200 simplified picture of a computation using continuations, where `'S'`
 201 plays the role of a continuation operator. (It works like the Scheme
 202 operators `shift` or `control`; the differences between them don't
 203 manifest themselves in this example.
 204 See Ken Shan's paper [Shift to control](http://www.cs.rutgers.edu/~ccshan/recur/recur.pdf),
 205 which inspired some of the discussion in this topic.)
 206 In the analogy, the input list portrays a
 207 sequence of functional applications, where `[f1; f2; f3; x]` represents
 208 `f1(f2(f3 x))`.  The limitation of the analogy is that it is only
 209 possible to represent computations in which the applications are
 210 always right-branching, i.e., the computation `((f1 f2) f3) x` cannot
 211 be directly represented.
 212
 213 One way to extend this exercise would be to add a special symbol `'#'`,
 214 and then the task would be to copy from the target `'S'` only back to
 215 the closest `'#'`.  This would allow our task to simulate delimited
 216 continuations with embedded `prompt`s (also called `reset`s).
 217
 218 The reason the task is well-suited to the list zipper is in part
 219 because the List monad has an intimate connection with continuations.
 220 We'll explore this next.
 221
 222