topics/week12_list_and_tree_zippers.mdwn

   1 [[!toc]]
   2
   3 ##List Zippers##
   4
   5 Say you've got some moderately-complex function for searching through a list, for example:
   6
   7         let find_nth (test : 'a -> bool) (n : int) (lst : 'a list) : (int * 'a) ->
   8             let rec helper (position : int) n lst =
   9                 match lst with
  10                 | [] -> failwith "not found"
  11                 | x :: xs when test x -> (if n = 1
  12                     then (position, x)
  13                     else helper (position + 1) (n - 1) xs
  14                 )
  15                 | x :: xs -> helper (position + 1) n xs
  16             in helper 0 n lst;;
  17
  18 This searches for the `n`th element of a list that satisfies the predicate `test`, and returns a pair containing the position of that element, and the element itself. (We follow the dominant convention of counting list positions from the left starting at 0.) Good. But now what if you wanted to retrieve a different kind of information, such as the `n`th element matching `test`, together with its preceding and succeeding elements? In a real situation, you'd want to develop some good strategy for reporting when the target element doesn't have a predecessor and successor; but we'll just simplify here and report them as having some default value:
  19
  20         let find_nth' (test : 'a -> bool) (n : int) (lst : 'a list) (default : 'a) : ('a * 'a * 'a) ->
  21             let rec helper (predecessor : 'a) n lst =
  22                 match lst with
  23                 | [] -> failwith "not found"
  24                 | x :: xs when test x -> (if n = 1
  25                     then (predecessor, x, match xs with [] -> default | y::ys -> y)
  26                     else helper x (n - 1) xs
  27                 )
  28                 | x :: xs -> helper x n xs
  29             in helper default n lst;;
  30
  31 This duplicates a lot of the structure of `find_nth`; it just has enough different code to retrieve different information when the matching element is found. But now what if you wanted to retrieve yet a different kind of information...?
  32
  33 Ideally, there should be some way to factor out the code to find the target element---the `n`th element of the list satisfying the predicate `test`---from the code that retrieves the information you want once the target is found. We might build upon the initial `find_nth` function, since that returns the *position* of the matching element. We could hand that result off to some other function that's designed to retrieve information of a specific sort surrounding that position. But suppose our list has millions of elements, and the target element is at position 600512. The search function will already have traversed 600512 elements of the list looking for the target, then the retrieval function would have to *start again from the beginning* and traverse those same 600512 elements again. It could go a bit faster, since it doesn't have to check each element against `test` as it traverses. It already knows how far it has to travel. But still, this should seem a bit wasteful.
  34
  35 Here's an idea. What if we had some way of representing a list as "broken" at a specific point. For example, if our base list is:
  36
  37         [10; 20; 30; 40; 50; 60; 70; 80; 90]
  38
  39 we might imagine the list "broken" at position 3 like this:
  40
  41                     40;
  42                 30;     50;
  43             20;             60;
  44         [10;                    70;
  45                                     80;
  46                                         90]
  47
  48 Then if we move one step forward in the list, it would be "broken" at position 4:
  49
  50                         50;
  51                     40;     60;
  52                 30;             70;
  53             20;                     80;
  54         [10;                            90]
  55
  56 If we had some convenient representation of these "broken" lists, then our search function could hand *that* off to the retrieval function, and the retrieval function could start right at the position where the list was broken, without having to start at the beginning and traverse many elements to get there. The retrieval function would also be able to inspect elements both forwards and backwards from the position where the list was "broken".
  57
  58 The kind of data structure we're looking for here is called a **list zipper**. To represent our first broken list, we'd use two lists: (1) containing the elements in the left branch, preceding the target or focused element, *in the order reverse to their appearance in the base list*. (2) containing the target or focus element and the rest of the list, in normal order. So:
  59
  60                     40;
  61                 30;     50;
  62             20;             60;
  63         [10;                    70;
  64                                     80;
  65                                         90]
  66
  67 would be represented as `([30; 20; 10], [40; 50; 60; 70; 80; 90])`. To move forward in the base list, we pop the head element `40` off of the head element of the second list in the zipper, and push it onto the first list, getting `([40; 30; 20; 10], [50; 60; 70; 80; 90])`. To move backwards again, we pop off of the first list, and push it onto the second. To reconstruct the base list, we just "moved backwards" until the first list is empty. (This is supposed to evoke the image of zipping up a zipper; hence the data structure's name.)
  68
  69 Last time we gave the class, we had some discussion of what's the right way to apply the "zipper" metaphor. I suggest it's best to think of the tab of the zipper being here:
  70
  71                  t
  72                   a
  73                    b
  74                     40;
  75                 30;     50;
  76             20;             60;
  77         [10;                    70;
  78                                     80;
  79                                         90]
  80
  81 And imagine that you're just seeing the left half of a real-zipper, rotated 60 degrees counter-clockwise. When the list is all "zipped up", we've "move backwards" to the state where the first element is targeted or in focus:
  82
  83     ([], [10; 20; 30; 40; 50; 60; 70; 80; 90])
  84
  85 However you understand the "zipper" metaphor, this is a very handy data structure, and it will become even more handy when we translate it over to more complicated base structures, like trees. To help get a good conceptual grip on how to do that, it's useful to introduce a kind of symbolism for talking about zippers. This is just a metalanguage notation, for us theorists.
  86
  87     [10; 20; 30; *; 50; 60; 70; 80; 90], * filled by 40
  88
  89 would represent a list zipper where the break is at position 3, and the element occupying that position is `40`. For a list zipper, this could be implemented using the pairs-of-lists structure described above.
  90
  91 Alternatively, we could present it in a form more like we used in the seminar for tree zippers:
  92
  93     in_focus = 40, context = (before = [30; 20; 10], after = [50; 60; 70; 80; 90])
  94
  95 In order to facilitate the discussion of tree zippers,
  96 let's consolidate a bit with a concrete implementation.
  97
  98 <pre>
  99 type int_list_zipper = int * (int list) * (int list)
 100 let zip_open (z:int_list_zipper):int_list_zipper = match z with
 101     focus, ls, r::rs -> r, focus::ls, rs
 102   | _ -> z
 103 let zip_closed (z:int_list_zipper):int_list_zipper = match z with
 104     focus, l::ls, rs -> l, ls, focus::rs
 105 </pre>
 106
 107 Here, an int list zipper is an int list with one element in focus.
 108 The context of that element is divided into two subparts: the left
 109 context, which gives the elements adjacent to the focussed element
 110 first (so looks reversed relative to the original list); and the right
 111 context, which is just the remainder of the list to the right of the
 112 focussed element.
 113
 114 Then we have the following behavior:
 115
 116 <pre>
 117 # let z1:int_list_zipper = 1, [], [2;3;4];;
 118 val z1 : int_list_zipper = (1, [], [2; 3; 4])
 119 # let z2 = zip_open z1;;
 120 val z2 : int_list_zipper = (2, [1], [3; 4])
 121 # let z3 = zip_open z2;;
 122 val z3 : int_list_zipper = (3, [2; 1], [4])
 123 # let z4 = zip_closed (zip_closed z3);;
 124 val z4 : int_list_zipper = (1, [], [2; 3; 4])
 125 # z4 = z1;;
 126 # - : bool = true
 127 </pre>
 128
 129
 130
 131 ##Tree Zippers##
 132
 133 Now how could we translate a zipper-like structure over to trees? What we're aiming for is a way to keep track of where we are in a tree, in the same way that the "broken" lists let us keep track of where we are in the base list.
 134
 135 Thus tree zippers are analogous to list zippers, but with one
 136 additional dimension to deal with: in addition to needing to shift
 137 focus to the left or to the right, we want to be able to shift the
 138 focus up or down.
 139
 140 In order to emphasize the similarity with list zippers, we'll use
 141 trees that are conceived of as lists of lists:
 142
 143     type tree = Leaf of int | Branch of tree list
 144
 145 On this conception, a tree is nothing more than a list of subtrees.
 146 For instance, we might have
 147
 148     let t1 = Branch [Leaf 1; Branch [Branch [Leaf 2; Leaf 3]; Leaf 4]];;
 149
 150     _|__
 151     |  |
 152     1  |
 153       _|__
 154       |  |
 155       |  4
 156      _|__
 157      |  |
 158      2  3
 159
 160 For simplicity, we'll work with trees that don't have labels on their
 161 internal nodes.  Note that there can be any number of siblings, though
 162 we'll work with binary trees here to prevent clutter.
 163
 164     _*__
 165     |  |
 166     1  |
 167       _|__
 168       |  |
 169       |  4
 170      _|__
 171      |  |
 172      2  3
 173
 174 How should we represent a tree with the starred subtree in focus?
 175 Well, that's easy: the focussed element is the entire tree, and the
 176 context is null.  We'll represent this as the ordered pair (t1, Root).
 177
 178     _|__
 179     |  |
 180     1  |
 181       _*__
 182       |  |
 183       |  4
 184      _|__
 185      |  |
 186      2  3
 187
 188 How should we represent a tree with this other starred subtree in
 189 focus?  Well, viewing this tree as a list of subtrees, we've merely
 190 put the second element of the list in focus.  We can almost just use
 191 the list zipper technique from the previous section:
 192
 193     Branch (Branch (Leaf 2, Leaf 3), Leaf 4), ([Leaf 1], [])
 194
 195 This is just a list zipper where the list elements are trees instead
 196 of ints.
 197
 198 But this won't be quite enough if we go one more level down.
 199
 200     _|__
 201     |  |
 202     1  |
 203       _|__
 204       |  |
 205       |  4
 206      _*__
 207      |  |
 208      2  3
 209
 210 The focussed element is the subtree Branch (Leaf 2, Leaf 3).
 211 And we know how to represent the part of the context that involves the
 212 siblings of the focussed tree:
 213
 214     Branch (Leaf 2, Leaf 3), ([], [Leaf 4])
 215
 216 We still need to add the rest of the context.  But just computed that
 217 context a minute ago.  It was ([Leaf 1], []).  If we add it here, we get:
 218
 219     Branch (Leaf 2, Leaf 3), ([], [Leaf 4], ([Leaf 1], [])
 220
 221 Here's the type suggested by this idea:
 222
 223     type context = Root | Context of (tree list) * (tree list) * context
 224     type zipper = tree * context
 225
 226 We can gloss `Context of (tree list) * (tree list) * context` as
 227 `Context of (left siblings) * (right siblings) * (context of parent)`.
 228
 229 Here, then, is the full tree zipper we've been looking for:
 230
 231   (Branch [Leaf 2; Leaf 3],
 232    Context ([], [Leaf 4], Context ([Leaf 1], [], Root)))
 233
 234 Just as with the simple list zipper, note that elements that are near
 235 the focussed element in the tree are near the focussed element in the
 236 zipper representation.  This is what makes it easy to shift the
 237 focus to nearby elements in the tree.
 238
 239 It should be clear that moving left and right in the tree zipper is
 240 just like moving left and right in a list zipper.
 241
 242 Moving down requires looking inside the tree in focus, and grabbing
 243 hold of one of its subtrees.  Since that subtree is the new focus, its
 244 context will be the list zipper consisting of its siblings (also
 245 recovered from the original focus).
 246
 247     let downleft (z:zipper):zipper = match z with
 248         (Branch (l::rest)), context -> l, Context ([], rest, context)
 249
 250 Moving up involves gathering up the left and right siblings and
 251 re-building the original subtree.  It's easiest to do this when the
 252 sibling zipper is fully closed, i.e., when the list of left siblings
 253 is empty:
 254
 255     let rec up (z:zipper):zipper = match z with
 256         focus, Context ([], rs, rest) -> Branch (focus::rs), rest
 257       | focus, Context (l::ls, _, _) -> up (left z)
 258
 259 The second match says that if the list of left siblings isn't empty,
 260 we just shift focus left and try again.
 261
 262 This tree zipper works for trees with arbitrary numbers of siblings
 263 per subtree.  If one were designing a tree zipper for a more
 264 restricted kind of tree, however, such as a binary tree, one would
 265 probably not represent siblings with a list zipper, but with something
 266 more special-purpose and economical.
 267
 268 With these functions, we can refocus on any part of the tree.
 269 Here's a complete tour:
 270
 271 <pre>
 272 # let z1 = (t1, Root);;
 273 val z1 : zipper =
 274   (Branch [Leaf 1; Branch [Branch [Leaf 2; Leaf 3]; Leaf 4]], Root)
 275 # let z2 = downleft z1;;
 276 val z2 : zipper =
 277   (Leaf 1, Context ([], [Branch [Branch [Leaf 2; Leaf 3]; Leaf 4]], Root))
 278 # let z3 = right z2;;
 279 val z3 : zipper =
 280   (Branch [Branch [Leaf 2; Leaf 3]; Leaf 4], Context ([Leaf 1], [], Root))
 281 # let z4 = downleft z3;;
 282 val z4 : zipper =
 283   (Branch [Leaf 2; Leaf 3],
 284    Context ([], [Leaf 4], Context ([Leaf 1], [], Root)))
 285 # let z5 = downleft z4;;
 286 val z5 : zipper =
 287   (Leaf 2,
 288    Context ([], [Leaf 3],
 289     Context ([], [Leaf 4], Context ([Leaf 1], [], Root))))
 290 # let z6 = right z5;;
 291 val z6 : zipper =
 292   (Leaf 3,
 293    Context ([Leaf 2], [],
 294     Context ([], [Leaf 4], Context ([Leaf 1], [], Root))))
 295 # let z7 = up z6;;
 296 val z7 : zipper =
 297   (Branch [Leaf 2; Leaf 3],
 298    Context ([], [Leaf 4], Context ([Leaf 1], [], Root)))
 299 # let z8 = right z7;;
 300 val z8 : zipper =
 301   (Leaf 4,
 302    Context ([Branch [Leaf 2; Leaf 3]], [], Context ([Leaf 1], [], Root)))
 303 # let z9 = up z8;;
 304 val z9 : zipper =
 305   (Branch [Branch [Leaf 2; Leaf 3]; Leaf 4], Context ([Leaf 1], [], Root))
 306 # let z10 = up z9;;
 307 val z10 : zipper =
 308   (Branch [Leaf 1; Branch [Branch [Leaf 2; Leaf 3]; Leaf 4]], Root)
 309 # z10 = z1;;
 310 - : bool = true
 311 # z10 == z1;;
 312 - : bool = false
 313 </pre>
 314
 315 Here's more on zippers:
 316
 317 *       [[!wikipedia Zipper (data structure)]]
 318 *       [Haskell wikibook on zippers](http://en.wikibooks.org/wiki/Haskell/Zippers)
 319 *       Huet, Gerard. ["Functional Pearl: The Zipper"](http://www.st.cs.uni-sb.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf) Journal of Functional Programming 7 (5): 549-554, September 1997.
 320 *       As always, [Oleg](http://okmij.org/ftp/continuations/Continuations.html#zipper) takes this a few steps deeper.