topics/week12_list_and_tree_zippers.mdwn

   1 <!-- λ ◊ ≠ ∃ Λ ∀ ≡ α β γ ρ ω φ ψ Ω ○ μ η δ ζ ξ ⋆ ★ • ∙ ● ⚫ 𝟎 𝟏 𝟐 𝟘 𝟙 𝟚 𝟬 𝟭 𝟮 ⇧ (U+2e17) ¢ -->
   2
   3 [[!toc]]
   4
   5 ##List Zippers##
   6
   7 Say you've got some moderately-complex function for searching through a list, for example:
   8
   9         let find_nth (test : 'a -> bool) (n : int) (lst : 'a list) : (int * 'a) ->
  10             let rec helper (position : int) n lst =
  11                 match lst with
  12                 | [] -> failwith "not found"
  13                 | x :: xs when test x -> (if n = 1
  14                     then (position, x)
  15                     else helper (position + 1) (n - 1) xs
  16                 )
  17                 | x :: xs -> helper (position + 1) n xs
  18             in helper 0 n lst;;
  19
  20 This searches for the `n`th element of a list that satisfies the predicate `test`, and returns a pair containing the position of that element, and the element itself. (We follow the dominant convention of counting list positions from the left starting at 0.) Good. But now what if you wanted to retrieve a different kind of information, such as the `n`th element matching `test`, together with its preceding and succeeding elements? In a real situation, you'd want to develop some good strategy for reporting when the target element doesn't have a predecessor and successor; but we'll just simplify here and report them as having some default value:
  21
  22         let find_nth' (test : 'a -> bool) (n : int) (lst : 'a list) (default : 'a) : ('a * 'a * 'a) ->
  23             let rec helper (predecessor : 'a) n lst =
  24                 match lst with
  25                 | [] -> failwith "not found"
  26                 | x :: xs when test x -> (if n = 1
  27                     then (predecessor, x, match xs with [] -> default | y::ys -> y)
  28                     else helper x (n - 1) xs
  29                 )
  30                 | x :: xs -> helper x n xs
  31             in helper default n lst;;
  32
  33 This duplicates a lot of the structure of `find_nth`; it just has enough different code to retrieve different information when the matching element is found. But now what if you wanted to retrieve yet a different kind of information...?
  34
  35 Ideally, there should be some way to factor out the code to find the target element---the `n`th element of the list satisfying the predicate `test`---from the code that retrieves the information you want once the target is found. We might build upon the initial `find_nth` function, since that returns the *position* of the matching element. We could hand that result off to some other function that's designed to retrieve information of a specific sort surrounding that position. But suppose our list has millions of elements, and the target element is at position 600512. The search function will already have traversed 600512 elements of the list looking for the target, then the retrieval function would have to *start again from the beginning* and traverse those same 600512 elements again. It could go a bit faster, since it doesn't have to check each element against `test` as it traverses. It already knows how far it has to travel. But still, this should seem a bit wasteful.
  36
  37 Here's an idea. What if we had some way of representing a list as "broken" at a specific point. For example, if our base list is:
  38
  39         [10; 20; 30; 40; 50; 60; 70; 80; 90]
  40
  41 we might imagine the list "broken" at position 3 like this:
  42
  43                     40;
  44                 30;     50;
  45             20;             60;
  46         [10;                    70;
  47                                     80;
  48                                         90]
  49
  50 Then if we move one step forward in the list, it would be "broken" at position 4:
  51
  52                         50;
  53                     40;     60;
  54                 30;             70;
  55             20;                     80;
  56         [10;                            90]
  57
  58 If we had some convenient representation of these "broken" lists, then our search function could hand *that* off to the retrieval function, and the retrieval function could start right at the position where the list was broken, without having to start at the beginning and traverse many elements to get there. The retrieval function would also be able to inspect elements both forwards and backwards from the position where the list was "broken".
  59
  60 The kind of data structure we're looking for here is called a **list zipper**. To represent our first broken list, we'd use two lists: (1) containing the elements in the left branch, preceding the target or focused element, *in the order reverse to their appearance in the base list*. (2) containing the target or focus element and the rest of the list, in normal order. So:
  61
  62                     40;
  63                 30;     50;
  64             20;             60;
  65         [10;                    70;
  66                                     80;
  67                                         90]
  68
  69 would be represented as `([30; 20; 10], [40; 50; 60; 70; 80; 90])`. To move forward in the base list, we pop the head element `40` off of the head element of the second list in the zipper, and push it onto the first list, getting `([40; 30; 20; 10], [50; 60; 70; 80; 90])`. To move backwards again, we pop off of the first list, and push it onto the second. To reconstruct the base list, we just "moved backwards" until the first list is empty. (This is supposed to evoke the image of zipping up a zipper; hence the data structure's name.)
  70
  71 Last time we gave the class, we had some discussion of what's the right way to apply the "zipper" metaphor. I suggest it's best to think of the tab of the zipper being here:
  72
  73                  t
  74                   a
  75                    b
  76                     40;
  77                 30;     50;
  78             20;             60;
  79         [10;                    70;
  80                                     80;
  81                                         90]
  82
  83 And imagine that you're just seeing the left half of a real-zipper, rotated 60 degrees counter-clockwise. When the list is all "zipped up", we've "move backwards" to the state where the first element is targeted or in focus:
  84
  85     ([], [10; 20; 30; 40; 50; 60; 70; 80; 90])
  86
  87 However you understand the "zipper" metaphor, this is a very handy data structure, and it will become even more handy when we translate it over to more complicated base structures, like trees. To help get a good conceptual grip on how to do that, it's useful to introduce a kind of symbolism for talking about zippers. This is just a metalanguage notation, for us theorists.
  88
  89     [10; 20; 30; *; 50; 60; 70; 80; 90], * filled by 40
  90
  91 would represent a list zipper where the break is at position 3, and the element occupying that position is `40`. For a list zipper, this could be implemented using the pairs-of-lists structure described above.
  92
  93 Alternatively, we could present it in a form more like we used in the seminar for tree zippers:
  94
  95     in_focus = 40, context = (before = [30; 20; 10], after = [50; 60; 70; 80; 90])
  96
  97 In order to facilitate the discussion of tree zippers,
  98 let's consolidate a bit with a concrete implementation.
  99
 100 <pre>
 101 type int_list_zipper = int * (int list) * (int list)
 102 let zip_open (z:int_list_zipper):int_list_zipper = match z with
 103     focus, ls, r::rs -> r, focus::ls, rs
 104   | _ -> z
 105 let zip_closed (z:int_list_zipper):int_list_zipper = match z with
 106     focus, l::ls, rs -> l, ls, focus::rs
 107 </pre>
 108
 109 Here, an int list zipper is an int list with one element in focus.
 110 The context of that element is divided into two subparts: the left
 111 context, which gives the elements adjacent to the focussed element
 112 first (so looks reversed relative to the original list); and the right
 113 context, which is just the remainder of the list to the right of the
 114 focussed element.
 115
 116 Then we have the following behavior:
 117
 118 <pre>
 119 # let z1:int_list_zipper = 1, [], [2;3;4];;
 120 val z1 : int_list_zipper = (1, [], [2; 3; 4])
 121 # let z2 = zip_open z1;;
 122 val z2 : int_list_zipper = (2, [1], [3; 4])
 123 # let z3 = zip_open z2;;
 124 val z3 : int_list_zipper = (3, [2; 1], [4])
 125 # let z4 = zip_closed (zip_closed z3);;
 126 val z4 : int_list_zipper = (1, [], [2; 3; 4])
 127 # z4 = z1;;
 128 # - : bool = true
 129 </pre>
 130
 131
 132
 133 ##Tree Zippers##
 134
 135 Now how could we translate a zipper-like structure over to trees? What we're aiming for is a way to keep track of where we are in a tree, in the same way that the "broken" lists let us keep track of where we are in the base list.
 136
 137 Thus tree zippers are analogous to list zippers, but with one
 138 additional dimension to deal with: in addition to needing to shift
 139 focus to the left or to the right, we want to be able to shift the
 140 focus up or down.
 141
 142 In order to emphasize the similarity with list zippers, we'll use
 143 trees that are conceived of as lists of lists:
 144
 145     type tree = Leaf of int | Branch of tree list
 146
 147 On this conception, a tree is nothing more than a list of subtrees.
 148 For instance, we might have
 149
 150     let t1 = Branch [Leaf 1; Branch [Branch [Leaf 2; Leaf 3]; Leaf 4]];;
 151
 152     _|__
 153     |  |
 154     1  |
 155       _|__
 156       |  |
 157       |  4
 158      _|__
 159      |  |
 160      2  3
 161
 162 For simplicity, we'll work with trees that don't have labels on their
 163 internal nodes.  Note that there can be any number of siblings, though
 164 we'll work with binary trees here to prevent clutter.
 165
 166     _*__
 167     |  |
 168     1  |
 169       _|__
 170       |  |
 171       |  4
 172      _|__
 173      |  |
 174      2  3
 175
 176 How should we represent a tree with the starred subtree in focus?
 177 Well, that's easy: the focussed element is the entire tree, and the
 178 context is null.  We'll represent this as the ordered pair (t1, Root).
 179
 180     _|__
 181     |  |
 182     1  |
 183       _*__
 184       |  |
 185       |  4
 186      _|__
 187      |  |
 188      2  3
 189
 190 How should we represent a tree with this other starred subtree in
 191 focus?  Well, viewing this tree as a list of subtrees, we've merely
 192 put the second element of the list in focus.  We can almost just use
 193 the list zipper technique from the previous section:
 194
 195     Branch (Branch (Leaf 2, Leaf 3), Leaf 4), ([Leaf 1], [])
 196
 197 This is just a list zipper where the list elements are trees instead
 198 of ints.
 199
 200 But this won't be quite enough if we go one more level down.
 201
 202     _|__
 203     |  |
 204     1  |
 205       _|__
 206       |  |
 207       |  4
 208      _*__
 209      |  |
 210      2  3
 211
 212 The focussed element is the subtree Branch (Leaf 2, Leaf 3).
 213 And we know how to represent the part of the context that involves the
 214 siblings of the focussed tree:
 215
 216     Branch (Leaf 2, Leaf 3), ([], [Leaf 4])
 217
 218 We still need to add the rest of the context.  But we just computed that
 219 context a minute ago.  It was ([Leaf 1], []).  If we add it here, we get:
 220
 221     Branch (Leaf 2, Leaf 3), ([], [Leaf 4], ([Leaf 1], [])
 222
 223 Here's the type suggested by this idea:
 224
 225     type context = Root | Context of (tree list) * (tree list) * context
 226     type zipper = tree * context
 227
 228 We can gloss the triple `(tree list) * (tree list) * context` as
 229 `(left siblings) * (right siblings) * (context of parent)`.
 230
 231 Here, then, is the full tree zipper we've been looking for:
 232
 233     (Branch [Leaf 2; Leaf 3],
 234      Context ([], [Leaf 4], Context ([Leaf 1], [], Root)))
 235
 236 Just as with the simple list zipper, note that elements that are near
 237 the focussed element in the tree are near the focussed element in the
 238 zipper representation.  This is what makes it easy to shift the
 239 focus to nearby elements in the tree.
 240
 241 It should be clear that moving left and right in the tree zipper is
 242 just like moving left and right in a list zipper.
 243
 244 Moving down requires looking inside the tree in focus, and grabbing
 245 hold of one of its subtrees.  Since that subtree is the new focus, its
 246 context will be the list zipper consisting of its siblings (also
 247 recovered from the original focus).
 248
 249     let downleft (z:zipper):zipper = match z with
 250         (Branch (l::rest)), context -> l, Context ([], rest, context)
 251
 252 Moving up involves gathering up the left and right siblings and
 253 re-building the original subtree.  It's easiest to do this when the
 254 sibling zipper is fully closed, i.e., when the list of left siblings
 255 is empty:
 256
 257     let rec up (z:zipper):zipper = match z with
 258         focus, Context ([], rs, rest) -> Branch (focus::rs), rest
 259       | focus, Context (l::ls, _, _) -> up (left z)
 260
 261 The second match says that if the list of left siblings isn't empty,
 262 we just shift focus left and try again.
 263
 264 This tree zipper works for trees with arbitrary numbers of siblings
 265 per subtree.  If one were designing a tree zipper for a more
 266 restricted kind of tree, however, such as a binary tree, one would
 267 probably not represent siblings with a list zipper, but with something
 268 more special-purpose and economical.
 269
 270 With these functions, we can refocus on any part of the tree.
 271 Let's abbreviate a tree zipper like this:
 272
 273     [2;3], ([] [4] ([1], [], Root))
 274
 275     ≡ (Branch [Leaf 2; Leaf 3],
 276        Context ([], [Leaf 4], Context ([Leaf 1], [], Root)))
 277
 278 Then we can take a tour of the original tree like this:
 279
 280 <pre>
 281 _|__
 282 |  |
 283 1  |
 284   _|__
 285   |  |
 286   |  4
 287  _|__
 288  |  |
 289  2  3
 290
 291     [1;[[2;3];4]],Root           =                                                              [1;[[2;3];4]],Root
 292   downleft                                                                                               up
 293 1, ([],[[2;3];4],Root) right [[2;3];4],([1],[],Root)                                            [[2;3];4],([1],[],Root)
 294                            downleft                                                                      up
 295                         [2;3],([],[4],([1],[],Root))         [2;3],([],[4],([1],[],Root)) right 4,([2;3],[],([1],[],Root))
 296                       downleft                                           up
 297                     2, ([],[3],([],[4],([1],[],Root))) right 3, ([2],[],([],[4],([1],[],Root)))
 298 </pre>
 299
 300 Here's more on zippers:
 301
 302 *       [[!wikipedia Zipper (data structure)]]
 303 *       [Haskell wikibook on zippers](http://en.wikibooks.org/wiki/Haskell/Zippers)
 304 *       Huet, Gerard. ["Functional Pearl: The Zipper"](http://www.st.cs.uni-sb.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf) Journal of Functional Programming 7 (5): 549-554, September 1997.
 305 *       As always, [Oleg](http://okmij.org/ftp/continuations/Continuations.html#zipper) takes this a few steps deeper.