manipulating_trees_with_monads.mdwn

   1 [[!toc]]
   2
   3 Manipulating trees with monads
   4 ------------------------------
   5
   6 This topic develops an idea based on a detailed suggestion of Ken
   7 Shan's.  We'll build a series of functions that operate on trees,
   8 doing various things, including replacing leaves, counting nodes, and
   9 converting a tree to a list of leaves.  The end result will be an
  10 application for continuations.
  11
  12 From an engineering standpoint, we'll build a tree transformer that
  13 deals in monads.  We can modify the behavior of the system by swapping
  14 one monad for another.  We've already seen how adding a monad can add
  15 a layer of funtionality without disturbing the underlying system, for
  16 instance, in the way that the Reader monad allowed us to add a layer
  17 of intensionality to an extensional grammar, but we have not yet seen
  18 the utility of replacing one monad with other.
  19
  20 First, we'll be needing a lot of trees for the remainder of the
  21 course.  Here again is a type constructor for leaf-labeled, binary trees:
  22
  23     type 'a tree = Leaf of 'a | Node of ('a tree * 'a tree)
  24
  25 [How would you adjust the type constructor to allow for labels on the
  26 internal nodes?]
  27
  28 We'll be using trees where the nodes are integers, e.g.,
  29
  30
  31         let t1 = Node (Node (Leaf 2, Leaf 3),
  32                        Node (Leaf 5, Node (Leaf 7,
  33                                               Leaf 11)))
  34             .
  35          ___|___
  36          |     |
  37          .     .
  38         _|_   _|__
  39         |  |  |  |
  40         2  3  5  .
  41                 _|__
  42                 |  |
  43                 7  11
  44
  45 Our first task will be to replace each leaf with its double:
  46
  47         let rec tree_map (leaf_modifier : 'a -> 'b) (t : 'a tree) : 'b tree =
  48           match t with
  49             | Leaf i -> Leaf (leaf_modifier i)
  50             | Node (l, r) -> Node (tree_map leaf_modifier l,
  51                                    tree_map leaf_modifier r);;
  52
  53 `tree_map` takes a function that transforms old leaves into new leaves,
  54 and maps that function over all the leaves in the tree, leaving the
  55 structure of the tree unchanged.  For instance:
  56
  57         let double i = i + i;;
  58         tree_map double t1;;
  59         - : int tree =
  60         Node (Node (Leaf 4, Leaf 6), Node (Leaf 10, Node (Leaf 14, Leaf 22)))
  61
  62             .
  63          ___|____
  64          |      |
  65          .      .
  66         _|__  __|__
  67         |  |  |   |
  68         4  6  10  .
  69                 __|___
  70                 |    |
  71                 14   22
  72
  73 We could have built the doubling operation right into the `tree_map`
  74 code.  However, because we've left what to do to each leaf as a parameter, we can
  75 decide to do something else to the leaves without needing to rewrite
  76 `tree_map`.  For instance, we can easily square each leaf instead by
  77 supplying the appropriate `int -> int` operation in place of `double`:
  78
  79         let square i = i * i;;
  80         tree_map square t1;;
  81         - : int tree =ppp
  82         Node (Node (Leaf 4, Leaf 9), Node (Leaf 25, Node (Leaf 49, Leaf 121)))
  83
  84 Note that what `tree_map` does is take some unchanging contextual
  85 information---what to do to each leaf---and supplies that information
  86 to each subpart of the computation.  In other words, `tree_map` has the
  87 behavior of a Reader monad.  Let's make that explicit.
  88
  89 In general, we're on a journey of making our `tree_map` function more and
  90 more flexible.  So the next step---combining the tree transformer with
  91 a Reader monad---is to have the `tree_map` function return a (monadized)
  92 tree that is ready to accept any `int -> int` function and produce the
  93 updated tree.
  94
  95 \tree (. (. (f 2) (f 3)) (. (f 5) (. (f 7) (f 11))))
  96
  97         \f      .
  98            _____|____
  99            |        |
 100            .        .
 101          __|___   __|___
 102          |    |   |    |
 103         f 2  f 3  f 5  .
 104                      __|___
 105                      |    |
 106                     f 7  f 11
 107
 108 That is, we want to transform the ordinary tree `t1` (of type `int
 109 tree`) into a reader object of type `(int -> int) -> int tree`: something
 110 that, when you apply it to an `int -> int` function `f` returns an `int
 111 tree` in which each leaf `i` has been replaced with `f i`.
 112
 113 With previous readers, we always knew which kind of environment to
 114 expect: either an assignment function (the original calculator
 115 simulation), a world (the intensionality monad), an integer (the
 116 Jacobson-inspired link monad), etc.  In the present case, we expect that our "environment" will be some function of type `int -> int`. "Looking up" some `int` in the environment will return us the `int` that comes out the other side of that function.
 117
 118         type 'a reader = (int -> int) -> 'a;;  (* mnemonic: e for environment *)
 119         let reader_unit (a : 'a) : 'a reader = fun _ -> a;;
 120         let reader_bind (u: 'a reader) (f : 'a -> 'b reader) : 'b reader = fun e -> f (u e) e;;
 121
 122 It would be a simple matter to turn an *integer* into an `int reader`:
 123
 124         let int_readerize : int -> int reader = fun (a : int) -> fun (modifier : int -> int) -> modifier a;;
 125         int_readerize 2 (fun i -> i + i);;
 126         - : int = 4
 127
 128 But how do we do the analagous transformation when our `int`s are scattered over the leaves of a tree? How do we turn an `int tree` into a reader?
 129 A tree is not the kind of thing that we can apply a
 130 function of type `int -> int` to.
 131
 132 But we can do this:
 133
 134         let rec tree_monadize (f : 'a -> 'b reader) (t : 'a tree) : 'b tree reader =
 135             match t with
 136             | Leaf a -> reader_bind (f a) (fun b -> reader_unit (Leaf b))
 137             | Node (l, r) -> reader_bind (tree_monadize f l) (fun l' ->
 138                                reader_bind (tree_monadize f r) (fun r' ->
 139                                  reader_unit (Node (l', r'))));;
 140
 141 This function says: give me a function `f` that knows how to turn
 142 something of type `'a` into an `'b reader`---this is a function of the same type that you could bind an `'a reader` to---and I'll show you how to
 143 turn an `'a tree` into an `'b tree reader`.  That is, if you show me how to do this:
 144
 145                       ------------
 146           1     --->  |    1     |
 147                       ------------
 148
 149 then I'll give you back the ability to do this:
 150
 151                       ____________
 152           .           |    .     |
 153         __|___  --->  |  __|___  |
 154         |    |        |  |    |  |
 155         1    2        |  1    2  |
 156                       ------------
 157
 158 And how will that boxed tree behave? Whatever actions you perform on it will be transmitted down to corresponding operations on its leaves. For instance, our `int reader` expects an `int -> int` environment. If supplying environment `e` to our `int reader` doubles the contained `int`:
 159
 160                       ------------
 161           1     --->  |    1     |  applied to e  ~~>  2
 162                       ------------
 163
 164 Then we can expect that supplying it to our `int tree reader` will double all the leaves:
 165
 166                       ____________
 167           .           |    .     |                      .
 168         __|___  --->  |  __|___  | applied to e  ~~>  __|___
 169         |    |        |  |    |  |                    |    |
 170         1    2        |  1    2  |                    2    4
 171                       ------------
 172
 173 In more fanciful terms, the `tree_monadize` function builds plumbing that connects all of the leaves of a tree into one connected monadic network; it threads the
 174 `'b reader` monad through the original tree's leaves.
 175
 176         # tree_monadize int_readerize t1 double;;
 177         - : int tree =
 178         Node (Node (Leaf 4, Leaf 6), Node (Leaf 10, Node (Leaf 14, Leaf 22)))
 179
 180 Here, our environment is the doubling function (`fun i -> i + i`).  If
 181 we apply the very same `int tree reader` (namely, `tree_monadize
 182 int_readerize t1`) to a different `int -> int` function---say, the
 183 squaring function, `fun i -> i * i`---we get an entirely different
 184 result:
 185
 186         # tree_monadize int_readerize t1 square;;
 187         - : int tree =
 188         Node (Node (Leaf 4, Leaf 9), Node (Leaf 25, Node (Leaf 49, Leaf 121)))
 189
 190 Now that we have a tree transformer that accepts a *reader* monad as a
 191 parameter, we can see what it would take to swap in a different monad.
 192
 193 For instance, we can use a State monad to count the number of leaves in
 194 the tree.
 195
 196         type 'a state = int -> 'a * int;;
 197         let state_unit a = fun s -> (a, s);;
 198         let state_bind u f = fun s -> let (a, s') = u s in f a s';;
 199
 200 Gratifyingly, we can use the `tree_monadize` function without any
 201 modification whatsoever, except for replacing the (parametric) type
 202 `'b reader` with `'b state`, and substituting in the appropriate unit and bind:
 203
 204         let rec tree_monadize (f : 'a -> 'b state) (t : 'a tree) : 'b tree state =
 205             match t with
 206             | Leaf a -> state_bind (f a) (fun b -> state_unit (Leaf b))
 207             | Node (l, r) -> state_bind (tree_monadize f l) (fun l' ->
 208                                state_bind (tree_monadize f r) (fun r' ->
 209                                  state_unit (Node (l', r'))));;
 210
 211 Then we can count the number of leaves in the tree:
 212
 213         # tree_monadize (fun a -> fun s -> (a, s+1)) t1 0;;
 214         - : int tree * int =
 215         (Node (Node (Leaf 2, Leaf 3), Node (Leaf 5, Node (Leaf 7, Leaf 11))), 5)
 216
 217             .
 218          ___|___
 219          |     |
 220          .     .
 221         _|__  _|__
 222         |  |  |  |
 223         2  3  5  .
 224                 _|__
 225                 |  |
 226                 7  11
 227
 228 Why does this work? Because the operation `fun a -> fun s -> (a, s+1)` takes an `int` and wraps it in an `int state` monadic box that increments the state. When we give that same operations to our `tree_monadize` function, it then wraps an `int tree` in a box, one that does the same state-incrementing for each of its leaves.
 229
 230 One more revealing example before getting down to business: replacing
 231 `state` everywhere in `tree_monadize` with `list` gives us
 232
 233         # tree_monadize (fun i -> [ [i; square i] ]) t1;;
 234         - : int list tree list =
 235         [Node
 236           (Node (Leaf [2; 4], Leaf [3; 9]),
 237            Node (Leaf [5; 25], Node (Leaf [7; 49], Leaf [11; 121])))]
 238
 239 Unlike the previous cases, instead of turning a tree into a function
 240 from some input to a result, this transformer replaces each `int` with
 241 a list of `int`'s. We might also have done this with a Reader monad, though then our environments would need to be of type `int -> int list`. Experiment with what happens if you supply the `tree_monadize` based on the List monad an operation like `fun -> [ i; [2*i; 3*i] ]`. Use small trees for your experiment.
 242
 243
 244 <!--
 245 FIXME: We don't make it clear why the fun has to be int -> int list list, instead of int -> int list
 246 -->
 247
 248
 249 Now for the main point.  What if we wanted to convert a tree to a list
 250 of leaves?
 251
 252         type ('a, 'r) continuation = ('a -> 'r) -> 'r;;
 253         let continuation_unit a = fun k -> k a;;
 254         let continuation_bind u f = fun k -> u (fun a -> f a k);;
 255
 256         let rec tree_monadize (f : 'a -> ('b, 'r) continuation) (t : 'a tree) : ('b tree, 'r) continuation =
 257             match t with
 258             | Leaf a -> continuation_bind (f a) (fun b -> continuation_unit (Leaf b))
 259             | Node (l, r) -> continuation_bind (tree_monadize f l) (fun l' ->
 260                                continuation_bind (tree_monadize f r) (fun r' ->
 261                                  continuation_unit (Node (l', r'))));;
 262
 263 We use the Continuation monad described above, and insert the
 264 `continuation` type in the appropriate place in the `tree_monadize` code. Then if we give the `tree_monadize` function an operation that converts `int`s into `'b`-wrapping Continuation monads, it will give us back a way to turn `int tree`s into corresponding `'b tree`-wrapping Continuation monads.
 265
 266 So for example, we compute:
 267
 268         # tree_monadize (fun a -> fun k -> a :: k a) t1 (fun t -> []);;
 269         - : int list = [2; 3; 5; 7; 11]
 270
 271 We have found a way of collapsing a tree into a list of its leaves. Can you trace how this is working? Think first about what the operation `fun a -> fun k -> a :: k a` does when you apply it to a plain `int`, and the continuation `fun _ -> []`. Then given what we've said about `tree_monadize`, what should we expect `tree_monadize (fun a -> fun k -> a :: k a` to do?
 272
 273 The Continuation monad is amazingly flexible; we can use it to
 274 simulate some of the computations performed above.  To see how, first
 275 note that an interestingly uninteresting thing happens if we use
 276 `continuation_unit` as our first argument to `tree_monadize`, and then
 277 apply the result to the identity function:
 278
 279         # tree_monadize continuation_unit t1 (fun t -> t);;
 280         - : int tree =
 281         Node (Node (Leaf 2, Leaf 3), Node (Leaf 5, Node (Leaf 7, Leaf 11)))
 282
 283 That is, nothing happens.  But we can begin to substitute more
 284 interesting functions for the first argument of `tree_monadize`:
 285
 286         (* Simulating the tree reader: distributing a operation over the leaves *)
 287         # tree_monadize (fun a -> fun k -> k (square a)) t1 (fun t -> t);;
 288         - : int tree =
 289         Node (Node (Leaf 4, Leaf 9), Node (Leaf 25, Node (Leaf 49, Leaf 121)))
 290
 291         (* Simulating the int list tree list *)
 292         # tree_monadize (fun a -> fun k -> k [a; square a]) t1 (fun t -> t);;
 293         - : int list tree =
 294         Node
 295          (Node (Leaf [2; 4], Leaf [3; 9]),
 296           Node (Leaf [5; 25], Node (Leaf [7; 49], Leaf [11; 121])))
 297
 298         (* Counting leaves *)
 299         # tree_monadize (fun a -> fun k -> 1 + k a) t1 (fun t -> 0);;
 300         - : int = 5
 301
 302 We could simulate the tree state example too, but it would require
 303 generalizing the type of the Continuation monad to
 304
 305         type ('a, 'b, 'c) continuation = ('a -> 'b) -> 'c;;
 306
 307 If you want to see how to parameterize the definition of the `tree_monadize` function, so that you don't have to keep rewriting it for each new monad, see [this code](/code/tree_monadize.ml).
 308
 309
 310 The Binary Tree monad
 311 ---------------------
 312
 313 Of course, by now you may have realized that we have discovered a new
 314 monad, the Binary Tree monad:
 315
 316         type 'a tree = Leaf of 'a | Node of ('a tree) * ('a tree);;
 317         let tree_unit (a: 'a) : 'a tree = Leaf a;;
 318         let rec tree_bind (u : 'a tree) (f : 'a -> 'b tree) : 'b tree =
 319             match u with
 320             | Leaf a -> f a
 321             | Node (l, r) -> Node (tree_bind l f, tree_bind r f);;
 322
 323 For once, let's check the Monad laws.  The left identity law is easy:
 324
 325     Left identity: bind (unit a) f = bind (Leaf a) f = f a
 326
 327 To check the other two laws, we need to make the following
 328 observation: it is easy to prove based on `tree_bind` by a simple
 329 induction on the structure of the first argument that the tree
 330 resulting from `bind u f` is a tree with the same strucure as `u`,
 331 except that each leaf `a` has been replaced with `f a`:
 332
 333 \tree (. (f a1) (. (. (. (f a2) (f a3)) (f a4)) (f a5)))
 334
 335                         .                         .
 336                       __|__                     __|__
 337                       |   |                     |   |
 338                       a1  .                   f a1  .
 339                          _|__                     __|__
 340                          |  |                     |   |
 341                          .  a5                    .  f a5
 342            bind         _|__       f   =        __|__
 343                         |  |                    |   |
 344                         .  a4                   .  f a4
 345                       __|__                   __|___
 346                       |   |                   |    |
 347                       a2  a3                f a2  f a3
 348
 349 Given this equivalence, the right identity law
 350
 351         Right identity: bind u unit = u
 352
 353 falls out once we realize that
 354
 355         bind (Leaf a) unit = unit a = Leaf a
 356
 357 As for the associative law,
 358
 359         Associativity: bind (bind u f) g = bind u (\a. bind (f a) g)
 360
 361 we'll give an example that will show how an inductive proof would
 362 proceed.  Let `f a = Node (Leaf a, Leaf a)`.  Then
 363
 364 \tree (. (. (. (. (a1) (a2)))))
 365 \tree (. (. (. (. (a1) (a1)) (. (a1) (a1)))))
 366
 367                                                    .
 368                                                ____|____
 369                   .               .            |       |
 370         bind    __|__   f  =    __|_    =      .       .
 371                 |   |           |   |        __|__   __|__
 372                 a1  a2        f a1 f a2      |   |   |   |
 373                                              a1  a1  a1  a1
 374
 375 Now when we bind this tree to `g`, we get
 376
 377                     .
 378                _____|______
 379                |          |
 380                .          .
 381              __|__      __|__
 382              |   |      |   |
 383            g a1 g a1  g a1 g a1
 384
 385 At this point, it should be easy to convince yourself that
 386 using the recipe on the right hand side of the associative law will
 387 built the exact same final tree.
 388
 389 So binary trees are a monad.
 390
 391 Haskell combines this monad with the Option monad to provide a monad
 392 called a
 393 [SearchTree](http://hackage.haskell.org/packages/archive/tree-monad/0.2.1/doc/html/src/Control-Monad-SearchTree.html#SearchTree)
 394 that is intended to represent non-deterministic computations as a tree.
 395
 396
 397 What's this have to do with tree\_mondadize?
 398 --------------------------------------------
 399
 400 So we've defined a Tree monad:
 401
 402         type 'a tree = Leaf of 'a | Node of ('a tree) * ('a tree);;
 403         let tree_unit (a: 'a) : 'a tree = Leaf a;;
 404         let rec tree_bind (u : 'a tree) (f : 'a -> 'b tree) : 'b tree =
 405             match u with
 406             | Leaf a -> f a
 407             | Node (l, r) -> Node (tree_bind l f, tree_bind r f);;
 408
 409 What's this have to do with the `tree_monadize` functions we defined earlier?
 410
 411         let rec tree_monadize (f : 'a -> 'b reader) (t : 'a tree) : 'b tree reader =
 412             match t with
 413             | Leaf a -> reader_bind (f a) (fun b -> reader_unit (Leaf b))
 414             | Node (l, r) -> reader_bind (tree_monadize f l) (fun l' ->
 415                                reader_bind (tree_monadize f r) (fun r' ->
 416                                  reader_unit (Node (l', r'))));;
 417
 418 ... and so on for different monads?
 419
 420 The answer is that each of those `tree_monadize` functions is adding a Tree monad *layer* to a pre-existing Reader (and so on) monad. So far, we've defined monads as single-layered things. Though in the Groenendijk, Stokhoff, and Veltmann homework, we had to figure out how to combine Reader, State, and Set monads in an ad-hoc way. In practice, one often wants to combine the abilities of several monads. Corresponding to each monad like Reader, there's a corresponding ReaderT **monad transformer**. That takes an existing monad M and adds a Reader monad layer to it. The way these are defined parallels the way the single-layer versions are defined. For example, here's the Reader monad:
 421
 422         (* monadic operations for the Reader monad *)
 423
 424         type 'a reader =
 425                 env -> 'a;;
 426         let unit (a : 'a) : 'a reader =
 427                 fun e -> a;;
 428         let bind (u: 'a reader) (f : 'a -> 'b reader) : 'b reader =
 429                 fun e -> (fun v -> f v e) (u e);;
 430
 431 We've just beta-expanded the familiar `f (u e) e` into `(fun v -> f v e) (u e)`, in order to factor out the parts where any Reader monad is being supplied as an argument to another function. Then if we want instead to add a Reader layer to some arbitrary other monad M, with its own M.unit and M.bind, here's how we do it:
 432
 433         (* monadic operations for the ReaderT monadic transformer *)
 434
 435         (* We're not giving valid OCaml code, but rather something
 436          * that's conceptually easier to digest.
 437          * How you really need to write this in OCaml is more circuitous...
 438          * see http://lambda.jimpryor.net/code/tree_monadize.ml for some details. *)
 439
 440         type ('a, M) readerT =
 441                 env -> 'a M;;
 442         (* this is just an 'a M reader; but don't rely on that pattern to generalize *)
 443
 444         let unit (a : 'a) : ('a, M) readerT =
 445                 fun e -> M.unit a;;
 446
 447         let bind (u : ('a, M) readerT) (f : 'a -> ('b, M) readerT) : ('b, M) readerT =
 448                 fun e -> M.bind (u e) (fun v -> f v e);;
 449
 450 Notice the key differences: where before we just returned `a`, now we instead return `M.unit a`. Where before we just supplied value `u e` of type `'a reader` as an argument to a function, now we instead `M.bind` the `'a reader` to that function. Notice also the differences in the types.
 451
 452 What is the relation between Reader and ReaderT? Well, suppose you started with the Identity monad:
 453
 454         type 'a identity = 'a;;
 455         let unit (a : 'a) : 'a = a;;
 456         let bind (u : 'a) (f : 'a -> 'b) : 'b = f u;;
 457
 458 and you used the ReaderT transformer to add a Reader monad layer to the Identity monad. What do you suppose you would get?
 459
 460 The relations between the State monad and the StateT monadic transformer are parallel:
 461
 462         (* monadic operations for the State monad *)
 463
 464         type 'a state =
 465                 store -> ('a * store);;
 466
 467         let unit (a : 'a) : 'a state =
 468                 fun s -> (a, s);;
 469
 470         let bind (u : 'a state) (f : 'a -> 'b state) : 'b state =
 471                 fun s -> (fun (a, s') -> f a s') (u s);;
 472
 473 We've used `(fun (a, s') -> f a s') (u s)` instead of the more familiar `let (a, s') = u s in f a s'` in order to factor out the part where a value of type `'a state` is supplied as an argument to a function. Now StateT will be:
 474
 475         (* monadic operations for the StateT monadic transformer *)
 476
 477         type ('a, M) stateT =
 478                 store -> ('a * store) M;;
 479         (* notice this is not an 'a M state *)
 480
 481         let unit (a : 'a) : ('a, M) stateT =
 482                 fun s -> M.unit (a, s);;
 483
 484         let bind (u : ('a, M) stateT) (f : 'a -> ('b, M) stateT) : ('b, M) stateT =
 485                 fun s -> M.bind (u s) (fun (a, s') -> f a s');;
 486
 487 Do you see the pattern? Where ordinarily we'd return an `'a` value, now we instead return an `'a M` value. Where ordinarily we'd supply a `'a state` value as an argument to a function, now we instead `M.bind` it to that function.
 488
 489 Okay, now let's do the same thing for our Tree monad.
 490
 491         (* monadic operations for the Tree monad *)
 492
 493         type 'a tree =
 494                 Leaf of 'a | Node of ('a tree) * ('a tree);;
 495
 496         let unit (a: 'a) : 'a tree =
 497                 Leaf a;;
 498
 499         let rec bind (u : 'a tree) (f : 'a -> 'b tree) : 'b tree =
 500             match u with
 501             | Leaf a -> f a;;
 502             | Node (l, r) -> (fun l' r' -> Node (l', r')) (bind l f) (bind r f);;
 503
 504         (* monadic operations for the TreeT monadic transformer *)
 505         (* NOTE THIS IS NOT YET WORKING --- STILL REFINING *)
 506
 507         type ('a, M) treeT =
 508                 'a tree M;;
 509
 510         let unit (a: 'a) : ('a, M) tree =
 511                 M.unit (Leaf a);;
 512
 513         let rec bind (u : ('a, M) tree) (f : 'a -> ('b, M) tree) : ('b, M) tree =
 514             match u with
 515             | Leaf a -> M.bind (f a) (fun b -> M.unit (Leaf b))
 516             | Node (l, r) -> M.bind (bind l f) (fun l' ->
 517                                                         M.bind (bind r f) (fun r' ->
 518                                                                 M.unit (Node (l', r'));;
 519
 520 Compare this definition of `bind` for the TreeT monadic transformer to our earlier definition of `tree_monadize`, specialized for the Reader monad:
 521
 522         let rec tree_monadize (f : 'a -> 'b reader) (t : 'a tree) : 'b tree reader =
 523             match t with
 524             | Leaf a -> reader_bind (f a) (fun b -> reader_unit (Leaf b))
 525             | Node (l, r) -> reader_bind (tree_monadize f l) (fun l' ->
 526                                reader_bind (tree_monadize f r) (fun r' ->
 527                                  reader_unit (Node (l', r'))));;
 528
 529