topics/week15_continuation_applications.mdwn

   1 <!-- λ ◊ ≠ ∃ Λ ∀ ≡ α β γ ρ ω φ ψ Ω ○ μ η δ ζ ξ ⋆ ★ • ∙ ● ⚫ 𝟎 𝟏 𝟐 𝟘 𝟙 𝟚 𝟬 𝟭 𝟮 ⇧ (U+2e17) ¢ -->
   2 [[!toc levels=2]]
   3
   4 # Applications of continuations to natural language
   5
   6 We've seen a number of applications of monads to natural language,
   7 including presupposition projection, binding, intensionality, and the
   8 dynamics of the GSV fragment.
   9
  10 In the past couple of weeks, we've introduced continuations, first as
  11 a functional programming technique, then in terms of list and tree
  12 zippers, then as a monad.  In this lecture, we will generalize
  13 continuations slightly beyond a monad, and then begin to outline some
  14 of the applications of the generalized continuations.
  15
  16 Many (though not all) of the applications are discussed in detail in
  17 Barker and Shan 2014, *Continuations in Natural Language*, OUP.
  18
  19 To review, in terms of list zippers, the continuation of a focused
  20 element in the list is the front part of the list.
  21
  22     list zipper for the list [a;b;c;d;e;f] with focus on d:
  23
  24         ([c;b;a], [d;e;f])
  25          -------
  26      defunctionalized
  27      continuation
  28
  29 In terms of tree zippers, the continuation is the entire context of
  30 the focused element--the entire rest of the tree.
  31
  32 [drawing of a tree zipper]
  33
  34 We explored continuations first in a list setting, then in a tree
  35 setting, using the doubling task as an example.
  36
  37     "abSd" ~~> "ababd"
  38     "ab#deSfg" ~~> "abdedefg"
  39
  40 The "S" functions like a shifty operator, and "#" functions like a reset.
  41
  42 Although the list version of the doubling task was easy to understand
  43 thoroughly, the tree version was significantly more challenging.  In
  44 particular, it remained unclear why
  45
  46     "aScSe" ~~> "aacaceecaacaceecee"
  47
  48 We'll burn through that conceptual fog today by learning more about
  49 how to work with continuations.
  50
  51 The natural thing to try would have been to defunctionalize the
  52 continuation-based solution using a tree zipper.  But that would not
  53 have been easy, since the natural way to implement the doubling
  54 behavior of the shifty operator would have been to simply copy the
  55 context provided by the zipper.  This would have produced two
  56 uncoordinated copies of the other shifty operator, and we'd have been
  57 in the situation described in class of having a reduction strategy
  58 that never reduced the number of shifty operators below 2.  The
  59 limitation is that zippers by themselves don't provide a natural way
  60 to establish a dependency between two distant elements of a data
  61 structure.  (There are ways around this limitation of tree zippers,
  62 but they are essentially equivalent to the technique given just
  63 below.)
  64
  65 Instead, we'll re-interpreting what the continuation monad was doing
  66 in more or less defunctionalized terms by using Quantifier Raising, a
  67 technique from linguistics.
  68
  69 But first, motivating quantifier scope as a linguistic application.
  70
  71 # The primary application of continuations to natural language: scope-taking
  72
  73 We have seen that continuations allow a deeply-embedded element to
  74 take control over (a portion of) the entire computation that contains
  75 it.  In natural language semantics, this is exactly what it means for
  76 a scope-taking expression to take scope.
  77
  78     1. [Ann put a copy of [everyone]'s homeworks in her briefcase]
  79
  80     2. For every x, [Ann put a copy of x's homeworks in her briefcase]
  81
  82 The sentence in (1) can be paraphrased as in (2), in which the
  83 quantificational DP *everyone* takes scope over the rest of the sentence.
  84 Even if you suspect that there could be an analysis of (2) on which
  85 "every student's term paper" could denote some kind of mereological
  86 fusion of a set of papers, it is much more difficult to be satisfied
  87 with a referential analysis when *every student* is replaced with
  88 *no student*, or *fewer than three students*, and so on---see any
  89 semantics text book for abundant discussion.
  90
  91 We can arrive at an analysis by expressing the meaning of
  92 quantificational DP such as *everyone* using continuations:
  93
  94     3. everyone = shift (\k.∀x.kx)
  95
  96 Assuming there is an implicit reset at the top of the sentence (we'll
  97 explicitly address determining where there is or isn't a reset), the
  98 reduction rules for `shift` will apply the handler function (\k.∀x.kx)
  99 to the remainder of the sentence after abstracting over the position
 100 of the shift expression:
 101
 102     [Ann put a copy of [shift (\k.∀x.kx)]'s homeworks in her briefcase]
 103     ~~> (\k.∀x.kx) (\v. Ann put a copy of v's homeworks in her briefcase)
 104     ~~> ∀x. Ann put a copy of x's homeworks in her briefcase
 105
 106 (To be a bit pedantic, this reduction sequence is more suitable for
 107 shift0 than for shift, but we're not being fussy here about subflavors
 108 of shifty operators.)
 109
 110 The standard technique for handling scope-taking in linguistics is
 111 Quantifier Raising (QR).  As you might suppose, the rule for Quantifier
 112 Raising closely resembles the reduction rule for shift:
 113
 114     Quantifier Raising: given a sentence of the form
 115
 116              [... [QDP] ...],
 117
 118     build a new sentence of the form
 119
 120     [QDP (\x.[... [x] ...])].
 121
 122 Here, QDP is a scope-taking quantificational DP.
 123
 124 Just to emphasize the similarity between QR and shift, we can use QR
 125 to provide insight into the tree version of the doubling task that
 126 mystified us earlier.  Here's the starting point:
 127
 128 <!--
 129 \tree (. (a)((S)((d)((S)(e)))))
 130 -->
 131
 132 <pre>
 133   .
 134 __|___
 135 |    |
 136 a  __|___
 137    |    |
 138    S  __|__
 139       |   |
 140       d  _|__
 141          |  |
 142          S  e
 143 </pre>
 144
 145 First we QR the lower shift operator, replacing it with a variable and
 146 abstracting over that variable.
 147
 148 <!--
 149 \tree (. (S) ((\\x) ((a)((S)((d)((x)(e)))))))
 150 -->
 151
 152 <pre>
 153    .
 154 ___|___
 155 |     |
 156 S  ___|___
 157    |     |
 158    \x  __|___
 159        |    |
 160        a  __|___
 161           |    |
 162           S  __|__
 163              |   |
 164              d  _|__
 165                 |  |
 166                 x  e
 167 </pre>
 168
 169 Next, we QR the upper shift operator
 170
 171 <!--
 172 \tree (. (S) ((\\y) ((S) ((\\x) ((a)((y)((d)((x)(e)))))))))
 173 -->
 174
 175 <pre>
 176    .
 177 ___|___
 178 |     |
 179 S  ___|____
 180    |      |
 181    \y  ___|___
 182        |     |
 183        S  ___|___
 184           |     |
 185           \x  __|___
 186               |    |
 187               a  __|___
 188                  |    |
 189                  y  __|__
 190                     |   |
 191                     d  _|__
 192                        |  |
 193                        x  e
 194 </pre>
 195
 196 We then evaluate, using the same value for the shift operator proposed before:
 197
 198     S = shift = \k.k(k "")
 199
 200 It will be easiest to begin evaluating this tree with the lower shift
 201 operator (we get the same result if we start with the upper one).
 202 The relevant value for k is (\x.a(y(d(x e)))).  Then k "" is
 203 a(y(d(""(e)))), and k(k "") is a(y(d((a(y(d(""(e)))))(e)))).  In tree
 204 form:
 205
 206 <!--
 207 \tree (. (S) ((\\y) ((a)((y)((d)(((a)((y)((d)(("")(e)))))(e)))))))
 208 -->
 209
 210 <pre>
 211    .
 212 ___|___
 213 |     |
 214 S  ___|____
 215    |      |
 216    \y  ___|___
 217        |     |
 218        a  ___|___
 219           |     |
 220           y  ___|___
 221              |     |
 222              d  ___|___
 223                 |     |
 224               __|___  e
 225               |    |
 226               a  __|___
 227                  |    |
 228                  y  __|___
 229                     |    |
 230                     d  __|__
 231                        |   |
 232                        ""  e
 233 </pre>
 234
 235
 236 Repeating the process for the upper shift operator replaces each
 237 occurrence of y with a copy of the whole tree.
 238
 239 <!--
 240 \tree (. ((a)((((a)(("")((d)(((a)(("")((d)(("")(e)))))(e))))))((d)(((a)((((a)(("")((d)(((a)(("")((d)(("")(e)))))(e))))))((d)(("")(e)))))(e))))))
 241 -->
 242
 243 <pre>
 244       .
 245       |
 246 ______|______
 247 |           |
 248 a  _________|__________
 249    |                  |
 250    |               ___|___
 251 ___|___            |     |
 252 |     |            d  ___|____
 253 a  ___|____           |      |
 254    |      |        ___|____  e
 255    ""  ___|___     |      |
 256        |     |     a  ____|_____
 257        d  ___|___     |        |
 258           |     |     |      __|___
 259        ___|___  e  ___|___   |    |
 260        |     |     |     |   d  __|__
 261        a  ___|___  a  ___|____  |   |
 262           |     |     |      |  ""  e
 263           ""  __|___  ""  ___|___
 264               |    |      |     |
 265               d  __|__    d  ___|___
 266                  |   |       |     |
 267                  ""  e    ___|___  e
 268                           |     |
 269                           a  ___|___
 270                              |     |
 271                              ""  __|___
 272                                  |    |
 273                                  d  __|__
 274                                     |   |
 275                                     ""  e
 276 </pre>
 277
 278 The yield of this tree (the sequence of leaf nodes) is
 279 aadadeedaadadeedee, which is the expected output of the double-shifted tree.
 280
 281 Exercise: the result is different, by the way, if the QR occurs in the
 282 opposite order.
 283
 284 Three lessons:
 285
 286 * Generalizing from one-sided, list-based continuation
 287   operators to two-sided, tree-based continuation operators is a
 288   dramatic increase in power and complexity.  (De Groote's dynamic
 289   montague semantics continuations are the one-sided, list-based variety.)
 290
 291 * Operators that
 292   compose multiple copies of a context can be hard to understand
 293   (though keep this in mind when we see the continuations-based
 294   analysis of coordination, which involves context doubling).
 295
 296 * When considering two-sided, tree-based continuation operators,
 297   quantifier raising is a good tool for visualizing (defunctionalizing)
 298   the computation.
 299
 300 ## Tower notation
 301
 302 At this point, we have three ways of representing computations
 303 involving control operators such as shift and reset: using a CPS
 304 transform, lifting into a continuation monad, and by using QR.
 305
 306 QR is the traditional system in linguistics, but it will not be
 307 adequate for us in general.  The reason has to do with evaluation
 308 order.  As we've discussed, especially with respect to the CPS
 309 transform, continuations allow fine-grained control over the order of
 310 evaluation.  One of the main empirical claims of Barker and Shan 2014
 311 is that natural language is sensitive to evaluation order.  Unlike
 312 other presentations of continuations, QR does not lend itself to
 313 reasoning about evaluation order, so we will need to use a different
 314 strategy.
 315
 316 [Note to self: it is interesting to consider what it would take to
 317 reproduce the analyses giving in Barker and Shan in purely QR terms.
 318 Simple quantificational binding using parasitic scope should be easy,
 319 but how reconstruction would work is not so clear.]
 320
 321 ## Introducting the tower notation
 322
 323 We'll present tower notation, then comment and motivate several of its
 324 features as we consider various applications.  For now, we'll motivate
 325 the tower notation by thinking about box types.  In the discussion of
 326 monads, we've thought of monadic types as values inside of a box.  The
 327 box will often contain information in addition to the core object.
 328 For instance, in the Reader monad, a boxed int contains an expression
 329 of type int as the payload, but also contains a function that
 330 manipulates a list of information.  It is natural to imagine
 331 separating a box into two regions, the payload and the hidden scratch
 332 space:
 333
 334 <pre>
 335     _______________               _______________            _______________
 336     | [x->2, y->3] |              | [x->2, y->3] |          | [x->2, y->3] |
 337   -------------------            ------------------        ------------------
 338     |              |     ¢        |              |    =     |              |
 339     |    +2        |              |     y        |          |     5        |
 340     |______________|              |______________|          |______________|
 341 </pre>
 342
 343 For people who are familiar with Discourse Representation Theory (Kamp
 344 1981, Kamp and Reyle 1993), this separation of boxes into payload and
 345 discourse scorekeeping will be familiar (although many details differ).
 346
 347 The general pattern is that monadic treatments separate computation
 348 into an at-issue (pre-monadic) computation with a layer at which
 349 side-effects occur.
 350
 351 The tower notation is a precise way of articulating continuation-based
 352 computations into a payload and (potentially multiple) layers of
 353 side-effects.  Visually, we won't keep the outer box, but we will keep
 354 the horizontal line dividing main effects from side-effects.
 355
 356 Tower convention for types:
 357 <pre>
 358                                               γ | β
 359     (α -> β) -> γ can be equivalently written -----
 360                                                 α
 361 </pre>
 362
 363 Read these types counter-clockwise starting at the bottom.
 364
 365 Tower convention for values:
 366 <pre>
 367                                            g[]
 368     \k.g[k(x)] can be equivalently written ---
 369                                             x
 370 </pre>
 371
 372 If \k.g[k(x)] has type (α -> β) -> γ, then k has type (α -> β).
 373
 374 Here "g[ ]" is a *context*, that is, an expression with (exactly) one
 375 hole in it.  For instance, we might have g[x] = \forall x.P[x].
 376
 377 We'll use a simply-typed system with two atomic types, DP (the type of
 378 individuals) and S (the type of truth values).
 379
 380 ## LIFT (mid)
 381
 382 Then in the spirit of monadic thinking, we'll have a way of lifting an
 383 arbitrary value into the tower system:
 384
 385                                            []   β|β
 386     LIFT (x:α) = \k.kx : (α -> β) -> β ==  -- : ---
 387                                            x     α
 388
 389 Obviously, LIFT is exactly the midentity (the unit) for the continuation monad.
 390 Notice that LIFT requires the result type of the continuation argument
 391 and the result type of the overall expression to match (here, both are β).
 392
 393 The name LIFT comes from Partee's 1987 theory of type-shifters for
 394 determiner phrases.  Importantly, LIFT applied to an
 395 individual-denoting expression yields the generalized quantifier
 396 proposed by Montague as the denotation for proper names:
 397
 398                                             []   S|S
 399     LIFT (j:DP) = \k.kj : (DP -> S) -> S == -- : ---
 400                                             j    DP
 401
 402 So if the proper name *John* denotes the individual j, LIFT(j) is the
 403 generalized quantifier that maps each property k of type DP -> S to true
 404 just in case kj is true.
 405
 406 Crucially for the discussion here, LIFT does not apply only to DPs, as
 407 in Montague and Partee, but to any expression whatsoever.  For
 408 instance, here is LIFT applied to a lexical verb phrase:
 409
 410                                                         []    S|S
 411     LIFT (left:DP->S) = \k.kx : ((DP->S) -> S) -> S == ---- : ---
 412                                                        left   DP
 413
 414 Once we have expressions of type (α -> β) -> γ, we'll need to combine
 415 them.  We'll use the ¢ operator from the continuation monad:
 416
 417     g[]    γ | δ      h[]   δ | ρ    g[h[]]   γ | ρ
 418     --- : -------  ¢  --- : ----- == ------ : -----
 419     f     α -> β      x       α        fx       β
 420
 421 Note that the types below the horizontal line combine just like
 422 functional application (i.e, f:(α->β) (x:α) = fx:β).
 423
 424 ## Not quite a monad
 425
 426 This discussion is based on Wadler's paper `Composable continuations'.
 427
 428 The unit and the combination rule work very much like we are used to
 429 from the various monads we've studied.
 430
 431 In particular, we can easily see that the ¢ operator defined just
 432 above is exactly the same as the ¢ operator from the continuation monad:
 433
 434       ¢ (\k.g[kf]) (\k.h[kx])
 435     = (\MNk.M(\m.N(\n.k(mn)))) (\k.g[kf]) (\k.h[kx])
 436     ~~> \k.(\k.g[kf])(\m.(\k.h[kx])(\n.k(mn))
 437     ~~> \k.g[(\k.h[kx])(\n.k(fn))
 438     ~~> \k.g[h[k(fx)]]
 439
 440        g[h[]]
 441     == ------
 442          fx
 443
 444 However, these continuations do not form an official monad.
 445 One way to see this is to consider the type of the bind operator.
 446 The type of the bind operator in a genuine monad is
 447
 448     mbind : Mα -> (α -> Mβ) -> Mβ
 449
 450 In particular, the result type of the second argument (Mβ) must be
 451 identical to the result type of the bind operator overall.  For the
 452 continuation monad, this means that mbind has the following type:
 453
 454                      ((α -> γ) -> ρ)
 455              -> α -> ((β -> δ) -> ρ)
 456                   -> ((β -> δ) -> ρ)
 457
 458 But the type of the bind operator in our generalized continuation
 459 system (call it "kbind") is
 460
 461     kbind :
 462                      ((α -> γ) -> ρ)
 463              -> α -> ((β -> δ) -> γ)
 464                   -> ((β -> δ) -> ρ)
 465
 466 Note that `(β -> δ) -> γ` is not the same as `(β -> δ) -> ρ`.
 467
 468 These more general types work out fine when plugged into the
 469 continuation monad's bind operator:
 470
 471     kbind u f = \k.   u           (\x.     f              x    k      )
 472                  β->δ (α->γ)->ρ     α      α->(β->δ)->γ   α    β->δ
 473                                            -----------------
 474                                               (β->δ)->γ
 475                                               ------------------------
 476                                                       γ
 477                                    --------------------
 478                                         α->γ
 479                        ---------------------
 480                               ρ
 481                 ---------------
 482                    (β->δ)->ρ
 483
 484 Neverthless, it's easy to see that the generalized continuation system
 485 obeys the monad laws.  We haven't spent much time proving monad laws,
 486 so this seems like a worthy occasion on which to give some details.
 487 Since we're working with bind here, we'll use the version of the monad
 488 laws that are expressed in terms of bind:
 489
 490     Prove u >>= ⇧ == u:
 491
 492        u >>= (\ak.ka)
 493     == (\ufk.u(\x.fxk)) u (\ak.ka)
 494     ~~> \k.u(\x.(\ak.ka)xk)
 495     ~~> \k.u(\x.kx)
 496     ~~> \k.uk
 497     ~~> u
 498
 499 The last two steps are eta reductions.
 500
 501     Prove ⇧a >>= f == f a:
 502
 503        ⇧a >>= f
 504     == (\ak.ka)a >>= f
 505     ~~> (\k.ka) >>= f
 506     == (\ufk.u(\x.fxk)) (\k.ka) f
 507     ~~> \k.(\k.ka)(\x.fxk)
 508     ~~> \k.fak
 509     ~~> fa
 510
 511     Prove u >>= (\a.fa >>= g) == (u >>= f) >>= g:
 512
 513        u >>= (\a.fa >>= g)
 514     == u >>= (\a.(\k.fa(\x.gxk)))
 515     == (\ufk.u(\x.fxk)) u (\a.(\k.fa(\x.gxk)))
 516     == \k.u(\x.(\a.(\k.fa(\x.gxk)))xk)
 517     ~~> \k.u(\x.fx(\y.gyk))
 518
 519        (u >>= f) >>= g
 520     == (\ufk.u(\x.fxk)) u f >>= g
 521     ~~> (\k.u(\x.fxk)) >>= g
 522     == (\ufk.u(\x.fxk)) (\k.u(\x.fxk)) g
 523     ~~> \k.(\k.u(\x.fxk))(\y.gyk)
 524     ~~> \k.u(\x.fx(\y.gyk))
 525
 526 The fact that the monad laws hold means that we can rely on any
 527 reasoning that depends on the monad laws.
 528
 529 ## Syntactic refinements: subtypes of implication
 530
 531 Because natural language allows the functor to be on the left or on
 532 the right, we replace the type arrow -> with a left-leaning version \
 533 and a right-leaning version, as follows:
 534
 535     α/β   β    = α
 536       β   β\α  = α
 537
 538 This means (without adding some fancy footwork, as in Charlow 2014) we
 539 need two versions of ¢ too, one for each direction for the unmonadized types.
 540
 541 Just to be explicit, here are the two versions:
 542
 543     g[]    γ | δ      h[]   δ | ρ    g[h[]]   γ | ρ
 544     --- : -------  ¢  --- : ----- == ------ : -----
 545     f       α/β        x       α        fx       β
 546
 547     h[]   δ | ρ      g[]    γ | δ       g[h[]]   γ | ρ
 548     --- : -----  ¢   --- : -------   == ------ : -----
 549      x      α         f      β\α          fx       β
 550
 551 With respect to types, they differ only in replacing α -> β with α/β
 552 (top version) and with β\α (bottom version).  With respect to
 553 syntactic order, they differ in a parallel way with respect to whether
 554 the function f is on the left of its argument x (top version) or on
 555 the right (bottom version).
 556
 557 Logically, separating -> into \\ and / corresponds to rejecting the
 558 structural rule of exchange.
 559
 560 Note that the \\ and / only govern types at the bottom of the tower.
 561 That is, we currently still have arrow at the higher-order levels of
 562 the types, if we undo the tower notation:
 563
 564     γ|δ
 565     --- == ((α/β) -> δ) -> γ
 566     α/β
 567
 568 We'll change these arrows into left-leaning and right-leaning versions
 569 too, according to the following scheme:
 570
 571     γ|δ
 572     --- == γ//((α/β) \\ δ)
 573     α/β
 574
 575 As we'll see in a minute, each of these for implications (\\, /, \\\\,
 576 //) will have a syntactic interpretation:
 577
 578     \   argument is adjacent on the left of the functor
 579     /    argument is adjacent on the right of the functor
 580     \\   argument is surrounded by the functor
 581     //  argument surrounds the functor
 582
 583 ## LOWER (reset)
 584
 585 One more ingredient: one thing that shifty continuation operators
 586 require is an upper bound, a reset to show the extent of the remainder
 587 of the computation that the shift operator captures.  In the list
 588 version of the doubling task, we have
 589
 590     "a#bdeSfg" ~~> "abdebdefg"   continuation captured: bde
 591     "ab#deSfg" ~~> "abdedefg"    continuation captured:  de
 592
 593 In programming languages, resets are encoded in the computation
 594 explicitly.  In natural language, resets are always invisible.
 595 We'll deal with this in the natural language setting by allowing
 596 spontaneous resets, as long as the types match the following recipe:
 597
 598            g[] α|S
 599     LOWER (---:---) == g[p]:α
 600             p   S
 601
 602 At this point, it should be clear how the approach in the seminar
 603 relates to the system developed in Barker and Shan 2014.  Many
 604 applications of continuations to natural langauge are developed in
 605 detail there, including
 606
 607 * Scope-taking
 608 * Quantificational binding
 609 * Weak crossover
 610 * Generalized coordination
 611 * Dynamic Binding
 612 * WH-movement as delayed binding
 613 * Semantic reconstruction effects
 614 * Linear order effects in negative polarity licensing
 615 * Donkey anaphora
 616
 617 and much more.
 618