topics/_week15_continuation_applications.mdwn

   1 <!-- λ ◊ ≠ ∃ Λ ∀ ≡ α β γ ρ ω φ ψ Ω ○ μ η δ ζ ξ ⋆ ★ • ∙ ● ⚫ 𝟎 𝟏 𝟐 𝟘 𝟙 𝟚 𝟬 𝟭 𝟮 ⇧ (U+2e17) ¢ -->
   2 [[!toc]]
   3
   4 # Applications of continuations to natural language
   5
   6 We've seen a number of applications of monads to natural language,
   7 including presupposition projection, binding, intensionality, and the
   8 dynamics of the GSV fragment.
   9
  10 In the past couple of weeks, we've introduced continuations, first as
  11 a functional programming technique, then in terms of list and tree
  12 zippers, then as a monad.  In this lecture, we will generalize
  13 continuations slightly beyond a monad, and then begin to outline some
  14 of the applications of monads.  In brief, the generalization can be
  15 summarized in terms of types: instead of using a Kleisli arrow mapping
  16 a type α to a continuized type (α -> ρ) -> ρ, we'll allow the result
  17 types to differ, i.e., we'll map α to (α -> β) -> γ.  This will be
  18 crucial for some natural language applications.
  19
  20 Many (though not all) of the applications are discussed in detail in
  21 Barker and Shan 2014, *Continuations in Natural Language*, OUP.
  22
  23 In terms of list zippers, the continuation of a focused element in
  24 the list is the front part of the list.
  25
  26     list zipper for the list [a;b;c;d;e;f] with focus on d:
  27
  28         ([c;b;a], [d;e;f])
  29          -------
  30      defunctionalized
  31      continuation
  32
  33 In terms of tree zippers, the continuation is the entire context of
  34 the focused element--the entire rest of the tree.
  35
  36 [drawing of a broken tree]
  37
  38 Last week we had trouble computing the doubling task when there was more
  39 than one shifty operator after moving from a list perspective to a
  40 tree perspective.  That is, it remained unclear why "aScSe" was
  41
  42     "aacaceecaacaceecee"
  43
  44 We'll burn through that conceptual fog today.  The natural thing to
  45 try would have been to defunctionalize the continuation-based solution
  46 using a tree zipper.  But that would not have been easy, since the
  47 natural way to implement the doubling behavior of the shifty operator
  48 would have been to simply copy the context provided by the zipper.
  49 This would have produced two uncoordinated copies of the other shifty
  50 operator, and we'd have been in the situation described in class of
  51 having a reduction strategy that never reduced the number of shifty
  52 operators below 2. (There are ways around this limitation of tree zippers,
  53 but they are essentially equivalent to the technique given just below.)
  54
  55 Instead, we'll re-interpreting what the continuation monad was doing
  56 in more or less defunctionalized terms by using Quantifier Raising, a technique
  57 from linguistics.
  58
  59 But first, motivating quantifier scope as a linguistic application.
  60
  61 # The primary application of continuations to natural language: scope-taking
  62
  63 We have seen that continuations allow a deeply-embedded element to
  64 take control over (a portion of) the entire computation that contains
  65 it.  In natural language semantics, this is exactly what it means for
  66 a scope-taking expression to take scope.
  67
  68     1. [Ann put a copy of [everyone]'s homeworks in her briefcase]
  69
  70     2. For every x, [Ann put a copy of x's homeworks in her briefcase]
  71
  72 The sentence in (1) can be paraphrased as in (2), in which the
  73 quantificational DP *everyone* takes scope over the rest of the sentence.
  74 Even if you suspect that there could be an analysis of (2) on which
  75 "every student's term paper" could denote some kind of mereological
  76 fusion of a set of papers, it is much more difficult to be satisfied
  77 with a referential analysis when *every student* is replaced with
  78 *no student*, or *fewer than three students*, and so on---see any
  79 semantics text book for abundant discussion.
  80
  81 We can arrive at an analysis by expressing the meaning of
  82 quantificational DP such as *everyone* using continuations:
  83
  84     3. everyone = shift (\k.∀x.kx)
  85
  86 Assuming there is an implicit reset at the top of the sentence (we'll
  87 explicitly address determining where there is or isn't a reset), the
  88 reduction rules for `shift` will apply the handler function (\k.∀x.kx)
  89 to the remainder of the sentence after abstracting over the position
  90 of the shift expression:
  91
  92     [Ann put a copy of [shift (\k.∀x.kx)]'s homeworks in her briefcase]
  93     ~~> (\k.∀x.kx) (\v. Ann put a copy of v's homeworks in her briefcase)
  94     ~~> ∀x. Ann put a copy of x's homeworks in her briefcase
  95
  96 (To be a bit pedantic, this reduction sequence is more suitable for
  97 shift0 than for shift, but we're not being fussy here about subflavors
  98 of shifty operators.)
  99
 100 The standard technique for handling scope-taking in linguistics is
 101 Quantifier Raising (QR).  As you might suppose, the rule for Quantifier
 102 Raising closely resembles the reduction rule for shift:
 103
 104     Quantifier Raising: given a sentence [... [QDP] ...], build a new
 105     sentence [QDP (\x.[... [x] ...])].
 106
 107 Here, QDP is a scope-taking quantificational DP.
 108
 109 Just to emphasize the similarity between QR and shift, we can use QR
 110 to provide insight into the tree task that mystified us earlier.
 111
 112 <!--
 113 \tree (. (a)((S)((d)((S)(e)))))
 114 -->
 115
 116 <pre>
 117   .
 118 __|___
 119 |    |
 120 a  __|___
 121    |    |
 122    S  __|__
 123       |   |
 124       d  _|__
 125          |  |
 126          S  e
 127 </pre>
 128
 129 First we QR the lower shift operator, replacing it with a variable and
 130 abstracting over that variable.
 131
 132 <!--
 133 \tree (. (S) ((\\x) ((a)((S)((d)((x)(e)))))))
 134 -->
 135
 136 <pre>
 137    .
 138 ___|___
 139 |     |
 140 S  ___|___
 141    |     |
 142    \x  __|___
 143        |    |
 144        a  __|___
 145           |    |
 146           S  __|__
 147              |   |
 148              d  _|__
 149                 |  |
 150                 x  e
 151 </pre>
 152
 153 Next, we QR the upper shift operator
 154
 155 <!--
 156 \tree (. (S) ((\\y) ((S) ((\\x) ((a)((y)((d)((x)(e)))))))))
 157 -->
 158
 159 <pre>
 160    .
 161 ___|___
 162 |     |
 163 S  ___|____
 164    |      |
 165    \y  ___|___
 166        |     |
 167        S  ___|___
 168           |     |
 169           \x  __|___
 170               |    |
 171               a  __|___
 172                  |    |
 173                  y  __|__
 174                     |   |
 175                     d  _|__
 176                        |  |
 177                        x  e
 178 </pre>
 179
 180 We then evaluate, using the same value for the shift operator proposed before:
 181
 182     S = shift = \k.k(k "")
 183
 184 It will be easiest to begin evaluating this tree with the lower shift
 185 operator (we get the same result if we start with the upper one).
 186 The relevant value for k is (\x.a(y(d(x e)))).  Then k "" is
 187 a(y(d(""(e)))), and k(k "") is a(y(d((a(y(d(""(e)))))(e)))).  In tree
 188 form:
 189
 190 <!--
 191 \tree (. (S) ((\\y) ((a)((y)((d)(((a)((y)((d)(("")(e)))))(e)))))))
 192 -->
 193
 194 <pre>
 195    .
 196 ___|___
 197 |     |
 198 S  ___|____
 199    |      |
 200    \y  ___|___
 201        |     |
 202        a  ___|___
 203           |     |
 204           y  ___|___
 205              |     |
 206              d  ___|___
 207                 |     |
 208               __|___  e
 209               |    |
 210               a  __|___
 211                  |    |
 212                  y  __|___
 213                     |    |
 214                     d  __|__
 215                        |   |
 216                        ""  e
 217 </pre>
 218
 219
 220 Repeating the process for the upper shift operator replaces each
 221 occurrence of y with a copy of the whole tree.
 222
 223 <!--
 224 \tree (. ((a)((((a)(("")((d)(((a)(("")((d)(("")(e)))))(e))))))((d)(((a)((((a)(("")((d)(((a)(("")((d)(("")(e)))))(e))))))((d)(("")(e)))))(e))))))
 225 -->
 226
 227 <pre>
 228       .
 229       |
 230 ______|______
 231 |           |
 232 a  _________|__________
 233    |                  |
 234    |               ___|___
 235 ___|___            |     |
 236 |     |            d  ___|____
 237 a  ___|____           |      |
 238    |      |        ___|____  e
 239    ""  ___|___     |      |
 240        |     |     a  ____|_____
 241        d  ___|___     |        |
 242           |     |     |      __|___
 243        ___|___  e  ___|___   |    |
 244        |     |     |     |   d  __|__
 245        a  ___|___  a  ___|____  |   |
 246           |     |     |      |  ""  e
 247           ""  __|___  ""  ___|___
 248               |    |      |     |
 249               d  __|__    d  ___|___
 250                  |   |       |     |
 251                  ""  e    ___|___  e
 252                           |     |
 253                           a  ___|___
 254                              |     |
 255                              ""  __|___
 256                                  |    |
 257                                  d  __|__
 258                                     |   |
 259                                     ""  e
 260 </pre>
 261
 262 The yield of this tree (the sequence of leaf nodes) is
 263 aadadeedaadadeedee, which is the expected output of the double-shifted tree.
 264
 265 Exercise: the result is different, by the way, if the QR occurs in the
 266 opposite order.
 267
 268 Three lessons:
 269
 270 * Generalizing from one-sided, list-based continuation
 271   operators to two-sided, tree-based continuation operators is a
 272   dramatic increase in power and complexity.
 273
 274 * Operators that
 275   compose multiple copies of a context can be hard to understand
 276   (though keep this in mind when we see the continuations-based
 277   analysis of coordination, which involves context doubling).
 278
 279 * When considering two-sided, tree-based continuation operators,
 280   quantifier raising is a good tool for visualizing (defunctionalizing)
 281   the computation.
 282
 283 ## Tower notation
 284
 285 At this point, we have three ways of representing computations
 286 involving control operators such as shift and reset: using a CPS
 287 transform, lifting into a continuation monad, and by using QR.
 288
 289 QR is the traditional system in linguistics, but it will not be
 290 adequate for us in general.  The reason has to do with order.  As
 291 we've discussed, especially with respect to the CPS transform,
 292 continuations allow fine-grained control over the order of evaluation.
 293 One of the main empirical claims of Barker and Shan 2014 is that
 294 natural language is sensitive to evaluation order.  Unlike other
 295 presentations of continuations, QR does not lend itself to reasoning
 296 about evaluation order, so we will need to use a different strategy.
 297
 298 [Note to self: it is interesting to consider what it would take to
 299 reproduce the analyses giving in Barker and Shan in purely QR terms.
 300 Simple quantificational binding using parasitic scope should be easy,
 301 but how reconstruction would work is not so clear.]
 302
 303 ## Introducting the tower notation
 304
 305 We'll present tower notation, then comment and motivate several of its
 306 features as we consider various applications.  For now, we'll motivate
 307 the tower notation by thinking about box types.  In the discussion of
 308 monads, we've thought of monadic types as values inside of a box.  The
 309 box will often contain information in addition to the core object.
 310 For instance, in the Reader monad, a boxed int contains an expression
 311 of type int as the payload, but also contains a function that
 312 manipulates a list of information.  It is natural to imagine
 313 separating a box into two regions, the payload and the hidden scratch
 314 space:
 315
 316 <pre>
 317     _______________               _______________            _______________
 318     | [x->2, y->3] |              | [x->2, y->3] |          | [x->2, y->3] |
 319   -------------------            ------------------        ------------------
 320     |              |     ¢        |              |    =     |              |
 321     |    +2        |              |     y        |          |     5        |
 322     |______________|              |______________|          |______________|
 323 </pre>
 324
 325 For people who are familiar with Discourse Representation Theory (Kamp
 326 1981, Kamp and Reyle 1993), this separation of boxes into payload and
 327 discourse scorekeeping will be familiar (although many details differ).
 328
 329 The general pattern is that monadic treatments separate computation
 330 into an at-issue (pre-monadic) computation with a layer at which
 331 side-effects occur.
 332
 333 The tower notation is a precise way of articulating continuation-based
 334 computations into a payload and (potentially multiple) layers of side-effects.
 335 We won't keep the outer box, but we will keep the horizontal line
 336 dividing main effects from side-effects.
 337
 338 Tower convention for types:
 339 <pre>
 340                                               γ | β
 341     (α -> β) -> γ can be equivalently written -----
 342                                                 α
 343 </pre>
 344
 345 Tower convention for values:
 346 <pre>
 347                                            g[]
 348     \k.g[k(x)] can be equivalently written ---
 349                                             x
 350 </pre>
 351
 352 If \k.g[k(x)] has type (α -> β) -> γ, then k has type (α -> β).
 353
 354 Here "g[ ]" is a *context*, that is, an expression with (exactly) one
 355 hole in it.  For instance, we might have g[x] = \forall x.P[x].
 356
 357 We'll use a simply-typed system with two atomic types, DP (the type of
 358 individuals) and S (the type of truth values).
 359
 360 ## LIFT
 361
 362 Then in the spirit of monadic thinking, we'll have a way of lifting an
 363 arbitrary value into the tower system:
 364
 365                                            []   β|β
 366     LIFT (x:α) = \k.kx : (α -> β) -> β ==  -- : ---
 367                                            x     α
 368
 369 Obviously, LIFT is exactly the midentity (the unit) for the continuation monad.
 370 Notice that LIFT requires the result type of the continuation argument
 371 and the result type of the overall expression to match (here, both are β).
 372
 373 The name LIFT comes from Partee's 1987 theory of type-shifters for
 374 determiner phrases.  Importantly, LIFT applied to an
 375 individual-denoting expression yields the generalized quantifier
 376 proposed by Montague as the denotation for proper names:
 377
 378                                             []   S|S
 379     LIFT (j:DP) = \k.kx : (DP -> S) -> S == -- : ---
 380                                             j    DP
 381
 382 So if the proper name *John* denotes the individual j, LIFT(j) is the
 383 generalized quantifier that maps each property k of type DP -> S to true
 384 just in case kj is true.
 385
 386 Crucially for the discussion here, LIFT does not apply only to DPs, as
 387 in Montague and Partee, but to any expression whatsoever.  For
 388 instance, here is LIFT applied to a lexical verb phrase:
 389
 390                                                         []    S|S
 391     LIFT (left:DP->S) = \k.kx : ((DP->S) -> S) -> S == ---- : ---
 392                                                        left   DP
 393
 394 Once we have expressions of type (α -> β) -> γ, we'll need to combine
 395 them.  We'll use the ¢ operator from the continuation monad:
 396
 397     g[]    γ | δ      h[]   δ | ρ    g[h[]]   γ | ρ
 398     --- : -------  ¢  --- : ----- == ------ : -----
 399     f     α -> β      x       α        fx       β
 400
 401 Note that the types below the horizontal line combine just like
 402 functional application (i.e, f:(α->β) (x:α) = fx:β).
 403
 404 ## Not quite a monad
 405
 406 To demonstrate that this is indeed the continuation monad's ¢
 407 operator:
 408
 409       ¢ (\k.g[kf]) (\k.h[kx])
 410     = (\MNk.M(\m.N(\n.k(mn)))) (\k.g[kf]) (\k.h[kx])
 411     ~~> \k.(\k.g[kf])(\m.(\k.h[kx])(\n.k(mn))
 412     ~~> \k.g[(\k.h[kx])(\n.k(fn))
 413     ~~> \k.g[h[k(fx)]]
 414
 415        g[h[]]
 416     == ------
 417          fx
 418
 419 However, these continuations do not form an official monad.  The
 420 reason is that (see Wadler's paper `Composable continuations' for details).
 421
 422 Neverthless, obeys the monad laws.
 423
 424 Oh, one more thing: because natural language allows the functor to be
 425 on the left or on the right, we replace the type arrow -> with a
 426 left-leaning version \ and a right-leaning version, as follows:
 427
 428     α/β  β = α
 429     β  β\α = α
 430
 431 This means we need two versions of ¢ too (see Barker and Shan 2014
 432 chapter 1 for full details).
 433
 434 This is (almost) all we need to get some significant linguistic work
 435 done.
 436