1 <!-- λ ◊ ≠ ∃ Λ ∀ ≡ α β γ ρ ω φ ψ Ω ○ μ η δ ζ ξ ⋆ ★ • ∙ ● ⚫ 𝟎 𝟏 𝟐 𝟘 𝟙 𝟚 𝟬 𝟭 𝟮 ⇧ (U+2e17) ¢ -->
4 # Applications of continuations to natural language
6 We've seen a number of applications of monads to natural language,
7 including presupposition projection, binding, intensionality, and the
8 dynamics of the GSV fragment.
10 In the past couple of weeks, we've introduced continuations, first as
11 a functional programming technique, then in terms of list and tree
12 zippers, then as a monad. In this lecture, we will generalize
13 continuations slightly beyond a monad, and then begin to outline some
14 of the applications of the generalized continuations.
16 Many (though not all) of the applications are discussed in detail in
17 Barker and Shan 2014, *Continuations in Natural Language*, OUP.
19 To review, in terms of list zippers, the continuation of a focused
20 element in the list is the front part of the list.
22 list zipper for the list [a;b;c;d;e;f] with focus on d:
29 In terms of tree zippers, the continuation is the entire context of
30 the focused element--the entire rest of the tree.
32 [drawing of a tree zipper]
34 We explored continuations first in a list setting, then in a tree
35 setting, using the doubling task as an example.
38 "ab#deSfg" ~~> "abdedefg"
40 The "S" functions like a shifty operator, and "#" functions like a reset.
42 Although the list version of the doubling task was easy to understand
43 thoroughly, the tree version was significantly more challenging. In
44 particular, it remained unclear why
46 "aScSe" ~~> "aacaceecaacaceecee"
48 We'll burn through that conceptual fog today by learning more about
49 how to work with continuations.
51 The natural thing to try would have been to defunctionalize the
52 continuation-based solution using a tree zipper. But that would not
53 have been easy, since the natural way to implement the doubling
54 behavior of the shifty operator would have been to simply copy the
55 context provided by the zipper. This would have produced two
56 uncoordinated copies of the other shifty operator, and we'd have been
57 in the situation described in class of having a reduction strategy
58 that never reduced the number of shifty operators below 2. The
59 limitation is that zippers by themselves don't provide a natural way
60 to establish a dependency between two distant elements of a data
61 structure. (There are ways around this limitation of tree zippers,
62 but they are essentially equivalent to the technique given just
65 Instead, we'll re-interpreting what the continuation monad was doing
66 in more or less defunctionalized terms by using Quantifier Raising, a
67 technique from linguistics.
69 But first, motivating quantifier scope as a linguistic application.
71 # The primary application of continuations to natural language: scope-taking
73 We have seen that continuations allow a deeply-embedded element to
74 take control over (a portion of) the entire computation that contains
75 it. In natural language semantics, this is exactly what it means for
76 a scope-taking expression to take scope.
78 1. [Ann put a copy of [everyone]'s homeworks in her briefcase]
80 2. For every x, [Ann put a copy of x's homeworks in her briefcase]
82 The sentence in (1) can be paraphrased as in (2), in which the
83 quantificational DP *everyone* takes scope over the rest of the sentence.
84 Even if you suspect that there could be an analysis of (2) on which
85 "every student's term paper" could denote some kind of mereological
86 fusion of a set of papers, it is much more difficult to be satisfied
87 with a referential analysis when *every student* is replaced with
88 *no student*, or *fewer than three students*, and so on---see any
89 semantics text book for abundant discussion.
91 We can arrive at an analysis by expressing the meaning of
92 quantificational DP such as *everyone* using continuations:
94 3. everyone = shift (\k.∀x.kx)
96 Assuming there is an implicit reset at the top of the sentence (we'll
97 explicitly address determining where there is or isn't a reset), the
98 reduction rules for `shift` will apply the handler function (\k.∀x.kx)
99 to the remainder of the sentence after abstracting over the position
100 of the shift expression:
102 [Ann put a copy of [shift (\k.∀x.kx)]'s homeworks in her briefcase]
103 ~~> (\k.∀x.kx) (\v. Ann put a copy of v's homeworks in her briefcase)
104 ~~> ∀x. Ann put a copy of x's homeworks in her briefcase
106 (To be a bit pedantic, this reduction sequence is more suitable for
107 shift0 than for shift, but we're not being fussy here about subflavors
108 of shifty operators.)
110 The standard technique for handling scope-taking in linguistics is
111 Quantifier Raising (QR). As you might suppose, the rule for Quantifier
112 Raising closely resembles the reduction rule for shift:
114 Quantifier Raising: given a sentence of the form
118 build a new sentence of the form
120 [QDP (\x.[... [x] ...])].
122 Here, QDP is a scope-taking quantificational DP.
124 Just to emphasize the similarity between QR and shift, we can use QR
125 to provide insight into the tree version of the doubling task that
126 mystified us earlier. Here's the starting point:
129 \tree (. (a)((S)((d)((S)(e)))))
145 First we QR the lower shift operator, replacing it with a variable and
146 abstracting over that variable.
149 \tree (. (S) ((\\x) ((a)((S)((d)((x)(e)))))))
169 Next, we QR the upper shift operator
172 \tree (. (S) ((\\y) ((S) ((\\x) ((a)((y)((d)((x)(e)))))))))
196 We then evaluate, using the same value for the shift operator proposed before:
198 S = shift = \k.k(k "")
200 It will be easiest to begin evaluating this tree with the lower shift
201 operator (we get the same result if we start with the upper one).
202 The relevant value for k is (\x.a(y(d(x e)))). Then k "" is
203 a(y(d(""(e)))), and k(k "") is a(y(d((a(y(d(""(e)))))(e)))). In tree
207 \tree (. (S) ((\\y) ((a)((y)((d)(((a)((y)((d)(("")(e)))))(e)))))))
236 Repeating the process for the upper shift operator replaces each
237 occurrence of y with a copy of the whole tree.
240 \tree (. ((a)((((a)(("")((d)(((a)(("")((d)(("")(e)))))(e))))))((d)(((a)((((a)(("")((d)(((a)(("")((d)(("")(e)))))(e))))))((d)(("")(e)))))(e))))))
248 a _________|__________
259 ___|___ e ___|___ | |
261 a ___|___ a ___|____ | |
278 The yield of this tree (the sequence of leaf nodes) is
279 aadadeedaadadeedee, which is the expected output of the double-shifted tree.
281 Exercise: the result is different, by the way, if the QR occurs in the
286 * Generalizing from one-sided, list-based continuation
287 operators to two-sided, tree-based continuation operators is a
288 dramatic increase in power and complexity. (De Groote's dynamic
289 montague semantics continuations are the one-sided, list-based variety.)
292 compose multiple copies of a context can be hard to understand
293 (though keep this in mind when we see the continuations-based
294 analysis of coordination, which involves context doubling).
296 * When considering two-sided, tree-based continuation operators,
297 quantifier raising is a good tool for visualizing (defunctionalizing)
302 At this point, we have three ways of representing computations
303 involving control operators such as shift and reset: using a CPS
304 transform, lifting into a continuation monad, and by using QR.
306 QR is the traditional system in linguistics, but it will not be
307 adequate for us in general. The reason has to do with evaluation
308 order. As we've discussed, especially with respect to the CPS
309 transform, continuations allow fine-grained control over the order of
310 evaluation. One of the main empirical claims of Barker and Shan 2014
311 is that natural language is sensitive to evaluation order. Unlike
312 other presentations of continuations, QR does not lend itself to
313 reasoning about evaluation order, so we will need to use a different
316 [Note to self: it is interesting to consider what it would take to
317 reproduce the analyses giving in Barker and Shan in purely QR terms.
318 Simple quantificational binding using parasitic scope should be easy,
319 but how reconstruction would work is not so clear.]
321 ## Introducting the tower notation
323 We'll present tower notation, then comment and motivate several of its
324 features as we consider various applications. For now, we'll motivate
325 the tower notation by thinking about box types. In the discussion of
326 monads, we've thought of monadic types as values inside of a box. The
327 box will often contain information in addition to the core object.
328 For instance, in the Reader monad, a boxed int contains an expression
329 of type int as the payload, but also contains a function that
330 manipulates a list of information. It is natural to imagine
331 separating a box into two regions, the payload and the hidden scratch
335 _______________ _______________ _______________
336 | [x->2, y->3] | | [x->2, y->3] | | [x->2, y->3] |
337 ------------------- ------------------ ------------------
340 |______________| |______________| |______________|
343 For people who are familiar with Discourse Representation Theory (Kamp
344 1981, Kamp and Reyle 1993), this separation of boxes into payload and
345 discourse scorekeeping will be familiar (although many details differ).
347 The general pattern is that monadic treatments separate computation
348 into an at-issue (pre-monadic) computation with a layer at which
351 The tower notation is a precise way of articulating continuation-based
352 computations into a payload and (potentially multiple) layers of
353 side-effects. Visually, we won't keep the outer box, but we will keep
354 the horizontal line dividing main effects from side-effects.
356 Tower convention for types:
359 (α -> β) -> γ can be equivalently written -----
363 Read these types counter-clockwise starting at the bottom.
365 Tower convention for values:
368 \k.g[k(x)] can be equivalently written ---
372 If \k.g[k(x)] has type (α -> β) -> γ, then k has type (α -> β).
374 Here "g[ ]" is a *context*, that is, an expression with (exactly) one
375 hole in it. For instance, we might have g[x] = \forall x.P[x].
377 We'll use a simply-typed system with two atomic types, DP (the type of
378 individuals) and S (the type of truth values).
382 Then in the spirit of monadic thinking, we'll have a way of lifting an
383 arbitrary value into the tower system:
386 LIFT (x:α) = \k.kx : (α -> β) -> β == -- : ---
389 Obviously, LIFT is exactly the midentity (the unit) for the continuation monad.
390 Notice that LIFT requires the result type of the continuation argument
391 and the result type of the overall expression to match (here, both are β).
393 The name LIFT comes from Partee's 1987 theory of type-shifters for
394 determiner phrases. Importantly, LIFT applied to an
395 individual-denoting expression yields the generalized quantifier
396 proposed by Montague as the denotation for proper names:
399 LIFT (j:DP) = \k.kj : (DP -> S) -> S == -- : ---
402 So if the proper name *John* denotes the individual j, LIFT(j) is the
403 generalized quantifier that maps each property k of type DP -> S to true
404 just in case kj is true.
406 Crucially for the discussion here, LIFT does not apply only to DPs, as
407 in Montague and Partee, but to any expression whatsoever. For
408 instance, here is LIFT applied to a lexical verb phrase:
411 LIFT (left:DP->S) = \k.kx : ((DP->S) -> S) -> S == ---- : ---
414 Once we have expressions of type (α -> β) -> γ, we'll need to combine
415 them. We'll use the ¢ operator from the continuation monad:
417 g[] γ | δ h[] δ | ρ g[h[]] γ | ρ
418 --- : ------- ¢ --- : ----- == ------ : -----
421 Note that the types below the horizontal line combine just like
422 functional application (i.e, f:(α->β) (x:α) = fx:β).
426 This discussion is based on Wadler's paper `Composable continuations'.
428 The unit and the combination rule work very much like we are used to
429 from the various monads we've studied.
431 In particular, we can easily see that the ¢ operator defined just
432 above is exactly the same as the ¢ operator from the continuation monad:
434 ¢ (\k.g[kf]) (\k.h[kx])
435 = (\MNk.M(\m.N(\n.k(mn)))) (\k.g[kf]) (\k.h[kx])
436 ~~> \k.(\k.g[kf])(\m.(\k.h[kx])(\n.k(mn))
437 ~~> \k.g[(\k.h[kx])(\n.k(fn))
444 However, these continuations do not form an official monad.
445 One way to see this is to consider the type of the bind operator.
446 The type of the bind operator in a genuine monad is
448 mbind : Mα -> (α -> Mβ) -> Mβ
450 In particular, the result type of the second argument (Mβ) must be
451 identical to the result type of the bind operator overall. For the
452 continuation monad, this means that mbind has the following type:
455 -> α -> ((β -> δ) -> ρ)
458 But the type of the bind operator in our generalized continuation
459 system (call it "kbind") is
463 -> α -> ((β -> δ) -> γ)
466 Note that `(β -> δ) -> γ` is not the same as `(β -> δ) -> ρ`.
468 These more general types work out fine when plugged into the
469 continuation monad's bind operator:
471 kbind u f = \k. u (\x. f x k )
472 β->δ (α->γ)->ρ α α->(β->δ)->γ α β->δ
475 ------------------------
479 ---------------------
484 Neverthless, it's easy to see that the generalized continuation system
485 obeys the monad laws. We haven't spent much time proving monad laws,
486 so this seems like a worthy occasion on which to give some details.
487 Since we're working with bind here, we'll use the version of the monad
488 laws that are expressed in terms of bind:
493 == (\ufk.u(\x.fxk)) u (\ak.ka)
494 ~~> \k.u(\x.(\ak.ka)xk)
499 The last two steps are eta reductions.
501 Prove ⇧a >>= f == f a:
506 == (\ufk.u(\x.fxk)) (\k.ka) f
507 ~~> \k.(\k.ka)(\x.fxk)
511 Prove u >>= (\a.fa >>= g) == (u >>= f) >>= g:
514 == u >>= (\a.(\k.fa(\x.gxk)))
515 == (\ufk.u(\x.fxk)) u (\a.(\k.fa(\x.gxk)))
516 == \k.u(\x.(\a.(\k.fa(\x.gxk)))xk)
517 ~~> \k.u(\x.fx(\y.gyk))
520 == (\ufk.u(\x.fxk)) u f >>= g
521 ~~> (\k.u(\x.fxk)) >>= g
522 == (\ufk.u(\x.fxk)) (\k.u(\x.fxk)) g
523 ~~> \k.(\k.u(\x.fxk))(\y.gyk)
524 ~~> \k.u(\x.fx(\y.gyk))
526 The fact that the monad laws hold means that we can rely on any
527 reasoning that depends on the monad laws.
529 ## Syntactic refinements: subtypes of implication
531 Because natural language allows the functor to be on the left or on
532 the right, we replace the type arrow -> with a left-leaning version \
533 and a right-leaning version, as follows:
538 This means (without adding some fancy footwork, as in Charlow 2014) we
539 need two versions of ¢ too, one for each direction for the unmonadized types.
541 Just to be explicit, here are the two versions:
543 g[] γ | δ h[] δ | ρ g[h[]] γ | ρ
544 --- : ------- ¢ --- : ----- == ------ : -----
547 h[] δ | ρ g[] γ | δ g[h[]] γ | ρ
548 --- : ----- ¢ --- : ------- == ------ : -----
551 With respect to types, they differ only in replacing α -> β with α/β
552 (top version) and with β\α (bottom version). With respect to
553 syntactic order, they differ in a parallel way with respect to whether
554 the function f is on the left of its argument x (top version) or on
555 the right (bottom version).
557 Logically, separating -> into \\ and / corresponds to rejecting the
558 structural rule of exchange.
560 Note that the \\ and / only govern types at the bottom of the tower.
561 That is, we currently still have arrow at the higher-order levels of
562 the types, if we undo the tower notation:
565 --- == ((α/β) -> δ) -> γ
568 We'll change these arrows into left-leaning and right-leaning versions
569 too, according to the following scheme:
572 --- == γ//((α/β) \\ δ)
575 As we'll see in a minute, each of these for implications (\\, /, \\\\,
576 //) will have a syntactic interpretation:
578 \ argument is adjacent on the left of the functor
579 / argument is adjacent on the right of the functor
580 \\ argument is surrounded by the functor
581 // argument surrounds the functor
585 One more ingredient: one thing that shifty continuation operators
586 require is an upper bound, a reset to show the extent of the remainder
587 of the computation that the shift operator captures. In the list
588 version of the doubling task, we have
590 "a#bdeSfg" ~~> "abdebdefg" continuation captured: bde
591 "ab#deSfg" ~~> "abdedefg" continuation captured: de
593 In programming languages, resets are encoded in the computation
594 explicitly. In natural language, resets are always invisible.
595 We'll deal with this in the natural language setting by allowing
596 spontaneous resets, as long as the types match the following recipe:
599 LOWER (---:---) == g[p]:α
602 At this point, it should be clear how the approach in the seminar
603 relates to the system developed in Barker and Shan 2014. Many
604 applications of continuations to natural langauge are developed in
605 detail there, including
608 * Quantificational binding
610 * Generalized coordination
612 * WH-movement as delayed binding
613 * Semantic reconstruction effects
614 * Linear order effects in negative polarity licensing