X-Git-Url: http://lambda.jimpryor.net/git/gitweb.cgi?p=lambda.git;a=blobdiff_plain;f=topics%2Fweek15_continuation_applications.mdwn;h=328b506ee9b7fc429e634f491b748b0a3455baa8;hp=0000000000000000000000000000000000000000;hb=a57131771bf2f69f35614eb4b41cc43bf00540a2;hpb=451f9f331fd36467309386c576ddcc9449eb249f diff --git a/topics/week15_continuation_applications.mdwn b/topics/week15_continuation_applications.mdwn new file mode 100644 index 00000000..328b506e --- /dev/null +++ b/topics/week15_continuation_applications.mdwn @@ -0,0 +1,605 @@ + +[[!toc]] + +# Applications of continuations to natural language + +We've seen a number of applications of monads to natural language, +including presupposition projection, binding, intensionality, and the +dynamics of the GSV fragment. + +In the past couple of weeks, we've introduced continuations, first as +a functional programming technique, then in terms of list and tree +zippers, then as a monad. In this lecture, we will generalize +continuations slightly beyond a monad, and then begin to outline some +of the applications of the generalized continuations. + +Many (though not all) of the applications are discussed in detail in +Barker and Shan 2014, *Continuations in Natural Language*, OUP. + +To review, in terms of list zippers, the continuation of a focused +element in the list is the front part of the list. + + list zipper for the list [a;b;c;d;e;f] with focus on d: + + ([c;b;a], [d;e;f]) + ------- + defunctionalized + continuation + +In terms of tree zippers, the continuation is the entire context of +the focused element--the entire rest of the tree. + +[drawing of a tree zipper] + +We explored continuations first in a list setting, then in a tree +setting, using the doubling task as an example. + + "abSd" ~~> "ababd" + "ab#deSfg" ~~> "abdedefg" + +The "S" functions like a shifty operator, and "#" functions like a reset. + +Although the list version of the doubling task was easy to understand +thoroughly, the tree version was significantly more challenging. In +particular, it remained unclear why + + "aScSe" ~~> "aacaceecaacaceecee" + +We'll burn through that conceptual fog today by learning more about +how to work with continuations. + +The natural thing to try would have been to defunctionalize the +continuation-based solution using a tree zipper. But that would not +have been easy, since the natural way to implement the doubling +behavior of the shifty operator would have been to simply copy the +context provided by the zipper. This would have produced two +uncoordinated copies of the other shifty operator, and we'd have been +in the situation described in class of having a reduction strategy +that never reduced the number of shifty operators below 2. The +limitation is that zippers by themselves don't provide a natural way +to establish a dependency between two distant elements of a data +structure. (There are ways around this limitation of tree zippers, +but they are essentially equivalent to the technique given just +below.) + +Instead, we'll re-interpreting what the continuation monad was doing +in more or less defunctionalized terms by using Quantifier Raising, a +technique from linguistics. + +But first, motivating quantifier scope as a linguistic application. + +# The primary application of continuations to natural language: scope-taking + +We have seen that continuations allow a deeply-embedded element to +take control over (a portion of) the entire computation that contains +it. In natural language semantics, this is exactly what it means for +a scope-taking expression to take scope. + + 1. [Ann put a copy of [everyone]'s homeworks in her briefcase] + + 2. For every x, [Ann put a copy of x's homeworks in her briefcase] + +The sentence in (1) can be paraphrased as in (2), in which the +quantificational DP *everyone* takes scope over the rest of the sentence. +Even if you suspect that there could be an analysis of (2) on which +"every student's term paper" could denote some kind of mereological +fusion of a set of papers, it is much more difficult to be satisfied +with a referential analysis when *every student* is replaced with +*no student*, or *fewer than three students*, and so on---see any +semantics text book for abundant discussion. + +We can arrive at an analysis by expressing the meaning of +quantificational DP such as *everyone* using continuations: + + 3. everyone = shift (\k.∀x.kx) + +Assuming there is an implicit reset at the top of the sentence (we'll +explicitly address determining where there is or isn't a reset), the +reduction rules for `shift` will apply the handler function (\k.∀x.kx) +to the remainder of the sentence after abstracting over the position +of the shift expression: + + [Ann put a copy of [shift (\k.∀x.kx)]'s homeworks in her briefcase] + ~~> (\k.∀x.kx) (\v. Ann put a copy of v's homeworks in her briefcase) + ~~> ∀x. Ann put a copy of x's homeworks in her briefcase + +(To be a bit pedantic, this reduction sequence is more suitable for +shift0 than for shift, but we're not being fussy here about subflavors +of shifty operators.) + +The standard technique for handling scope-taking in linguistics is +Quantifier Raising (QR). As you might suppose, the rule for Quantifier +Raising closely resembles the reduction rule for shift: + + Quantifier Raising: given a sentence of the form + + [... [QDP] ...], + + build a new sentence of the form + + [QDP (\x.[... [x] ...])]. + +Here, QDP is a scope-taking quantificational DP. + +Just to emphasize the similarity between QR and shift, we can use QR +to provide insight into the tree version of the doubling task that +mystified us earlier. Here's the starting point: + + + +
+  .
+__|___
+|    |
+a  __|___
+   |    |
+   S  __|__
+      |   |
+      d  _|__
+         |  |
+         S  e
+
+ +First we QR the lower shift operator, replacing it with a variable and +abstracting over that variable. + + + +
+   .
+___|___
+|     |
+S  ___|___
+   |     |
+   \x  __|___
+       |    |
+       a  __|___
+          |    |
+          S  __|__
+             |   |
+             d  _|__
+                |  |
+                x  e
+
+ +Next, we QR the upper shift operator + + + +
+   .
+___|___
+|     |
+S  ___|____
+   |      |
+   \y  ___|___
+       |     |
+       S  ___|___
+          |     |
+          \x  __|___
+              |    |
+              a  __|___
+                 |    |
+                 y  __|__
+                    |   |
+                    d  _|__
+                       |  |
+                       x  e
+
+ +We then evaluate, using the same value for the shift operator proposed before: + + S = shift = \k.k(k "") + +It will be easiest to begin evaluating this tree with the lower shift +operator (we get the same result if we start with the upper one). +The relevant value for k is (\x.a(y(d(x e)))). Then k "" is +a(y(d(""(e)))), and k(k "") is a(y(d((a(y(d(""(e)))))(e)))). In tree +form: + + + +
+   .
+___|___
+|     |
+S  ___|____
+   |      |
+   \y  ___|___
+       |     |
+       a  ___|___
+          |     |
+          y  ___|___
+             |     |
+             d  ___|___
+                |     |
+              __|___  e
+              |    |
+              a  __|___
+                 |    |
+                 y  __|___
+                    |    |
+                    d  __|__
+                       |   |
+                       ""  e
+
+ + +Repeating the process for the upper shift operator replaces each +occurrence of y with a copy of the whole tree. + + + +
+      .
+      |
+______|______
+|           |
+a  _________|__________
+   |                  |
+   |               ___|___
+___|___            |     |
+|     |            d  ___|____
+a  ___|____           |      |
+   |      |        ___|____  e
+   ""  ___|___     |      |
+       |     |     a  ____|_____
+       d  ___|___     |        |
+          |     |     |      __|___
+       ___|___  e  ___|___   |    |
+       |     |     |     |   d  __|__
+       a  ___|___  a  ___|____  |   |
+          |     |     |      |  ""  e
+          ""  __|___  ""  ___|___
+              |    |      |     |
+              d  __|__    d  ___|___
+                 |   |       |     |
+                 ""  e    ___|___  e
+                          |     |
+                          a  ___|___
+                             |     |
+                             ""  __|___
+                                 |    |
+                                 d  __|__
+                                    |   |
+                                    ""  e
+
+ +The yield of this tree (the sequence of leaf nodes) is +aadadeedaadadeedee, which is the expected output of the double-shifted tree. + +Exercise: the result is different, by the way, if the QR occurs in the +opposite order. + +Three lessons: + +* Generalizing from one-sided, list-based continuation + operators to two-sided, tree-based continuation operators is a + dramatic increase in power and complexity. (De Groote's dynamic + montague semantics continuations are the one-sided, list-based variety.) + +* Operators that + compose multiple copies of a context can be hard to understand + (though keep this in mind when we see the continuations-based + analysis of coordination, which involves context doubling). + +* When considering two-sided, tree-based continuation operators, + quantifier raising is a good tool for visualizing (defunctionalizing) + the computation. + +## Tower notation + +At this point, we have three ways of representing computations +involving control operators such as shift and reset: using a CPS +transform, lifting into a continuation monad, and by using QR. + +QR is the traditional system in linguistics, but it will not be +adequate for us in general. The reason has to do with evaluation +order. As we've discussed, especially with respect to the CPS +transform, continuations allow fine-grained control over the order of +evaluation. One of the main empirical claims of Barker and Shan 2014 +is that natural language is sensitive to evaluation order. Unlike +other presentations of continuations, QR does not lend itself to +reasoning about evaluation order, so we will need to use a different +strategy. + +[Note to self: it is interesting to consider what it would take to +reproduce the analyses giving in Barker and Shan in purely QR terms. +Simple quantificational binding using parasitic scope should be easy, +but how reconstruction would work is not so clear.] + +## Introducting the tower notation + +We'll present tower notation, then comment and motivate several of its +features as we consider various applications. For now, we'll motivate +the tower notation by thinking about box types. In the discussion of +monads, we've thought of monadic types as values inside of a box. The +box will often contain information in addition to the core object. +For instance, in the Reader monad, a boxed int contains an expression +of type int as the payload, but also contains a function that +manipulates a list of information. It is natural to imagine +separating a box into two regions, the payload and the hidden scratch +space: + +
+    _______________               _______________            _______________ 
+    | [x->2, y->3] |	          | [x->2, y->3] |          | [x->2, y->3] |
+  ------------------- 	         ------------------        ------------------
+    |              |     ¢        |              |    =     |              |
+    |    +2        |	          |     y        |          |     5        |
+    |______________|	          |______________|          |______________|
+
+ +For people who are familiar with Discourse Representation Theory (Kamp +1981, Kamp and Reyle 1993), this separation of boxes into payload and +discourse scorekeeping will be familiar (although many details differ). + +The general pattern is that monadic treatments separate computation +into an at-issue (pre-monadic) computation with a layer at which +side-effects occur. + +The tower notation is a precise way of articulating continuation-based +computations into a payload and (potentially multiple) layers of +side-effects. Visually, we won't keep the outer box, but we will keep +the horizontal line dividing main effects from side-effects. + +Tower convention for types: +
+                                              γ | β
+    (α -> β) -> γ can be equivalently written ----- 
+                                                α
+
+ +Read these types counter-clockwise starting at the bottom. + +Tower convention for values: +
+                                           g[] 
+    \k.g[k(x)] can be equivalently written ---
+                                            x
+
+ +If \k.g[k(x)] has type (α -> β) -> γ, then k has type (α -> β). + +Here "g[ ]" is a *context*, that is, an expression with (exactly) one +hole in it. For instance, we might have g[x] = \forall x.P[x]. + +We'll use a simply-typed system with two atomic types, DP (the type of +individuals) and S (the type of truth values). + +## LIFT (mid) + +Then in the spirit of monadic thinking, we'll have a way of lifting an +arbitrary value into the tower system: + + [] β|β + LIFT (x:α) = \k.kx : (α -> β) -> β == -- : --- + x α + +Obviously, LIFT is exactly the midentity (the unit) for the continuation monad. +Notice that LIFT requires the result type of the continuation argument +and the result type of the overall expression to match (here, both are β). + +The name LIFT comes from Partee's 1987 theory of type-shifters for +determiner phrases. Importantly, LIFT applied to an +individual-denoting expression yields the generalized quantifier +proposed by Montague as the denotation for proper names: + + [] S|S + LIFT (j:DP) = \k.kj : (DP -> S) -> S == -- : --- + j DP + +So if the proper name *John* denotes the individual j, LIFT(j) is the +generalized quantifier that maps each property k of type DP -> S to true +just in case kj is true. + +Crucially for the discussion here, LIFT does not apply only to DPs, as +in Montague and Partee, but to any expression whatsoever. For +instance, here is LIFT applied to a lexical verb phrase: + + [] S|S + LIFT (left:DP->S) = \k.kx : ((DP->S) -> S) -> S == ---- : --- + left DP + +Once we have expressions of type (α -> β) -> γ, we'll need to combine +them. We'll use the ¢ operator from the continuation monad: + + g[] γ | δ h[] δ | ρ g[h[]] γ | ρ + --- : ------- ¢ --- : ----- == ------ : ----- + f α -> β x α fx β + +Note that the types below the horizontal line combine just like +functional application (i.e, f:(α->β) (x:α) = fx:β). + +## Not quite a monad + +This discussion is based on Wadler's paper `Composable continuations'. + +The unit and the combination rule work very much like we are used to +from the various monads we've studied. + +In particular, we can easily see that the ¢ operator defined just +above is exactly the same as the ¢ operator from the continuation monad: + + ¢ (\k.g[kf]) (\k.h[kx]) + = (\MNk.M(\m.N(\n.k(mn)))) (\k.g[kf]) (\k.h[kx]) + ~~> \k.(\k.g[kf])(\m.(\k.h[kx])(\n.k(mn)) + ~~> \k.g[(\k.h[kx])(\n.k(fn)) + ~~> \k.g[h[k(fx)]] + + g[h[]] + == ------ + fx + +However, these continuations do not form an official monad. +One way to see this is to consider the type of the bind operator. +The type of the bind operator in a genuine monad is + + mbind : Mα -> (α -> Mβ) -> Mβ + +In particular, the result type of the second argument (Mβ) must be +identical to the result type of the bind operator overall. For the +continuation monad, this means that mbind has the following type: + + ((α -> γ) -> ρ) + -> α -> ((β -> δ) -> ρ) + -> ((β -> δ) -> ρ) + +But the type of the bind operator in our generalized continuation +system (call it "kbind") is + + kbind : + ((α -> γ) -> ρ) + -> α -> ((β -> δ) -> γ) + -> ((β -> δ) -> ρ) + +Note that `(β -> δ) -> γ` is not the same as `(β -> δ) -> ρ`. + +These more general types work out fine when plugged into the +continuation monad's bind operator: + + kbind u f = \k. u (\x. f x k ) + β->δ (α->γ)->ρ α α->(β->δ)->γ α β->δ + ----------------- + (β->δ)->γ + ------------------------ + γ + -------------------- + α->γ + --------------------- + ρ + --------------- + (β->δ)->ρ + +Neverthless, it's easy to see that the generalized continuation system +obeys the monad laws. We haven't spent much time proving monad laws, +so this seems like a worthy occasion on which to give some details. +Since we're working with bind here, we'll use the version of the monad +laws that are expressed in terms of bind: + + Prove u >>= ⇧ == u: + + u >>= (\ak.ka) + == (\ufk.u(\x.fxk)) u (\ak.ka) + ~~> \k.u(\x.(\ak.ka)xk) + ~~> \k.u(\x.kx) + ~~> \k.uk + ~~> u + +The last two steps are eta reductions. + + Prove ⇧a >>= f == f a: + + ⇧a >>= f + == (\ak.ka)a >>= f + ~~> (\k.ka) >>= f + == (\ufk.u(\x.fxk)) (\k.ka) f + ~~> \k.(\k.ka)(\x.fxk) + ~~> \k.fak + ~~> fa + + Prove u >>= (\a.fa >>= g) == (u >>= f) >>= g: + + u >>= (\a.fa >>= g) + == u >>= (\a.(\k.fa(\x.gxk))) + == (\ufk.u(\x.fxk)) u (\a.(\k.fa(\x.gxk))) + == \k.u(\x.(\a.(\k.fa(\x.gxk)))xk) + ~~> \k.u(\x.fx(\y.gyk)) + + (u >>= f) >>= g + == (\ufk.u(\x.fxk)) u f >>= g + ~~> (\k.u(\x.fxk)) >>= g + == (\ufk.u(\x.fxk)) (\k.u(\x.fxk)) g + ~~> \k.(\k.u(\x.fxk))(\y.gyk) + ~~> \k.u(\x.fx(\y.gyk)) + +The fact that the monad laws hold means that we can rely on any +reasoning that depends on the monad laws. + +## Syntactic refinements: subtypes of implication + +Because natural language allows the functor to be on the left or on +the right, we replace the type arrow -> with a left-leaning version \ +and a right-leaning version, as follows: + + α/β β = α + β β\α = α + +This means (without adding some fancy footwork, as in Charlow 2014) we +need two versions of ¢ too, one for each direction for the unmonadized types. + +Just to be explicit, here are the two versions: + + g[] γ | δ h[] δ | ρ g[h[]] γ | ρ + --- : ------- ¢ --- : ----- == ------ : ----- + f α/β x α fx β + + h[] δ | ρ g[] γ | δ g[h[]] γ | ρ + --- : ----- ¢ --- : ------- == ------ : ----- + x α f β\α fx β + +With respect to types, they differ only in replacing α -> β with α/β +(top version) and with β\α (bottom version). With respect to +syntactic order, they differ in a parallel way with respect to whether +the function f is on the left of its argument x (top version) or on +the right (bottom version). + +Logically, separating -> into \\ and / corresponds to rejecting the +structural rule of exchange. + +Note that the \\ and / only govern types at the bottom of the tower. +That is, we currently still have arrow at the higher-order levels of +the types, if we undo the tower notation: + + γ|δ + --- == ((α/β) -> δ) -> γ + α/β + +We'll change these arrows into left-leaning and right-leaning versions +too, according to the following scheme: + + γ|δ + --- == γ//((α/β) \\ δ) + α/β + +As we'll see in a minute, each of these for implications (\\, /, \\\\, +//) will have a syntactic interpretation: + + \ argument is adjacent on the left of the functor + / argument is adjacent on the right of the functor + \\ argument is surrounded by the functor + // argument surrounds the functor + +## LOWER (reset) + +One more ingredient: one thing that shifty continuation operators +require is an upper bound, a reset to show the extent of the remainder +of the computation that the shift operator captures. In the list +version of the doubling task, we have + + "a#bdeSfg" ~~> "abdebdefg" continuation captured: bde + "ab#deSfg" ~~> "abdedefg" continuation captured: de + +In programming languages, resets are encoded in the computation +explicitly. In natural language, resets are always invisible. +We'll deal with this in the natural language setting by allowing +spontaneous resets, as long as the types match the following recipe: + + g[] α|S + LOWER (---:---) == g[p]:α + p S + +This will be easiest to explain by presenting our first complete +example from natural language: + +