X-Git-Url: http://lambda.jimpryor.net/git/gitweb.cgi?p=lambda.git;a=blobdiff_plain;f=topics%2F_week15_continuation_applications.mdwn;h=0000000000000000000000000000000000000000;hp=328b506ee9b7fc429e634f491b748b0a3455baa8;hb=a57131771bf2f69f35614eb4b41cc43bf00540a2;hpb=451f9f331fd36467309386c576ddcc9449eb249f diff --git a/topics/_week15_continuation_applications.mdwn b/topics/_week15_continuation_applications.mdwn deleted file mode 100644 index 328b506e..00000000 --- a/topics/_week15_continuation_applications.mdwn +++ /dev/null @@ -1,605 +0,0 @@ - -[[!toc]] - -# Applications of continuations to natural language - -We've seen a number of applications of monads to natural language, -including presupposition projection, binding, intensionality, and the -dynamics of the GSV fragment. - -In the past couple of weeks, we've introduced continuations, first as -a functional programming technique, then in terms of list and tree -zippers, then as a monad. In this lecture, we will generalize -continuations slightly beyond a monad, and then begin to outline some -of the applications of the generalized continuations. - -Many (though not all) of the applications are discussed in detail in -Barker and Shan 2014, *Continuations in Natural Language*, OUP. - -To review, in terms of list zippers, the continuation of a focused -element in the list is the front part of the list. - - list zipper for the list [a;b;c;d;e;f] with focus on d: - - ([c;b;a], [d;e;f]) - ------- - defunctionalized - continuation - -In terms of tree zippers, the continuation is the entire context of -the focused element--the entire rest of the tree. - -[drawing of a tree zipper] - -We explored continuations first in a list setting, then in a tree -setting, using the doubling task as an example. - - "abSd" ~~> "ababd" - "ab#deSfg" ~~> "abdedefg" - -The "S" functions like a shifty operator, and "#" functions like a reset. - -Although the list version of the doubling task was easy to understand -thoroughly, the tree version was significantly more challenging. In -particular, it remained unclear why - - "aScSe" ~~> "aacaceecaacaceecee" - -We'll burn through that conceptual fog today by learning more about -how to work with continuations. - -The natural thing to try would have been to defunctionalize the -continuation-based solution using a tree zipper. But that would not -have been easy, since the natural way to implement the doubling -behavior of the shifty operator would have been to simply copy the -context provided by the zipper. This would have produced two -uncoordinated copies of the other shifty operator, and we'd have been -in the situation described in class of having a reduction strategy -that never reduced the number of shifty operators below 2. The -limitation is that zippers by themselves don't provide a natural way -to establish a dependency between two distant elements of a data -structure. (There are ways around this limitation of tree zippers, -but they are essentially equivalent to the technique given just -below.) - -Instead, we'll re-interpreting what the continuation monad was doing -in more or less defunctionalized terms by using Quantifier Raising, a -technique from linguistics. - -But first, motivating quantifier scope as a linguistic application. - -# The primary application of continuations to natural language: scope-taking - -We have seen that continuations allow a deeply-embedded element to -take control over (a portion of) the entire computation that contains -it. In natural language semantics, this is exactly what it means for -a scope-taking expression to take scope. - - 1. [Ann put a copy of [everyone]'s homeworks in her briefcase] - - 2. For every x, [Ann put a copy of x's homeworks in her briefcase] - -The sentence in (1) can be paraphrased as in (2), in which the -quantificational DP *everyone* takes scope over the rest of the sentence. -Even if you suspect that there could be an analysis of (2) on which -"every student's term paper" could denote some kind of mereological -fusion of a set of papers, it is much more difficult to be satisfied -with a referential analysis when *every student* is replaced with -*no student*, or *fewer than three students*, and so on---see any -semantics text book for abundant discussion. - -We can arrive at an analysis by expressing the meaning of -quantificational DP such as *everyone* using continuations: - - 3. everyone = shift (\k.∀x.kx) - -Assuming there is an implicit reset at the top of the sentence (we'll -explicitly address determining where there is or isn't a reset), the -reduction rules for `shift` will apply the handler function (\k.∀x.kx) -to the remainder of the sentence after abstracting over the position -of the shift expression: - - [Ann put a copy of [shift (\k.∀x.kx)]'s homeworks in her briefcase] - ~~> (\k.∀x.kx) (\v. Ann put a copy of v's homeworks in her briefcase) - ~~> ∀x. Ann put a copy of x's homeworks in her briefcase - -(To be a bit pedantic, this reduction sequence is more suitable for -shift0 than for shift, but we're not being fussy here about subflavors -of shifty operators.) - -The standard technique for handling scope-taking in linguistics is -Quantifier Raising (QR). As you might suppose, the rule for Quantifier -Raising closely resembles the reduction rule for shift: - - Quantifier Raising: given a sentence of the form - - [... [QDP] ...], - - build a new sentence of the form - - [QDP (\x.[... [x] ...])]. - -Here, QDP is a scope-taking quantificational DP. - -Just to emphasize the similarity between QR and shift, we can use QR -to provide insight into the tree version of the doubling task that -mystified us earlier. Here's the starting point: - - - -
-  .
-__|___
-|    |
-a  __|___
-   |    |
-   S  __|__
-      |   |
-      d  _|__
-         |  |
-         S  e
-
- -First we QR the lower shift operator, replacing it with a variable and -abstracting over that variable. - - - -
-   .
-___|___
-|     |
-S  ___|___
-   |     |
-   \x  __|___
-       |    |
-       a  __|___
-          |    |
-          S  __|__
-             |   |
-             d  _|__
-                |  |
-                x  e
-
- -Next, we QR the upper shift operator - - - -
-   .
-___|___
-|     |
-S  ___|____
-   |      |
-   \y  ___|___
-       |     |
-       S  ___|___
-          |     |
-          \x  __|___
-              |    |
-              a  __|___
-                 |    |
-                 y  __|__
-                    |   |
-                    d  _|__
-                       |  |
-                       x  e
-
- -We then evaluate, using the same value for the shift operator proposed before: - - S = shift = \k.k(k "") - -It will be easiest to begin evaluating this tree with the lower shift -operator (we get the same result if we start with the upper one). -The relevant value for k is (\x.a(y(d(x e)))). Then k "" is -a(y(d(""(e)))), and k(k "") is a(y(d((a(y(d(""(e)))))(e)))). In tree -form: - - - -
-   .
-___|___
-|     |
-S  ___|____
-   |      |
-   \y  ___|___
-       |     |
-       a  ___|___
-          |     |
-          y  ___|___
-             |     |
-             d  ___|___
-                |     |
-              __|___  e
-              |    |
-              a  __|___
-                 |    |
-                 y  __|___
-                    |    |
-                    d  __|__
-                       |   |
-                       ""  e
-
- - -Repeating the process for the upper shift operator replaces each -occurrence of y with a copy of the whole tree. - - - -
-      .
-      |
-______|______
-|           |
-a  _________|__________
-   |                  |
-   |               ___|___
-___|___            |     |
-|     |            d  ___|____
-a  ___|____           |      |
-   |      |        ___|____  e
-   ""  ___|___     |      |
-       |     |     a  ____|_____
-       d  ___|___     |        |
-          |     |     |      __|___
-       ___|___  e  ___|___   |    |
-       |     |     |     |   d  __|__
-       a  ___|___  a  ___|____  |   |
-          |     |     |      |  ""  e
-          ""  __|___  ""  ___|___
-              |    |      |     |
-              d  __|__    d  ___|___
-                 |   |       |     |
-                 ""  e    ___|___  e
-                          |     |
-                          a  ___|___
-                             |     |
-                             ""  __|___
-                                 |    |
-                                 d  __|__
-                                    |   |
-                                    ""  e
-
- -The yield of this tree (the sequence of leaf nodes) is -aadadeedaadadeedee, which is the expected output of the double-shifted tree. - -Exercise: the result is different, by the way, if the QR occurs in the -opposite order. - -Three lessons: - -* Generalizing from one-sided, list-based continuation - operators to two-sided, tree-based continuation operators is a - dramatic increase in power and complexity. (De Groote's dynamic - montague semantics continuations are the one-sided, list-based variety.) - -* Operators that - compose multiple copies of a context can be hard to understand - (though keep this in mind when we see the continuations-based - analysis of coordination, which involves context doubling). - -* When considering two-sided, tree-based continuation operators, - quantifier raising is a good tool for visualizing (defunctionalizing) - the computation. - -## Tower notation - -At this point, we have three ways of representing computations -involving control operators such as shift and reset: using a CPS -transform, lifting into a continuation monad, and by using QR. - -QR is the traditional system in linguistics, but it will not be -adequate for us in general. The reason has to do with evaluation -order. As we've discussed, especially with respect to the CPS -transform, continuations allow fine-grained control over the order of -evaluation. One of the main empirical claims of Barker and Shan 2014 -is that natural language is sensitive to evaluation order. Unlike -other presentations of continuations, QR does not lend itself to -reasoning about evaluation order, so we will need to use a different -strategy. - -[Note to self: it is interesting to consider what it would take to -reproduce the analyses giving in Barker and Shan in purely QR terms. -Simple quantificational binding using parasitic scope should be easy, -but how reconstruction would work is not so clear.] - -## Introducting the tower notation - -We'll present tower notation, then comment and motivate several of its -features as we consider various applications. For now, we'll motivate -the tower notation by thinking about box types. In the discussion of -monads, we've thought of monadic types as values inside of a box. The -box will often contain information in addition to the core object. -For instance, in the Reader monad, a boxed int contains an expression -of type int as the payload, but also contains a function that -manipulates a list of information. It is natural to imagine -separating a box into two regions, the payload and the hidden scratch -space: - -
-    _______________               _______________            _______________ 
-    | [x->2, y->3] |	          | [x->2, y->3] |          | [x->2, y->3] |
-  ------------------- 	         ------------------        ------------------
-    |              |     ¢        |              |    =     |              |
-    |    +2        |	          |     y        |          |     5        |
-    |______________|	          |______________|          |______________|
-
- -For people who are familiar with Discourse Representation Theory (Kamp -1981, Kamp and Reyle 1993), this separation of boxes into payload and -discourse scorekeeping will be familiar (although many details differ). - -The general pattern is that monadic treatments separate computation -into an at-issue (pre-monadic) computation with a layer at which -side-effects occur. - -The tower notation is a precise way of articulating continuation-based -computations into a payload and (potentially multiple) layers of -side-effects. Visually, we won't keep the outer box, but we will keep -the horizontal line dividing main effects from side-effects. - -Tower convention for types: -
-                                              γ | β
-    (α -> β) -> γ can be equivalently written ----- 
-                                                α
-
- -Read these types counter-clockwise starting at the bottom. - -Tower convention for values: -
-                                           g[] 
-    \k.g[k(x)] can be equivalently written ---
-                                            x
-
- -If \k.g[k(x)] has type (α -> β) -> γ, then k has type (α -> β). - -Here "g[ ]" is a *context*, that is, an expression with (exactly) one -hole in it. For instance, we might have g[x] = \forall x.P[x]. - -We'll use a simply-typed system with two atomic types, DP (the type of -individuals) and S (the type of truth values). - -## LIFT (mid) - -Then in the spirit of monadic thinking, we'll have a way of lifting an -arbitrary value into the tower system: - - [] β|β - LIFT (x:α) = \k.kx : (α -> β) -> β == -- : --- - x α - -Obviously, LIFT is exactly the midentity (the unit) for the continuation monad. -Notice that LIFT requires the result type of the continuation argument -and the result type of the overall expression to match (here, both are β). - -The name LIFT comes from Partee's 1987 theory of type-shifters for -determiner phrases. Importantly, LIFT applied to an -individual-denoting expression yields the generalized quantifier -proposed by Montague as the denotation for proper names: - - [] S|S - LIFT (j:DP) = \k.kj : (DP -> S) -> S == -- : --- - j DP - -So if the proper name *John* denotes the individual j, LIFT(j) is the -generalized quantifier that maps each property k of type DP -> S to true -just in case kj is true. - -Crucially for the discussion here, LIFT does not apply only to DPs, as -in Montague and Partee, but to any expression whatsoever. For -instance, here is LIFT applied to a lexical verb phrase: - - [] S|S - LIFT (left:DP->S) = \k.kx : ((DP->S) -> S) -> S == ---- : --- - left DP - -Once we have expressions of type (α -> β) -> γ, we'll need to combine -them. We'll use the ¢ operator from the continuation monad: - - g[] γ | δ h[] δ | ρ g[h[]] γ | ρ - --- : ------- ¢ --- : ----- == ------ : ----- - f α -> β x α fx β - -Note that the types below the horizontal line combine just like -functional application (i.e, f:(α->β) (x:α) = fx:β). - -## Not quite a monad - -This discussion is based on Wadler's paper `Composable continuations'. - -The unit and the combination rule work very much like we are used to -from the various monads we've studied. - -In particular, we can easily see that the ¢ operator defined just -above is exactly the same as the ¢ operator from the continuation monad: - - ¢ (\k.g[kf]) (\k.h[kx]) - = (\MNk.M(\m.N(\n.k(mn)))) (\k.g[kf]) (\k.h[kx]) - ~~> \k.(\k.g[kf])(\m.(\k.h[kx])(\n.k(mn)) - ~~> \k.g[(\k.h[kx])(\n.k(fn)) - ~~> \k.g[h[k(fx)]] - - g[h[]] - == ------ - fx - -However, these continuations do not form an official monad. -One way to see this is to consider the type of the bind operator. -The type of the bind operator in a genuine monad is - - mbind : Mα -> (α -> Mβ) -> Mβ - -In particular, the result type of the second argument (Mβ) must be -identical to the result type of the bind operator overall. For the -continuation monad, this means that mbind has the following type: - - ((α -> γ) -> ρ) - -> α -> ((β -> δ) -> ρ) - -> ((β -> δ) -> ρ) - -But the type of the bind operator in our generalized continuation -system (call it "kbind") is - - kbind : - ((α -> γ) -> ρ) - -> α -> ((β -> δ) -> γ) - -> ((β -> δ) -> ρ) - -Note that `(β -> δ) -> γ` is not the same as `(β -> δ) -> ρ`. - -These more general types work out fine when plugged into the -continuation monad's bind operator: - - kbind u f = \k. u (\x. f x k ) - β->δ (α->γ)->ρ α α->(β->δ)->γ α β->δ - ----------------- - (β->δ)->γ - ------------------------ - γ - -------------------- - α->γ - --------------------- - ρ - --------------- - (β->δ)->ρ - -Neverthless, it's easy to see that the generalized continuation system -obeys the monad laws. We haven't spent much time proving monad laws, -so this seems like a worthy occasion on which to give some details. -Since we're working with bind here, we'll use the version of the monad -laws that are expressed in terms of bind: - - Prove u >>= ⇧ == u: - - u >>= (\ak.ka) - == (\ufk.u(\x.fxk)) u (\ak.ka) - ~~> \k.u(\x.(\ak.ka)xk) - ~~> \k.u(\x.kx) - ~~> \k.uk - ~~> u - -The last two steps are eta reductions. - - Prove ⇧a >>= f == f a: - - ⇧a >>= f - == (\ak.ka)a >>= f - ~~> (\k.ka) >>= f - == (\ufk.u(\x.fxk)) (\k.ka) f - ~~> \k.(\k.ka)(\x.fxk) - ~~> \k.fak - ~~> fa - - Prove u >>= (\a.fa >>= g) == (u >>= f) >>= g: - - u >>= (\a.fa >>= g) - == u >>= (\a.(\k.fa(\x.gxk))) - == (\ufk.u(\x.fxk)) u (\a.(\k.fa(\x.gxk))) - == \k.u(\x.(\a.(\k.fa(\x.gxk)))xk) - ~~> \k.u(\x.fx(\y.gyk)) - - (u >>= f) >>= g - == (\ufk.u(\x.fxk)) u f >>= g - ~~> (\k.u(\x.fxk)) >>= g - == (\ufk.u(\x.fxk)) (\k.u(\x.fxk)) g - ~~> \k.(\k.u(\x.fxk))(\y.gyk) - ~~> \k.u(\x.fx(\y.gyk)) - -The fact that the monad laws hold means that we can rely on any -reasoning that depends on the monad laws. - -## Syntactic refinements: subtypes of implication - -Because natural language allows the functor to be on the left or on -the right, we replace the type arrow -> with a left-leaning version \ -and a right-leaning version, as follows: - - α/β β = α - β β\α = α - -This means (without adding some fancy footwork, as in Charlow 2014) we -need two versions of ¢ too, one for each direction for the unmonadized types. - -Just to be explicit, here are the two versions: - - g[] γ | δ h[] δ | ρ g[h[]] γ | ρ - --- : ------- ¢ --- : ----- == ------ : ----- - f α/β x α fx β - - h[] δ | ρ g[] γ | δ g[h[]] γ | ρ - --- : ----- ¢ --- : ------- == ------ : ----- - x α f β\α fx β - -With respect to types, they differ only in replacing α -> β with α/β -(top version) and with β\α (bottom version). With respect to -syntactic order, they differ in a parallel way with respect to whether -the function f is on the left of its argument x (top version) or on -the right (bottom version). - -Logically, separating -> into \\ and / corresponds to rejecting the -structural rule of exchange. - -Note that the \\ and / only govern types at the bottom of the tower. -That is, we currently still have arrow at the higher-order levels of -the types, if we undo the tower notation: - - γ|δ - --- == ((α/β) -> δ) -> γ - α/β - -We'll change these arrows into left-leaning and right-leaning versions -too, according to the following scheme: - - γ|δ - --- == γ//((α/β) \\ δ) - α/β - -As we'll see in a minute, each of these for implications (\\, /, \\\\, -//) will have a syntactic interpretation: - - \ argument is adjacent on the left of the functor - / argument is adjacent on the right of the functor - \\ argument is surrounded by the functor - // argument surrounds the functor - -## LOWER (reset) - -One more ingredient: one thing that shifty continuation operators -require is an upper bound, a reset to show the extent of the remainder -of the computation that the shift operator captures. In the list -version of the doubling task, we have - - "a#bdeSfg" ~~> "abdebdefg" continuation captured: bde - "ab#deSfg" ~~> "abdedefg" continuation captured: de - -In programming languages, resets are encoded in the computation -explicitly. In natural language, resets are always invisible. -We'll deal with this in the natural language setting by allowing -spontaneous resets, as long as the types match the following recipe: - - g[] α|S - LOWER (---:---) == g[p]:α - p S - -This will be easiest to explain by presenting our first complete -example from natural language: - -