Applications of continuations to natural language

We've seen a number of applications of monads to natural language, including presupposition projection, binding, intensionality, and the dynamics of the GSV fragment.

In the past couple of weeks, we've introduced continuations, first as a functional programming technique, then in terms of list and tree zippers, then as a monad. In this lecture, we will generalize continuations slightly beyond a monad, and then begin to outline some of the applications of the generalized continuations.

Many (though not all) of the applications are discussed in detail in Barker and Shan 2014, Continuations in Natural Language, OUP.

To review, in terms of list zippers, the continuation of a focused element in the list is the front part of the list.

list zipper for the list [a;b;c;d;e;f] with focus on d:

    ([c;b;a], [d;e;f])

In terms of tree zippers, the continuation is the entire context of the focused element--the entire rest of the tree.

[drawing of a tree zipper]

We explored continuations first in a list setting, then in a tree setting, using the doubling task as an example.

"abSd" ~~> "ababd"
"ab#deSfg" ~~> "abdedefg"

The "S" functions like a shifty operator, and "#" functions like a reset.

Although the list version of the doubling task was easy to understand thoroughly, the tree version was significantly more challenging. In particular, it remained unclear why

"aScSe" ~~> "aacaceecaacaceecee"

We'll burn through that conceptual fog today by learning more about how to work with continuations.

The natural thing to try would have been to defunctionalize the continuation-based solution using a tree zipper. But that would not have been easy, since the natural way to implement the doubling behavior of the shifty operator would have been to simply copy the context provided by the zipper. This would have produced two uncoordinated copies of the other shifty operator, and we'd have been in the situation described in class of having a reduction strategy that never reduced the number of shifty operators below 2. The limitation is that zippers by themselves don't provide a natural way to establish a dependency between two distant elements of a data structure. (There are ways around this limitation of tree zippers, but they are essentially equivalent to the technique given just below.)

Instead, we'll re-interpreting what the continuation monad was doing in more or less defunctionalized terms by using Quantifier Raising, a technique from linguistics.

But first, motivating quantifier scope as a linguistic application.

The primary application of continuations to natural language: scope-taking

We have seen that continuations allow a deeply-embedded element to take control over (a portion of) the entire computation that contains it. In natural language semantics, this is exactly what it means for a scope-taking expression to take scope.

1. [Ann put a copy of [everyone]'s homeworks in her briefcase]

2. For every x, [Ann put a copy of x's homeworks in her briefcase]

The sentence in (1) can be paraphrased as in (2), in which the quantificational DP everyone takes scope over the rest of the sentence. Even if you suspect that there could be an analysis of (2) on which "every student's term paper" could denote some kind of mereological fusion of a set of papers, it is much more difficult to be satisfied with a referential analysis when every student is replaced with no student, or fewer than three students, and so on---see any semantics text book for abundant discussion.

We can arrive at an analysis by expressing the meaning of quantificational DP such as everyone using continuations:

3. everyone = shift (\k.∀x.kx)

Assuming there is an implicit reset at the top of the sentence (we'll explicitly address determining where there is or isn't a reset), the reduction rules for shift will apply the handler function (\k.∀x.kx) to the remainder of the sentence after abstracting over the position of the shift expression:

[Ann put a copy of [shift (\k.∀x.kx)]'s homeworks in her briefcase]
~~> (\k.∀x.kx) (\v. Ann put a copy of v's homeworks in her briefcase)
~~> ∀x. Ann put a copy of x's homeworks in her briefcase

(To be a bit pedantic, this reduction sequence is more suitable for shift0 than for shift, but we're not being fussy here about subflavors of shifty operators.)

The standard technique for handling scope-taking in linguistics is Quantifier Raising (QR). As you might suppose, the rule for Quantifier Raising closely resembles the reduction rule for shift:

Quantifier Raising: given a sentence of the form

         [... [QDP] ...],

build a new sentence of the form

[QDP (\x.[... [x] ...])].  

Here, QDP is a scope-taking quantificational DP.

Just to emphasize the similarity between QR and shift, we can use QR to provide insight into the tree version of the doubling task that mystified us earlier. Here's the starting point:

|    |
a  __|___
   |    |
   S  __|__
      |   |
      d  _|__
         |  |
         S  e

First we QR the lower shift operator, replacing it with a variable and abstracting over that variable.

|     |
S  ___|___
   |     |
   \x  __|___
       |    |
       a  __|___
          |    |
          S  __|__
             |   |
             d  _|__
                |  |
                x  e

Next, we QR the upper shift operator

|     |
S  ___|____
   |      |
   \y  ___|___
       |     |
       S  ___|___
          |     |
          \x  __|___
              |    |
              a  __|___
                 |    |
                 y  __|__
                    |   |
                    d  _|__
                       |  |
                       x  e

We then evaluate, using the same value for the shift operator proposed before:

S = shift = \k.k(k "")

It will be easiest to begin evaluating this tree with the lower shift operator (we get the same result if we start with the upper one). The relevant value for k is (\x.a(y(d(x e)))). Then k "" is a(y(d(""(e)))), and k(k "") is a(y(d((a(y(d(""(e)))))(e)))). In tree form:

|     |
S  ___|____
   |      |
   \y  ___|___
       |     |
       a  ___|___
          |     |
          y  ___|___
             |     |
             d  ___|___
                |     |
              __|___  e
              |    |
              a  __|___
                 |    |
                 y  __|___
                    |    |
                    d  __|__
                       |   |
                       ""  e

Repeating the process for the upper shift operator replaces each occurrence of y with a copy of the whole tree.

|           |
a  _________|__________
   |                  |
   |               ___|___
___|___            |     |
|     |            d  ___|____
a  ___|____           |      |
   |      |        ___|____  e
   ""  ___|___     |      |
       |     |     a  ____|_____
       d  ___|___     |        |
          |     |     |      __|___
       ___|___  e  ___|___   |    |
       |     |     |     |   d  __|__
       a  ___|___  a  ___|____  |   |
          |     |     |      |  ""  e
          ""  __|___  ""  ___|___
              |    |      |     |
              d  __|__    d  ___|___
                 |   |       |     |
                 ""  e    ___|___  e
                          |     |
                          a  ___|___
                             |     |
                             ""  __|___
                                 |    |
                                 d  __|__
                                    |   |
                                    ""  e

The yield of this tree (the sequence of leaf nodes) is aadadeedaadadeedee, which is the expected output of the double-shifted tree.

Exercise: the result is different, by the way, if the QR occurs in the opposite order.

Three lessons:

  • Generalizing from one-sided, list-based continuation operators to two-sided, tree-based continuation operators is a dramatic increase in power and complexity. (De Groote's dynamic montague semantics continuations are the one-sided, list-based variety.)

  • Operators that compose multiple copies of a context can be hard to understand (though keep this in mind when we see the continuations-based analysis of coordination, which involves context doubling).

  • When considering two-sided, tree-based continuation operators, quantifier raising is a good tool for visualizing (defunctionalizing) the computation.

Tower notation

At this point, we have three ways of representing computations involving control operators such as shift and reset: using a CPS transform, lifting into a continuation monad, and by using QR.

QR is the traditional system in linguistics, but it will not be adequate for us in general. The reason has to do with evaluation order. As we've discussed, especially with respect to the CPS transform, continuations allow fine-grained control over the order of evaluation. One of the main empirical claims of Barker and Shan 2014 is that natural language is sensitive to evaluation order. Unlike other presentations of continuations, QR does not lend itself to reasoning about evaluation order, so we will need to use a different strategy.

[Note to self: it is interesting to consider what it would take to reproduce the analyses giving in Barker and Shan in purely QR terms. Simple quantificational binding using parasitic scope should be easy, but how reconstruction would work is not so clear.]

Introducting the tower notation

We'll present tower notation, then comment and motivate several of its features as we consider various applications. For now, we'll motivate the tower notation by thinking about box types. In the discussion of monads, we've thought of monadic types as values inside of a box. The box will often contain information in addition to the core object. For instance, in the Reader monad, a boxed int contains an expression of type int as the payload, but also contains a function that manipulates a list of information. It is natural to imagine separating a box into two regions, the payload and the hidden scratch space:

    _______________               _______________            _______________ 
    | [x->2, y->3] |              | [x->2, y->3] |          | [x->2, y->3] |
  -------------------            ------------------        ------------------
    |              |     ¢        |              |    =     |              |
    |    +2        |              |     y        |          |     5        |
    |______________|              |______________|          |______________|

For people who are familiar with Discourse Representation Theory (Kamp 1981, Kamp and Reyle 1993), this separation of boxes into payload and discourse scorekeeping will be familiar (although many details differ).

The general pattern is that monadic treatments separate computation into an at-issue (pre-monadic) computation with a layer at which side-effects occur.

The tower notation is a precise way of articulating continuation-based computations into a payload and (potentially multiple) layers of side-effects. Visually, we won't keep the outer box, but we will keep the horizontal line dividing main effects from side-effects.

Tower convention for types:

                                              γ | β
    (α -> β) -> γ can be equivalently written ----- 

Read these types counter-clockwise starting at the bottom.

Tower convention for values:

    \k.g[k(x)] can be equivalently written ---

If \k.g[k(x)] has type (α -> β) -> γ, then k has type (α -> β).

Here "g[ ]" is a context, that is, an expression with (exactly) one hole in it. For instance, we might have g[x] = \forall x.P[x].

We'll use a simply-typed system with two atomic types, DP (the type of individuals) and S (the type of truth values).

LIFT (mid)

Then in the spirit of monadic thinking, we'll have a way of lifting an arbitrary value into the tower system:

                                       []   β|β
LIFT (x:α) = \k.kx : (α -> β) -> β ==  -- : ---
                                       x     α

Obviously, LIFT is exactly the midentity (the unit) for the continuation monad. Notice that LIFT requires the result type of the continuation argument and the result type of the overall expression to match (here, both are β).

The name LIFT comes from Partee's 1987 theory of type-shifters for determiner phrases. Importantly, LIFT applied to an individual-denoting expression yields the generalized quantifier proposed by Montague as the denotation for proper names:

                                        []   S|S 
LIFT (j:DP) = \k.kj : (DP -> S) -> S == -- : ---
                                        j    DP

So if the proper name John denotes the individual j, LIFT(j) is the generalized quantifier that maps each property k of type DP -> S to true just in case kj is true.

Crucially for the discussion here, LIFT does not apply only to DPs, as in Montague and Partee, but to any expression whatsoever. For instance, here is LIFT applied to a lexical verb phrase:

                                                    []    S|S 
LIFT (left:DP->S) = \k.kx : ((DP->S) -> S) -> S == ---- : ---
                                                   left   DP

Once we have expressions of type (α -> β) -> γ, we'll need to combine them. We'll use the ¢ operator from the continuation monad:

g[]    γ | δ      h[]   δ | ρ    g[h[]]   γ | ρ
--- : -------  ¢  --- : ----- == ------ : -----
f     α -> β      x       α        fx       β

Note that the types below the horizontal line combine just like functional application (i.e, f:(α->β) (x:α) = fx:β).

Not quite a monad

This discussion is based on Wadler's paper `Composable continuations'.

The unit and the combination rule work very much like we are used to from the various monads we've studied.

In particular, we can easily see that the ¢ operator defined just above is exactly the same as the ¢ operator from the continuation monad:

  ¢ (\k.g[kf]) (\k.h[kx])
= (\MNk.M(\m.N(\n.k(mn)))) (\k.g[kf]) (\k.h[kx])
~~> \k.(\k.g[kf])(\m.(\k.h[kx])(\n.k(mn))
~~> \k.g[(\k.h[kx])(\n.k(fn))
~~> \k.g[h[k(fx)]]

== ------

However, these continuations do not form an official monad. One way to see this is to consider the type of the bind operator. The type of the bind operator in a genuine monad is

mbind : Mα -> (α -> Mβ) -> Mβ

In particular, the result type of the second argument (Mβ) must be identical to the result type of the bind operator overall. For the continuation monad, this means that mbind has the following type:

                 ((α -> γ) -> ρ)
         -> α -> ((β -> δ) -> ρ)
              -> ((β -> δ) -> ρ)

But the type of the bind operator in our generalized continuation system (call it "kbind") is

kbind : 
                 ((α -> γ) -> ρ)
         -> α -> ((β -> δ) -> γ)
              -> ((β -> δ) -> ρ)

Note that (β -> δ) -> γ is not the same as (β -> δ) -> ρ.

These more general types work out fine when plugged into the continuation monad's bind operator:

kbind u f = \k.   u           (\x.     f              x    k      )
             β->δ (α->γ)->ρ     α      α->(β->δ)->γ   α    β->δ

Neverthless, it's easy to see that the generalized continuation system obeys the monad laws. We haven't spent much time proving monad laws, so this seems like a worthy occasion on which to give some details. Since we're working with bind here, we'll use the version of the monad laws that are expressed in terms of bind:

Prove u >>= ⇧ == u:

   u >>= (\ak.ka)
== (\ufk.u(\x.fxk)) u (\ak.ka)
~~> \k.u(\x.(\ak.ka)xk)
~~> \k.u(\x.kx)
~~> \
~~> u

The last two steps are eta reductions.

Prove ⇧a >>= f == f a:

   ⇧a >>= f
== (\ak.ka)a >>= f
~~> (\k.ka) >>= f
== (\ufk.u(\x.fxk)) (\k.ka) f
~~> \k.(\k.ka)(\x.fxk)
~~> \k.fak
~~> fa

Prove u >>= (\a.fa >>= g) == (u >>= f) >>= g:

   u >>= (\a.fa >>= g)
== u >>= (\a.(\k.fa(\x.gxk)))
== (\ufk.u(\x.fxk)) u (\a.(\k.fa(\x.gxk)))
== \k.u(\x.(\a.(\k.fa(\x.gxk)))xk)
~~> \k.u(\x.fx(\y.gyk))

   (u >>= f) >>= g
== (\ufk.u(\x.fxk)) u f >>= g
~~> (\k.u(\x.fxk)) >>= g
== (\ufk.u(\x.fxk)) (\k.u(\x.fxk)) g
~~> \k.(\k.u(\x.fxk))(\y.gyk)
~~> \k.u(\x.fx(\y.gyk))

The fact that the monad laws hold means that we can rely on any reasoning that depends on the monad laws.

Syntactic refinements: subtypes of implication

Because natural language allows the functor to be on the left or on the right, we replace the type arrow -> with a left-leaning version \ and a right-leaning version, as follows:

α/β   β    = α
  β   β\α  = α

This means (without adding some fancy footwork, as in Charlow 2014) we need two versions of ¢ too, one for each direction for the unmonadized types.

Just to be explicit, here are the two versions:

g[]    γ | δ      h[]   δ | ρ    g[h[]]   γ | ρ
--- : -------  ¢  --- : ----- == ------ : -----
f       α/β        x       α        fx       β

h[]   δ | ρ      g[]    γ | δ       g[h[]]   γ | ρ
--- : -----  ¢   --- : -------   == ------ : -----
 x      α         f      β\α          fx       β

With respect to types, they differ only in replacing α -> β with α/β (top version) and with β\α (bottom version). With respect to syntactic order, they differ in a parallel way with respect to whether the function f is on the left of its argument x (top version) or on the right (bottom version).

Logically, separating -> into \ and / corresponds to rejecting the structural rule of exchange.

Note that the \ and / only govern types at the bottom of the tower. That is, we currently still have arrow at the higher-order levels of the types, if we undo the tower notation:

--- == ((α/β) -> δ) -> γ

We'll change these arrows into left-leaning and right-leaning versions too, according to the following scheme:

--- == γ//((α/β) \\ δ)

As we'll see in a minute, each of these for implications (\, /, \\, //) will have a syntactic interpretation:

\   argument is adjacent on the left of the functor
/    argument is adjacent on the right of the functor
\\   argument is surrounded by the functor
//  argument surrounds the functor

LOWER (reset)

One more ingredient: one thing that shifty continuation operators require is an upper bound, a reset to show the extent of the remainder of the computation that the shift operator captures. In the list version of the doubling task, we have

"a#bdeSfg" ~~> "abdebdefg"   continuation captured: bde
"ab#deSfg" ~~> "abdedefg"    continuation captured:  de

In programming languages, resets are encoded in the computation explicitly. In natural language, resets are always invisible. We'll deal with this in the natural language setting by allowing spontaneous resets, as long as the types match the following recipe:

       g[] α|S
LOWER (---:---) == g[p]:α
        p   S

At this point, it should be clear how the approach in the seminar relates to the system developed in Barker and Shan 2014. Many applications of continuations to natural langauge are developed in detail there, including

  • Scope-taking
  • Quantificational binding
  • Weak crossover
  • Generalized coordination
  • Dynamic Binding
  • WH-movement as delayed binding
  • Semantic reconstruction effects
  • Linear order effects in negative polarity licensing
  • Donkey anaphora

and much more.