X-Git-Url: http://lambda.jimpryor.net/git/gitweb.cgi?p=lambda.git;a=blobdiff_plain;f=week2.mdwn;h=49a7e8d5af7632592a80b80b6351058bb97c1a7c;hp=9bb13d463c86245d6eb956f81b6528b359eb6b47;hb=f5b1ef726e2cfa444df2b5fc50562822a90f0b03;hpb=7e0ffa15ed30fe95e208e4086d4cbcb92ee3d237 diff --git a/week2.mdwn b/week2.mdwn index 9bb13d46..49a7e8d5 100644 --- a/week2.mdwn +++ b/week2.mdwn @@ -1,700 +1,444 @@ +[[!toc]] + +Substitution and Alpha-Conversion +================================= + +Intuitively, (a) and (b) express the application of the same function to the argument `y`: + +
+
1. (\x. \z. z x) y +
2. (\x. \y. y x) y +
+ +One can't just rename variables freely. (a) and (b) are different than what's expressed by: + +
+
1. (\z. (\z. z z) y +
+ + +Substituting `y` into the body of (a) `(\x. \z. z x)` is unproblematic: + + (\x. \z. z x) y ~~> \z. z y + +However, with (b) we have to be more careful. If we just substituted blindly, then we might take the result to be `\y. y y`. But this is the self-application function, not the function which accepts an arbitrary argument and applies that argument to the free variable `y`. In fact, the self-application function is what (c) reduces to. So if we took (b) to reduce to `\y. y y`, we'd wrongly be counting (b) to be equivalent to (c), instead of (a). + +To reduce (b), then, we need to be careful to that no free variables in what we're substituting in get captured by binding λs that they shouldn't be captured by. + +In practical terms, you'd just replace (b) with (a) and do the unproblematic substitution into (a). + +How should we think about the explanation and justification for that practical procedure? + +One way to think about things here is to identify expressions of the lambda calculus with *particular alphabetic sequences*. Then (a) and (b) would be distinct expressions, and we'd have to have an explicit rule permitting us to do the kind of variable-renaming that takes us from (a) to (b) (or vice versa). This kind of renaming is called "alpha-conversion." Look in the standard treatments of the lambda calculus for detailed discussion of this. + +Another way to think of it is to identify expressions not with particular alphabetic sequences, but rather with *classes* of alphabetic sequences, which stand to each other in the way that (a) and (b) do. That's the way we'll talk. We say that (a) and (b) are just typographically different notations for a *single* lambda formula. As we'll say, the lambda formula written with (a) and the lambda formula written with (b) are literally syntactically identical. + +A third way to think is to identify the lambda formula not with classes of alphabetic sequences, but rather with abstract structures that we might draw like this: + +

+	(λ. λ. _ _) y
+     ^  ^  | |
+     |  |__| |
+     |_______|
+
+ +Here there are no bound variables, but there are *bound positions*. We can regard formula like (a) and (b) as just helpfully readable ways to designate these abstract structures. + +A version of this last approach is known as **de Bruijn notation** for the lambda calculus. + +It doesn't seem to matter which of these approaches one takes; the logical properties of the systems are exactly the same. It just affects the particulars of how one states the rules for substitution, and so on. And whether one talks about expressions being literally "syntactically identical," or whether one instead counts them as "equivalent modulu alpha-conversion." + +(Linguistic trivia: however, some linguistic discussions do suppose that alphabetic variance has important linguistic consequences; see Ivan Sag's dissertation.) + +In a bit, we'll discuss other systems that lack variables. Those systems will not just lack variables in the sense that de Bruijn notation does; they will furthermore lack any notion of a bound position. + + Syntactic equality, reduction, convertibility ============================================= -Define T to be `(\x. x y) z`. Then T and `(\x. x y) z` are syntactically equal, and we're counting them as syntactically equal to `(\z. z y) z` as well. We write: +Define N to be `(\x. x y) z`. Then N and `(\x. x y) z` are syntactically equal, and we're counting them as syntactically equal to `(\z. z y) z` as well, which we will write as: -
-T ≡ `(\x. x y) z` ≡ `(\z. z y) z`
-
+
N ≡ (\x. x y) z ≡ (\z. z y) z
+
This: -
-T ~~> `z y`
-
+ N ~~> z y -means that T beta-reduces to `z y`. This: +means that N beta-reduces to `z y`. This: -
-M <~~> T
-
+ M <~~> N -means that M and T are beta-convertible, that is, that there's something they both reduce to in zero or more steps. +means that M and N are beta-convertible, that is, that there's something they both reduce to in zero or more steps. Combinators and Combinatorial Logic =================================== Lambda expressions that have no free variables are known as **combinators**. Here are some common ones: -
-**I** is defined to be `\x x`

-**K** is defined to be `\x y. x`, That is, it throws away its second argument. So `K x` is a constant function from any (further) argument to `x`. ("K" for "constant".) Compare K to our definition of **true**.

-**get-first** was our function for extracting the first element of an ordered pair: `\fst snd. fst`. Compare this to K and true as well.

-**get-second** was our function for extracting the second element of an ordered pair: `\fst snd. snd`. Compare this to our definition of false.

-**ω** is defined to be: `\x. x x`

-

+> **I** is defined to be `\x x` -It's possible to build a logical system equally powerful as the lambda calculus (and straightforwardly intertranslatable with it) using just combinators, considered as atomic operations. Such a language doesn't have any variables in it: not just no free variables, but no variables at all. +> **K** is defined to be `\x y. x`. That is, it throws away its + second argument. So `K x` is a constant function from any + (further) argument to `x`. ("K" for "constant".) Compare K + to our definition of `true`. -One can do that with a very spare set of basic combinators. These days the standard base is just three combinators: K and I from above, and also one more, S, which behaves the same as the lambda expression `\f g x. f x (g x)`. behaves.But it's possible to be even more minimalistic, and get by with only a single combinator. (And there are different single-combinator bases you can choose.) +> **get-first** was our function for extracting the first element of an ordered pair: `\fst snd. fst`. Compare this to K and `true` as well. -These systems are Turing complete. In other words: every computation we know how to describe can be represented in a logical system consisting of only a single primitive operation! +> **get-second** was our function for extracting the second element of an ordered pair: `\fst snd. snd`. Compare this to our definition of `false`. -Here's more to read about combinatorial logic: +> **B** is defined to be: `\f g x. f (g x)`. (So `B f g` is the composition `\x. f (g x)` of `f` and `g`.) -MORE +> **C** is defined to be: `\f x y. f y x`. (So `C f` is a function like `f` except it expects its first two arguments in swapped order.) -Evaluation strategies -===================== +> **W** is defined to be: `\f x . f x x`. (So `W f` accepts one argument and gives it to `f` twice. What is the meaning of `W multiply`?) -In the assignment we asked you to reduce various expressions until it wasn't possible to reduce them any further. For two of those expressions, this was impossible to do. One of them was this: +> **ω** (that is, lower-case omega) is defined to be: `\x. x x` - (\x. x x) (\x. x x) +It's possible to build a logical system equally powerful as the lambda calculus (and readily intertranslatable with it) using just combinators, considered as atomic operations. Such a language doesn't have any variables in it: not just no free variables, but no variables at all. -As we saw above, each of the halves of this formula are the combinator ω; so this can also be written: +One can do that with a very spare set of basic combinators. These days the standard base is just three combinators: K and I from above, and also one more, **S**, which behaves the same as the lambda expression `\f g x. f x (g x)`. behaves. But it's possible to be even more minimalistic, and get by with only a single combinator. (And there are different single-combinator bases you can choose.) -
ω ω
+There are some well-known linguistic applications of Combinatory +Logic, due to Anna Szabolcsi, Mark Steedman, and Pauline Jacobson. +They claim that natural language semantics is a combinatory system: that every +natural language denotation is a combinator. -This compound expression---the self-application of ω---is named Ω. It has the form of an application of an abstract (ω) to an argument (which also happens to be ω), so it's a redex and can be reduced. But when we reduce it, we get ω ω again. So there's no stage at which this expression has been reduced to a point where it can't be reduced any further. In other words, evaluation of this expression "never terminates." (This is standard language, however it has the unfortunate connotation that evaluation is a process or operation that is performed in time. You shouldn't think of it like that. Evaluation of this expression "never terminates" in the way that the decimal expansion of π never terminates. This are static, atemporal facts about their mathematical properties.) +For instance, Szabolcsi argues that reflexive pronouns are argument +duplicators. -There are infinitely many formulas in the lambda calculus that have this same property. Ω is the syntactically simplest of them. In our meta-theory, it's common to assign such formula a special value, , pronounced "bottom." When we get to discussing types, you'll see that this value is counted as belonging to every type. To say that a formula has the bottom value means that the computation that formula represents never terminates and so doesn't evaluate to any orthodox value. +![reflexive](http://lambda.jimpryor.net/szabolcsi-reflexive.jpg) -From a "Fregean" or "weak Kleene" perspective, if any component of an expression fails to be evaluable (to an orthodox, computed value), then the whole expression should be unevaluable as well. +Notice that the semantic value of *himself* is exactly `W`. +The reflexive pronoun in direct object position combines with the transitive verb. The result is an intransitive verb phrase that takes a subject argument, duplicates that argument, and feeds the two copies to the transitive verb meaning. -However, in some such cases it seems *we could* sensibly carry on evaluation. For instance, consider: +Note that `W <~~> S(CI)`: -

-(\x. y) (ω ω)
-
+
S(CI) ≡
+S((\fxy.fyx)(\x.x)) ~~>
+S(\xy.(\x.x)yx) ~~>
+S(\xy.yx) ≡
+(\fgx.fx(gx))(\xy.yx) ~~>
+\gx.(\xy.yx)x(gx) ~~>
+\gx.(gx)x ≡
+W
-Should we count this as unevaluable, because the reduction of ω ω never terminates? Or should we count it as evaluating to `y`? +Ok, here comes a shift in thinking. Instead of defining combinators as equivalent to certain lambda terms, +we can define combinators by what they do. If we have the I combinator followed by any expression X, +I will take that expression as its argument and return that same expression as the result. In pictures, -This question highlights that there are different choices to make about how evaluation or computation proceeds. It's helpful to think of three questions in this neighborhood: + IX ~~> X -> Q1. When arguments are complex, as ω ω, do we reduce them before or after substituting them into the abstracts to which they are arguments? +Thinking of this as a reduction rule, we can perform the following computation -> Q2. Are we allowed to reduce inside abstracts? That is, can we reduce: + II(IX) ~~> IIX ~~> IX ~~> X -> (\x y. x z) (\x. x) +The reduction rule for K is also straightforward: -> only this far: + KXY ~~> X -> \y. (\x. x) z +That is, K throws away its second argument. The reduction rule for S can be constructed by examining +the defining lambda term: -> or can we continue reducing to: +
S ≡ \fgx.fx(gx)

-(\x. y) (ω ω)
-
ω ω
- ((\x (y x)) z) ~~> (y z) +This compound expression---the self-application of ω---is named Ω. It has the form of an application of an abstract (ω) to an argument (which also happens to be ω), so it's a redex and can be reduced. But when we reduce it, we get ω ω again. So there's no stage at which this expression has been reduced to a point where it can't be reduced any further. In other words, evaluation of this expression "never terminates." (This is the standard language, however it has the unfortunate connotation that evaluation is a process or operation that is performed in time. You shouldn't think of it like that. Evaluation of this expression "never terminates" in the way that the decimal expansion of π never terminates. These are static, atemporal facts about their mathematical properties.) -Different authors use different notations. Some authors use the term "contraction" for a single reduction step, and reserve the term "reduction" for the reflexive transitive closure of that, that is, for zero or more reduction steps. Informally, it seems easiest to us to say "reduction" for one or more reduction steps. So when we write: +There are infinitely many formulas in the lambda calculus that have this same property. Ω is the syntactically simplest of them. In our meta-theory, it's common to assign such formulas a special value, , pronounced "bottom." When we get to discussing types, you'll see that this value is counted as belonging to every type. To say that a formula has the bottom value means that the computation that formula represents never terminates and so doesn't evaluate to any orthodox, computed value. - M ~~> N +From a "Fregean" or "weak Kleene" perspective, if any component of an expression fails to be evaluable (to an orthodox, computed value), then the whole expression should be unevaluable as well. -We'll mean that you can get from M to N by one or more reduction steps. Hankin uses the symbol for one-step contraction, and the symbol for zero-or-more step reduction. Hindley and Seldin use 1 and . +However, in some such cases it seems *we could* sensibly carry on evaluation. For instance, consider: -When M and N are such that there's some P that M reduces to by zero or more steps, and that N also reduces to by zero or more steps, then we say that M and N are **beta-convertible**. We'll write that like this: +

+(\x. y) (ω ω)
+
- M <~~> N +Should we count this as unevaluable, because the reduction of (ω ω) never terminates? Or should we count it as evaluating to `y`? + +This question highlights that there are different choices to make about how evaluation or computation proceeds. It's helpful to think of three questions in this neighborhood: + +> Q1. When arguments are complex, as (ω ω) is, do we reduce them before substituting them into the abstracts to which they are arguments, or later? + +> Q2. Are we allowed to reduce inside abstracts? That is, can we reduce: + +> (\x y. x z) (\x. x) + +> only this far: -This is what plays the role of equality in the lambda calculus. Hankin uses the symbol `=` for this. So too do Hindley and Seldin. Personally, I keep confusing that with the relation to be described next, so let's use this notation instead. Note that `M <~~> N` doesn't mean that each of `M` and `N` are reducible to each other; that only holds when `M` and `N` are the same expression. (Or, with our convention of only saying "reducible" for one or more reduction steps, it never holds.) +> \y. (\x. x) z -In the metatheory, it's also sometimes useful to talk about formulas that are syntactically equivalent *before any reductions take place*. Hankin uses the symbol for this. So too do Hindley and Seldin. We'll use that too, and will avoid using `=` when discussing the metatheory. Instead we'll use `<~~>` as we said above. When we want to introduce a stipulative definition, we'll write it out longhand, as in: +> or can we continue reducing to: +> \y. z -combinators as lambda expressions -combinatorial logic +> Q3. Are we allowed to "eta-reduce"? That is, can we reduce expressions of the form: -tuples = possibly type-heterogenous ordered collections, different length -> different type -lists = type-homogenous ordered collections, lists of different lengths >=0 can be of same type +> \x. M x +> where x does not occur free in `M`, to `M`? +With regard to Q3, it should be intuitively clear that `\x. M x` and `M` will behave the same with respect to any arguments they are given. It can also be proven that no other functions can behave differently with respect to them. However, the logical system you get when eta-reduction is added to the proof theory is importantly different from the one where only beta-reduction is permitted. +If we answer Q2 by permitting reduction inside abstracts, and we also permit eta-reduction, then where none of y1, ..., yn occur free in M, this: -1. Substitution; using alpha-conversion and other strategies -1. Conversion versus reduction - -1. Different evaluation strategies (call by name, call by value, etc.) -1. Strongly normalizing vs weakly normalizing vs non-normalizing; Church-Rosser Theorem(s) -1. Lambda calculus compared to combinatorial logic

-1. Church-like encodings of numbers, defining addition and multiplication -1. Defining the predecessor function; alternate encodings for the numbers -1. Homogeneous sequences or "lists"; how they differ from pairs, triples, etc. -1. Representing lists as pairs -1. Representing lists as folds -1. Typical higher-order functions: map, filter, fold

-1. Recursion exploiting the fold-like representation of numbers and lists ([[!wikipedia Deforestation (computer science)]], [[!wikipedia Zipper (data structure)]]) -1. General recursion using omega +

\x y1... yn. M y1... yn
-1. Eta reduction and "extensionality" ?? -Undecidability of equivalence +will eta-reduce by n steps to: -There is no algorithm which takes as input two lambda expressions and outputs TRUE or FALSE depending on whether or not the two expressions are equivalent. This was historically the first problem for which undecidability could be proven. As is common for a proof of undecidability, the proof shows that no computable function can decide the equivalence. Church's thesis is then invoked to show that no algorithm can do so. + \x. M -Church's proof first reduces the problem to determining whether a given lambda expression has a normal form. A normal form is an equivalent expression which cannot be reduced any further under the rules imposed by the form. Then he assumes that this predicate is computable, and can hence be expressed in lambda calculus. Building on earlier work by Kleene and constructing a GÃ¶del numbering for lambda expressions, he constructs a lambda expression e which closely follows the proof of GÃ¶del's first incompleteness theorem. If e is applied to its own GÃ¶del number, a contradiction results. +When we add eta-reduction to our proof system, we end up reconstruing the meaning of `~~>` and `<~~>` and "normal form", all in terms that permit eta-reduction as well. Sometimes these expressions will be annotated to indicate whether only beta-reduction is allowed (~~>β) or whether both beta- and eta-reduction is allowed (~~>βη). +The logical system you get when eta-reduction is added to the proof system has the following property: +> if `M`, `N` are normal forms with no free variables, then M ≡ N iff `M` and `N` behave the same with respect to every possible sequence of arguments. -1. The Y combinator(s); more on evaluation strategies

-1. Introducing the notion of a "continuation", which technique we'll now already have used a few times +This implies that, when `M` and `N` are (closed normal forms that are) syntactically distinct, there will always be some sequences of arguments L1, ..., Ln such that: +

M L1 ... Ln x y ~~> x
+N L1 ... Ln x y ~~> y
+