week1.mdwn

   1 These notes will recapitulate, make more precise, and to some degree expand what we did in the last hour of our first meeting, leading up to the definitions of the `factorial` and `length` functions.
   2
   3 ### Getting started ###
   4
   5 We begin with a decidable fragment of arithmetic. Our language has some primitive literal values:
   6
   7     0, 1, 2, 3, ...
   8
   9 In fact we could get by with just the primitive literal `0` and the `succ` function, but we will make things a bit more convenient by allowing literal expressions of any natural number. We won't worry about numbers being too big for our finite computers to handle.
  10
  11 We also have some predefined functions:
  12
  13     succ, +, *, pred, -
  14
  15 Again, we might be able to get by with just `succ`, and define the others in terms of it, but we'll be a bit more relaxed. Since we want to stick with natural numbers, not the whole range of integers, we'll make `pred 0` just be `0`, and `2-4` also be `0`.
  16
  17 Here's another set of functions:
  18
  19     ==, <, >, <=, >=, !=
  20
  21 `==` is just what we non-programmers normally express by `=`. It's a relation that holds or not between two values. Here we'll treat it as a function that takes two values as arguments and returns a *boolean* value, that is a truth-value, as a result. The reason for using the doubled `=` symbol is that the single `=` symbol tends to get used in lots of different roles in programming, so we reserve `==` to express this meaning. I will deliberately try to minimize the uses of single `=` in this made-up language (but not eliminate it entirely), to reduce ambiguity and confusion. The `==` relation---or as we're treating it here, the `==` *function* that returns a boolean value---can at least take two numbers as arguments. Probably it makes sense for it to take other kinds of values as arguments, too. For example, it should operate on two truth-values as well. Maybe we'd want it to operate on a number and a truth-value, too? and always return false in that case? What about operating on two functions? Here we encounter the difficulty that the computer can't in general *decide* when two functions are equivalent. Let's not try to sort this all out just yet. We'll suppose that `==` can at least take two numbers as arguments, or two truth-values.
  22
  23 As mentioned in class, we represent the truth-values like this:
  24
  25     'true, 'false
  26
  27 These are instances of a broader class of literal values that I called *symbolic atoms*. We'll return to them shortly. The reason we write them with an initial `'` will also be explained shortly. For now, it's enough to note that the expression:
  28
  29     1 + 2 == 3
  30
  31 evaluates to `'true`, and the expression:
  32
  33     1 + 0 == 3
  34
  35 evaluates to `'false`. Something else that evaluates to `'false` is the simple expression:
  36
  37     'false
  38
  39 That is, literal values are a limiting case of expression, that evaluate to just themselves. More complex expressions like `1 + 0` don't evaluate to themselves, but rather down to literal values.
  40
  41 The functions `succ` and `pred` come before their arguments, like this:
  42
  43     succ 1
  44
  45 On the other hand, the functions `+`, `*`, `-`, `==`, and so on come in between their arguments, like this:
  46
  47     x < y
  48
  49 Functions of this latter sort are said to have an "infix" syntax. This is just a convenience for how we write them. Our language will have to keep rigorous track of which functions have infix syntax and which don't, but we'll just rely on context and our brains to make sense of this for now. Functions with the ordinary, non-infix syntax can take two arguments, as well. If we had defined the less-than relation (boolean function) in that style, we'd write it like this instead:
  50
  51     lessthan? (x, y)
  52
  53 or perhaps like this:
  54
  55     lessthan? x y
  56
  57 We'll get more acquainted with the difference between these next week. For now, I'll just stick to the first form.
  58
  59 Another set of operations we have are:
  60
  61     and, or, not
  62
  63 The first two of these are infix functions that expect two boolean arguments, and gives a boolean result. The third is a function that expects only one boolean argument. Our earlier function `!=` means "doesn't equal", and:
  64
  65     x != y
  66
  67 will in general be just another way to write:
  68
  69     not (x == y)
  70
  71 You see that you can use parentheses in the standard way.
  72
  73 I've started throwing in some variables. We'll say variables are any expression that starts with a lower-case letter, then is followed by a sequence of 0 or more upper- or lower-case letters, or underscores (`_`). Then at the end you can optionally have a `?` or `!` or a sequence of `'`s, understood as "prime" symbols. Hence, all of these are legal variables:
  74
  75     x
  76     x1
  77     x_not_y
  78     xUBERANT
  79     x'
  80     x''
  81     x?
  82     xs
  83
  84 We'll follow a *convention* of using variables with short names and a final `s` to represent collections like sequences (to be discussed below). But this is just a convention to help us remember what we're up to, not a strict rule of the language. We'll also follow a convention of only using variables ending in `?` to represent functions that return a boolean value. Thus, for example, `zero?` will be a function that expects a single number argument and returns a boolean corresponding to whether that number is `0`. `odd?` will be a function that expects a single number argument and returns a boolean corresponding to whether than number is odd. Above, I suggested we might use `lessthan?` to represent a function that expects *two* number arguments, and again returns a boolean result.
  85
  86 We also conventionally reserve variables ending in `!` for a different special class of functions, that we will explain later in the course.
  87
  88 In fact you can think of `succ` and `pred` and `not` and all the rest as also being variables; it's just that these variables have been pre-defined in our language to be bound to special functions we designated in advance. You can even think of `==` and `<` as being variables, too, bound to other functions. But I haven't given you rules yet which would make them legal variables, because they don't start with a lower-case letter. We can make the rules more liberal later.
  89
  90 Only a few things in our language aren't variables. These include the **keywords** like `let` and `case` and so on that we'll discuss below. You can't use `let` as a variable, else the syntax of our language would become too hard to mechanically parse. (And probably too hard for our meager brains to parse, too.)
  91
  92 The rule for symbolic atoms is that a single quote `'` followed by any single word that could be a legal variable is a symbolic atom. Thus `'false` is a symbolic atom, but so too are `'x` and `'succ`. For the time being, I'll restrict myself to only talking about the symbolic atoms `'true` and `'false`. These are a special subgroup of symbolic atoms that we call the *booleans* or *truth-values*. Nothing deep hangs on these being a subclass of a larger category in this way; it just seems elegant. Other languages sometimes make booleans their own special type, not a subclass of any other limited type. Others make them a subclass of the numbers (yuck). We will think of them this way.
  93
  94 Note that in symbolic atoms there is no closing `'`, just a `'` at the beginning. That's enough to make the whole word, up to the next space (or whatever) count as naming a symbolic atom.
  95
  96 We call these things symbolic *atoms* because they aren't collections. Thus numbers are also atoms, just not symbolic ones. And functions are also atoms, but again, not symbolic ones.
  97
  98 Functions are another class of values we'll have in our language. They aren't "literal" values, though. Numbers and symbolic atoms are simple expressions in the language that evaluate to themselves. That's what we mean by calling them "literals." Functions aren't expressions in the language at all; they have to be generated from the evaluation of more complex expressions.
  99
 100 (By the way, I really am serious about thinking of *the numbers themselves* as being expressions in this language; rather than some "numerals" that aren't themselves numbers. We can talk about this down the road. For now, don't worry about it too much.)
 101
 102 I said we wanted to be starting with a fragment of arithmetic, so we'll keep the function values off-stage for the moment, and also all the symbolic atoms except for `'true` and `'false`. So we've got numbers, truth-values, and some functions and relations (that is, boolean functions) defined on them. We also help ourselves to a notion of bounded quantification, as in &forall;`x < M.` &phi;, where `M` and &phi; are (simple or complex) expressions that evaluate to a number and a boolean, respectively. We limit ourselves to *bounded* quantification so that the fragment we're dealing with can be "effectively" or mechanically decided. (As we extend the language, we will lose that property, but it will be a topic for later discussion exactly when that happens.)
 103
 104 As I mentioned in class, I will sometimes write &forall; x : &psi; . &phi; in my informal metalanguage, where the &psi; clause represents the quantifier's *restrictor*. Other people write this like `[`&forall; x : &psi; `]` &phi;, or in various other ways. My notation is meant to parallel the notation some linguists (for example, Heim &amp; Kratzer) use in writing &lambda; x : &psi; . &phi;, where &psi;  clause restricts the range of arguments over which the function designated by the &lambda;-expression is defined. Later we will see the colon used in a somewhat similar (but also somewhat different) way in our programming languages. But that's just foreshadowing.
 105
 106
 107 ### Let and lambda ###
 108
 109 So we have bounded quantification as in &forall; `x < 10.` &phi;. Obviously we could also make sense of &forall; `x == 5.` &phi; in just the same way. This would evaluate &phi; but with the variable `x` now bound to the value `5`, ignoring whatever it may be bound to in broader contexts. I will express this idea in a more perspicuous vocabulary, like this: `let x be 5 in` &phi;. (I say `be` rather than `=` because, as I mentioned before, it's too easy for the `=` sign to get used for too many subtly different jobs.)
 110
 111 As one of you was quick to notice in class, though, when I shift to the `let`-vocabulary, I no longer restricted myself to just the case where &phi; evaluates to a boolean. I also permitted myself expressions like this:
 112
 113     let x be 5 in x + 1
 114
 115 which evaluates to `6`. Okay, fair enough, so I am moving beyond the &forall; `x==5.` &phi; idea when I do this. But the rules for how to interpret this are just a straightforward generalization of our existing understanding for how to interpret bound variables. So there's nothing fundamentally novel here.
 116
 117 We can have multiple `let`-expressions embedded, as in:
 118
 119     let y be (let x be 5 in x + 1) in 2 * y
 120
 121     let x be 5 in let y be x + 1 in 2 * y
 122
 123 both of which evaluate to `12`. When we have a stack of `let`-expressions as in the second example, I will write it like this:
 124
 125     let
 126       x be 5;
 127       y be x + 1
 128     in 2 * y
 129
 130 It's okay to also write it all inline, like so: `let x be 5; y be x + 1 in 2 * y`. The `;` represents that we have a couple of `let`-bindings coming in sequence. The earlier bindings in the sequence are considered to be in effect for the later right-hand expressions in the sequence. Thus in:
 131
 132     let x be 0 in (let x be 5; y be x + 1 in 2 * y)
 133
 134 The `x + 1` that is evaluated to give the value that `y` gets bound to uses the (more local) binding of `x` to `5`, not the (previous, less local) binding of `x` to `0`. By the way, the parentheses in that displayed expression were just to focus your attention. It would have parsed and meant the same without them.
 135
 136 Now we can allow ourselves to introduce &lambda;-expressions in the following way. If a &lambda;-expression is applied to an argument, as in: `(`&lambda; `x.` &phi;`) M`, for any (simple or complex) expressions &phi; and `M`, this means the same as: `let x be M in` &phi;. That is, the argument to the &lambda;-expression provides (when evaluated) a value for the variable `x` to be bound to, and then the result of the whole thing is whatever &phi; evaluates to, under that binding to `x`.
 137
 138 If we restricted ourselves to only that usage of &lambda;-expressions, that is when they were applied to all the arguments they're expecting, then we wouldn't have moved very far from the decidable fragment of arithmetic we began with.
 139
 140 However, it's tempting to help ourselves to the notion (at least partly) *unapplied* &lambda;-expressions, too. If I can make sense of what:
 141
 142 `(`&lambda; `x. x + 1) 5`
 143
 144 means, then I can make sense of what:
 145
 146 `(`&lambda; `x. x + 1)`
 147
 148 means, too. It's just *the function* that waits for an argument and then returns the result of `x + 1` with `x` bound to that argument.
 149
 150 This does take us beyond our (first-order) fragment of arithmetic, at least if we allow the bodies and arguments of &lambda;-expressions to be any expressible value, including other &lambda;-expressions. But we're having too much fun, so why should we hold back?
 151
 152 So now we have a new kind of value our language can work with, alongside numbers and booleans. We now have function values, too. We can bind these function values to variables just like other values:
 153
 154 `let id be` &lambda; `x. x; y be id 5 in y`
 155
 156 will evaluate to `5`. In reaching that result, the variable `id` was temporarily bound to the identity function, that expects an argument, binds it to the variable `x`, and then returns the result of evaluating `x` under that binding.
 157
 158 This is what is going on, behind the scenes, with all the expressions like `succ` and `+` that I said could really be understood as variables. They have just been pre-bound to certain agreed-upon functions rather than others.
 159
 160
 161 ### Containers ###
 162
 163 So far, we've only been talking about *atomic* values. Our language will also have some *container* values, that have other values as members. One example are **ordered sequences**, like:
 164
 165     [10, 20, 30]
 166
 167 This is a sequence of length 3. It's the result of *cons*ing the value `10` onto the front of the shorter, length-2 sequence `[20, 30]`. In this made-up language, we'll represent the sequence-consing operation like this:
 168
 169     10 & [20, 30]
 170
 171 If you want to know why we call it "cons", that's because this is what the operation is called in Scheme, and they call it that as shorthand for "constructing" the longer list (they call it a "list" rather than a "sequence") out of the components `10` and `[20, 30]`. The name is a bit unfortunate, though, because other structured values besides lists also get "constructed", but we don't say "cons" about them. Still, this is the tradition. Let's just take "cons" to be a nonsense label with an interesting back-history.
 172
 173 The sequence `[20, 30]` in turn is the result of:
 174
 175     20 & [30]
 176
 177 and the sequence `[30]` is the result of consing `30` onto the empty sequence `[]`. Note that the sequence `[30]` is not the same as the number `30`. The former is a container value, with one element. The latter is an atomic value, and as such won't have any elements. If you try to do this:
 178
 179     [30] + 1
 180
 181 it won't work. We haven't discussed what happens with illegal expressions like that, or like `'true + 1`. For the time being, I'll just say these "don't work", or that they "crash". We'll discuss the variety of ways these illegalities might be handled later.
 182
 183 Also, if you try to do this:
 184
 185     20 & 30
 186
 187 it won't work. The consing operator `&` always requires a container (here, a sequence) on its right-hand side. And `30` is not a container.
 188
 189 We've said that:
 190
 191     [10, 20, 30]
 192
 193 is the same as;
 194
 195     10 & (20 & (30 & []))
 196
 197 and the latter can also be written without the parentheses. Our language knows that `&` should always be understood as "implicitly associating to the right", that is, that:
 198
 199     10 & 20 & 30 & []
 200
 201 should be interpreted like the expression displayed before. Other operators like `-` should be understood as "implicitly associating to the left." If we write:
 202
 203     30 - 2 - 1
 204
 205 we presumably want it to be understood as:
 206
 207     (30 - 2) - 1
 208
 209 not as:
 210
 211     30 - (2 - 1)
 212
 213 Other operators don't implicitly associate at all. For example, you may understand the expression:
 214
 215     10 < x < 20
 216
 217 because we have familiar conventions about what it means. But what it means is not:
 218
 219     (10 < x) < 20
 220
 221 The result of the parenthesized expression is either `'true` or `'false`, assuming `x` evaluates to a number. But `'true < 20` doesn't mean anything, much less what we expect `10 < x < 20` to mean. So `<` doesn't implicitly associate to the left. Neither does it implicitly associate to the right. If you want expressions like `10 < x < 20` to be meaningful, they will need their own special rules.
 222
 223 Sequences are containers that keep track of the order of their arguments, and also those arguments' multiplicity (how many times each one appears). Other containers might also keep track of these things, and more structural properties too, or they might keep track of less. Let's say we also have **set containers** too, like this:
 224
 225     {10, 20, 30}
 226
 227 Whereas the sequences `[10, 20, 10]`, `[10, 20]`, and `[20, 10]` are three different sequences, `{10, 20, 10}`, `{10,20}`, and `{20, 10}` would just be different ways of expressing a single set.
 228
 229 We can let the `&` operator do extra-duty, and express the "consing" relation for sets, too:
 230
 231     10 & {20}
 232
 233 would evaluate to `{10, 20}`, and so too would:
 234
 235     10 & {10, 20}
 236
 237 As I mentioned in class, we'll let `&&` express the operation by which two sequences are appended or concatenated to each other:
 238
 239     [10, 20] && [30, 40, 50]
 240
 241 will evaluate to `[10, 20, 30, 40, 50]`. For sets, we'll let `and` and `or` and `-` do extra duty, and express set intersection, set union, and set subtraction, when their arguments are sets. If the arguments of `and` and `or` are booleans, on the other hand, or the arguments of `-` are numbers, then they express the functions we were understanding them to express before.
 242
 243 In addition to sequences, there's another kind of expression that might initially be confused with them. We might call these **tuples** or **multivalues**. They are written surrounded by parentheses rather than square brackets. Here's an example:
 244
 245 `(0, 'true,` &lambda;`x. x)`
 246
 247 That's a tuple with 3 elements (also called a "triple").
 248
 249 In the programming languages and other formal systems we'll be looking at, tuples and sequences are often understood and handled differently. This is because we apply different assumptions to them. In the case of a sequence, it's assumed that they will have homogeneously-typed elements, and that their length will be irrelevant to their own type. So you can have the sequence:
 250
 251     [20, 30]
 252
 253 and the sequence:
 254
 255     [30]
 256
 257 and even the sequence:
 258
 259     []
 260
 261 and these will all be of the same type, namely a sequence of numbers. You can have sequences with other types of elements, too, for example a sequence of booleans:
 262
 263     ['true, 'false, 'true]
 264
 265 or a sequence of sequences of numbers:
 266
 267     [[10,20], [], [30]]
 268
 269 An excellent question that came up in class is "How do we tell whether `[]` expresses the empty sequence of numbers or the empty sequence of something else?" We will discuss that question in later weeks. It's central to some of the developments we'll be exploring. For now, just put that question on a mental shelf and assume that somehow this just works out right.
 270
 271 Now whereas sequences expect homogenously-typed elements, and their length is irrelevant to their own type, tuples are the opposite in both respects. Tuples may have elements of heterogenous type, as our example:
 272
 273 `(0, 'true,` &lambda;`x. x)`
 274
 275 did. They need not, but they may. Also, the type of a tuple does depend on its length, and moreover on the specific types of each of its elements. A tuple of length 2 (also called a "pair") whose first element is a number and second element is a boolean is a different type of thing that a tuple whose first element is a boolean and whose second element is a number. Most functions expecting the first as an argument will crash if you give them the second instead.
 276
 277 Earlier I said that we can call these things "tuples" or "multivalues". Here I'll make a technical comment, that in fact I'll understand these slightly differently. Really I'll understand the bare expression `(10, x)` to express a multivalue, and to express a tuple proper, you'll have to write `Pair (10, x)` or something like that. The difference between these is that only the tuple is itself a single value that can be bound to a single variable. The multivalue isn't a single value at all, but rather a plurality of values. This is a bit subtle, and other languages we're looking at this term don't always make this distinction. But the result is that they have to say complicated things elsewhere. If we permit ourselves this fine distinction here, many other things downstream will go more smoothly than they do in the languages that don't make it. Ours is just a made-up language, but I've thought this through carefully, so humor me. We haven't yet introduced the apparatus to make sense of expressions like `Pair (10, x)`, so for the time being I'll just restrict myself to multivalues, not to tuples proper. The result will be that while we can say:
 278
 279     let x be [10, 20] in ...
 280
 281 that is, sequences are first-class values in our language, we can't say:
 282
 283     let x be (10, 'true) in ...
 284
 285 or even:
 286
 287     let x be (10, 20) in ...
 288
 289 However, intuitively it ought to make sense to say:
 290
 291     let (x, y) be (10, 'true) in ...
 292
 293 That should just bind the variable `x` to the value `10` and the variable `y` to the value `'true`, and go on to evaluate the rest of the expression with those bindings in place. In this particular example, we could equally have said:
 294
 295     let x be 10; y be 'true in ...
 296
 297 but in other examples it will be substantially more convenient to be able to bind `x` and `y` simultaneously. Here's an example:
 298
 299 `let`
 300 `  f be` &lambda; `x. (x, 2*x)`
 301 `  (x, y) be f 10`
 302 `in [x, y]`
 303
 304 which will evaluate to `[10, 20]`. Note that we have the function `f` returning two values, rather than just one, just by having its body evaluate to a multivalue rather than to a single value.
 305
 306
 307
 308
 309 *More coming*
 310
 311
 312 ### Patterns ###
 313
 314 *More coming*
 315
 316 ### Recursive let ###
 317
 318 *More coming*
 319
 320 ### Comparing recursive-style and iterative-style definitions ###
 321
 322 *More coming*
 323
 324