X-Git-Url: http://lambda.jimpryor.net/git/gitweb.cgi?p=lambda.git;a=blobdiff_plain;f=topics%2Fweek7_untyped_evaluator.mdwn;h=ce9dd683faeecae906826f54e7513862501ed807;hp=119a730de2944af45e06b74c10f68d082d085a3f;hb=0b805960d6aff23b3103ff142963a7200e376767;hpb=7076e0276e4fcbd01657fd2c59cf0f368367fe6d diff --git a/topics/week7_untyped_evaluator.mdwn b/topics/week7_untyped_evaluator.mdwn index 119a730d..ce9dd683 100644 --- a/topics/week7_untyped_evaluator.mdwn +++ b/topics/week7_untyped_evaluator.mdwn @@ -1,5 +1,7 @@ -A program that interprets untyped lambda terms (and more) -========================================================= +[[!toc levels=2]] + +Datatypes and complications +--------------------------- Let's start with an OCaml datatype for untyped lambda terms: @@ -145,7 +147,7 @@ However, when we move to the VB/environment-based interpreter, we will need to i We'll explain the `Symbol` and `Closure` variants on the `bare_result` datatype below. -Having these two parallel datatypes is rather annoying, and requires us to insert some translation functions `term_of_result` and `result_of_term` at a few places in the program. But the core, non-fancy parts of OCaml don't supply any more elegant way to specify that one datatype is a subtype of another, so this is simply what we'll need to do. +Having these two parallel datatypes is rather annoying, and requires us to insert some translation functions `term_of_result` and `result_of_term` at a few places in the program. But the core, non-fancy parts of OCaml don't supply any more elegant way to specify that one datatype overlaps or is a subtype of another, so this is what works best. A **third complication** has to do with environments. On the one hand, we don't have any really compelling need for environments in the first phase of the exercise, when we're just making a substitute-and-repeat interpreter. They don't play any role in the fundamental task we're focusing on. But on the other hand, weaving environments into the system when we *will* need them, for the second phase of the exercise, is not simple and would require lots of code changes. So that is a reason to include them from the beginning, just off to the side not doing any important work until we want them. @@ -209,7 +211,7 @@ The further reduction, to: has to come from a subsequent re-invocation of the function. -Let's think about how we should detect that the term has been reduced as far as we can take it. In the substitute-and-repeat interpreter Chris demonstrated for combinatory logic, we had the `reduce_if_redex` function perform a single reduction *if it could*, and then it was up to the caller to compare the result to the original term to see whether any reduction took place. That worked for the example we had. But it has some disadvantages. One is that it's inefficient. Another is that it's sensitive to the idiosyncrasies of how your programming language handles equality comparisons on complex structures; and these details turn out to be very complex and vary from language to language (and even across different versions of different implementations of a single language). We'd be glad to discuss these subtleties offline, but if you're not prepared to master them, it would be smart to foster an ingrained hesitation to blindly applying a language's `=` operator to complex structures. (Some problem cases: large numbers, set structures, structures that contain functions.) A third difficulty is that it's sensitive to the particular combinators we took as basic. With `S` and `K` and `I`, it can never happen that a term has been reduced, but the output is identical to the input. That can happen in the lambda calculus, though (remember `ω ω`); and it can happen in combinatory logic if other terms are chosen as primitive (`W W1 W2` reduces to `W1 W2 W2`, so let them all just be plain `W`s). +Let's think about how we should detect that the term has been reduced as far as we can take it. In the substitute-and-repeat interpreter Chris demonstrated for combinatory logic, we had the `reduce_if_redex` function perform a single reduction *if it could*, and then it was up to the caller to compare the result to the original term to see whether any reduction took place. That worked for the example we had. But it has some disadvantages. One is that it's inefficient. Another is that it's sensitive to the idiosyncrasies of how your programming language handles equality comparisons on complex structures; and these details turn out to be very complex and vary from language to language (and even across different versions of different implementations of a single language). We'd be glad to discuss these subtleties offline, but if you're not prepared to master them, it would be smart to foster an ingrained hesitation to blindly applying a language's `=` operator to complex structures. (Some problem cases: large numbers, set structures, structures that contain functions, cyclic structures.) A third difficulty is that it's sensitive to the particular combinators we took as basic. With `S` and `K` and `I`, it can never happen that a term has been reduced, but the output is identical to the input. That can happen in the lambda calculus, though (remember `ω ω`); and it can happen in combinatory logic if other terms are chosen as primitive (`W W1 W2` reduces to `W1 W2 W2`, so let them all just be plain `W`s). So let's consider different strategies for how to detect that the term cannot be reduced any further. One possibility is to write a function that traverses the term ahead of time, and just reports whether it's already a result, without trying to perform any reductions itself. Another strategy is to "raise an exception" or error when we ask the `reduce_head_once` function to reduce an irreducible term; then we can use OCaml's error-handling facilities to "catch" the error at an earlier point in our code and we'll know then that we're finished. Pierce's code used a mix of these two strategies. @@ -302,11 +304,13 @@ Here's a better strategy. Instead of keeping all of the information about which Now `Closure`s are not a new kind of lambda _term_: the syntax for our language doesn't have any constituents that get parsed into `Closure`s. `Closure`s are only created _during the course of evaluating_ terms: specifically, when a variable gets bound to an abstract, which may itself contain variables that are locally free (not bound by the abstract itself). This is why we have separate datatypes for _terms_ and for the _results_ that terms can evaluate to. `Closure`s are results, but they aren't terms. `App`s are terms, but not results. Our boolean and number literals, as well as our primitive functions, constructors, and destructors, are both. +In later weeks, we will see more examples of results that aren't terms, but can only be generated during the course of a computation. (I'm thinking of mutable reference cells. Arguably, partially applied constructors are yet another example, that we're already familiar with.) + Getting, reading, and compiling the source code ----------------------------------------------- -You can download the source code for the intepreter [[here|/code/untyped_full-1.3.tgz]]. That link will always give you the latest version. We will update it as we find any issues. Let us know about any difficulties you experience. +You can download the source code for the intepreter [[here|/code/untyped_full-1.7.tgz]]. That link will always give you the latest version. We will update it as we find any issues. Let us know about any difficulties you experience. When you unpack the downloaded source code, you will get a folder with the following contents, sorted here by logical order rather than alphabetically. @@ -344,11 +348,32 @@ The second file, `types.ml`, contains different implementations for the environm Each implementation of that interface is itself pretty simple, though the file `types.ml` does need to have some trickery in it to work around constraints imposed by OCaml. (The trickery is marked as such.) -TODO +OCaml's terminology for the _abstract interfaces_ is `module type S = sig ... end`, and its terminology for the _concrete implementations_ of these is `module M = struct ... end`. (By the way, the `*.mli` files get compiled into the former of these, and the `*.ml` files get compiled into the latter.) The implementations have to define (at least) all the types and values declared in the abstract interface. Notice that in the `ENV` interface, we just said `type env`. That means there _has to be_ some type `env`; but different implementations can define it differently. Also, when other parts of the code _use_ the interface, the details of how the `env` type is implemented won't be exposed to them. They have to interact with the `env`s via the declared `shift` and `lookup` functions, and the `empty` environment that every implementation is obliged to provide. + +What the function `lookup` does is take an identifier like `"x"` and an existing `env`, and try to return the `result` that this `env` associates with that identifier, if any. Else it returns `None`. (There are some complications, in that we don't really return a `result option`, but rather a `binding option`, where the `binding` type is a small wrapper around a type `bound`, which is identified with the type `result`. The point of the `binding` wrapper is to help handle the toplevel declarations. But you can ignore that and just think of the `binding`s as `bound`/`result`s. The point of having the two identified types `bound` and `result` is to prepare for later developments. The `result` is what a term like `Var "x"` evaluates to (in a context where it's not free). The `bound` is what the environment binds the identifier `"x"` to. In our present system, these are of course the same. But later when we introduce mutable state into our system, they may come apart, depending on design choices we make.) + +What the function `shift` does is take an `env` and add a new binding for a given identifier. It returns an `env` with this new binding. The identifier may or may not have already had a binding in the original `env`; but in any case, the new `env` will only return the supplied new `binding` when you `lookup` the `ident`. + +As we've said, there are different ways to implement these environments. That's what's in the `types.ml` file. The `Env0` implementation provides the demanded interface, but doesn't do anything. It won't remember any new bindings. You can select this for the VA interpreter, if you like, to demonstrate that the `env`s are inessential to that interpretation strategy. (Though in that case the toplevel declarations won't be remembered.) `Env1` implements the environments as a list of pairs of identifiers and bindings. `Env2` implements the environments instead as functions from identifiers to `Some binding` or to `None`, if the identifier has no binding in that environment. At the end of the file `types.ml` is the line: + + include Env1 -The third file, `engine.ml`, is where the action is. Most of the homework assignment was just a simplified version of this file. At the bottom of the file are also instructions on how to shift the interpreter between using the VA or the VB functions. +You can change that to whichever of these implementations you'd like to use. -You can try building it and running the interpreter like this. First, make sure you're in a Terminal and that you're working directory is the folder that the source code unpacked to. Then just type `make`. That should take care of everything. If you see errors that you don't think are your fault, let us know about them. +The third file, `engine.ml`, is where the action is. Most of the homework assignment was just a simplified version of this file. At the bottom of the file are also instructions on how to shift the interpreter between using the VA or the VB functions: + + (* Put comment (* *)s around exactly one of the following two pairs of lines. *) + + let version = "A (reduce by substituting; " ^ version ^ ")" + let interpret = VA.reduce + + (* + let version = "B (use environment for local bindings; " ^ version ^ ")" + let interpret = VB.evaluate + *) + + +You can try building and running the interpreter like this. First, make sure you're in a Terminal and that your working directory is the folder that the source code unpacked to. Then just type `make`. That should take care of everything. If you see errors that you don't think are your fault, let us know about them. Possibly some Windows computers that _do_ have OCaml on them might nonetheless fail to have the `make` program. (It isn't OCaml-specific, and will be found by default on Macs and many Linux systems.) In that case, you can try entering the following sequence of commands by hand: @@ -369,9 +394,8 @@ Possibly some Windows computers that _do_ have OCaml on them might nonetheless f ocamlc -c parser.ml ocamlc -c lexer.ml ocamlc -c main.ml - ocamlc -o interp.exe lolevel.cmo types.cmo hilevel.cmo primitives.cmo engine.cmo parser.cmo lexer.cmo main.cmo - -If your computer doesn't ... TODO + ocamlc -o interp.exe lolevel.cmo types.cmo hilevel.cmo primitives.cmo \ + engine.cmo parser.cmo lexer.cmo main.cmo OK, I built the interpeter. How can I use it? @@ -451,7 +475,7 @@ Here is some more sample inputs, each of which the parser is happy with: /* note that it's not f x (g y) (h z) */ "strings" /* you can input these and pass them around, but can't perform any operations on them */ -Predefined combinators include: `S`, `K` (same as `const`), `I` (same as `id`), `B` (same as `(o)`, occurring in prefix not infix position), `C` (same as `flip`), `T`, `V` (the Church pairing combinator), `W`, `M` (better known as `ω`), and `L`. +Predefined combinators include: `S`, `K` (same as `const`), `I` (same as `id`), `B` (same as `(o)`, occurring in prefix not infix position), `C` (same as `flip`), `T` (same as `flip ($)`), `V` (the Church pairing combinator), `W`, `M` (better known as `ω`), and `L`. The parser also accepts `letrec ... in ...` terms, but currently there is no implementation for how to reduce/interpret these (that's for a later assignment), so you'll just get an error.