There are two ways to approach encoding a list in System F / OCaml. One way thinks of the list in the style of OCaml's datatypes:

    type ('a) mylist = Cons of 'a * ('a) mylist | Nil

Now in general it is possible to encode sum types like this in System F. The idea would be to make the list be a higher-order function that accepted two arguments. One argument would be the Cons-handler, the other the Nil-handler. If the list in question were non-empty, it would ignore the Nil-handler and instead feed its head and its tail to the Cons-handler. If the list were on the other hand empty, it would ignore the Cons handler and instead just return the Nil-handler.

That basic idea is fine, but there are obstacles to encoding it in System F and different obstacles to encoding it (in that style) in OCaml. (The above type declaration works fine in OCaml, though.)

What is the obstacle to encoding it in System F? Well think about what the type would be. Something like this:

    type mylist = forall 'a. forall 'b. ('a -> __ -> 'b) -> 'b -> 'b

Wait, what goes in the blank spot. That has to be the type of mylist's tail. But that should again be the same type as mylist. And we don't know how to make types that refer to themselves, and indeed in System F itself, it is impossible to do so.

Even if you could, you wouldn't be able to *do* too much with lists encoded in this way, because in System F there's no `letrec`.

So the more promising way forward would be to use the other approach to encoding a list in System F / OCaml, which instead encodes the lists in terms of their right-folds. We will explore that further below. You may want to skip to END DIGRESSION the first time you read this (though the end of this discussion will refer back to some of the stuff explained during the digression).

START DIGRESSION

First, a lengthy digression about how we might nonetheless pursue this first encoding strategy in OCaml instead. Of course you don't *have to* encode lists in OCaml. OCaml already gives you its native lists `[10, 20, 30]`, and even if it didn't, you could just write a datatype declaration for mylist like we have above, and then use that, with `letrec` and `match` and so on just as we were doing in the early weeks of class. Only now with pretty types.

Still, it may be instructive to see how we would encode lists in the style we described. It may help us understand better what's going on behind that datatype declaration, and it will also show us some of the constraints that OCaml labors under, and that sometimes one needs to work around.

So the type we want would be something like this:

    type mylist = forall 'a. forall 'b. ('a -> mylist -> 'b) -> 'b -> 'b

One problem though is that OCaml doesn't understand those `forall`s. Well, we can just parameterize the list on the `'a`, since if we have a list of `'a`s, it will always be a list of `'a`s.

    type ('a) mylist = forall 'b. ('a -> ('a) mylist -> 'b) -> 'b -> 'b

We should be a bit reluctant to parameterize on the `'b`s, though, because let's say we encode a list `[10,20,30]`. That can intrinsically be a list where the `'a`s are `int`s, but it shouldn't intrinsically be a list where the `'b` result type is `char` or `bool` or `int` or anything in particular. We want one and the same list to be capable of delivering any of those results depending on what is the type of the handlers passed to it.

If we just omitted the `forall 'b.`, though, OCaml would complain that our type declaration has an unbound type variable in it:

    # type ('a) mylist = ('a -> ('a) mylist -> 'b) -> 'b -> 'b;;
    Error: Unbound type parameter 'b

So let's try parameterizing on the `'b` variable too, and seeing how it works out. If we just try to do it like this:

    type ('a,'b) sysf_list = ('a -> ('a,'b) sysf_list -> 'b) -> 'b -> 'b;;

OCaml won't like that either:

    # type ('a,'b) sysf_list = ('a -> ('a,'b) sysf_list -> 'b) -> 'b -> 'b;;
    Error: The type abbreviation sysf_list is cyclic

Now OCaml can tolerate recursive types, but only if the recursive call isn't "naked": it has to be underneath a constructor or something else of that sort. So we could do this instead, adding a constructor `S` around the whole type:

    type ('a,'b) sysf_list = S of (('a -> ('a,'b) sysf_list -> 'b) -> 'b -> 'b);;

OCaml will accept that. You can't have been expected to know that this would be needed. But then, we didn't intend for you to go down this path anyway. We intended for you to encode the lists in terms of their right-folds, which we discuss below. This long digression is just a learning exercise about how things work out if you stick with the first encoding style.

Now with this type definition, we can define a few list functions. The tricky part is making sure to pop off the `S`'s when we want access to the higher-order function that is the `sysf_list`'s body, and to wrap an `S` back around the result when we want to call it an official `sysf_list` again.

    let sysf_empty = S (fun ht n -> n);;
    let sysf_cons x xs = S (fun ht n -> ht x xs);;
    let sysf_isempty (S xs) = xs (fun h t -> false) true;;
    let sysf_head (S xs) = xs (fun h t -> h) __;;

Well, what should we use as the second value if `sysf_head` is applied to a `sysf_empty` list? If we knew that it was an `(int, _) sysf_list`, we could perhaps return `0`, if a `(char, _) sysf_list`, we could perhaps return a space character, or make some other arbitrary choice. But it's clear we don't have any general, principled solution here.

But at this point, perhaps you can apply some of the things you learned from other parts of the assignment. Look at the type of the `blackhole` function:

    # let rec blackhole () = blackhole ();;
    val blackhole : unit -> 'a = <fun>

It's a function from `unit -> 'a`, for *any `'a` whatsoever*! All of OCaml's functions that don't return to their callers, but which go into infinite loops, and also the functions that raise an error, have this property. Here's another one, one of OCaml's built-in error functions:

    # failwith;;
    - : string -> 'a = <fun>

So perhaps we could define `sysf_head` like this:

    let sysf_head (S xs) = xs (fun h t -> h) (failwith "bad list");;

And indeed that will type perfectly. Unfortunately, though, guess what happens if we apply that function to a good list:

    # let abc = sysf_cons 'a' (sysf_cons 'b' (sysf_cons 'c' sysf_empty));;
    # sysf_head abc;;
    Exception: Failure "bad list".

Why? Because OCaml, like most programming languages, is call-by-value. So the argument `failwith "bad list"` gets evaluated *before* its result is passed into the xs function, and thus the `xs` function never gets invoked.

How can we avoid this? If only we had a way of scheduling that failure, but suspending its evaluation so that it didn't happen right when we scheduled it, but only (perhaps) later, if the `xs` function corresponds to an empty list and so tries to use it. Have you seen any way of doing that?

Yes you have. The idea is to write `sysf_head` like this instead:

    let sysf_head (S xs) = xs (fun h t -> h) (fun () -> failwith "bad list");;

Of course, that won't type right. (OCaml will accept that definition ok, but the types will be screwed up later when you try to combine it with lists built using the `sysf_cons` we've got.) The problem is that now we are thinking that the second, if-you're-empty argument is always a thunk, so we should rewrite all our definitions with this in mind.

    type ('a,'b) sysf_list = S of (('a -> ('a,'b) sysf_list -> 'b) -> (unit -> 'b) -> 'b);;
                                                                       -------
    let sysf_empty = S (fun ht n -> n ());;
                                      --
    let sysf_cons x xs = S (fun ht n -> ht x xs);;
    let sysf_isempty (S xs) = xs (fun h t -> false) (fun () -> true);;
                                                     ---------
    let sysf_head (S xs) = xs (fun h t -> h) (fun () -> failwith "bad list");;
                                              ---------

All the changes have been underlined. Now, this is looking a lot better. In fact, it almost works.

    # let abc = sysf_cons 'a' (sysf_cons 'b' (sysf_cons 'c' sysf_empty));;
    val abc : (char, '_b) sysf_list = S <fun>

Looks good, what's the problem? The hint that there's trouble comes in the type OCaml displays for the list `abc` you just created. Its type is `(char, '_b) sysf_list`. Notice that `'_b` type parameter begins with an underscore. That's OCaml's convention for indicating that it is not polymorphic or generalized over. It's some specific one type; OCaml just hasn't figured out what that type is yet. But as that `abc` list interacts with other pieces of code, OCaml will eventually decide that it has one specific type, and then it will insist that that must always be the second type parameter, and thus must always be the type of the result returned by the handlers.

For example:

    # let S xs = abc in xs (fun h t -> 1) (fun () -> 0);;
    - : int = 1
    # abc;;
    - : (char, int) sysf_list = S <fun>

Now it has decided `abc` only takes handlers returning int results:

    # let S xs = abc in xs (fun h t -> true) (fun () -> false);;
                                       ----
    Error: This expression has type bool but an expression was expected of type int

Whoops. This is exactly the problem I warned we were going to encounter by making the result type a parameter for the list type, on a par with `'a`. Any given list is a list of a *specific type* of `'as`; but it shouldn't be a list that only returns type a specific type `'b` result. It should be able to return any type result, depending on what handlers are passed to it.

There is a way to do this in OCaml, but it has to use some apparatus you haven't been introduced to yet. The lightest-weight way to do it would be to use OCaml's "polymorphic record types". Here is how it looks, if you're curious. The `{ label = ...}` is OCaml's syntax for a "record", something like a tuple but with the field referred to by its label rather than by its position. The `'b.` at the start of the type declaration for the record's field is OCaml's way of saying `forall b.`:

    # type ('a) sysf_list = { sysf_list : 'b. ('a -> ('a) sysf_list -> 'b) -> (unit -> 'b) -> 'b };;
    # let sysf_empty = { sysf_list = fun ht n -> n () };;
    # let sysf_cons (x:'a) (xs:('a) sysf_list) : ('a) sysf_list = { sysf_list = fun ht n -> ht x xs };;
    # let sysf_isempty xs = xs.sysf_list (fun h t -> false) (fun () -> true);;
    # let sysf_head xs = xs.sysf_list (fun h t -> h) (fun () -> failwith "never");;
    # let sysf_tail xs = xs.sysf_list (fun h t -> t) (fun () -> sysf_empty);;

    # let abc = sysf_cons 'a' (sysf_cons 'b' (sysf_cons 'c' sysf_empty));;
    # let { sysf_list = xs } = abc in xs (fun h t -> 1) (fun () -> 0);;
    - : int = 1
    # let { sysf_list = xs } = abc in xs (fun h t -> true) (fun () -> false);;
    - : bool = true

No problem. Again, there was no way you could be expected to know that these contortions would be necessary.

END DIGRESSION

Okay, so if the approach of encoding lists the way we encode sum-types isn't going to fly at all in System F, or easily in OCaml, what are our alternatives. We just encode the structure using its natural recursion, like we did when we encoded lists in terms of their right-folds back in the untyped Lambda Calculus.

In System F, the idea would be this:

    type sysf_list_a = forall 'b. ('a->'b->'b) -> 'b -> 'b

That is, a `sysf_list_a` encodes a list of `'a`s (we don't try to generalize on the list element, this is not possible to do in a satisfying way in System F). For every possible result type `'b`, it accepts a fold function `c:'a->'b->'b` and a seed value `n:'b` keyed to that result type, and returns a result of that type (what you get by right-folding function `c` over the list using that seed value `n`).

    let sysf_empty_a = Lambda 'b. \c:'a->'b->'b. \n:'b. n
    let sysf_cons_a = \x:'a. \xs:sysf_list_a. Lambda 'b. \c:'a->'b->'b. \n:'b. c x (xs ['b] c n)
    let sysf_isempty_a = \xs:sysf_list_a. xs [Boolean] (\_:'a. \_:Boolean. false) true
    let sysf_head_a = \xs:sysf_list_a. xs ['a] (\x:'a. \_:'a. x) (some_arbitrarily_chosen_value_of_type_a)

And if only we had asked you to stop there, all would be good.

Unfortunately, though, we encouraged you to try to get this to work in OCaml. Here is a first attempt. In OCaml we can even parameterize on the type of the list element, so we can write `('a) sysf_list` instead of `sysf_list_a`. Also, in OCaml we omit the `Lambda 'b`s and the `[Type]`s. Also the syntax for lambda expressions is a bit different. But those are the only changes I'm making here from the System F code:

    type ('a) sysf_list = ('a->'b->'b) -> 'b -> 'b

    let sysf_empty : ('a) sysf_list = fun (c:'a->'b->'b) (n:'b) -> n
    let sysf_cons (x:'a) (xs:('a) sysf_list) = fun (c:'a->'b->'b) (n:'b) -> c x (xs c n)
    let sysf_isempty (xs:('a) sysf_list) = xs (fun (_:'a) (_:bool) -> false) true
    let sysf_head (xs:('a) sysf_list) = xs (fun (x:'a) (_:'a) -> x) (some_arbitrarily_chosen_value_of_type_a)

Whoops. Isn't this just going to inherit most of the same problems discussed above in the digression? We can fix the issue of `some_arbitrarily_chosen_value_of_type_a` by using `fun () -> failwith "bad list"` or `fun () -> blackhole ()`. It's a bit trickier here, because the type of the seed value `n` has to be *the same type* as is returned when the fold function `c` is applied to its two arguments (the current element and the result of folding over the rest of the list). So we have to add `fun () ->` to everything returned by the fold functions, too. And then at the end of our functions, we should discharge the `fun () ->` by supplying a `()` argument. But all of that is doable.

Still, the problem of that unbound type parameter `'b`, that we want to be generalized/polymorphic, remains.

If we just decline to specify the types, OCaml will infer them for us:

    let sysf_empty = fun c (n : unit -> 'b) -> n;;
    let sysf_cons x xs = fun c n -> fun () -> c x (xs c n);;
    let sysf_length xs = xs (fun x z -> z () + 1) (fun () -> 0) ();;

Notice that we have `sysf_cons` returning a `fun () ->`, and also that in `sysf_length` we have to apply a `()` to the seed argument to evaluate it and extract its value. That's a good thing, we don't *want* to automatically evaluate the fold over the rest of the list if doing so might invoke a `blackhole` or `failwith`. We only want the fold over the rest of the list to be evaluated when we specifically request it, by feeding the thunk a `()` argument. (This is called "forcing the thunk.") Also, after the fold has traversed the whole list, what we'll have at that point will be a thunk (it has to have the same type as the seed value). So we have to force it to get our answer --- that's why there's a `()` at the end of `sysf_length`.

Now how are we doing?

    # let abc = sysf_cons 'a' (sysf_cons 'b' (sysf_cons 'c' sysf_empty));;
    val abc : (char -> (unit -> '_b) -> '_b) -> (unit -> '_b) -> unit -> '_b =  <fun>

Uh-oh. There are those `'_b` types again. It means OCaml has decided this list can only have a single result type, it just doesn't know what it is yet. But once you pass one handler to the list, it will be settled and can't afterwards be changed.

    # abc (fun x _ -> 1) (fun () -> 0) ();;
    - : int = 1
    # abc;;
    - : (char -> (unit -> int) -> int) -> (unit -> int) -> unit -> int = <fun>
    # abc (fun x _ -> true) (fun () -> false) ();;
                      ----
    Error: This expression has type bool but an expression was expected of type int

The lightest-weight way to fix this issue would again be to use OCaml's "polymorphic record types", discussed above in the digression, and which you can't be expected to know about. So really we should have done this assignment differently, perhaps just in System F. Well, sorry. You will hopefully have learned something from the experience of trying to get it to work, and in the end working through this explanation of why it was harder than it seemed it should be. We hope that the frustration and confusion didn't undermine that learning benefit too much. We in turn learned something too, that this problem wasn't as straightforward as we expected it to be.


For reference, here is how I'd encode lists as right-folds, using OCaml's polymorphic record types:

    type ('a) sysf_list = { sysf_list2 : 'b. ('a->(unit->'b)->'b) -> (unit -> 'b) -> unit -> 'b };;

    let sysf_empty : ('a) sysf_list = { sysf_list2 = fun c n -> n };;
    let sysf_cons (x:'a) (xs:('a) sysf_list) : ('a) sysf_list = { sysf_list2 = fun c n () -> c x (xs.sysf_list2 c n) };;

If you try to specify type annotations on the `c` and `n` arguments in `sysf_empty` and `sysf_cons`, like this:

    let sysf_empty : ('a) sysf_list = { sysf_list2 = fun (c:'a->(unit->'b)->'b) (n:unit->'b) -> n };;
    let sysf_cons (x:'a) (xs:('a) sysf_list) : ('a) sysf_list = { sysf_list2 = fun (c:'a->(unit->'b)->'b) (n:unit->'b) () -> c x (xs.sysf_list2 c n) };;

OCaml will complain that the types aren't general enough. I don't know how to fix this, other than to omit those type annotations, as I did above.

Here are some more list functions:

    let sysf_length (xs:('a) sysf_list) : int = xs.sysf_list2 (fun x z -> z () + 1) (fun () -> 0) ();;
    let sysf_isempty (xs:('a) sysf_list) = xs.sysf_list2 (fun (y:'a) (_:unit->bool) -> false) (fun () -> true) ();;
    let sysf_head (xs:('a) sysf_list) = xs.sysf_list2 (fun (y:'a) (_:unit->'a) -> y) (fun () -> failwith "bad list") ();;

    # let abc = sysf_cons 'a' (sysf_cons 'b' (sysf_cons 'c' sysf_empty));;
    val abc : char sysf_list = {sysf_list2 = <fun>}
    # sysf_isempty abc;;
    - : bool = false
    # sysf_isempty sysf_empty;;
    - : bool = true
    # sysf_length sysf_empty;;
    - : int = 0
    # sysf_length abc;;
    - : int = 3
    # sysf_head abc;;
    - : char = 'a'
    # sysf_head sysf_empty;;
    Exception: Failure "bad list".

Yes, they do work as expected.


By the way, the main problem here (of OCaml not making the functions polymorphic enough) will also afflict your attempts to encode Church numerals, though you might not have noticed it there. But witness:

    # type 'b sysf_nat = ('b -> 'b) -> 'b -> 'b
    let sysf_zero : ('b) sysf_nat = fun s z -> z
    let sysf_one : ('b) sysf_nat = fun s z -> s z
    let sysf_succ : ('b) sysf_nat -> ('b) sysf_nat = fun n -> (fun s z -> s (n s z));;

    # sysf_succ sysf_one;;
    - : '_b sysf_nat = <fun>

Notice the `'_b` type parameter. That means that OCaml thinks the successor of `sysf_one` is not polymorphic, but always will be giving only a single result type. OCaml just doesn't yet know what it is.

There is one easy-to-implement fix to this problem. If instead of `sysf_succ sysf_one` you had instead written the eta-expanded version:

    # fun s -> sysf_succ sysf_one s;;
    - : ('a -> 'a) -> 'a -> 'a = <fun>

then OCaml knows to make the result polymorphic. But it's a pain to force yourself to write `fun s -> ... s` around all of your arithmetical computations. Short of doing that, the only way I know to make OCaml treat these functions as polymorphic enough is to use the "polymorphic record types" discussed above.

This kind of problem doesn't come up *that* often in OCaml. Normally, you wouldn't be encoding your data structures in this way, anyway. You'd use some datatype declaration like we had at the top of this page.

    type ('a) mylist = Cons of 'a * ('a) mylist | Nil

And you won't have any of the kinds of difficulties we're discussing here with that. It's just that some of the topics we're exploring in this course press against the walls where things are hard (or sometimes not even possible) to do in OCaml (and sometimes Haskell too).

By the way, this issue about not-enough-polymorphism doesn't arise in Haskell. Here are the Church numerals:

    > type Church a = (a -> a) -> a -> a
    > let { zero :: Church a; zero = \s z -> z; one :: Church a; one = \s z -> s z; succ n = \s z-> s (n s z) }
    > let two = succ one
    > two ('S':) "0"
    "SS0"
    > :t two
    two :: (a -> a) -> a -> a
    > two (1+) 0
    2

The reason that OCaml has trouble here where Haskell doesn't has to do with some fundamental differences between their type systems, that we haven't yet explored. (Specifically, it has to do with the fact that OCaml has *mutable reference cells* in its type system, and this obliges it to place limits on where it generalizes type variables, else its type system becomes inconsistent.)