More details are also available on these two pages. (But some information is only discussed below; the others aren't supersets of this page.)

Can you summarize the differences between your made-up language and Scheme, OCaml, and Haskell?

The made-up language we wet our toes in in week 1 is called Kapulet. (I'll tell you the story behind its name sometime.) The purpose of starting with this language is that it represents something of a center of gravity between Scheme, OCaml, and Haskell, and also lacks many of their idiosyncratic warts. One downside is that it's not yet implemented in a form that you can run on your computers. So for now, if you want to try out your code on a real mechanical evaluator, you'll need to use one of the other languages.

Also, if you want to read code written outside this seminar, or have others read your code, for these reasons too you'll need to make the shift over to one of the established languages.

We hope, though, that learning Kapulet first puts you in a position to make that shift more effortlessly, and also to more quickly see the places where there's underlying unity to Scheme, OCaml, and Haskell, despite their diverse syntaxes. (And idiosyncratic warts.)

This is a complex document. We don't expect that you will be learning all of these languages simultaneously. But you may find it helpful to read through the whole thing to get a broad overview, then consult it more carefully about the language you're focused on learning at any given point. You may also find it helpful to consult when confronting code you don't understand in one of the other languages. There are important parts of these languages that aren't covered here, especially parts concerning types and monads and continuations, that we will be discussing later in the seminar. We will add additional Rosetta pages for those later. If you master the ideas summarized here, however, you will have a good understanding of the basic skeleton of each of these languages.

Comments

...  # this is a comment in Kapulet, that goes until the end of the line

...  ; this is a comment in Scheme, that goes until the end of the line

...  -- this is a comment in Haskell, that goes until the end of the line

Note that for Haskell's comments, the -- must be immediately followed by something like a space or a letter. --> does not begin a comment; it's a legal operator symbol.

OCaml doesn't have comments of that sort. It only has "block" comments like this:

(* ... *)

which may last for several lines. These comments nest, so that:

(* ... (* inner *) ... *)

is a single comment.

Haskell also has block comments, though it {- writes them differently -}. Haskell's block comments also nest.

Racket and Scheme also have block comments, though they #| write them differently |#. These block comments also nest. Another form of block comments is #;( ... ). Those may contain nested parentheses, and extend until the next matching ). So prefixing #; to a complex parenthesized expression is a way to turn the whole thing into a comment. (These two comment styles only recently became part of the official Scheme standard, but they have been widely implemented.)

Variables

Our syntax for variables in Kapulet is close to that in the other languages. Haskell and OCaml differ only in that they do not permit trailing ? or !; however, they do permit trailing 's (and even permit 's in the middle of a variable too, which Kapulet does not). Scheme permits all of these characters, plus many more punctuation symbols as well, to occur anywhere in a variable. Scheme also permits variables to begin with capital letters, or to consist solely of the single character _; but the other languages reserve these terms for special purposes.

In addition to the variables made of letters (more properly, of alphanumerics), Haskell and OCaml and Kapulet also permit some variables made exclusively of punctuation symbols, like < or Haskell's >=> and <$>. In Haskell, these always have infix syntax, and the variables made of letters never do. (But the former can have their infix syntax suppressed with parentheses, and the latter can be "flagged" to temporarily take on infix syntax, as we'll discuss below.)

In OCaml and Kapulet, some variables made of letters also have infix syntax, such as comp in Kapulet or mod in OCaml. I haven't presented to you the complex mechanisms needed to declare this.

Equality and Booleans

The relation that's written == in Kapulet is also written that way in Haskell. That symbol means something else in OCaml, having to do with mutable reference cells; to get the same notion in OCaml one writes just a single =. The negation of this relation is written != in Kapulet, /= in Haskell, and <> in OCaml. (Again, != means something else in OCaml.)

These comparison operators are "polymorphic". This is a notion we'll discuss later when we get to types, but in the present context it means that you can apply == to two numbers, or to two booleans, and so on. In Kapulet, OCaml, and Haskell, however, you cannot apply that comparison to a number and a boolean at the same time. That will fail as a type error, instead of evaluating to 'false.

Also, these languages (and Scheme too) behave in idiosyncratic ways if you try to compare two function values for equality. The equivalence of function values is not in general recursively decidable; it may be possible in some specific cases to give you a definite yes-or-no answer, but you'll have to look up the specific rules for (each implementation of) each language. I recommend that you in general just avoid comparing function values for equality.

Scheme has a whole bunch of equality functions. First, there are functions restricted to specific kinds of values: = for numbers, symbol=? for symbolic atoms, boolean=? for booleans (this is more familiar to us as "iff"), and so on. Those functions fail if called with arguments that aren't of the expected types. Scheme also has a couple of unrestricted equality functions, which can take arguments of any type, and the arguments need not even be of the same type (but if they're not, they'll always be counted as unequal). The two most fundamental of these are eqv? and equal?. They behave the same for numbers (at least, for "exact" numbers like integers), for symbols, for booleans, and the like. As we'll discuss below, containers in Scheme (lists, pairs, vectors, strings) are generally "mutable", so there's a choice when comparing two such containers whether we're asking if the containers merely happen now to contain corresponding values (including, if their elements are themselves containers, they too containing corresponding values). Or whether we're asking if the containers occupy the same mutable location in memory, so that it'd be impossible for them to become unequal at any stage in the program's evaluation. The first comparison is expressed by equal?; the second by eqv?. (You may also see Scheme programs that use the predicate eq?. This is a variant of eqv? that may sometimes be more efficient.)

The relations that are written and, or, and not in Kapulet are written the same way in Scheme. Note that in Scheme the first two can take zero or more arguments:

; Scheme
(and)
(and bool1)
(and bool1 bool2)
(and bool1 bool2 bool3)

As you'd expect (and bool1) evaluates the same as plain bool1; similarly with (or bool1). What do you think (and) with no arguments should evaluate to? How about (or)?

These relations are written in Haskell and OCaml as &&, ||, and not. (Haskell uses and and or to express other functions, which compute the joint conjunction or disjunction of every Bool value in a List of such. OCaml permits or as an old synonym for ||, but discourages using that spelling. OCaml also permits & as an old, discouraged synonym for &&.)

The values that are written 'true and 'false in Kapulet are written in Haskell as True and False, and in OCaml as just true and false. (It'd be more consistent with OCaml's other naming policies for them to have said True and False, but they didn't.) These are written #t and #f in Scheme, but in Scheme in many contexts any value that isn't #f will behave as though it were #t, even values you might think are more "false-like", like 0 and the empty list. Thus (if 0 'zero 'nope) will evaluate to 'zero.

Some Scheme implementations, such as Racket, permit #true and #false as synonyms for #t and #f. (These aliases are also mandated in "version 7", r7rs, of the Scheme standard.)

Scheme also recognizes the values 'true and 'false, but it treats 'false as distinct from #f, and thus as a "truth-like" value, like all of its other values that aren't #f. Kapulet essentially took Scheme's boolean values and collapsed them into being a subtype of its symbol values.

Infix operators and parentheses

Kapulet, OCaml, and Haskell all understand some expressions like + to be infix operators. So you would write:

1 + 2

not:

+ 1 2

But all three of these languages permits you to enclose an infix operator in parentheses to make a section, which no longer has infix syntax. In Kapulet, ( + ) is the same as λ (x, y). x + y, whereas in OCaml and Haskell it's a curried function, which we can write (in Kapulet syntax) as λ x y. x + y. We'll discuss sections and curried functions below.

Kapulet and OCaml have some variables made of (or spelled with) letters also taking infix syntax, such as comp in Kapulet or mod in OCaml. In Haskell, this is never the case: variables that are made of letters are only treated as function terms being applied to arguments when they're at the start of a list of expressions; and variables that are made of punctuation symbols, and not enclosed in parentheses, will only be treated as infix operators. However, Haskell permits you to temporarily "flag" a function term made of letters to behave like an infix operator, by enclosing it in ` marks. Thus in Haskell you can write:

3 `mod` 2

But without the `, you'd have to write: mod 3 2.

Scheme has no infix operators. It ruthlessly demands that all functions to be applied to arguments come at the start of a list of expressions, regardless of whether the functions are specified by variables made of letters, punctuation symbols, or a mix of the two, or even if the functions are computed by evaluating more complex expressions. Thus in Scheme one always writes:

(+ 3 2)

and the like. Here is an example where the function to be applied is the result of evaluating a more complex expression:

((if #t + *) 3 2)

which will evaluate to 5, not 6.

In Scheme the parentheses are never optional and never redundant. In expressions like (+ 3 2), the parentheses are necessary to express that the function is being applied; + 3 2 on its own is not a complete Scheme expression. And if the + were surrounded by its own parentheses, as in:

((+) 3 2)

what that would mean is that + is first being applied to zero arguments, which is different from not applying it all. (In Kapulet, OCaml, and Haskell, one would write that f is being applied to "zero arguments" like this: f (); see below. We will discuss functions that "take zero arguments" a few weeks into the seminar.) Scheme helpfully defines the result of applying + to zero arguments to be 0. So ((+) 3 2) would evaluate to whatever (0 3 2) does, and that's an error, because 0 is not a function.

Note that (0 3 2), although it is, qua expression, a list of numbers, does not evaluate to a list. To get an expression that evaluates to that list, you'd have to use (list 0 3 2) or '(0 3 2). (Notice the initial '.) More on this below.

In Scheme, you can also write (+ 3 2 10), and so on. You only have to write (+ (+ 3 2) 10) if you really want to.

Parentheses have many other roles in Scheme, too; they're a ubiquitous part of the syntax, and don't always express function application. You might sometimes feel they are overused.

You may sometimes see [ ... ] being used in Scheme, instead of ( ... ). This is just a stylistic variant; they work exactly the same. The official Scheme standard doesn't permit this usage, but most Scheme implementations do. It can help keep track of which closing ] or ) goes with which opening [ or (. The opening and closing symbols always have to correspond.

In Scheme, the default style for defining functions is as taking several arguments simultaneously, that is the uncurried style. In OCaml and Haskell, the default style is to define them curried. Curried functions can easily be partially applied:

(* OCaml *)
let add  = fun x y -> x + y in
let add2 = add 2 in
    add2 3
;;

will result in 5.

In Scheme, the common idiom would be to define add like this:

(define add (lambda (x y) (+ x y)))

(We'll explain define below.) After this, you cannot say (add 2), because add will be expecting two arguments, but you only supplied one. You can however define curried functions in Scheme, it's just more laborious:

(define curried_add (lambda (x) (lambda (y) (+ x y))))
(define add2 (curried_add 2))
(add2 3)

will result in 5. This is the best one can do in official Scheme, but there are various syntax extensions and macros out there to make it possible to write this sort of thing more succinctly.

OCaml and Haskell also permit defining functions in uncurried form:

(* OCaml *)
let add  = fun (x, y) -> x + y (* uncurried*) in
let add2 = add 2 in ...

Here the last displayed line will fail, because add expects as its argument a tuple of two numbers.

Kapulet essentially works like OCaml and Haskell; though for pedagogical reasons we started out by introducing uncurried definitions, rather than the curried definitions those other languages predominantly use.

Here are some interesting functions we can define in Kapulet. See below for the pattern syntax used here.

# Kapulet
let
  curry   match lambda f. lambda  x  y.  f (x, y);
  uncurry match lambda g. lambda (x, y). g  x  y ;
  uncurried_flip match lambda f. lambda (y, x). f (x, y)
  curried_flip   match lambda g. lambda  y  x.  g  x  y;
in ...

The function curry takes as an argument a function f that expects its arguments uncurried, and returns instead lambda x y. f (x, y), a function that expects its arguments curried --- but then does with them whatever f does. Going in the other direction, the function uncurry takes a function g that expects its arguments curried, and returns instead a function that expects its arguments uncurried --- but then does with them whatever g does.

The function uncurried_flip takes as an argument again an uncurried function f, and returns another function that also expects its arguments uncurried, but that expects them in the other order. curried_flip transforms a curried function g in the analogous way. These are both different from the function swap we defined in the course notes as:

lambda (x, y) = (y, x)

That function operates on a tuple and returns another tuple. The ..._flip functions operate on functions, and transform them into other functions that expect their arguments in a different order.

As we mentioned in the course notes, in Kapulet, OCaml, and Haskell, there is a shorthand that enables you to write things like:

# Kapulet
let
  ten_minus match lambda x. 10 - x;
  and_ys    match lambda x. x & ys;
  plus      match lambda (x, y). x + y
in (ten_minus, and_ys)

like this:

# Kapulet
let
  ten_minus match (10 - );
  and_ys    match ( & ys);
  plus      match ( + )
in (ten_minus, and_ys)

There are just minor differences between these languages. First, OCaml doesn't have the ( + 10) or (10 + ) forms, but only the ( + ).

Second, as a special case, OCaml doesn't permit you to do this with its list cons-ing operator ::. You have to write fun x xs -> x :: xs, not ( :: ). Whereas in Kapulet ( & ), (x & ), and ( & xs) are all sections using its sequence cons-ing operator &; and in Haskell, ( : ), (x : ), and ( : xs) are the same.

Third, as mentioned above, OCaml's and Haskell's ( + ) and the like evaluate to curried functions.

Fourth, in Kapulet, ( - 10) expresses λ x. x - 10 (consistently with (10 - )), but Haskell (and OCaml) treat this specific form differently, and interpret it as meaning the integer -10. Here's how to express some things in Kapulet:

# Kapulet
(0 - 2)
( - 2)         # ( - 2) 10 == 8
(0 - )
( - ) (5, 3)

and here are their translations into natural Haskell:

-- Haskell
( -2 )        -- (0 - 2) also works
(subtract 2)  -- subtract 2 10 == 8
negate        -- (0 - ) also works
( - ) 5 3

OCaml expresses (0 - ) or negate as ~-. You can write 3 * (0 - 2) in OCaml either like that, or as 3 * ( -2 ), or as 3 * ~-2.

I know all these languages fairly well, and I still find this fourth issue difficult to keep track of. You may be starting to understand why I spoke of "warts."

Sequences and Lists

In Kapulet, we have a notion I called a "sequence" which has an empty form [] and a cons-ing operator &, so that:

1 & 2 & 3 & []

can also be written:

[1, 2, 3]

Haskell is very similar, except that it calls these Lists, and its cons-ing operator is written :. OCaml also calls them lists, and its cons-operator is written ::. (OCaml also uses Haskell's symbol :, but it uses it to deal with types; and Haskell in turn also uses OCaml's symbol ::, but that's what it uses to deal with types. Grr.)

Kapulet writes the operator that concatenates or appends sequences as &&. Thus:

# Kapulet
[1, 2] && [3, 4, 5]

evaluates to [1, 2, 3, 4, 5]. Haskell writes this operator as ++. In Haskell, a String is just a List of Char, so ++ is also the operator we use to append strings:

-- Haskell
"over" ++ "due"

evaluates to "overdue". In OCaml, strings aren't implemented as lists, so their append operators are different: ^ for strings and @ for lists:

(* OCaml *)
[1; 2] @ [3; 4; 5] ;;
"over" ^ "due" ;;

evaluate to [1; 2; 3; 4; 5] and "overdue". Note that OCaml separates its list elements with semicolons not commas. If you write [1, 2, 3] in OCaml, it will think that's a one-element list whose first element is a triple, that is, what you'd write in Haskell as [(1, 2, 3)].

Here are some list functions in Kapulet:

length
(&&)
# the following were defined in homework
empty?       # can also use ([] == ) or pattern-match against []
tail
drop
take
split
filter
partition
map
map2
# the following were defined in extra credit
unmap2
takewhile
dropwhile
reverse
# new functions
join         # converts [[10, 20], [30], [], [40, 50]]
             # to [10, 20, 30, 40, 50] (but only "joining" a single layer of []s)
(mem)        # infix syntax, 2 mem [1, 2, 3] == 'true
nth          # nth [10, 20, 30] 1 == 20, because 10 occupies position 0
             # fails if the index is out of bounds
all p xs     # all odd? [1, 3, 5] == 'true
any p xs     # any even? [1, 3, 5] == 'false

Here are the corresponding functions in Haskell:

length
(++)
null     -- can also use ([] == ) or pattern-match against []
tail     -- compare head, which fails on []
drop     {- but these are curried functions, so you write `drop n xs`
            not `drop (n, xs)` as in Kapulet -}
take
splitAt
filter
Data.List.partition
map
zipWith  {- zip handles the special case of zipWith where f is the function that forms ordered pairs
            both zipWith and zip stop with the shortest list -}
unzip    {- unlike unmap2, doesn't take an explicit f argument
            just assumes it's (\(x, y) -> (x, y)) -}
takeWhile
dropWhile
reverse
concat   -- corresponding to join
elem     -- not infix syntax, but often written as: 2 `elem` [1, 2, 3]
(!!)     -- infix syntax: [10, 20, 30] !! 1 == 20
         -- fails if the index is out of bounds
all p xs
any p xs

Here they are in OCaml:

List.length
(@)          (* or List.append *)
(* no function predefined for empty?
   can use fun xs -> [] == xs, or function [] -> true | _ -> false *)
List.tl      (* compare List.hd, which fails on [] *)
(* no function predefined for drop or take *)
(* no function predefined for split; OCaml uses List.split to mean something else *)
List.filter  (* also List.find_all *)
List.partition
List.map
List.map2    (* compare List.combine, like Haskell's zip
                both map2 and combine fail if the lists are different lengths *)
List.split   (* like Haskell's unzip, doesn't take an f argument *)
(* no function predefined for takewhile or dropwhile *)
List.rev
List.concat  (* also List.flatten, which still only "joins" a single layer of []s *)
List.mem     (* not infix syntax *)
List.nth     (* List.nth [10; 20; 30] 1 = 20; fails if the index is out of bounds *)
List.for_all p xs
List.exists p xs

Recall that in addition to sequences/lists, Kapulet also has a notion of sets, which can be literally expressed using notation like this:

{'x, x}

That set contains the atomic symbol 'x, and whatever symbol value the variable x is bound to (which need not, but may, be the symbol 'x). Or:

{1, 2, x}

That set contains the numbers 1 and 2, and whatever number the variable x is bound to. Sets in Kapulet, like sequences, must have elements of all the same type.

OCaml and Haskell also have set values (in the Set and Data.Set libraries, respectively), but these are harder to use and can't be literally expressed. In particular, the { ... } notation in these languages has different meanings.

How does all this look in Scheme? Well, Scheme has a notion they call a (proper) list, and also a notion they call a vector. There are also what Scheme calls "improper" lists, with (cons 1 'nonlist) or '(1 . nonlist), where 'nonlist is any non-list (here it's a symbol) being a limiting case. Let's ignore the improper lists. Scheme's (proper) lists and vectors each have a claim to correspond to Kapulet's sequences / Haskell's Lists / OCaml's lists. But they also differ from those. The main differences are:

  1. these structures in Scheme can contain heterogenously-typed elements, including further lists and vectors in some positions but not in others
  2. in the official Scheme standard, lists and vectors are both mutable containers, that is, one and the same persisting list structure can have different elements at different stages in a program's evaluation

Many Scheme implementations also provide immutable versions of lists and vectors, more closely approximating the sequences/lists in Kapulet, Haskell, and OCaml. With some configurations, Racket even makes the immutable versions the defaults. But none of these are yet part of the official Scheme standard. Also, difference 1 is present in all Scheme implementations. This makes Scheme's lists and vectors in some ways more akin to tuples in the other languages (to "proper" tuples in Kapulet) (see below).

(OCaml does have Array values, and Haskell has Data.Array.MArray values, both of which are similar to Scheme's mutable vectors, at least in respect 2. But they are more difficult to use.)

There are also some differences in how lists are specified in Scheme versus the other languages. In Scheme, one writes the empty list like this:

(list)

and lists with more elements like this:

(list 10)
(list 10 x)
(list 10 x 'alpha)
(list 10 x 'alpha (list 'beta 'gamma) 'delta 20)

In the preceding, the x is a variable and is evaluated to be whatever value it's bound to in the context where the displayed expressions are being evaluated. If one has a list specification that contains no variables, no matter how deeply embedded, then a certain shorthand becomes available, using a ' prefix, like this:

'()                          ; same as (list)
'(10)                        ; same as (list 10)
'(10 alpha)                  ; same as (list 10 'alpha)
'(10 alpha (beta gamma) 20)  ; same as (list 10 'alpha (list 'beta 'gamma) 20)

Scheme can also write 'something as (quote something). (The quote is not a function being applied to some argument; this is a special syntax that only superficially looks like a function application.)

Here are the list functions in Scheme corresponding to the functions listed in the other languages:

cons              ; corresponds to Kapulet's ( & ), Haskell's ( : ), OCaml's `::`
length
append            ; corresponds to Kapulet's ( && ), Haskell's ( ++ ), OCaml's ( @ )
                  ; can be applied to one or more arguments
null?             ; corresponds to Kapulet's empty?, Haskell's null
car               ; corresponds to Haskell's head
cdr               ; corresponds to Kapulet's and Haskell's tail
(list-tail xs k)  ; corresponds to Kapulet's drop (k, xs)
                  ; fails if the list has length < k
; no official function predefined for take or split or filter or partition
map               ; corresponds to Kapulet's map and map2
                  ; can take one or more list arguments
; no official function predefined for unmap2 or takewhile or dropwhile
reverse
; no official function prefefined for join/concat
memv, member      ; correspond to Kapulet's (mem) and Haskell's elem
                  ; memv compares elements using eqv?, member using equal?
(list-ref xs k)   ; corresponds to Kapulet's `nth xs k`
                  ; fails if the index k is out of bounds
; no official function predefined for all or any

All of the functions listed as missing from the official Scheme standard can be found in various add-on libraries, or you could define them yourself if you had to.

Tuples

The course notes already mentioned that Kapulet has a "lightweight" notion of tuples, called multivalues and written (10, x), as well as a heavier notion written Pair (10, x). The latter is what corresponds to the tuples in Haskell and OCaml. They don't have any explicit notation for Kapulet's "lightweight" tuples (though they exist behind the scenes in OCaml and explain some of its otherwise puzzling behavior). There are good reasons for introducing this additional complexity in Kapulet, but this is not the place to explain them.

All of these languages have notions of zero-length tuples, as well as pairs, triples, and the like. (In Kapulet's case, there are both the 0-length multivalue () and heavier counterparts.)

Probably the closest approximation to tuples in Scheme is its notion of vectors, though in the case of pairs, Scheme's pairs---which it identifies with short, possibly "improper" lists---are arguably also contenders. The fact that these Scheme structures permit elements of heterogenous type is not a problem, because that is also true for tuples in the other languages. However, Scheme's vectors and pairs are officially mutable, but tuples in the other languages are not. (As mentioned above, many Scheme implementations do also provide immutable versions of these structures.)

What corresponds to the zero-length tuples in Kapulet, OCaml, and Haskell? Perhaps the zero-length vector. Or perhaps a different Scheme value, called void. Different Scheme implementations display this value in different ways. For example, Racket and Chicken may display it as #<void> or as #<unspecified> or may just display nothing. This is the value returned, for example, by a case or a cond construction if there is no else clause and none of the provided clauses successfully match. In many respects, this value more closely approximates in Scheme the behavior that () has in Kapulet, OCaml, and Haskell.

Chars and Strings

Scheme, OCaml, and Haskell all have values they call "characters", and sequences of such characters they call "strings". Haskell and OCaml write the first character of the word "false" like this:

'f'

whereas Scheme writes it like this:

#\f

(Note the difference between the character #\f and the boolean #f.) Scheme gives special characters like #\space funny names.

Sequences of characters are called "strings". All of these languages write the string "false" like this:

"false"

This is not the same as the truth-value, nor is it the same as the atomic symbol 'false (which Kapulet but not Scheme identifies with the truth-value). In Haskell, strings are strictly equivalent to Lists of Chars. In OCaml and Scheme, they are not equivalent to lists (nor to vectors) but merely isomorphic to them. In OCaml and Scheme, some strings are mutable, like Scheme's vectors.

Other functions

These functions are roughly the same in Kapulet, OCaml, and Haskell: succ, pred, fst, snd. The official Scheme standard doesn't include any succ or pred functions, but Racket and Chicken both have add1 and sub1. Depending on what Scheme values you take to correspond to tuples in the other languages, fst and snd may correspond to Scheme's car and cdr. (These also correspond to head and tail when applied to lists.)

Kapulet's (comp), odd?, even?, and swap are Haskell's ( . ), odd, even, and Data.Tuple.swap. None of these are predefined in OCaml.

Kapulet's dup isn't predefined in Haskell but can be easily expressed as \x -> (x, x).

These are the same in Kapulet and Haskell (modulo the differences between Kapulet's multivalues or "lightweight tuples" and Haskell's tuples): id, const, curry, uncurry. Kapulet's curried_flip is Haskell's flip. None of these are predefined in OCaml.

Kapulet and Haskell both have ( $ ), which was explained in the course notes. OCaml expresses this as ( @@ ). (OCaml also uses |> to express the converse operation: f x, f @@ x and x |> f all mean the same.)

Case, Cond, and If ... then ...

The complex expression that's written like this in Kapulet:

# Kapulet
case some_expression of
  0 then result0;
  1 then result1;
  x then resultx
end

is written very similarly in Haskell:

-- Haskell
case some_expression of {
  0 -> result0;
  1 -> result1;
  x -> resultx
}

Unlike the other languages we're discussing, Haskell pays special attention to the whitespace/indentation of what you write. If you've got the indentation right, you can omit the {, ;, and }s in the above. And that's how you will often see Haskell code displayed. On this website, though, I propose to always include the {s and so on when displaying Haskell code, because the indentation rules aren't 100% intuitive. It's easy to read properly-indented Haskell code, but until you've learned and practiced the specific rules, it's not always easy to write it.

The case construction is written only a little bit differently in OCaml:

(* OCaml *)
match some_expression with
  0 -> result0 |
  1 -> result1 |
  x -> resultx

Note there is no closing end or }. You can enclose the whole expression in parentheses if you want to, and when embedding it in some larger expressions (like another match expression), you may need to. Sometimes the | dividers are written at the start of a line, and you are allowed to include an extra one before the first line, so you could also see this written as:

(* OCaml *)
match some_expression with
  | 0 -> result0
  | 1 -> result1
  | x -> resultx

The syntax for guards and as-patterns also only varies slightly between these languages:

# Kapulet
case some_expression of
  pat1   when guard             then result1;
  pat1   when different_guard   then result2;
  ((complex_pat) as var, pat4)  then result3
end

-- Haskell
case some_expression of {
  pat1 | guard              -> result1;
       | different_guard    -> result2;
  (var@(complex_pat), pat4) -> result3
}

(* OCaml *)
match some_expression with
  pat1   when guard             -> result1 |
  pat1   when different_guard   -> result2 |
  ((complex_pat) as var, pat4   -> result3

The official Scheme standard only provides for a limited version of this. There is a case construction, available since at least "version 5" of the Scheme standard (r5rs), but it only accepts literal values as patterns, not any complex patterns containing them or any patterns containing variables. Here is how it looks:

; Scheme
(case some_expression
  ((0) 'result0)
  ((1) 'result1)
  ((2 3 5) 'smallprime)
  (else 'toobig))

The results can be complex expressions; I just used bare symbols here for illustration. Note that the literal patterns in the first two clauses are surrounded by an extra pair of parentheses than you might expect. The reason is shown in the third clause, which begins (2 3 5). This does not mean to match a list containing the values 2 3 and 5. Instead it means to match the simple value 2 or the simple value 3 or the simple value 5. The final else clause is optional. If it's omitted, and none of the other clauses match, the result is Scheme's special void value.

The patterns here can be any literal value (what the Scheme standards call a "datum"). Numbers are permitted, as are boolean literals (#t and #f) and symbolic atoms ('alpha and the like, though inside a pattern position in a case construction, you omit the initial '). You can also use the list literal '() (again, omit the initial ' when writing it as a pattern). Some implementations of Scheme allow more complex list patterns, matching literal lists like '(alpha 0 () #t); others don't.

There are various add-on libraries to Scheme that will permit you to pattern-match in more ambitious ways, approximating what you can do in Kapulet, OCaml, and Haskell. We will explain some of these later in the course, after we've introduced you to the notion of datatypes.

What programmers using standard Scheme tend to do instead is to use predicates that query the type and/or structure of an unknown value, and then take separate evaluation paths depending on the result. This can be done with an if ... then ... else ... construction, or with Scheme's more general cond construction. In Scheme, these two are equivalent:

; Scheme
(if test1 'result1                    ; else what follows:
          (if test2 'result2          ; else what follows:
                    (if test3 'result3 'somethingelse)))

(cond
  (test1 'result1)
  (test2 'result2)
  (test3 'result3)
  (else  'somethingelse))

The tests tend to use predicates like null? (are you the empty list?), pair? (are you a non-empty list, whether proper or improper?), list? (are you a proper list, whether empty or not?), symbol?, boolean?, number?, zero? (you get the idea). The Little Schemer books use their own predicates they call atom? (are you a non-list?) and lat? (are you a list all of whose members are atoms?)

You can also use more complex tests you write on the spot, or your own antecedently-defined functions:

; Scheme...in case the parens left any doubt
(define smallprime? (lambda (x) (if (= x 2) #t (if (= x 3) #t (if (= x 5) #t #f)))))

(cond
  ((= x 0) 'infant)
  ((smallprime? x) 'myfavorite)
  ((and (> x 10) (< x 20)) 'teenaged)
  (else 'unknown))

Remember that in Scheme, an expression doesn't have to evaluate to #t to be treated as "truth-like". Every value other than #f is treated as truth-like. As I said before (if 0 'zero 'nope) evaluates to 'zero.

You may sometimes see Scheme cond constructions written with this kind of clause:

(cond
  ...
  (test-expression => function-value)
  ...)

That's the same as the following:

(cond
  ...
  (test-expression (function-value test-expression))
  ...)

Except that it only evaluates the test-expression once.

The clauses in Scheme's cond expressions can contain multiple expressions after the test. This only becomes useful when you're working with mutable values and side-effects, which we've not gotten to yet. The if expressions only take a single expression for the "then" branch and a single expression for the "else" branch. You can turn a complex series of expressions, which may involve side-effects, into a single expression by wrapping it in a (begin ...) construction. The (begin ...) construction as a whole evaluates to whatever the last expression it contains does.

Scheme standards after r5rs also provide two further conditional constructions, which are for the situations where you want to perform a meaningful action only on the "then" branch, or only on the "else" branch:

(when test-expression
   result-expression1...)

(unless test-expression
   result-expression2...)

If the test-expression evaluates to #f, then the when expression evaluates to Scheme's special void value; mutatis mutandis for the unless expression. This is analogous to () in OCaml, Haskell, and Kapulet.

In the last three languages, the expressions in the then-branch and the else-branch of a conditional have to have the same type. You can't say if test-expression then 0 else []. Also, they expect the test-expression to evaluate specifically to a boolean value, not merely to 'false versus "anything else". They are stricter about types here than Scheme is.

In the special case where an else-branch evaluate to () (and thus so too must the then-branch), and the else-branch does so using no complex expression but merely the literal (), then OCaml permits you to omit that else-branch. So in OCaml you can write this:

 if test_expression then then_result

instead of

 if test_expression then then_result else ()

This is similar to Scheme's when construction. Kapulet and Haskell have no analogue.

Lambda expressions

In Kapulet you write λ expressions (sometimes called "anonymous functions") with a prefix of either λ or the spelled-out lambda. That's followed by one or more patterns, separated by spaces, then a period, then a single expression which makes up the body of the function. When there are multiple patterns, the function expressed is curried, thus:

lambda (x, y) z. result

means the same as:

lambda (x, y). (lambda z. result)

The parentheses could have been omitted around lambda z. result; they're just there to focus your attention.

Haskell and OCaml are very similar to this, they just use some slightly different notation. In Haskell you'd write:

-- Haskell
\(x, y) z -> result

and in OCaml you'd write:

(* OCaml *)
fun (x, y) z -> result

You may sometimes see λ expressions in OCaml written using function instead of fun. These overlap somewhat in their usage. The difference is that function only allocates a position for one argument pattern, so can't straightforwardly define curried functions. (You can however embed function expressions inside other function expressions.) On the other hand, function can take multiple variant patterns for that single position. Thus with function you can say:

(* OCaml *)
function []    -> result1 |
         x::xs -> result2

whereas with fun you'd have to write:

(* OCaml *)
fun ys -> match ys with
            []    -> result1 |
            x::xs -> result2

In Scheme, lambda expressions are written like this:

; Scheme
(lambda (vars...) body-expressions...)

Scheme only permits simple variables as its argument patterns, and the lambda expression can be defined to take zero or more arguments:

; Scheme
(lambda () ...)
(lambda (x) ...)
(lambda (x y) ...)
(lambda (x y z) ...)

As I said before, we will discuss functions that "take zero arguments" a few weeks into the seminar.

There is special syntax for defining functions that may take varying numbers of arguments (recall and and +), where Scheme binds a single variable to a list containing all of the received arguments (or all of the arguments after the nth...). I won't explain that syntax here.

Let, Letrec, and Define

Kapulet has the syntax:

# Kapulet
let
  pat1  match expr1;
  pat2  match expr2;
  pat3  match expr3
in result

which is equivalent to:

# Kapulet
let
  pat1  match expr1
in let
  pat2  match expr2
in let
  pat3  match expr3
in result

There is also a corresponding letrec form. In let, the bindings in pat1 are in effect for the evaluation of all of expr2, expr3, and result (but not any further, if this is part of a more complex expression); similarly for the bindings in pat2 and pat3. In letrec, all of the bindings on the left-hand side are in effect for all of the right-hand side expressions, as well as for the result.

OCaml only has the second, more verbose form of this, and writes it a bit differently:

(* OCaml *)
let
  pat1  = expr1
in let
  pat2  = expr2
in let
  pat3  = expr3
in result

If you want to define some mutually recursive functions with letrec, OCaml uses a special syntax for that, using letrec ... and ... in ...:

(* OCaml *)
letrec
  even  = fun x -> if x = 0 then true else odd x
and
  odd   = fun x -> if x = 0 then false else even x
in ...

Haskell has both of the syntactic forms that Kapulet does, though like OCaml, it uses = rather than match. And it wraps all the binding clauses with { ... } (see earlier remarks about Haskell and whitespace/indentation):

-- Haskell
let {
  pat1  = expr1;
  pat2  = expr2;
  pat3  = expr3
} in result

Also, in Haskell let always means letrec. There is no term in Haskell that means what simple let does in Kapulet and OCaml.

Haskell also has another form, roughly synonymous with its let ... in .... It looks like this:

-- Haskell
result where {
  pat1  = expr1;
  pat2  = expr2;
  pat3  = expr3
}

Here all the new bindings introduced for the variables in the pats are in effect for the evaluation of the exprs (this works like letrec too), and also for the evaluation of result.

There are a few places where you can use let ... in ... but not ... where ..., and a few places where the inverse is true.

Scheme has four (or five) syntactic forms here, including let, let*, letrec, and letrec*. The difference between the last two is subtle and only arises in the presence of continuations; you can just use letrec for ordinary purposes. I won't try to explain the difference between let and let* here, except to say this:

  1. When there's only a single pattern-binding clause, as in (let ((var expression)) result), let and let* work the same.
  2. When there are multiple pattern-binding clauses, as in (let ((var1 expression1) (var2 expression2)) result), then they work somewhat differently and let* is probably the one that works like you're expecting.

The let* form is the one that corresponds to let in Kapulet. I recommend you get in the habit of just always using let* (or letrec) in Scheme, instead of let.

When you're at the "toplevel" of your program, or of a library/module/compilation-unit (the terminology differs), there is also another syntactic form possible. In Kapulet, you'd write:

# Kapulet
let
  pat1  match expr1;
  ...
end
... # rest of program or library

Notice that this form ends with end, not with in result. The above is roughly equivalent to:

# Kapulet
let
  pat1  match expr1;
  ...
in ... # rest of program or library

That is, the bindings initiated by the clauses of the let construction remain in effect until the end of the program or library. They can of course be "hidden" by subsequent bindings to new variables spelled the same way. The program:

# Kapulet
let
  x  match 0
end
let
  x  match 1
end
x

evaluates to 1, just like:

# Kapulet
let
  x  match 0
in let
  x  match 1
in x

does. There's a similar form for letrec.

OCaml can do the same:

let
  x = 0 ;;
let
  x = 1 ;;
x

The double-semicolons are hints to OCaml's "toplevel interpreter" that a syntactic unit has finished. In some contexts they're not needed, but it does no harm to include them if you're not sure.

Haskell's "toplevel interpreter" (ghci) permits a syntactic form that looks superficially quite like these:

let x = 2
x

but under the covers something quite different is happening. (Specifically, you're working "inside the IO Monad", except that in this special context, expressions like x that don't evaluate to monadic values are permitted and evaluated. We don't expect that you will understand yet what any of this means.) If you're writing in a file that you want Haskell to interpret or compile, on the other hand, you have to do something a bit different (which you can't easily also do at the toplevel in ghci). Recall the shortcut by which we permitted:

# Kapulet
let
  f  match lambda pat1. body1;
  g  match lambda pat2 pat3. body2
in ...

to be written more concisely as:

# Kapulet
let
  f pat1      = body1;
  g pat2 pat3 = body2
in ...

OCaml and Haskell permit that same shorthand. And Haskell additionally permits the bare binding clauses of such expressions (that is, without the surrounding let and in) to occur at the toplevel of files. In other words, a Haskell file can look like this:

-- Haskell file.hs
f pat1      = body1

g pat2 pat3 = body2
...

Note there are no semicolons here. These are called "toplevel declarations" of the functions f and g. A single function name can have multiple declarations (within a single scoping context), using different patterns:

-- Haskell file.hs
f [] = 0
f (x:xs) = 1 + f xs

defines f as a function that returns the length of a single List argument. (You can also do that inside Haskell's let constructions, too.) This is what corresponds in Haskell files to let ... end in Kapulet.

Haskell also permits multiple declarations of this sort inside its let and where constructs, too. Moreover, these declarations can also have pattern guards, as in:

-- Haskell file.fs
f [] = 0
f (x:xs) | odd x = 1 + f xs
         | otherwise = f xs

Scheme has a version of letrec ... end, which it writes as define. Thus in Scheme this:

; Scheme
(define var1 expr1)
... ; rest of program

evaluates the same as this:

; Scheme
(letrec ((var1 expr1))
        ... ; rest of program
            )

This is what we can call Scheme's fifth form of the let family.

Some versions of Scheme permit you also to include define inside some (but not all) complex expressions. Thus you can write:

(lambda (x)
  (define var1 expr1)
  ...)

instead of:

(lambda (x)
  (letrec ((var1 expr1))
  ...))

There is no analogue to this in the other languages.

Further Installments ...

We will expand these comparisons (on separate web pages) as we introduce additional ideas in the course, such as types and monads and continuations.

Offsite Readings comparing Scheme, OCaml, and Haskell

Why did you name these pages "Rosetta"?

The Rosetta Stone is a famous slab discovered during Napoleon's invasion of Egypt, that had the same decree written in ancient Greek (which modern scholars understood) and two ancient Egyptian scripts (which they didn't). The slab enabled us to recover understanding of those Egyptian scripts; and has since come to be a symbol for the simultaneous expression of a single idea in multiple languages. A number of websites do this for various programming languages:

Scheme OCaml Haskell
  Rosetta Code Rosetta Code Rosetta Code
PLEAC PLEAC PLEAC
n/a
langref.org
code codex code codex code codex
99 problems 99 problems 99 problems

See also the Project Euler programming challenges.