#Implementing trees#
In [[Assignment3]] we proposed a very ad-hoc-ish implementation of
trees. It had the virtue of constructing trees entirely out of lists,
which meant that there was no need to define any special
tree-construction functions.
Think about how you'd implement them in a more principled way. You could
use any of the version 1 -- version 5 implementation of lists as a model.
To keep things simple, we'll stick to binary trees. A node will either be a
*leaf* of the tree, or it will have exactly two children.
There are two kinds of trees to think about. In one sort of tree, it's only
the tree's leaves that are labeled:
.
/ \
. 3
/ \
1 2
The inner, non-leaf nodes of the tree may have associated values. But if so,
what values they are will be determinable from the structure of the tree and the
values of the node's left and right children. So the inner nodes don't need
their own independent labels.
In another sort of tree, the tree's inner nodes are also labeled:
4
/ \
2 5
/ \
1 3
When you want to efficiently arrange an ordered collection, so that it's
easy to do a binary search through it, this is the way you usually structure
your data.
These latter sorts of trees can helpfully be thought of as ones where
*only* the inner nodes are labeled. Leaves can be thought of as special,
dead-end branches with no label:
.4.
/ \
2 5
/ \ / \
1 3 x x
/ \ / \
x x x x
In our earlier discussion of lists, we said they could be thought of as
data structures of the form:
Empty_list | Non_empty_list (its_head, its_tail)
And that could in turn be implemented in v2 form as:
the_list (\head tail. non_empty_handler) empty_handler
Similarly, the leaf-labeled tree:
.
/ \
. 3
/ \
1 2
can be thought of as a data structure of the form:
Leaf (its_label) | Non_leaf (its_left_subtree, its_right_subtree)
and that could be implemented in v2 form as:
the_tree (\left right. non_leaf_handler) (\label. leaf_handler)
And the node-labeled tree:
.4.
/ \
2 5
/ \ / \
1 3 x x
/ \ / \
x x x x
can be thought of as a data structure of the form:
Leaf | Non_leaf (its_left_subtree, its_label, its_right_subtree)
and that could be implemented in v2 form as:
the_tree (\left label right. non_leaf_handler) leaf_result
What would correspond to "folding" a function `f` and base value `z` over a
tree? Well, if it's an empty tree:
x
we should presumably get back `z`. And if it's a simple, non-empty tree:
1
/ \
x x
we should expect something like `f z 1 z`, or `f label_of_this_node `. (It's not important what order we say `f` has to take its arguments
in.)
A v3-style implementation of node-labeled trees, then, might be:
let empty_tree = \f z. z in
let make_tree = \left label right. \f z. f (left f z) label (right f z) in
...
Think about how you might implement other tree operations, such as getting
the label of the root (topmost node) of a tree; extracting the left subtree of
a node; and so on.
Think about different ways you might implement leaf-labeled trees.
If you had one tree and wanted to make a larger tree out of it, adding in a
new element, how would you do that?
When using trees to represent linguistic structures, one doesn't have
latitude about *how* to build a larger tree. The linguistic structure you're
trying to represent will determine where the new element should be placed, and
where the previous tree should be placed.
However, when using trees as a computational tool, one usually does have
latitude about how to structure a larger tree---in the same way that we had the
freedom to implement our [sets](/week4/#index9h1) with lists whose members were
just appended in the order we built the set up, or instead with lists whose
members were ordered numerically.
When building a new tree, one strategy for where to put the new element and
where to put the existing tree would be to always lean towards a certain side.
For instance, to add the element `2` to the tree:
1
/ \
x x
we might construct the following tree:
1
/ \
x 2
/ \
x x
or perhaps we'd do it like this instead:
2
/ \
x 1
/ \
x x
However, if we always leaned to the right side in this way, then the tree
would get deeper and deeper on that side, but never on the left:
1
/ \
x 2
/ \
x 3
/ \
x 4
/ \
x 5
/ \
x x
and that wouldn't be so useful if you were using the tree as an arrangement
to enable *binary searches* over the elements it holds. For that, you'd prefer
the tree to be relatively "balanced", like this:
.4.
/ \
2 5
/ \ / \
1 3 x x
/ \ / \
x x x x
Do you have any ideas about how you might efficiently keep the new trees
you're building pretty "balanced" in this way?
This is a large topic in computer science. There's no need for you to learn
the various strategies that they've developed for doing this. But
thinking in broad brush-strokes about what strategies might be promising will
help strengthen your understanding of trees, and useful ways to implement them
in a purely functional setting like the lambda calculus.