Encoding Trees

Sets on the one hand, and order relations (including lattices), and strings/finite sequences are various different ways that elements can be structured or organized. We’ve also talked about different kinds of “trees,” which are other ways that elements can be structured. Here are some trees:

The elements that are structured by, or occupy/annotate/label positions in these example trees are strings. But as with sequences and other structures, the elements can be other types of things too (numbers, planets). Sometimes the trees have all of positions occupied, as in the leftmost example. Sometimes only their “leaf” positions are occupied, as in the two rightmost examples.

Note that the middle example and the rightmost example are distinct trees, because they structure their elements differently.

Think about what conventions we might use if we wanted to “encode” or represent these tree structures into a single string, in the way that Homework 5 talks about encoding sequences into a single string.

One way to encode the leftmost tree would be like this (with the string “displayed” on the next line):

(That’s the same as "((c b d) a e)". When I “display” a literal string by indenting it on its own line, I will omit the surrounding quotes.)

We’ll be using an encoding like that later in the course, with "b" and "a" replaced by logical connectives like "∨", "&", or "⊃".

In Linguistics a different convention is used. They’d represent the middle tree like this:

If they’re working with trees like the leftmost one (as they often do), they’d represent it like this (notice the subscripted "a" and "b"):

Generally, they have labels for syntactic categories like “NP/DP” or “VP” or “S” in place of "a" and "b", and natural language words or phrases in place of "c", "d", and "e". For example, they’d say that the sentence The child swims has a syntactic structure something like this:

Why do I say “NP/DP”? This is because there’s a controversy in syntax about how to understand phrases like the child. Consider all of the following:

the young child who plays D&D
a young child who plays D&D
many young children who play D&D
young children who play D&D

It’s agreed that in these phrases young and who play(s) D&D have a subordinate role. They are supporting the work of the primary structural element of the phrase. But what is that primary structural element? On a traditional picture, it’s the underlined word child (or children), called the “head noun.” But on the dominant contemporary view, the primary structural element is instead the determiner words the, a, and many. The last example is thought to also have a determiner element in the syntax, just one that is silent and unpronounced. Theorists who think the noun is primary call these “noun phrases” (NPs); theorists who think the determiners are primary call them “determiner phrases” (DPs).

The language we’ve introduced for talking about strings doesn’t include formatting choices like subscripting, so we can’t encode trees in exactly that way. But if in some contexts we were always going to be working with trees like the leftmost one, we could just use:

and understand the first word after an opening [ as if it were subscripted in the linguist’s notation. Or if we were sometimes going to be working with trees like the leftmost one, and sometimes like the other two, we might take a special letter "*" which didn’t have any other role, and represent the middle tree like this:

That doesn’t represent a tree like the leftmost one but where the "a" and "b" positions are occupied by the "*" letter; rather it represents a tree where those positions are unoccupied.

We’re not going to need to rely on any of these specific conventions. I’m just trying to give you a feel for conventions one might reasonably adopt. Different ways of using single strings to encode or represent structures of other elements (including other strings).