% Graphs and Trees \newcommand\concat{\curl} \newcommand\AB{\mathfrak{S}} \newcommand\lettera{\mathcal{A}} \newcommand\letterb{\mathcal{B}} \newcommand\letterc{\mathcal{C}} A **graph** is a mathematical structure consisting of 1 or more (but finitely many) **vertices** or **nodes**, on the one hand, and 0 or more **edges** joining two vertices, on the other. There are several different variations on this idea. In a simple graph, the edges have no direction; in **directed graphs** or **digraphs**, they do. In the latter case the directed edges are sometimes called **arrows** or **arcs**. In simple graphs and simple digraphs, it is usually forbidden for an edge to join any vertex to itself. For some applications, though, it is useful to allow graphs that permit that. (Consider the next-state diagrams for finite automata that we were just recently working with.) When such edges are allowed, they are called **self-loops** or just **loops**, and sometimes the graphs that include them are called **pseudographs**. In simple graphs and simple digraphs, it is usually forbidden for more than one edge to join two vertices (in the same direction: a digraph is allowed to have a directed edge from $v_1$ to $v_2$, and then another directed edge from $v_2$ to $v_1$). For some applications, though, it is useful to allow graphs with multiple edges between some pairs of vertices; sometimes these are called **multigraphs**. (For example, the famous [Bridges of Koenigsberg problem](http://en.wikipedia.org/wiki/Bridges-of-Koenigsberg_problem), which launched the discipline of graph theory, makes use of multigraphs rather than simple graphs.) As is implicit in the above explanations, a graph can have many vertices but no edges at all. Or it may have edges connecting some proper subsets of its vertices, but have several disconnected "pieces." We'll introduce some terminology to describe this more carefully below. In some applications, it is permitted for there to be edges that "hang free," instead of always joining a pair of vertices. They lack a vertex on one of their ends. (Sometimes even on both ends.) We won't be discussing graphs of this sort, though; nor will we be disussing multigraphs or pseudographs more than we already have. ### Some Terminology ### * an edge is **incident** to each of the vertices it joins * two **edges are adjacent** if there's any vertex they're both incident to * two **vertices are adjacent** if an edge joins them (in a digraph, the edge may go in either direction) * the **neighborhood** of a vertex is the set of all vertices adjacent to it * the **degree** of a vertex is the number of edges incident to it (in a digraph, we'd speak of the **indegree** and the **outdegree**) * a vertex of degree 0 is **isolated** ### Connectedness for undirected graphs ### Consider a simple (undirected) graph, and let a **walk** be a sequence of adjacent vertices, perhaps visiting the same vertex several times, perhaps even traversing the same edge several times. There is a variety of terminology for different restrictions on this notion. We will say a **path** is a walk of 1 or more vertices, where (as in any walk) successive vertices in the sequence are adjacent, and (now we impose special restrictions) no vertex (and thus no edge) is visited more than once, with the exception that the ending vertex is allowed to be (but needn't be) identical to the starting vertex. A closed path, where the start and end vertices are the same (and the path traverses at least one edge), is called a **cycle**. A path of length 0 would just be a single vertex. Any single edge would give us a path of length 1. Paths of length 2 would include 2 edges; and so on. Recall the notion of a self-loop, discussed before. If those are allowed, they would count as cycles (on our definitions). They would be cycles of length 1. But there can be longer cycles too. In a *non-multi*graph, where at most a single edge joins any two vertices, the shortest cycle which isn't a self-loop would have length 3. Although it's common to prohibit self-loops from the graphs one is talking about, it's on the other hand common to *allow* longer cycles. For some purposes, though, it is useful to prohibit cycles too, and only consider **acyclic** graphs. The **components** of a simple (undirected) graph are the maximal subsets of vertices such that for every pair in the set, there is a path between them (together with all the edges joining those vertices). When a graph has only one component --- that is, when there's a path between *any* two vertices in the graph --- then the graph is called **connected**. (Graphs are **complete** when there is, more demandingly, an *edge* between any two vertices.) A component of a graph is an example of a more general notion, that of a **subgraph**. A subgraph of a graph $G$ would be given by *any* subset of $G$'s vertices, joined by *some* subset of the edges that join those vertices in $G$. $G$'s components are its *maximal connected subgraphs*: that is, subgraphs of $G$ that are connected and aren't proper subgraphs of any other connected subgraphs of $G$. When we consider the question whether graph $H$ is a subgraph of $G$, or whether it is equivalent to $G$, we ignore the specific identities of the vertices. Graphs count as equivalent so long as there is some isomorphism between them that preserves the relevant edge structure. And to count as a subgraph of $G$, $H$ doesn't have to contain any of the same specific vertices as $G$ does; it's enough that there is a (homo)morphism from $G$ to $H$ that preserves the relevant edge structure. Deep issues in the philosophy of math lurk here. Ideally we'd like to think of a graph $G$ in a way that's independent of any choices about which specific objects are the vertices. Mathematically, a graph should just be a *pattern* by which *any* set of so-many vertices may be joined. But we won't dig into these issues here. ### Connectedness for directed graphs ### When it comes to digraphs, there are several different notions of connectedness. It's clear that a digraph like this one is unconnected, in every sense: ![Digraph #1](digraph1.jpg) On the other extreme, it's clear that a digraph like this one is connected, in the most robust sense. Every pair of vertices is joined by a path (in both directions, in fact), which always travels "with" the direction the digraph's arrows. This is called a **strongly connected** digraph. ![Digraph #2](digraph2.jpg) But between these extremes there are different kinds of digraphs that are kind-of, sort-of, in-a-way connected. For example, in this digraph: ![Digraph #3](digraph3.jpg) for any pair of vertices you choose, you can always find a path that travels "with" the direction of the digraph's arrows, but sometimes this path will only go from the second vertex to the first, and not from the first to the second. There is a path of this sort from $a$ to $c$, but not one from $c$ to $a$. The following two digraphs are connected in an even weaker sense: ![Digraphs #4](digraph4.jpg) In these cases we cannot find a path that travels only "with" the direction of the arrows *either* from $a$ to $c$ *or* from $c$ to $a$. Unlike Digraph #1, though, at least in this case there is a *kind* of path between $a$ and $c$. It just requires sometimes traveling "with" the direction of an arrow, and sometimes traveling "against" it. These are all legitimate notions of connectedness and of a path. Henceforth, though, when I speak of a **directed path** from $c$ to $a$, I shall mean only the kind of structure we see in Digraph #2. ### Graphs and Relations ### There are natural correlations between (non-multi)graphs and relations. One way to think of this would associate the holding of the relation with the presence of an edge. Then, when we have a simple graph that excludes self-loops, the relation would have to be irreflexive. When the relation is symmetric, we could think of it as correponding to either an undirected graph, or to a digraph that just happens to also have a reverse arrow wherever it has an arrow. When the relation is not symmetric, we should understand the graph to be directed. A different way to correlate graphs and relations would associate the holding of a relation not with the presence of an edge but the presence of a path. This makes sense when the relation is reflexive and transitive. Then the relation holds between $v_1$ and itself iff there's a path starting with $v_1$ and ending with $v_1$---and there always is such a path, of length 0. (We don't need to rely on the "self-loopy" paths of length 1.) It holds between $v_1$ and $v_2$ if there's a (directed) edge joining $v_1$ to $v_2$, or a (directed) edge joining $v_1$ to $\dots$ to $v_2$. If the relation fails to be symmetric, we should here also understand the graph to be directed. There are also interesting correlations between graphs and groups, in both directions. But this would take us too far afield. ### Labels ### The presence or absence of an edge between two vertices carries a single bit of information about how those vertices are related. Sometimes we want to use graphs to represent richer information. For example, we might be using the vertices to represent cities, and want to associate with the edges information about *how far apart* those cities are. We can do this by just adding to our graph a function from edges into whatever kind of information we want to work with. Graphs with labeled edges of this sort, especially when the labels are real numbers, are sometimes called **networks**. For some purposes, we might also (or instead) want to have labels for vertices. Why might we want to do that? Whatever we were tempted to use as the label for some vertex, couldn't we just let that thing *be* the vertex? For example, if we're graphing distances between cities, can't we just let the cities themselves be our vertices? Why do we need to have some separate set of vertices, that the cities merely label? Well, in some situations it will be fine just to let the objects you're interested in constitute your vertices. But if there is ever the prospect of wanting one and numerically the same object to simultaneously occupy two different positions in your graph, then you can't do that. Numerical sameness and difference between the vertices is what constitutes sameness and difference of graph positions. So it makes no sense to have two positions constituted by what is numerically just a single object. On the other hand, it would be completely unproblematic to let one and numerically the same object simultaneously be the label for two distinct vertices. But, you may ask, would such a need ever arise? Well, suppose we are making a syntax tree of a sentence where the same word appears multiple times. A syntax tree is a kind of graph, which we'll look at more closely in a moment. So now if we think that what gets associated with the vertices of this graph are the *word-types* of the relevant sentence, this would be a case where we'd want to associate the same object (a given word-type) with multiple vertices. You might protest: why don't we these trees as inhabited by *word tokens* instead of word types? Then we can be sure that when "the same" word appears multiple times in the sentence, there will always be multiple tokens. And we can let those tokens *be* the vertices, rather than having them *label* some separate set of vertices. I'm afraid that strategy will not work in general. Consider the following inscription of the sentence $Stupid~is~as~stupid~does.$
Stupid | is as does. |