% Automata

\newcommand\emptystring{\varepsilon}
\newcommand\concat{\curl}
<!-- \newcommand\concat{\textcaret\textcaret } -->
\newcommand\AB{\mathfrak{S}}
\newcommand\lettera{\mathcal{A}}
\newcommand\letterb{\mathcal{B}}
\newcommand\letterc{\mathcal{C}}

When we introduced formal grammars, I wrote:

> I spoke of grammars as *generating* a set of strings; it's also possible to think of the grammars as providing patterns that *match* strings in that set (and fail to match other strings). It'll become clearer why both of these perspectives are available as we proceed.

Now I *did* go on to describe certain shortcuts for expressing a grammar, and linked these to the variety of "regular expression patterns" that Epp discusses. But that doesn't really deliver on the promise I was making in that passage.

To deliver on that promise, let's first trace through the **derivations** of some strings in two grammars.

Here is a regular grammar, that generates strings matching the regex $(\lettera\letterb){+}$:

> $\mathbf{S} \longrightarrow \lettera \: \mathrm{X}$

> $\mathrm{X} \longrightarrow \letterb$

> $\mathrm{X} \longrightarrow \letterb \: \mathrm{S}$

Let's see how it generates the string $\lettera\letterb\lettera\letterb$. On each line, the symbol that will be expanded in order to move to the next line is colored red.

> $\mathrm{\red S}                                     \quad\longrightarrow$  
> $\lettera \: \mathrm{\red X}                         \quad\longrightarrow$  
> $\lettera \: \letterb \: \mathrm{\red S}             \quad\longrightarrow$  
> $\lettera \: \letterb \: \lettera \: \mathrm{\red X} \quad\longrightarrow$  
> $\lettera \: \letterb \: \lettera \: \letterb$

This derivation implicitly attributes a **parse tree** to the finished string, which I'll present here using brackets because I don't know a quick way to draw and present the tree on this web page:

> $[_{\mathrm{S}} \lettera \: [_{\mathrm{X}} \letterb \: [_{\mathrm{S}} \lettera \: [_{\mathrm{X}} \letterb \:]]]]$

Imagine each $[_\mathrm{Q} x \: y \:]$ here as a tree branching, with $\mathrm{Q}$ at the top/root vertex, and $x$ and $y$ making up the lower leaves or subtrees.

Now what I'm recommending is that, just as you can think of the grammar as *generating* that string, via the derivation just shown, you can also think of it as *matching* the string, and attributing it the syntactic structure encoded in the parse tree corresponding to that derivation.

In some cases, distinct derivations will yield the same parse tree: for example, there may be two steps in the derivation where it doesn't matter in which order they're performed. In some other cases, distinct derivations will yield *different* parse trees but still generate the same final string. Grammars which permit this are called **ambiguous**. <!-- I know that CFGs can be ambiguous in my sense. Sipser p. 212, problem 4.23 suggests that NFAs can also be ambiguous, though it's not clear if the sense deployed there entails different parse trees. See also Sipser, pp. 107-8. -->

However, just because a given grammar is ambiguous doesn't mean we should conclude the string or language it generates is ambiguous. One and the same language can be generated by different grammars, and it often happens that the languages generated by ambiguous CFGs can also be generated by unambigous CFGs. So ambiguity is in general relative to which grammar you're talking about, not fixed by the language. (There are some languages, however, that can be generated by ambiguous CFGs but not by any unambiguous CFG.) <!-- example: $\set{\lettera^i \letterb^j \letterc^k \where i=j \latin{ or } j=k}$ -->

Here's an example of another derivation and the parse tree it generates. This time we'll use a CFG, one we've already encountered. (I'm just compressing the two rules of the previous presentation into one.)

> $\mathbf{early} \longrightarrow \lettera \: \mathrm{early} \: \letterb \mid \emptystring$

Let's derive $\lettera\lettera\letterb\letterb$:

> $\mathrm{\red{early}}                                                    \quad\longrightarrow$  
> $\lettera \: \mathrm{\red{early}} \: \letterb                            \quad\longrightarrow$  
> $\lettera \: \lettera \: \mathrm{\red{early}} \: \letterb \: \letterb    \quad\longrightarrow$  
> $\lettera \: \lettera \: \emptystring       \: \letterb \: \letterb$

The last line is equivalent to $\lettera\lettera\letterb\letterb$, but I'll leave the $\emptystring$ in place to highlight how we got there. That derivation implicitly attributes the following parse tree:

> $[_{\mathrm{early}} \lettera \: [_{\mathrm{early}} \lettera \: [_{\mathrm{early}} \emptystring \: ] \: \letterb \: ] \: \letterb \: ]$


Now, let's come at things from another direction. So far, we've been letting the grammar generate whatever strings it's allowed to. It implicitly attributes parse trees to those strings in the process. But what about parsing some novel string? That is, what if we're first *given* a particular
string, and we want to figure out whether it belongs to some formal language. How should we determine that?

That's easy! the mathematician may say. String $S$ belongs to language $L$ just in case:

> $S \in L$.

Yeah, thanks. That's not what we meant. 
In simple cases we may just be able to intuitively see that the string is generatable by a given grammar, and hence belongs to the language specified by that grammar. But is there some general strategy for settling this? Some *algorithm or recipe* we could rely on in arbitrary cases? (Keep in mind that the set $L$ will often be infinite!)


### Deciding and enumerating ###

What we're looking for is a mechanical or rote procedure, one whose steps are straightforward and unambiguous and don't require any ingenuity to implement. The kind of procedure one could program a computer to perform, without relying on the computer to already have some AI in the background. And we want the procedure to be guaranteed to deliver correct answers. This kind of thing is called **effectively calculating** an answer our question. Here is how Burgess explains it:

> The instructions must be completely definite and explicit. They should tell you at each step what to do, not tell you to go ask someone else what to do, or to figure out for yourself what to do: the instructions should require no external sources of information, and should require no ingenuity to execute, so that one might hope to automate the process of applying the rules, and have it performed by some mechanical device.

Theorists think about these issues using mathematical models of simple but idealized machines, called **formal automata**. Different kinds of automata are distinguished by formal specifications of which resources they have access to. You may have heard of **Turing machines**. That's one kind of formal automaton, an especially powerful kind. Theoretically it's often interesting to find the *weakest* kind of automaton that would be capable of answering certain questions. When we pursue such questions, we're not trying to actually *build* computers. Automata are ultimately nothing more than *recipes* for how to go about answering a question.

In general, the kinds of *questions* we may be working with will have the form: does such-and-such have so-and-so properties, that is, does it belong to set defined as blah-de-blah?

We'll say that an automaton can **enumerate the members** of a set when it can sequentially generate specifications of each of the set's members. We don't place any requirements on what order the members get specified in. We just suppose that, for any $x$ that is a member of the set, eventually --- after some finite number of steps --- the automaton will specify $x$. And the automaton will never specify things that aren't in the set.

We'll say that an automaton can **decide** some question, such as whether some specified item belongs to a specified set, when it can eventually --- after some finite number of steps --- give us a definite and correct answer. We say that a set or property is **decidable** by an automaton if it can decide, for *any* specified item, whether that item belongs to the set or has the property.

Now, from one perspective, it may sound like the task of enumerating all the members of a set is harder than the task of deciding a single item's membership in the set. Wouldn't the former involve a lot more time and work? Theoretically, though, what we're interested in is how mathemtically challenging these tasks are, not how long they would take or how expensive the computer would need to be to perform them. Some tasks are formally more demanding than others, in that they can't even in principle be performed by the kind of automata that can perform the easier tasks. And from this perspective, enumerating all the members of a set tends to be easier (and is never harder) than being able to decide, for arbitrary single items, whether they belong to the set.

The kinds of automata capable of enumerating the members of a set may well also be capable of deciding, for many *particular choices*, whether the chosen items belong to the set, yes-or-no. But being able to give a yes-or-no answer for *any* specified item will be harder. The automaton can start making its yes-list, and if the answer *is* yes, then eventually the automaton will be able to stop and say so. But if the answer is in fact no, and the set isn't finite, and there's no order we can assume the members of the set will be specified in, then the automaton may never reach a point where it's able to stop and give the answer no. For all it knows, the item may still be in the part of the yes-set that hasn't yet been listed.

Of course, if the automaton were able to enumerate *both* the members of a set *and*, separately, the non-members, then it *would* always be able to decide the question whether a specified item belongs to the set. It may take a while, but eventually the item would have to show up on one of its lists, so at that point the automaton could stop and give its answer, yes-or-no. But in general, being able to enumerate the members of a set *doesn't entail* also being able to enumerate the non-members. (And vice versa.)

Several times this term we'll be taking up questions about which sets or properties are decidable by an automaton, which it can merely enumerate, and when neither is true.

So far I've been talking about specified sets being *decidable by* certain kinds of automata. If a set is decidable by any conceivable kind of effective procedure *at all* (any conceivable kind of formal recipe or automaton), then it's called just **decidable**. You may also see the terms "computable" or "Turing computable" or "recursive" used to mean the same thing. (In some of these cases, it's a substantive but widely accepted thesis that the terms apply to the same sets.) We will discuss this more later.

If a specified set isn't decidable, but some conceivable kind of effective procedure (some conceivable kind of formal recipe or automaton) can enumerate it, the set is called **semi-decidable** or "recursively enumerable." (Again, it's a substantive but widely accepted thesis that these terms coincide. "Recursive" here and in the previous paragraph has a specific technical meaning, concerning what are sometimes called $\mu$-recursive functions.)

<div class="box">
There is also a use of the bare term "enumerable" applied to sets, but this means something different. We'll come back to that later. When I say that a particular automaton or kind of automaton can enumerate a set, that's one thing; when I say that the set is enumerable, that's different. The uses have some relation to each other, but "enumerable" doesn't mean "can be enumerated by some conceivable automaton."

There is also a use of the term "decidable" to talk not about sets but about sentences in a deductive system. That again is somewhat different from the use introduced here. There what's meant is whether the sentence is either provable or refutable in the specified deductive system. This has some connections to the notion of decidability introduced here, but they're not the same thing. A sentence's not being decidable in a specific formal system doesn't in itself settle the question whether there might be other ways to effectively calculate its truth-value. It would probably be best if we used different terms here, like "deductively decidable (in system so-and-so)" vs "effectively decidable."
</div>


### Deciding whether a string belongs to some formal language ###

Our present question is, given an alphabet $\Sigma$ and some formal language $L \se \Sigma^*$ --- usually specified by a grammar that generates it --- what will it take to decide membership in $L$. That is, what kinds of automata will be able to decide, for arbitrary strings $S \in \Sigma^*$, whether $S \in L$, yes-or-no? When an automaton can do this, we say that it **decides membership in** the language $L$.

<!-- I first called this "recognizing" language L; but Sipser says the language an automaton "recognizes" is the set of strings that it will give a yes answer to: that is, the set of strings it can recursively enumerate. (Help ourselves to an enumeration of $\Sigma^*$ to get started, and run the checks in parallel.) Whereas I would have said say some languages can be enumerated but not recognized by Turing machines, he says they can be recognized but not decided by them. See Sipser, pp. 40 and 169-70.

I think I will just use "recognize" informally, without giving a precise definition.

Other differences from Sipser: he uses $\circ$ for string concatenation, and $\cup$ for regex alternation; and his $\N$ starts with 1.
-->

Membership in many formal languages can be decided by automata much weaker than Turing machines.
There turn out to be deep and interesting parallels between the categories of grammar we mentioned before (regular grammars, context-free grammars, and some more complex grammars) and what kinds of automata are capable of recognizing the languages those grammars generate.


Earlier I introduced the notion of a **regular language**, as being a language generatable by some regular grammar.

<div class="box">
It's also possible to define these languages in algebraic terms. One definition goes:

* Let $\Sigma$ be a finite alphabet, and $f$ be a monoid (homo)morphism from $\Sigma^*$ to some finite monoid $M$. Let $N$ be any subset of $M$. Then $\set{w\in \Sigma^* \where f(w) \in N}$ is a regular language.

Another definition goes:

* Define an equivalence relation $\sim$ on a language $L\se\Sigma^*$, where
$u\sim v$ iff $L$ is closed wrt prefix replacement of $u$ by $v$; that is, $\forall w\in\Sigma^*(u\concat w\in L\latin{ iff }v\concat w\in L)$. Then $L$ is regular iff the number of equivalence classes $\sim$ partitions it into is finite. <!-- Additionally, the number of equivalence classes is the minimal number of states needed for a DFA that recognizes the language. Altogether this is the Myhill-Nerode Theorem. -->

We'll come back to the notion of an equivalance class in a few weeks.
</div>

It turns out that regular languages are exactly the languages whose membership is decidable
by a very weak kind of automaton known as a **finite-state**
machine or automaton (sometimes abbreviated FSM or FSA). For those languages,
there is always an finite-state automaton capable of giving a definite and
correct answer to the question whether an arbitrary string belongs to the
language. Of course, I don't mean you could build one and it will never break.
As I've said, these are mathematical models or recipes, not real machines. I
called them "weak" because they are limited in certain ways; but this category
includes automata that are too complex for humans to ever build a real instance
of. Don't worry, though. The ones we will work with will be very simple. You'll
be able to draw them on a single piece of paper.

Given a regular language, there is always a finite-state automaton capable of deciding which strings belong to it and which don't. The converse is also true; given any finite-state automaton, there will be a regular language whose membership it decides. (It may be that it is a language recognized by distinct finite-state automata, as well.) This is what we learn from Kleene's Theorem, discussed in the Epp reading. Kleene's Theorem explicitly concerns the relation between finite-state automata and languages generated by Kleene's regexes, not languages generated by regular grammars. But it is straightforward to translate between these regexes and regular grammars, and the languages they generate are the same. (For more details, see the optional <a href="restricted/Sipser-Ch1.pdf">Sipser Chapter 1</a> reading.)

In general, automata come in two varieties: **deterministic** and **non-deterministic**. Non-determinism here doesn't have to do with chance or randomness. It's probably a misleading label for philosophers. But this terminology is well-entrenched. I will explain what non-determinism is later. For the moment, we will focus on deterministic finite-state automata. These are often abbreviated as **DFA**.

DFAs are the kinds of machines you see drawn in the directed graphs on p. 223 of the Gamut reading, and on pp. 792 and following in the Epp reading. These automata have a number of *internal states*, represented by the vertices in those graphs. There are only a finite number of such states, that's why they are called *finite*-state automata. Graphs like these are also used in the specification of more powerful automata, but those more powerful automata also have memory registers, or a stack where they can write information to retrieve later. Turing machines have an infinite array of memory. Not the poor finite-state automata. They don't have any memory other than what is implicit in what vertex they happen to "occupy" at a given step of their computation. That is what makes them so weak.

So the DFA has a certain number of internal states. One of these is marked as the *starting state*. The DFA also has a 
*next-state function*, which specifies what it should do next, depending on what state it happens to be in and what input it then receives.
For DFAs, the menu of what to do next is rather limited. There's no additional memory they can write to or read from. All they can do is move to another internal state. (Or they can stay in the state they're in.) How they will choose is specified by their next-state function, and displayed in the directed graphs you see. Suppose there's an arrow leading from vertex $s_1$ to vertex $s_2$, labeled with some letter. This means, if you're in vertex $s_1$ and you get that letter as input, then go to vertex $s_2$. Then you're at the next step of the computation. At each step, the automaton is fed a new input letter. It can't backtrack and see what it was fed before. It's just force-fed one new letter per step. These are the letters of the string they're trying to recognize, conventionally read from left-to-right. (Given the way I orginally defined strings, though, it'd be more natural to go in the other direction, wouldn't it?)

When a DFA has been fed all of the input, that's it. It doesn't do anything further. It just stays in whatever internal state it last moved to. However, we'll have marked some of the states as being "accepting states" and the rest as being "rejecting states." What that means is that if a string is fed to the automaton and when it stops it's in a "accepting state," then its answer to the question whether that string belongs to the language is "yes." Else the answer is "no." That's about it.

When there are multiple accepting states, we sometimes attach special signficance to which one the automaton stopped in.


HOMEWORK EXERCISES:

(51) Work through Epp's examples 12.2.4, 12.2.5, 12.2.6, 12.2.7. Try to answer the questions she poses. She then gives the solutions and talks through an explanation of them, so I don't want you to reproduce that. You can just tell me "Yes I've worked through these examples and think I understand them."


With the automata you just looked at, for every vertex and every possible next input letter, there was always exactly one arrow leading away from that vertex (perhaps back to the same vertex). We might adopt a convention to sometimes allow there to be zero arrows for some input letters. That's to be understood as though there *were* an arrow with that label, but leading to an undisplayed vertex that was "rejecting" and which led back to itself for every possible further input. So a kind of black hole, from which no further input will summon a verdict of "accept." There's a graph at the bottom of Epp's p. 803 that is missing some arrows, so needs to be interpreted like this.

That graph at the bottom of p. 803 has another curious feature as well. It's *not just* that sometimes it has zero arrows for a given input: leading from state $s_0$ we see that there are also *two* arrows labeled with the input $1$. What could this mean?

This is what makes that automaton *non-deterministic*. In particular, it's a non-deterministic finite-state automaton, often abbreviated as **NFA**.
You're *not* supposed to understand "non-determinism" here as meaning that the automaton chooses one of the arrows randomly. Rather, you should think that all of the outgoing arrows are (potentially) explored. You could think of all the explorations running simultaneously, on parallel computer chips, each consuming the next piece of input at the same time. (And perhaps in the course of doing so, we'd have to split into multiple processes again... and again...) Or you could think that the NFA just explores one outgoing arrow at a time, but keeps a snapshot of where it was before it started, so that if things go bad it can rewind to this point, including rewinding the input, and start again with another arrow. If *any* of the explorations lead to the input being exhausted when the NFA is in an "accepting" state, then the string counts as accepted. (Any pending snapshots can then just be ignored.) If *none* of the explorations end with the NFA stopped in an "accepting" state, then the string counts as rejected.

Sometimes, the arrows in an NFA's decision graph are labeled with an $\emptystring$ instead of a letter from the relevant alphabet. When the automaton enters a state with one or more outgoing arrows labeled with $\emptystring$, then --- before consuming any further input --- it "splits" into multiple exploration paths, one following each of those arrows, and one remaining in the current state. Otherwise this works just as described above.

In the case of *finite-state* automata, there is only a practical difference between the deterministic and non-deterministic varieties. It's possible in principle to translate any NFA into a corresponding DFA (though it may be complicated). In the case of more powerful automata, though, there is sometimes a theoretical difference, and the non-deterministic versions can recognize more languages.

Even in the case of NFAs, it can in practice be difficult to construct a corresponding DFA. But the example on Epp's p. 803 is not of this sort. You can easily produce a DFA that recognizes the same input as that NFA.

HOMEWORK EXERCISE:

(52) There are some issues with the example on the bottom of Epp's p. 803. First of all, the displayed automaton lacks outgoing arrows for some possible inputs. But I explained above how we can interpret that. (Let's suppose the possible input letters are just $0$ or $1$.) Another issue is that in the text accompanying her diagram, she says that the automaton "accepts the set of all strings beginning with a 1." But that is not correct; anyway, not if we interpret the absence of arrows in the natural way I've proposed. Ignore Epp's claims about what the automaton does and just look at the diagram. Describe a DFA, and a regex, that accept exactly the same strings as the NFA displayed in that diagram.


### More powerful automata ###

Membership in languages that are non-regular can only be decided by automata that are more powerful than finite-state automata.

*  One way to get a more powerful automaton is to equip it with a "stack", where the automaton can leave notes to itself and base decisions upon later. These are called [pushdown automata](http://en.wikipedia.org/wiki/Pushdown_automaton). Given any language generated by a context-free grammar, one can specify a nondeterministic pushdown automaton, needing only finitely many states and a single stack, which can decide membership in that language. (See the end of the optional <a href="restricted/Sipser-Ch2.pdf">Sipser Chapter 2</a> reading for details.)

*  If you equip a pushdown automaton with *two* stacks, then it becomes much more powerful --- in fact, as powerful as a Turing machine. But there are automata intermediate in strength between one-stack pushdown automata and Turing machines. Some of these are called linear-bound automata; they're essentially a Turing machine (see below) handicapped by having only a finite amount of memory. These are capable of deciding membership in languages generated by context-sensitive grammars, which we've mentioned but haven't (and won't) discuss in detail.


As I mentioned earlier, Turing machines are regarded as fundamentally the most powerful automata we can specify. In the early part of the 20th century, there were a variety of strategies for formalizing the notion of "effectively calculating" some result, and these all turned out to be provably equivalent to Turing machines.
(There's no reason why we have to take Turing machines rather than the other formalisms as being more fundamental; it's just typical pedagogy to focus on them.)

At the same time, we've accumulated a variety of questions that we can prove *aren't* answerable, in general, by a Turing machine. (Usually, the questions will be answerable in special cases. But not for arbitrary well-defined inputs.) This has prompted some theorists to explore the notion of [Hypercomputers](http://en.wikipedia.org/wiki/Hypercomputation). Such machines are most likely not physically constructable, but arguably neither are Turing machines. Regardless of that, many theorists aren't inclined to count the methods employed by hypercomputers as still being "effective," that is, the kind of rote or mechanical procedure we're trying to formalize. So generally, Turing machines, and the other formalisms that are equivalent to them, are thought of as constituting the limits of what can be effectively calculated or decided. (This belief is called the **Church-Turing thesis**.)

Like the other enhanced automata we described, Turing machines are equipped with memory apart from the vertices of their decision graph. This memory is thought of as a tape that is potentially infinitely long, in at least one direction.
Turing machines read their input from, and write their output to, that tape, rather than being force-fed it one letter at a time, as finite-state automata are.


*   For languages generated by the most complex grammars, even Turing machines aren't able to decide whether arbitrary strings are included. They are however able to enumerate all the strings that are included.

    I believe that no natural category of grammar is known to generate exactly the languages where Turing machines *can* decide membership.

<div class="box"><p>
It is provable that there are more languages (sets of strings over a given alphabet) than there are Turing machines. It follows that some languages aren't even enumerated by any Turing machine. We cannot give grammars that generate those languages; but we can formally specify some of them. For any language $L\se\Sigma^*$ that Turing machines can enumerate but not decide membership in, the language $\Sigma^*-L$ must be one that no Turing machine can enumerate. If some Turing machine *could* enumerate it, then we could specify a machine that emulated the enumeration of both $L$ and its complement, and stopped as soon as a specified string appeared in either list; thereby deciding yes-or-no whether that string belongs to $L$. Which by assumption, no Turing machine can do.
</div>


Turing machines can be specified that are capable of taking as input descriptions of *arbitrary other* Turing machines, and the inputs they might be provided, and returning as answer what the other Turing machines would say the answer is --- if the other Turing machine ever *would give* an answer, rather than go on running forever. Turing machines of this sort are called **Universal Turing machines**.

No Turing machine is capable of saying whether arbitrary Turing machines *would* stop when provided with specified input. This is known as the **halting problem**, and it was an important result in the theory of computability that this question is not decidable by a Turing machine. And thus, we presume, not decidable by any effective recipe. It's important to remember we are talking about being able to give an answer for arbitrary cases. There are some Turing machines where you can just see by looking at them that they won't stop, and Turing machines can see it too. What there can't be is a recipe for them to give a definite and correct yes-or-no answer about *any* Turing machine, given *any* well-defined input. To learn more, see [here](http://www.futilitycloset.com/2012/02/20/scooping-the-loop-snooper) and
[here](http://en.wikipedia.org/wiki/Halting_problem).

This fact is connected to some other important metalogical results in the 20th century, that we will return to later in the course.


### What is and isn't decidable about a grammar ###

Regular grammars and finite-state automata are mathematically very tractable; many questions we'd want to ask about them are effectively decidable. (That is, decidable by some effective recipe, not necessarily by a finite-state automaton.)

For instance, given a specification of a regular grammar or of a finite-state automaton, it is decidable whether:

* They match some specified string.

* They match *any* string at all.

* They match only a finite number of strings.

* They match *every* string in the free monoid over the relevant alphabet.

* They match a subset of the strings matched by another specified regular grammar or finite-state machine.

* They match *exactly the same* strings as another specified regular grammar or finite-state machine.

* They match *at least one string* also matched by another specified regular grammar or finite-state machine.

For CFGs, however, things are less pretty. Some of these questions may be answerable in particular cases, but only the first three of them are decidable in general when we're talking about CFGs rather than regular grammars.

It's also undecidable in general whether a CFG is ambiguous; and whether the language a CFG generates is regular, that is, is also generated by some regular grammar.