Consider the following simplified version of the grammar for Kestrel expressions:

<expression> ::= <term> { ( '+' | '-' ) <term> } <term> ::= <factor> { ( '*' | '/' ) <factor> } <factor> ::= [ '-' ] ( <number> | '(' <expression> ')' }

(The simplifications we made eliminated comparing and logical operators, variables and non-numeric constants.)

The question we face is, how do we construct a parser that can take an input string and match it to this grammar. A human working at parsing a string might proceed as follows, rewriting the string after each step:

- 4 * 5 + 6 - ( 7 + 8 ) / 9 factor * factor + factor - (factor + factor) / factor term + term - ( term + term ) / factor term + term - ( expression ) / factor term + term - factor / factor term + term - term expression

Unfortunately, this view of the problem does not give us an algorithm. People can take a breadth-first approach to problems like this, doing large numbers of substitutions along the entire string, and then working on the substituted string. Note, however, that what we have shown above can also be described by a tree:

- 4 * 5 + 6 - ( 7 + 8 ) / 9 \ | | | | | | | | | | | | | factor | factor | factor | |factor | factor| | factor \ | / | | | | | | | | | / term | term | | term | term | / / | \ | | | \ | / / / / \ \ \ | \ expression / / / \_______ \ \ \ \_ | _/ / / \ \ \ \ \ | / _/ / \ \ \ | factor / _/ \ \ | | \ / / \ | | | term \ | | | ________/ \ | | | / expression

Such a tree is called a *parse tree*. This parse tree exactly reflects
the syntactic relationships implied by the above grammar, but it says
little about one important issue. Our top-level expression contains two
operators, '+' and '-'. What is the order of evaluation of operators that
occur at the same level in the expression?

The above grammar is loaded with extended BNF constructs, and these
obscure several issues. Let's remove them! There are, unfortunately,
two distinct ways to do this (not counting mixed approaches). We could
expand our grammar to the following, which is referred to as
a *left-recursive*
grammar because the recursion occurs on the left side of each rule:

<expression> ::= <term> | <expression> '+' <term> | <expression> '-' <term> <term> ::= <factor> | <term> '*' <factor> | <term> '/' <factor> <factor> ::= <number> | '-' <number> | '(' <expression> ')' | '-' '(' <expression> ')'

Alternatively, we could produce this grammar, which is *right recursive*:

<expression> ::= <term> | <term> '+' <expression> | <term> '-' <expression> <term> ::= <factor> | <factor> '*' <term> | <factor> '/' <term> <factor> ::= <number> | '-' <number> | '(' <expression> ')' | '-' '(' <expression> ')'

Which is correct? First and foremost, *these two grammars describe
the exact same set of strings* and therefore *they describe the
same language*, so long as you define the language only as the set
of strings, without ascribing any meanngs to the strings.

The problem we face is this: These two grammars give distinctly different
interpretations to such expressions as "`5*6/7`". The left-recursive
grammar above gives the left tree below,
while the right-recursive grammar gives tree to the right:

5 * 6 / 7 5 * 6 / 7 | | | | | | | | | | factor | factor | factor factor | factor | factor | | | | / \ | | | | term | / / / \ \ \ | term \_ | / / / \ \ \ | _/ \ | / / / \ \ \ | / term / / \ \ term \ / __/ \__ \ / \ | / \ | / term term | | expression expression

Unless we apply some very complex semantic rules, the first grammar above and the left parse tree imply left-to-right evaluation of operators, giving the value 4, while the second grammar and left parse tree above imply right to left evaluation, giving the value 0.

(Semantic rules can be formulated that give either value for either parse tree, but to do this, we have to interpret the value of each subtree as something quite different from a simple integer value, for example, as a list, with complex semantics at each level for constructing new lists, and an evaluator at the very top level that actually applies the operators.)

The basic shift-reduce parser operates by scanning the input text from end to end (left to right, in European languages, but right to left in Arabic or Hebrew). In addition to the input text, the parser has a push-down stack, and the ability to examine the top elements of the stack. In the following figures, we will show the stack top to the right, separated from the input text by a double vertical bar. Consider this input text, shown with an empty stack:

|| - 4 * 5 + 6 - ( 7 + 8 ) / 9

We will parse this against the following grammar, the left recursive grammar presented previously. In order to simplify the presentation, we'll number the production rules:

1 <expression> ::= <term> 2 | <expression> '+' <term> 3 | <expression> '-' <term> 4 <term> ::= <factor> 5 | <term> '*' <factor> 6 | <term> '/' <factor> 7 <factor> ::= <number> 8 | '-' <number> 9 | '(' <expression> ')' 10 | '-' '(' <expression> ')'

Our parser begins by reading one symbol from the input and pushing it onto the stack:

- || 4 * 5 + 6 - ( 7 + 8 ) / 9

Our grammar does not have any production rules that match a single minus sign, so we repeat, scanning forward another lexeme:

- <number> || * 5 + 6 - ( 7 + 8 ) / 9

At this point, we can apply rule 8, reducing the stack to this:

<factor> || * 5 + 6 - ( 7 + 8 ) / 9

A factor standing alone is a term, according to rule 4, so we can also do this reduction:

<term> || * 5 + 6 - ( 7 + 8 ) / 9

How did we know not to apply rule 1 to the above?
The answer is, we peeked ahead
at the next lexeme, the `'*'`, and noticed that
`<term> '*'` shows up at the start of another rule, but
`<expression> '*'` does not.

Now, we shift in another symbol:

<term> * || 5 + 6 - ( 7 + 8 ) / 9

No rules match at this point, so we shift in yet another symbol:

<term> * <number> || + 6 - ( 7 + 8 ) / 9

At this point, we can apply rule 7 to reduce the number to a factor:

<term> * <factor> || + 6 - ( 7 + 8 ) / 9

Now, we can apply rule 5:

<term> || + 6 - ( 7 + 8 ) / 9

This time, we can apply rule 1 because
the next lexeme is `'+'` and
`<expression> '+'` does show up at the head of a rule.
This gives us:

<expression> || + 6 - ( 7 + 8 ) / 9

And then shift twice:

<expression> + || 6 - ( 7 + 8 ) / 9 <expression> + <number> || - ( 7 + 8 ) / 9

And then apply rules 7 and 4 and then 2:

<expression> + <factor> || - ( 7 + 8 ) / 9 <expression> + <term> || - ( 7 + 8 ) / 9 <expression> || - ( 7 + 8 ) / 9

Now, we shift 3 times before we find any rules we can apply:

<expression> - || ( 7 + 8 ) / 9 <expression> - ( || 7 + 8 ) / 9 <expression> - ( >number< || + 8 ) / 9

At this point, we can apply rules 7 and 4 and then 1:

<expression> - ( >factor< || + 8 ) / 9 <expression> - ( >term< || + 8 ) / 9 <expression> - ( >expression< || + 8 ) / 9

No rules apply here, so we shift and shift again:

<expression> - ( >expression< + || 8 ) / 9 <expression> - ( >expression< + <number> || ) / 9

At this point, we can apply rules 7, 4 and 2:

No rules apply here, so we shift, allowing us to apply rules 9 and 4:<expression> - ( >expression< + <factor> || ) / 9 <expression> - ( >expression< + <term> || ) / 9 <expression> - ( >expression< || ) / 9

<expression> - ( >expression< ) || / 9 <expression> - >factor< || / 9 <expression> - >term< || / 9

Again, we did not apply rule 1 because we looked ahead and noticed
that `<term> '/'` is the start of a rule, while
`<expression> '/'` is not.

At this point, we shift twice, and then apply rules 7, 6 and 3:

<expression> - >term< / || 9 <expression> - >term< / <number> || <expression> - >term< / <factor> || <expression> - >term< || <expression> ||

And we are done! No more reductions are possible and there is no further input to shift into the stack. If we reach this state with the stack containing the sentence symbol of our grammar, the input was syntactically correct. If we reach this state with anything else on the stack, the input contained a syntax error. Unfortunately, this parsing method does not directly produce any hint about the nature of the error.

If we had not had the idea of looking ahead into the input string, we'd have had to do a backtrack search for a parse. The idea of looking ahead can be formalized by listing, for each production rule, the set next symbols that permit it to be applied. This is a purely mechanical job, and it is possible to write a program that takes a grammar as input and either produces the table giving, for each rule, the next symbol that permits that rule, or rejects the grammar.

Rejecting the grammar does not mean that this method cannot be used to parse the language that this grammar describes, but rather, it means that the grammar itself is not suited to this method. There may be other grammars for the same language that this method can be applied to. If given a "parser construction" tool that produces the parse tables for a shift-reduce parser, the compiler writer will typically need to spend time "massaging" the grammar, reformulating it to meet the constraints of the parser generator.