The Formula Parser

The formula parser sometimes was not able to parse a user's formula quickly, or took a long time to determine that a formula was in error. Ideally, parsing should take no more than ten or twenty seconds, but it can take many minutes. Also, there were interpretation problems that increased the parsing time, or returned erroneous interpretations of formulae.

As the parser makes the initial assumption that the first matching rule it finds in the grammar is the correct one to apply, well formed formulae parse well. It is only when it has to backtrack and generate all the children of a node that it takes a significant amount of time. The backtracking occurs either when there is an error in the formula and it has to exhaust all possible paths of derivation before it determines it is unparsable or, more commonly, the parser makes a mistake in its initial interpretation.

To reduce the chance of the parser incorrectly parsing formulae, the
formula parser can be made more selective about the relationships
between items, making the regions accepted by the geometric tests, as
described in Section 3.1.4, smaller. Thus, the
*x*^{2} in Figure 6.9 would be too far from
the 2 to be combined. Unfortunately, doing this also limits the
parser's ability to cope with the unpredictable nature of handwritten
input. It also does not eliminate the problem, it just reduces it.
Another alternative is to change the shape of the areas in which
symbols are looked for, from the overlapping rectangular regions that
are currently being used to something based more on a more empirical
or intelligent study of writing. It might also help to have regions
defined on a per-operator basis. In this way, the region for an upper
limit on an integral would be differently shaped to the region for an
exponent.

The use of context rules, as done by Lavirotte and Pottier and mentioned in Section , would provide the greatest improvement of the system's performance by reducing the frequency of interpretation errors and increasing the parser's speed. The contextual rules that Lavirotte and Pottier add to the grammar allow such things as reserving the limits for the integral. The additional computation required for the contextual checks would most likely be far outweighed by their gains. It is also possible that using some sort of stochastic grammar would help to rule out the unlikely interpretations returned for some formulae, in favour of more likely ones.

The speed of the parsing process can be improved, from the current
*O*(*n*!) time complexity, by using advanced graph matching techniques,
such as those described by Bunke and
Messmer . If *L* = the
number of rules in the grammar, *g* = the number of nodes in the current
formula's graph, and *n* = total number of nodes in a rule in the
grammar, then the speed of their approach varies between *O*(*L*.*g*.*n*) for
best case: when all the rule-graphs have the same structure, and
*O*(*L*.*g*^{n}.*n*^{2}) for the worst case: where none of the rule-graphs
share any common structure.

The current speed of the parser, even when it works well, is still not fast enough for real time processing of formulae. The formula processing will always have to be an additional step that the user waits for. For real time parsing, the parsing would have to be completed within a second of the user writing a symbol, otherwise it runs the risk of seeming too slow. A sacrifice in speed is the tradeoff for having a system which allows completely arbitrary input order. Systems such as Littin's , that use a modified SLR(1) parser to offer real-time parsing of formulae, restrict the order of entry of symbols in a formula.

, that use a modified SLR(1) parser to offer real-time parsing of formulae, restrict the order of entry of symbols in a formula.