The Formula Processor

The formula processor implemented is a graph rewriting system, similar to that described by Lavirotte and Pottier . Using an input of symbols and their bounding boxes, a graph is constructed with nodes representing the symbols in the formula. Each node has data associated with it, holding the identity of the symbol, the location of its bounding box and a unique ID. As parsing proceeds, nodes will come to represent subexpressions within the formula. Directed arcs are built in the graph, within predetermined restrictions, signifying the geometric relationships between the symbols in the formula. Each arc has a label that stores what geometric relationship it represents between the two nodes it connects.

By making it possible to build more than one arc between any two given nodes, uncertain relationships can be represented. For example, to represent the possibility a symbol may be either to the ``top-right'' or ``right'' of another, two arcs are built between their respective nodes.

The restrictions on the building of arcs limit the arcs to ``sensible'' ones, based on their types and positions relative to each other and to other symbols. This helps the later parsing process by simplifying the graph built.

After being constructed from the symbols in the input formula, the graph is parsed by a graph grammar parser. The grammar used by the parser describes the syntax and layout of mathematical formulae through a number of small, typically one to five node, graphs that are templates for subexpressions that are found in mathematical expressions. Figure

shows some rules from a graph grammar.

**Figure 3.1:** Sample rules from a graph grammar.
$\includegraphics[scale=0.6]{figures/plusrule.eps}$ (a) Addition. $\includegraphics[scale=0.6]{figures/powerrule.eps}$ (b) Superscript. $\includegraphics[scale=0.6]{figures/integralrule.eps}$ (c) Integral with limits.

In Figure 3.1, we can see boxes representing nodes in the graph. Each box has two parts. The lower part is the ``data'' component, and the upper part is the ``type'' component. When rules are being matched to a graph representing a formula, the ``type'' values are considered. When the rule is being applied, the ``data'' components dictate the data component of the resulting node after application. As nodes are combined, a new bounding box is calculated based on the nodes being combined.

The superscript rule in Figure 3.1(b) shows how a superscript operation is defined. Two nodes of type ``element'', with a