next up previous
Next: Data Driven and Knowledge Up: Formula Parsers Previous: Stochastic Grammars

Graph Rewriting

Graph rewriting uses a graph, consisting of nodes and arcs, to represent a formula. ``Rewrite rules'' are used to progressively reduce the graph, repeatedly replacing subgraphs with new graphs. This technique is described by Blostein and Grbavec . Lavirotte and Pottier  present a system they have implemented that uses this technique, processing scanned LATEX formulae, and an optimisation they have developed for speed.

To parse a formula using a graph grammar approach, a graph is first defined that represents the recognised symbols and the geometric relationship between them. Nodes in the graph, one for each symbol, have attributes holding information about the identity, lexical class (letter, digit, etc.) and position of symbols in the formulae. Arcs in the graph represent relative positions of the symbols, above, below, up-right, down-right, etc.

Rules in the grammar are also graphs, typically one to five nodes in size, defining sub-graphs that are searched for in the graph representing the formula. These sub-graphs are typically templates for expressions or parts of expressions found in formulae. As the rules in graph grammars are productions, each rule also contains a second graph that is the result of applying the rule: what the first graph is replaced with. Finally, the rule also contains information on how to transfer the attributes of the nodes of the first graph to the replacement graph. There may also be components in the rule that define how to treat the arcs that were connected to the subgraph that has been removed.

These rules are extended by Lavirotte and Pottier to contain graphs that define the contexts in which it is allowable to apply a particular rule. This increases the efficiency of the parsing process, as the contexts define extra conditions that must be met before a rule is applied. The extra conditions remove some ambiguities from the grammar, and thus reduce the possibility of exploring erroneous derivations due to choosing the wrong rule.

As the parsing process works by successively finding sub-graphs and replacing them with smaller graphs, at the end of the parsing process a single node is left representing the original formula. The original formula can then be determined by examining the attributes of this remaining node.

Graph rewriting is a very general and powerful tool and has been used for a wide range of tasks. Graph rewriting has been used in the recognition and interpretation of schematic diagrams , and the description and processing of ``structured'' pictures, including flow charts, organic chemistry molecules, and images of particle trajectories produced in physics experiments.

Blostein reports that graph grammars are tolerant to irregular symbol positioning, as found in handwritten input. Work by Lavirotte and Pottier has only been on single typeset formulae and works well. Input to their system is scanned images of printed LATEX formulae.

Lavirotte and Pottier optimise the graph reduction process by adding context information to the rules in the graph grammar that avoid ambiguities where two or more rules can be applied in a given situation. These rules are created semi-automatically, also using information such as operator precedence information supplied by the person who builds the grammar.

As a graph rewriting system is used in this thesis, it is described in more detail in Chapter [*].

next up previous
Next: Data Driven and Knowledge Up: Formula Parsers Previous: Stochastic Grammars
Steve Smithies