# The union-find structure and semantic parsing

In mathematics, a set is a collection of objects, regardless of their nature. For example, we can have a set of sets. Two sets are disjoint if they have no elements in common. A *disjoint-set (data) structure* (also called *union-find structure*) is a set of mutually disjoint (i.e. non-overlapping) sets.

A disjoint-set structure is, for example, `{{1}, {2}, {3}, {4}, {5}}`

, whose elements are called singletons because they each have exactly one element (a natural number). Two sets can be merged via the so-called *union* operation. For example, we can merge `{1}`

and `{2}`

, getting a new disjoint-set structure, namely `{{1, 2}, {3}, {4}, {5}}`

.

The simplest way of implementing a disjoint-set structure is like this:

function makeSet(x) x.parent = x function find(x) if (x.parent == x) return x else return find(x.parent) function union(x, y) xRoot = Find(x) yRoot = Find(y) xRoot.parent = yRoot

Each set is implemented as a tree and represented by its root. What are disjoint-set structures good for? The can be used, for example, to semantically analyse sentences. Consider the sentence

Brutus stabbed Caesar.

The words occurring in it can be characterised as follows:

Word | Syntax | Semantics |
---|---|---|

Brutus | NP | Brutus(x1) |

stabbed | V | stab(e1, x2, x3) |

Caesar | NP | Caesar(x4) |

*Brutus* and *Caesar* are noun phrases (NP) and *stabbed* is a verb (V). Semantically, *stab* is a ternary predicate, `stab(e1, x2, x3)`

, where `e1`

is an event representing a stabbing action, `x2`

is the agent of the action (the stabber) and `x3`

the undergoer (the stabbee).

We can use a *chart* (a data structure invented by Alain Colmerauer for efficient parsing) to represent the sentence:

Brutus(x1) stabbed(e1,x2,x3) Caesar(x4) +—————————————————+—————————————————+—————————————————+

The objects figuring in the sentence (or the complex event) can be represented as a disjoint-set structure: `{{e1}, {x1}, {x2}, {x3}, {x4}}`

.
Applying the syntactic rule `VP → V NP`

, we can add a new edge to the chart:

stabbed(e1,x2,x3) & Caesar(x4) +———————————————————————————————————+ Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) | +—————————————————+—————————————————+—————————————————+

However, the application of this rule also unifies `x3`

and `x4`

so we need to merge the two singletons:

stabbed(e1,x2,x3) & Caesar(x4) x3 = x4 +———————————————————————————————————+ Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) | +—————————————————+—————————————————+—————————————————+

We now apply the rule `S → NP VP`

, which adds another edge to the chart spanning the whole sentence:

Brutus(x1) & stabbed(e1,x2,x3) & Caesar(x4) x1 = x2 & x3 = x4 +—————————————————————————————————————————————————————+ | stabbed(e1,x2,x3) & Caesar(x4) | | x3 = x4 | | +———————————————————————————————————+ | Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) | +—————————————————+—————————————————+—————————————————+

This rule application unifies `x1`

and `x2`

. The disjoint-set structure now is `{{e1}, {x1, x2}, {x3, x4}}`

and if we rename the objects that are equal, the sentence is semantically represented by the formula `Brutus(x1) & stab(e1,x1,x3) & Caesar(x3)`

.