The union-find structure and semantic parsing
In mathematics, a set is a collection of objects, regardless of their nature. For example, we can have a set of sets. Two sets are disjoint if they have no elements in common. A disjoint-set (data) structure (also called union-find structure) is a set of mutually disjoint (i.e. non-overlapping) sets.
A disjoint-set structure is, for example, {{1}, {2}, {3}, {4}, {5}}, whose elements are called singletons because they each have exactly one element (a natural number). Two sets can be merged via the so-called union operation. For example, we can merge {1} and {2}, getting a new disjoint-set structure, namely {{1, 2}, {3}, {4}, {5}}.
The simplest way of implementing a disjoint-set structure is like this:
function makeSet(x)
x.parent = x
function find(x)
if (x.parent == x)
return x
else
return find(x.parent)
function union(x, y)
xRoot = Find(x)
yRoot = Find(y)
xRoot.parent = yRoot
Each set is implemented as a tree and represented by its root. What are disjoint-set structures good for? The can be used, for example, to semantically analyse sentences. Consider the sentence
Brutus stabbed Caesar.
The words occurring in it can be characterised as follows:
| Word | Syntax | Semantics |
|---|---|---|
| Brutus | NP | Brutus(x1) |
| stabbed | V | stab(e1, x2, x3) |
| Caesar | NP | Caesar(x4) |
Brutus and Caesar are noun phrases (NP) and stabbed is a verb (V). Semantically, stab is a ternary predicate, stab(e1, x2, x3), where e1 is an event representing a stabbing action, x2 is the agent of the action (the stabber) and x3 the undergoer (the stabbee).
We can use a chart (a data structure invented by Alain Colmerauer for efficient parsing) to represent the sentence:
Brutus(x1) stabbed(e1,x2,x3) Caesar(x4) +—————————————————+—————————————————+—————————————————+
The objects figuring in the sentence (or the complex event) can be represented as a disjoint-set structure: {{e1}, {x1}, {x2}, {x3}, {x4}}.
Applying the syntactic rule VP → V NP, we can add a new edge to the chart:
stabbed(e1,x2,x3) & Caesar(x4)
+———————————————————————————————————+
Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) |
+—————————————————+—————————————————+—————————————————+
However, the application of this rule also unifies x3 and x4 so we need to merge the two singletons:
stabbed(e1,x2,x3) & Caesar(x4)
x3 = x4
+———————————————————————————————————+
Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) |
+—————————————————+—————————————————+—————————————————+
We now apply the rule S → NP VP, which adds another edge to the chart spanning the whole sentence:
Brutus(x1) & stabbed(e1,x2,x3) & Caesar(x4)
x1 = x2 & x3 = x4
+—————————————————————————————————————————————————————+
| stabbed(e1,x2,x3) & Caesar(x4) |
| x3 = x4 |
| +———————————————————————————————————+
| Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) |
+—————————————————+—————————————————+—————————————————+
This rule application unifies x1 and x2. The disjoint-set structure now is {{e1}, {x1, x2}, {x3, x4}} and if we rename the objects that are equal, the sentence is semantically represented by the formula Brutus(x1) & stab(e1,x1,x3) & Caesar(x3).