# The union-find structure and semantic parsing

In mathematics, a set is a collection of objects, regardless of their nature. For example, we can have a set of sets. Two sets are disjoint if they have no elements in common. A disjoint-set (data) structure (also called union-find structure) is a set of mutually disjoint (i.e. non-overlapping) sets.

A disjoint-set structure is, for example, `{{1}, {2}, {3}, {4}, {5}}`, whose elements are called singletons because they each have exactly one element (a natural number). Two sets can be merged via the so-called union operation. For example, we can merge `{1}` and `{2}`, getting a new disjoint-set structure, namely `{{1, 2}, {3}, {4}, {5}}`.

The simplest way of implementing a disjoint-set structure is like this:

```function makeSet(x)
x.parent = x

function find(x)
if (x.parent == x)
return x
else
return find(x.parent)

function union(x, y)
xRoot = Find(x)
yRoot = Find(y)
xRoot.parent = yRoot
```

Each set is implemented as a tree and represented by its root. What are disjoint-set structures good for? The can be used, for example, to semantically analyse sentences. Consider the sentence

Brutus stabbed Caesar.

The words occurring in it can be characterised as follows:

WordSyntaxSemantics
BrutusNPBrutus(x1)
stabbedVstab(e1, x2, x3)
CaesarNPCaesar(x4)

Brutus and Caesar are noun phrases (NP) and stabbed is a verb (V). Semantically, stab is a ternary predicate, `stab(e1, x2, x3)`, where `e1` is an event representing a stabbing action, `x2` is the agent of the action (the stabber) and `x3` the undergoer (the stabbee).

We can use a chart (a data structure invented by Alain Colmerauer for efficient parsing) to represent the sentence:

```     Brutus(x1)    stabbed(e1,x2,x3)    Caesar(x4)
+—————————————————+—————————————————+—————————————————+
```

The objects figuring in the sentence (or the complex event) can be represented as a disjoint-set structure: `{{e1}, {x1}, {x2}, {x3}, {x4}}`. Applying the syntactic rule `VP → V NP`, we can add a new edge to the chart:

```                     stabbed(e1,x2,x3) & Caesar(x4)
+———————————————————————————————————+
Brutus(x1)   |stabbed(e1,x2,x3)    Caesar(x4)    |
+—————————————————+—————————————————+—————————————————+
```

However, the application of this rule also unifies `x3` and `x4` so we need to merge the two singletons:

```                     stabbed(e1,x2,x3) & Caesar(x4)
x3 = x4
+———————————————————————————————————+
Brutus(x1)   |stabbed(e1,x2,x3)    Caesar(x4)    |
+—————————————————+—————————————————+—————————————————+
```

We now apply the rule `S → NP VP`, which adds another edge to the chart spanning the whole sentence:

```     Brutus(x1) & stabbed(e1,x2,x3) & Caesar(x4)
x1 = x2 & x3 = x4
+—————————————————————————————————————————————————————+
|                    stabbed(e1,x2,x3) & Caesar(x4)   |
|                               x3 = x4               |
|                 +———————————————————————————————————+
|    Brutus(x1)   |stabbed(e1,x2,x3)    Caesar(x4)    |
+—————————————————+—————————————————+—————————————————+
```

This rule application unifies `x1` and `x2`. The disjoint-set structure now is `{{e1}, {x1, x2}, {x3, x4}}` and if we rename the objects that are equal, the sentence is semantically represented by the formula `Brutus(x1) & stab(e1,x1,x3) & Caesar(x3)`.

NlpAlgorithmSemantics

#### Written byPetr Homola

Studied physics & CS, PhD in NLP, interested in AI, HPC & PLT