Disjoint Sets

←Back to Tech Tree

inventorycoverage

Disjoint Sets #

Data StructuresDifficulty: ★★★☆☆Depth: 4Unlocks: 0

Union-Find structure. Near-constant time union and find.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

parent[x] - pointer/index of x's parent (equals x if x is a root)rank[x] - heuristic measure (approximate tree height) used to decide which root becomes parent

Essential Relationships #

Prerequisites (2) #

Trees5 atomsAmortized Analysis5 atoms

Advanced Learning Details

Graph Position #

55

Depth Cost

0

Fan-Out (ROI)

0

Bottleneck Score

4

Chain Length

Cognitive Load #

6

Atomic Elements

34

Total Elements

L2

Percentile Level

L4

Atomic Level

All Concepts (14) #

Teaching Strategy #

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

You’re repeatedly asked questions like “Are a and b in the same group?” while also merging groups over time. Disjoint Sets (Union-Find) is the classic structure that makes those queries fast—so fast that in practice it feels constant-time even for huge inputs.

TL;DR:

Disjoint Sets (Union-Find) maintains a partition of elements into non-overlapping sets. It supports

With path compression + union by rank/size, a sequence of m operations on n elements runs in O(m α(n)) time (α is inverse Ackermann), which is “near-constant” amortized.

What Is Disjoint Sets? (Union-Find) #

Why this data structure exists #

Many problems evolve by merging groups and asking connectivity questions:

A naive approach stores each set explicitly (like a list). Then:

Union-Find is designed for exactly this pattern: lots of operations, interleaving unions and queries, where we want each operation to be almost constant time.

The core abstraction: a disjoint partition #

We maintain a collection of sets S₁, S₂, … such that:

This is a partition of U.

The key idea: each set has a representative (root) #

Instead of storing full sets, we store parent pointers that form a forest of rooted trees.

The representative (root) is a canonical “name” for the set. Two elements are in the same set iff they have the same root.

Formally, define:

What operations we support #

We typically implement three operations:

  1. make-set(x)
  1. find(x)
  1. union(a, b)

Data stored (common variant) #

We’ll use these arrays:

Rank is an estimate of tree height; size is exact set size. Either supports good performance when used correctly.

A small picture in words #

Suppose we have elements 1..7. Initially each is its own root.

After unions, we might have:

Then:

Union-Find’s entire job is to maintain this forest efficiently as it changes.

Core Mechanic 1: Trees of Parent Pointers and the Find Operation #

Why find() is the heart of Union-Find #

Every union needs to know which sets it is merging, and every connectivity query needs to know whether two elements share a set. Both tasks reduce to a single primitive:

Find the representative root.

Without optimizations, find(x) just walks up parent pointers:

That is correct—but could be slow if trees get tall.

Correctness invariant #

Union-Find maintains this invariant:

As long as union() only links root to root, and never creates cycles, the forest remains valid.

Baseline find implementation #

Pseudocode (conceptually):

This runs in O(h), where h is the height of x’s tree. If h becomes O(n), find becomes O(n) and everything collapses.

Path compression: the “secret sauce” for find #

Motivation: If we frequently call find(x), it’s wasteful to traverse the same long path over and over.

Path compression flattens the tree during find:

In a recursive form:

find(x):

Now, the next time we call find on any of those nodes, it returns quickly.

What path compression does (and does not do) #

Important subtlety: path compression alone already helps a lot, but to get the famous near-constant bound we pair it with a good union rule (next section).

A step-by-step mental model #

Suppose we have a chain:

Calling find(1) without compression visits 1,2,3,4,5.

With path compression, after find(1):

Now find(2) is essentially O(1).

Complexity intuition (amortized) #

Path compression can make one find operation do extra work (rewiring pointers), but that work pays off later.

Over a long sequence of operations, the total cost is tiny per operation. The formal result (when combined with union by rank/size) is:

where α(n) is the inverse Ackermann function.

For any realistic n (even n ≈ 10²⁰), α(n) ≤ 5. So in practice, Union-Find behaves like constant time.

Implementation detail: iterative path compression #

Some environments avoid recursion. You can still compress paths iteratively in two passes:

  1. Walk up to find the root.

  2. Walk again and set each visited node’s parent to the root.

Both are common; recursive is compact, iterative avoids stack depth issues.

Core Mechanic 2: Union by Rank/Size (Keeping Trees Shallow) #

Why union() needs a strategy #

A union operation merges two sets. Internally, that means we take two roots r₁ and r₂ and make one the parent of the other.

If we do this carelessly (e.g., always parent[r₂] = r₁), we can create tall trees:

Tall trees make find slow.

So union must be disciplined: always attach the “smaller” or “shallower” tree under the “bigger” or “deeper” one.

Union by size #

Store size[root] = number of nodes in that root’s tree.

Algorithm:

  1. r₁ = find(a), r₂ = find(b)

  2. if r₁ = r₂: already same set

  3. attach smaller to larger:

Why it helps: each time a node’s tree is attached under another, the size of the resulting tree at least doubles. A node can only be “moved upward” O(log n) times.

Union by rank #

Store rank[root], which approximates tree height.

Algorithm:

  1. r₁ = find(a), r₂ = find(b)

  2. if r₁ = r₂: return

  3. compare ranks:

Why rank works: ranks only increase when two equal-rank trees are merged, which is rare per node, and creates a provable bound when combined with path compression.

Rank vs size: which should you use? #

Both are excellent. Rank is slightly more common in textbooks; size can be easier to reason about and gives you set sizes for free.

HeuristicStored valueUnion ruleExtra capabilityTypical use
Union by rankrank[root] ≈ heightattach lower-rank under higher-rank; tie ⇒ incrementnone directlyclassic DSU
Union by sizesize[root] = set sizeattach smaller size under larger sizecan answer component sizesmany contest implementations

Either one + path compression ⇒ near-constant amortized time.

The famous time bound (what it really means) #

With both heuristics:

This is an amortized statement:

“Near-constant” is not marketing—it's math #

α(n) grows so slowly that it’s ≤ 4 for values of n far beyond what you can store in memory.

That’s why Union-Find is one of the few structures where theoretical and practical performance line up remarkably well.

A careful invariant to keep #

Union must connect roots to roots.

That means union(a,b) should do:

If you accidentally do parent[a] = b without finding roots, you can break the forest structure and even create cycles, making find incorrect or non-terminating.

Application/Connection: When and How Disjoint Sets Gets Used #

The pattern: dynamic connectivity under unions #

Union-Find shines when:

It does not directly support efficiently removing connections (dynamic connectivity with deletions is harder).

Kruskal’s algorithm (Minimum Spanning Tree) #

In Kruskal’s algorithm, you sort edges by weight and add them if they don’t create a cycle.

The cycle check is exactly:

So DSU provides cycle detection quickly, making Kruskal efficient.

Cycle detection in an undirected graph #

As you scan edges:

This is a lightweight way to detect cycles without full DFS each time.

Connected components as they evolve #

Suppose you’re building components from interactions:

With union by size, you can also answer:

Percolation / grid connectivity #

In an N×N grid, you may open cells and union adjacent open cells.

Queries like “is there a path from top to bottom?” become connectivity checks between virtual nodes representing borders.

DSU and equivalence relations #

Union-Find is a practical way to maintain an equivalence relation ~ over elements:

Union operations add equivalences; find queries membership in equivalence classes.

How it connects to what you already know #

You already know:

Think of DSU as a case study where an extremely simple structure (parent pointers) becomes powerful when paired with the right heuristics.

Implementation checklist (practical) #

When writing DSU in code:

The resulting code is short, reliable, and performance-friendly.

A note on vectors and math notation #

Union-Find is less about vector math (no v or ‖v‖ needed here), and more about combinatorial structure. Still, it’s grounded in precise invariants and amortized bounds, which is why it belongs in a core data structures track.

Worked Examples (3) #

Worked Example 1: Run DSU by hand (with union by rank + path compression) #

Elements: {1,2,3,4,5,6}. Start with make-set on all.

We perform operations:

  1. union(1,2)

  2. union(2,3)

  3. union(4,5)

  4. find(3)

  5. union(3,4)

Track parent[·] and rank[·].

  1. Initialization:

    parent[x]=x for x=1..6

    rank[x]=0 for x=1..6

    1. union(1,2)

    find(1)=1, find(2)=2 (both roots)

    ranks equal (0 and 0) ⇒ choose 1 as new root

    parent[2]=1

    rank[1]=rank[1]+1=1

    Now: 1 is root of {1,2}

    1. union(2,3)

    First find(2):

    • •parent[2]=1, parent[1]=1 ⇒ find(2)=1

    (With path compression, parent[2] already points to 1, so no change.)

    find(3)=3

    Union roots 1 and 3:

    rank[1]=1 > rank[3]=0 ⇒ parent[3]=1

    Now: root 1 represents {1,2,3}

    1. union(4,5)

    find(4)=4, find(5)=5

    ranks equal ⇒ parent[5]=4 and rank[4]=1

    Now: root 4 represents {4,5}

    1. find(3)

    parent[3]=1, parent[1]=1 ⇒ root is 1

    Path compression step sets parent[3]=1 (already 1)

    So find(3)=1

    1. union(3,4)

    find(3)=1

    find(4)=4

    Compare ranks: rank[1]=1, rank[4]=1 (tie)

    Attach one under the other; pick 1 as root:

    parent[4]=1

    Increment rank[1]: rank[1]=2

    Note: node 5 still has parent[5]=4, but 4 now points to 1.

    After this, a future find(5) will compress 5 → 1.

Insight: Two ideas are visible here: (1) union by rank prevents systematic growth of height; (2) path compression doesn’t need to run on every node immediately—future find calls gradually flatten the structure, yielding the amortized near-constant cost.

Worked Example 2: Cycle detection while scanning edges #

Undirected graph with vertices {A,B,C,D}. Edges arrive in this order:

( A,B ), ( B,C ), ( C,D ), ( D,A )

Use DSU to detect when a cycle first appears.

  1. Initialize:

    parent[A]=A, parent[B]=B, parent[C]=C, parent[D]=D

  2. Edge (A,B):

    find(A)=A, find(B)=B ⇒ different

    union(A,B)

    Now A and B are in same set.

  3. Edge (B,C):

    find(B)=find(A)=A (after compression)

    find(C)=C ⇒ different

    union(B,C) merges C into {A,B,C}

  4. Edge (C,D):

    find(C)=A

    find(D)=D ⇒ different

    union(C,D) merges D into {A,B,C,D}

  5. Edge (D,A):

    find(D)=A

    find(A)=A ⇒ same representative

    Therefore adding (D,A) creates a cycle.

    We can report: the first cycle is detected at edge (D,A).

Insight: Cycle detection is just a same-set query: an undirected edge (u,v) forms a cycle exactly when u and v are already connected. DSU turns that global graph property into two near-constant-time finds.

Worked Example 3: Union by size and component size queries #

Elements {1,2,3,4,5,6,7}. Use union by size + path compression.

Operations:

union(1,2), union(2,3), union(4,5), union(6,7), union(5,6)

Then query: size of component containing 7.

  1. Initialization:

    parent[x]=x

    size[x]=1 for all x

  2. union(1,2):

    find(1)=1, find(2)=2

    sizes tie ⇒ choose 1 as root

    parent[2]=1

    size[1]=2

  3. union(2,3):

    find(2)=find(1)=1

    find(3)=3

    Attach smaller to larger:

    parent[3]=1

    size[1]=3

  4. union(4,5):

    parent[5]=4

    size[4]=2

  5. union(6,7):

    parent[7]=6

    size[6]=2

  6. union(5,6):

    find(5): parent[5]=4 ⇒ root 4

    find(6)=6

    sizes: size[4]=2, size[6]=2 tie ⇒ pick 4

    parent[6]=4

    size[4]=4

    Now 7 still points to 6, and 6 points to 4.

  7. Query size of component containing 7:

    find(7): 7 → 6 → 4 (root)

    Path compression updates parent[7]=4 (and possibly parent[6]=4 already)

    Answer = size[4] = 4

Insight: Union by size gives you component sizes essentially for free. Path compression ensures that repeated size queries become extremely fast because nodes quickly learn their true root.

Key Takeaways #

Common Mistakes #

Practice #

easy

You have elements {1..8}. Perform: union(1,2), union(3,4), union(2,3), union(5,6), union(7,8), union(6,7). After these operations, are 1 and 4 connected? Are 5 and 8 connected?

Hint: Connectivity is determined entirely by representatives: check whether find(x) = find(y). You don’t need the exact parent array if you track which components got merged.

Show solution

union(1,2) ⇒ {1,2}

union(3,4) ⇒ {3,4}

union(2,3) merges {1,2} with {3,4} ⇒ {1,2,3,4}

So 1 and 4 are connected.

union(5,6) ⇒ {5,6}

union(7,8) ⇒ {7,8}

union(6,7) merges {5,6} with {7,8} ⇒ {5,6,7,8}

So 5 and 8 are connected.

medium

Consider a DSU using union by rank (no path compression). Prove that the rank of any node is at most ⌊log₂ n⌋, where n is the number of elements.

Hint: Rank increases only when two roots of equal rank are merged. Track a lower bound on the size of a tree as a function of its rank.

Show solution

Claim: a root of rank r has at least 2ʳ nodes in its tree.

Base: r=0 ⇒ singleton tree has size 1 = 2⁰.

Inductive step: rank increases from r to r+1 only when two trees of rank r are merged. By induction each has size ≥ 2ʳ, so merged size ≥ 2ʳ + 2ʳ = 2ʳ⁺¹.

Thus size ≥ 2ʳ for rank r. Since size ≤ n, we have 2ʳ ≤ n ⇒ r ≤ log₂ n, so rank ≤ ⌊log₂ n⌋.

medium

In an undirected graph with vertices 1..n, edges arrive one by one. Describe an algorithm using DSU to output the first edge that creates a cycle, or report that no cycle is created after all edges arrive. Give the time complexity in terms of m edges and n vertices.

Hint: The cycle condition for an undirected edge (u,v) is that u and v are already connected before adding the edge.

Show solution

Algorithm:

Initialize DSU with make-set(1..n).

For each edge (u,v) in arrival order:

If finished without finding such an edge, report “no cycle.”

Time: Each edge does up to two finds and possibly one union. With path compression + union by rank/size, total time is O(m α(n)).

Connections #

Related nodes:

Quality: A (4.6/5)

← back to treebrowse all →