Binary Search #

AlgorithmsDifficulty: ★★☆☆☆Depth: 3Unlocks: 0

O(log n) search in sorted array. Divide and conquer.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

-Array is sorted (nondecreasing order).
-Maintain a search interval [low,high] that, if target exists, always contains its index; stop when interval is empty or target is found.

Key Symbols & Notation #

low (lowest index of current interval)high (highest index of current interval)

Essential Relationships #

-Compute mid = (low + high) div 2; compare A[mid] to target: if equal -> found; if A[mid] < target -> low = mid + 1; else high = mid - 1 (this halves the interval).

Prerequisites (2) #

Arrays5 atoms Big O Notation6 atoms

Referenced by (1) #

Where this concept shows up in the operating-finance and personal-finance graphs.

From Business (1) #

[IRRBusiness

Bisection for IRR is literally binary search on a continuous domain - bracket the rate where NPV changes sign, halve the interval, repeat. The O(log n) convergence logic and divide-and-conquer structure are identical.](/business/irr/)

Advanced Learning Details

Graph Position #

Depth Cost

Fan-Out (ROI)

Bottleneck Score

Chain Length

Cognitive Load #

Atomic Elements

Total Elements

Percentile Level

Atomic Level

All Concepts (13) #

- Sorted array (total order on elements; array must be ordered ascending or descending for binary search to work)
- Divide-and-conquer application: reduce problem by splitting the search interval in two and discarding one half
- Search interval representation: maintaining low and high indices that bound the current candidate subarray
- Midpoint selection: choosing a middle index of the current interval to test
- Halving property: each comparison reduces the size of the candidate interval by about half
- Termination conditions: when midpoint equals target (found) or when interval becomes empty (not found)
- Loop vs recursion control: implementing repeated halving either iteratively or recursively
- Invariants for correctness: e.g., if target exists in the array then it is always within A[low..high]
- Inclusive vs exclusive bounds variants (e.g., [low, high] vs [low, high) semantics) and their impact on updates
- Off-by-one issues: correct index updates (low = mid+1, high = mid-1, or their variants) to avoid skipping elements
- Integer-midpoint rounding: the need to floor/truncate when computing the middle index
- Overflow-safe midpoint computation: using low + (high - low) // 2 instead of (low + high) // 2 in fixed-width integer contexts
- Handling duplicates: distinguishing finding any occurrence versus finding first/last occurrence requires modified update rules

Teaching Strategy #

Self-serve tutorial - low prerequisites, straightforward concepts.

You have a sorted list of a million items and need to know whether a value is present. Scanning from left to right works—but it wastes the fact that the list is sorted. Binary search is the classic way to turn “sorted order” into a dramatic speedup: from O(n) down to O(log n).

TL;DR:

Binary search repeatedly halves a sorted array’s remaining search interval. Maintain an interval [low, high] that always contains the target’s index if it exists. Check mid; if mid is too small, move low up; if too large, move high down. This takes O(log n) time and O(1) extra space.

What Is Binary Search? #

The problem it solves #

You’re given an array A and a target value. You want to find the index of target (or decide it isn’t present).

•With an unsorted array, the safest approach is linear search: check each element until you find target or reach the end. That’s O(n).
•With a sorted array (nondecreasing order), you can do much better by using the ordering information.

Binary search is a divide-and-conquer algorithm for searching in a sorted array. The key idea: each comparison lets you discard half of the remaining candidates.

What “sorted” means here #

Assume the array is sorted in nondecreasing order:

•For indices i < j, we have A[i] ≤ A[j].

Nondecreasing allows duplicates (e.g., [1, 2, 2, 2, 5]). Binary search still works, but you must be clear on what you want when duplicates exist (any occurrence? first? last?). We’ll cover variants later.

The mental model: shrinking an interval #

Binary search maintains a search interval of indices where the target could still be.

•low = lowest index still possible
•high = highest index still possible

At all times, you try to preserve this loop invariant:

Invariant: If target appears in the array, then its index lies within [low, high].

Each step picks the middle index mid, compares A[mid] to target, and throws away the half that can’t contain the target.

Prerequisite note (to avoid hidden gaps) #

Before you code binary search confidently, you should be comfortable with:

1)Integer division / floor: computing mid using floor behavior (e.g., (low + high) // 2).
2)Loop invariants: understanding and maintaining “what is always true” about [low, high] each iteration.
3)Edge cases: empty arrays, single-element arrays, and duplicate values.
4)(Optional but useful) Comparator-based ordering: arrays sorted descending or by a custom key require adjusting comparisons.

These aren’t “extra topics”—they are exactly where most binary search bugs come from.

Why it’s O(log n) #

Each iteration cuts the interval size roughly in half.

If the array length is n:

•after 1 step: ~n/2 candidates remain
•after 2 steps: ~n/4
•after k steps: ~n/2ᵏ

We stop when the interval becomes empty or we find the target. The smallest k such that n/2ᵏ ≤ 1 is k ≥ log₂(n). So the number of iterations is O(log n).

Core Mechanic 1: Maintain the Search Interval [low, high] (Loop Invariant) #

Why the interval matters #

Binary search isn’t just “check middle and move left/right.” It’s a careful promise about what indices are still possible.

If you lose that promise (the invariant), you can:

•skip over the target
•loop forever
•read out of bounds

So we’ll be explicit about the invariant and how each update preserves it.

The standard exact-match version (closed interval) #

We’ll use a closed interval [low, high] inclusive. Initialize:

•low = 0
•high = n - 1

Loop while the interval is non-empty:

•condition: low <= high

Compute middle:

•mid = low + (high - low) // 2

(The low + (high - low) form avoids overflow in languages with fixed-size integers; in Python it’s not necessary, but it’s a good habit.)

Compare:

•If A[mid] == target: return mid
•If A[mid] < target: discard left half including mid → set low = mid + 1
•If A[mid] > target: discard right half including mid → set high = mid - 1

If the loop ends, return “not found” (often -1).

Showing the invariant preservation #

Let’s spell out what each case means.

Assume the array is sorted nondecreasing.

Case 1: A[mid] < target #

Because the array is sorted, for every i ≤ mid, we have A[i] ≤ A[mid] < target.

So no index ≤ mid can hold target.

Therefore, if target exists, it must be in indices mid + 1 ... high. Setting low = mid + 1 preserves:

If target exists, it’s still in [low, high].

Case 2: A[mid] > target #

Similarly, for every i ≥ mid, we have A[i] ≥ A[mid] > target.

So no index ≥ mid can hold target.

Therefore, if target exists, it must be in indices low ... mid - 1. Setting high = mid - 1 preserves the invariant.

Case 3: A[mid] == target #

We’re done.

Termination: why it must end #

Each iteration strictly shrinks the interval:

•low increases (to mid + 1) or
•high decreases (to mid - 1) or
•we return

Since low and high are integers and the interval length can’t shrink forever without becoming empty, the loop terminates.

Concrete pseudocode #

binary_search(A, target):
    low = 0
    high = len(A) - 1

    while low <= high:
        mid = low + (high - low) // 2

        if A[mid] == target:
            return mid
        else if A[mid] < target:
            low = mid + 1
        else:
            high = mid - 1

    return -1

Breathing room: trace one run #

Suppose A = [2, 5, 7, 12, 19, 23, 31], target = 19.

•Start: low=0, high=6
•mid=3 → A[3]=12 < 19 → low=4
•Now low=4, high=6
•mid=5 → A[5]=23 > 19 → high=4
•Now low=4, high=4
•mid=4 → A[4]=19 == 19 → found at 4

Notice how [low, high] always contained index 4 until we found it.

Core Mechanic 2: Getting Boundaries Right (mid, duplicates, and interval conventions) #

Binary search is famous for off-by-one errors. Most bugs come from mixing interval conventions or mishandling duplicates.

1) Computing mid safely and correctly #

The classic formula is:

•mid = (low + high) // 2

In many languages, low + high can overflow if indices are large. A safer version is:

•mid = low + (high - low) // 2

Both choose the “lower middle” when there are two middles.

2) Closed vs half-open intervals #

There are two common interval conventions:

Convention	Interval	Loop condition	Update when going right	Update when going left
Closed	`[low, high]`	`low <= high`	`low = mid + 1`	`high = mid - 1`
Half-open	`[low, high)`	`low < high`	`low = mid + 1` or `low = mid` (depends)	`high = mid`

For beginners, closed intervals are often easier to reason about for exact-match search.

The most important rule is: pick one convention and stay consistent.

3) Duplicates: “any occurrence” vs “first/last occurrence” #

If duplicates exist, the exact-match binary search above can return any index where A[mid] == target.

Sometimes you need a stable boundary, like:

•lower bound: first index i where A[i] ≥ target
•upper bound: first index i where A[i] > target

These are incredibly useful in practice (counting occurrences, insertion positions, range queries).

Lower bound (first position with A[i] ≥ target) #

A common pattern uses a half-open interval [low, high):

•Initialize: low = 0, high = n
•While low < high:
•mid = low + (high - low) // 2
•If A[mid] < target, discard left including mid → low = mid + 1
•Else, keep mid (could be answer) → high = mid

At the end, low is the smallest index with A[low] ≥ target (or n if none).

Why does high = mid (not mid - 1)? Because in half-open intervals, high is exclusive, and we want to keep mid as a candidate.

Upper bound (first position with A[i] > target) #

Same structure, just change the comparison:

•If A[mid] <= target, move low = mid + 1
•Else, high = mid

Then the count of occurrences of target is:

count=upper_bound(A,target)−lower_bound(A,target)\text{count} = \text{upper\_bound}(A, target) - \text{lower\_bound}(A, target)count=upper_bound(A,target)−lower_bound(A,target)

4) Empty arrays and single-element arrays #

Binary search should handle these without special-casing if your loop condition is correct.

•Empty: n = 0 → high = -1 → low <= high is false immediately → return -1.
•Single element: low=0, high=0 → one iteration checks that element.

5) Custom orderings (optional but important) #

If the array is sorted descending, the direction flips:

•If A[mid] < target, you must go left (because values decrease to the right).

If the array is sorted by a key (e.g., objects sorted by age), compare using that key consistently.

A useful way to stay sane is to think: you need a monotonic predicate (something that switches from false to true once). Lower/upper bound are built exactly on that principle.

Applications and Connections (Where Binary Search Shows Up) #

1) Fast lookup in sorted data #

Binary search is the core primitive behind many “fast membership” tasks when data is stored sorted:

•searching in a sorted array
•searching in a sorted list of timestamps
•dictionary-like behavior when updates are rare but queries are frequent

Time complexity is O(log n) per query, with O(1) extra space.

2) Insertion position and maintaining sorted order #

Using lower bound, you can find where to insert an element while maintaining sorted order.

•Find index i = lower_bound(A, x)
•Insert at i (in an array, insertion itself is O(n) due to shifting, but the search for position is O(log n))

This is common in:

•building sorted lists incrementally
•coordinate compression (collect values, sort, then binary search indices)

3) Range counting and frequency queries #

If the array is sorted, you can count how many elements equal target or lie in a range [L, R].

•occurrences of x: ub(x) - lb(x)
•count in [L, R]: lb(R+ε) - lb(L) (conceptually), or more directly ub(R) - lb(L)

4) Binary search on the answer (a bigger idea) #

A powerful pattern is binary searching over a value space when you have a monotonic condition.

Example shape:

•You want the smallest value k such that feasible(k) is true.
•If feasible(k) is true, then feasible(k+1) is also true (monotonic).

Then you can binary search k even if there is no array.

This is used for:

•minimum capacity / maximum speed problems
•scheduling and allocation
•optimization with monotonic constraints

This lesson node focuses on array search, but learning lower/upper bound sets you up for that broader technique.

Worked Examples (3) #

Exact match search (closed interval) with a full trace #

Array A is sorted: A = [1, 4, 6, 9, 13, 18, 21, 27]. Find target = 13 using binary search with [low, high].

Initialize:
low = 0
high = 7 (n-1)
Invariant: if 13 exists, it is in indices [0,7].
Iteration 1:
mid = low + (high - low)//2 = 0 + (7-0)//2 = 3
A[mid] = A[3] = 9
Compare: 9 < 13, so discard indices ≤ 3.
Update: low = mid + 1 = 4
Now interval is [4,7].
Iteration 2:
mid = 4 + (7-4)//2 = 4 + 3//2 = 5
A[5] = 18
Compare: 18 > 13, so discard indices ≥ 5.
Update: high = mid - 1 = 4
Now interval is [4,4].
Iteration 3:
mid = 4 + (4-4)//2 = 4
A[4] = 13
Compare: 13 == 13 → found.
Return index 4.

Insight: The invariant makes the updates feel inevitable: once A[mid] is too small, everything left of mid is too small because the array is sorted.

Lower/upper bound to count duplicates #

A = [1, 2, 2, 2, 3, 5, 5, 9]. Compute how many times target = 2 appears using lower_bound and upper_bound (half-open intervals).

Goal:
count(2) = upper_bound(2) - lower_bound(2)
where:
- •lower_bound finds first index with A[i] ≥ 2
- •upper_bound finds first index with A[i] > 2
Compute lower_bound(2):
Initialize low=0, high=8 (n)
While low < high:
- •mid = 0 + (8-0)//2 = 4, A[4]=3
A[mid] < 2? No (3 < 2 is false) → high = mid = 4
- •mid = 0 + (4-0)//2 = 2, A[2]=2
A[mid] < 2? No (2 < 2 false) → high = 2
- •mid = 0 + (2-0)//2 = 1, A[1]=2
A[mid] < 2? No → high = 1
- •mid = 0 + (1-0)//2 = 0, A[0]=1
A[mid] < 2? Yes → low = mid + 1 = 1
Stop (low == high == 1). So lower_bound(2) = 1.
Compute upper_bound(2):
Initialize low=0, high=8
While low < high:
- •mid = 4, A[4]=3
A[mid] <= 2? No → high=4
- •mid = 2, A[2]=2
A[mid] <= 2? Yes → low = mid + 1 = 3
- •mid = 3, A[3]=2
A[mid] <= 2? Yes → low = 4
Stop (low == high == 4). So upper_bound(2) = 4.
Compute count:
count = upper_bound - lower_bound = 4 - 1 = 3

Insight: Lower/upper bound treat equality differently. That tiny change ( < vs ≤ ) is what turns “find a match” into “find a boundary,” which is often more useful than returning any one index.

Not found case and why the loop ends correctly #

A = [3, 8, 10, 14, 17]. Search for target = 9 using the closed-interval exact-match binary search.

Initialize:
low=0, high=4
Iteration 1:
mid = 0 + (4-0)//2 = 2
A[2]=10
10 > 9 → high = mid - 1 = 1
Interval becomes [0,1]
Iteration 2:
mid = 0 + (1-0)//2 = 0
A[0]=3
3 < 9 → low = mid + 1 = 1
Interval becomes [1,1]
Iteration 3:
mid = 1 + (1-1)//2 = 1
A[1]=8
8 < 9 → low = mid + 1 = 2
Interval becomes [2,1] (empty because low > high)
Loop ends because low <= high is false.
Return -1 (not found).

Insight: The empty interval (low > high) is the precise signal that there is no index left where target could be, so “not found” is justified—not just a guess.

Key Takeaways #

✓
Binary search requires a sorted array (nondecreasing is the usual assumption).
✓
Maintain a search interval: with a closed interval, the loop invariant is “if target exists, it’s in [low, high].”
✓
Use mid = low + (high - low) // 2 to avoid overflow and to get correct integer rounding.
✓
Closed interval exact-match pattern: while low <= high, then update with low = mid + 1 or high = mid - 1.
✓
Binary search runs in O(log n) time because each comparison discards about half the remaining candidates.
✓
With duplicates, exact-match may return any occurrence; use lower_bound / upper_bound to get first/last positions and counts.
✓
Be consistent about interval conventions ([low, high] vs [low, high)) to avoid off-by-one errors.

Common Mistakes #

✗
Using the wrong loop condition for your interval convention (e.g., using low < high with updates meant for [low, high]).
✗
Updating to low = mid or high = mid in a closed-interval exact-match search, which can cause infinite loops when low == mid or high == mid.
✗
Computing mid = (low + high) // 2 in a language where low + high can overflow, leading to negative or incorrect mid.
✗
For duplicates, expecting the exact-match search to return the first (or last) occurrence without implementing the boundary-search variant.

Practice #

easy

Implement the closed-interval binary search that returns the index of target in a sorted array A, or -1 if not found. Then trace it on A = [1, 2, 4, 8, 16] with target = 8.

Hint: Use low=0, high=n-1, loop while low <= high, and update low=mid+1 or high=mid-1.

Show solution

Algorithm:

•low=0, high=n-1
•while low<=high:

mid=low+(high-low)//2

if A[mid]==target return mid

if A[mid]<target low=mid+1 else high=mid-1

•return -1

Trace:

A=[1,2,4,8,16], target=8

low=0 high=4 mid=2 A[2]=4<8 → low=3

low=3 high=4 mid=3 A[3]=8==8 → return 3

medium

Given A = [1, 2, 2, 2, 3, 3, 10], find (a) lower_bound(3) and (b) upper_bound(3), and use them to compute how many 3s are in A.

Hint: Lower bound uses condition A[mid] < target; upper bound uses A[mid] <= target. Use a half-open interval [low, high) with high=n.

Show solution

A length n=7.

•lower_bound(3) returns first index with A[i] ≥ 3. The first 3 is at index 4, so lower_bound(3)=4.
•upper_bound(3) returns first index with A[i] > 3. The first element >3 is 10 at index 6, so upper_bound(3)=6.

Count of 3s = 6 - 4 = 2.

hard

You are given a sorted array A (nondecreasing) and a target. Modify binary search to return the index of the FIRST occurrence of target (or -1 if absent). Describe the rule for updating boundaries when you find equality.

Hint: When A[mid] == target, don’t stop immediately; keep searching left while preserving that the first occurrence is still in the interval.

Show solution

One approach (half-open lower_bound style):

Compute i = lower_bound(A, target) (first index with A[i] ≥ target). If i < n and A[i] == target, return i else return -1.

If you implement directly with a closed interval:

•Keep an ans = -1.
•When A[mid] == target: set ans=mid and move left: high = mid - 1.
•When A[mid] < target: low = mid + 1.
•When A[mid] > target: high = mid - 1.

Return ans at the end.

Key rule: on equality, shrink toward the side where the first occurrence could be (left), rather than terminating.

Connections #

•Prerequisite refresh: Arrays
•Complexity perspective: Big O Notation
•Next useful pattern: Divide and Conquer
•Boundary searching skill: Lower Bound / Upper Bound
•Data structures that use the same idea: Binary Search Trees
•Optimization pattern: Binary Search on Answer

Quality: A (4.5/5)

← back to tree browse all →