Branch prediction is one of the most studied problems in computer architecture. Modern out-of-order processors execute dozens of instructions ahead of where they actually are in program flow, which means a mispredicted branch can flush a pipeline with 15–20 stages and waste significant work. Understanding prediction strategies is fundamental to understanding processor performance.
On a modern superscalar processor:
At 3 GHz with 5% misprediction rate, a 15-cycle penalty costs you ~450 million wasted cycles per second per core — roughly 15% IPC loss for a typical workload.
The simplest useful predictor is the Smith n-bit counter. Each branch maps to a saturating counter with 2ⁿ states. For a 2-bit counter:
States: Strongly Not-Taken (00) → Weakly Not-Taken (01) → Weakly Taken (10) → Strongly Taken (11)
The advantage over a 1-bit counter is that a single anomalous branch outcome (e.g., a loop exit) doesn't flip the prediction. The "strongly" states provide hysteresis.
Limitation: A single counter for the entire program ignores that different branches have different behavior. The Bimodal predictor fixes this.
The Bimodal predictor uses the lower bits of the Program Counter (PC) to index a table of 2-bit counters. Each branch gets its own counter (modulo table size).
index = PC[k:1] // lower k bits of PC
prediction = table[index][1] // MSB of counter = taken/not-taken
Aliasing: Two different branches can map to the same table entry and interfere with each other. Larger tables reduce aliasing but increase hardware cost.
GShare improves on Bimodal by incorporating global branch history into the index:
index = (PC[k:1]) XOR (global_history_register[k:1])
The Global History Register (GHR) is a shift register that records the last k branch outcomes. XORing it with the PC means that the same branch instruction maps to different table entries depending on what branches preceded it. This is powerful for correlated branches — common in loops and if-else chains.
Simulating on SPECint95 traces (from my BranchPredictionSim project):
| Predictor | Bits | Avg Misprediction Rate | |-----------|------|----------------------| | Smith 2-bit | — | ~9.5% | | Bimodal | 13-bit index | ~4.8% | | GShare | 13-bit history+PC | ~3.2% | | Hybrid (chooser) | 13-bit | ~2.9% |
The hybrid predictor dynamically selects between Bimodal and GShare using a meta-predictor, slightly improving over GShare alone.
No single predictor dominates all workloads. GShare excels on correlated branches (loops, nested conditions) but can underperform Bimodal on branches with strong local patterns. The hybrid predictor hedges, but adds hardware cost. Modern processors use more sophisticated TAGE (Tagged Geometric history length) predictors, but the intuitions from these classical designs remain foundational.