Getting Started with CXL Simulation in gem5

Feb 10, 2025·2 min read

Getting Started with CXL Simulation in gem5

gem5CXLSimulationComputer Architecture

CXL (Compute Express Link) is an open standard interconnect built on PCIe that enables high-bandwidth, low-latency communication between host processors and devices such as memory expanders, accelerators, and smart NICs. Simulating CXL-based systems is challenging because it requires modeling both the PCIe physical layer and the CXL protocol stack on top of it.

Why gem5?

gem5 is the de-facto standard for computer architecture research simulation. It supports full-system simulation, meaning you can boot a real Linux kernel and run unmodified workloads while observing micro-architectural behavior at any level of detail. For CXL research, this is essential because:

CXL device initialization happens through the OS and firmware
Real workloads expose latency and bandwidth effects that synthetic benchmarks miss
Full-system simulation allows co-simulation of CPU, memory controller, and CXL endpoint

Setting Up a Basic CXL Memory Expander Model

The starting point for CXL memory simulation in gem5 is extending the existing PCIe endpoint model. A CXL Type 3 device (memory expander) exposes additional memory capacity to the host through a CXL.mem protocol layer.

# Example gem5 Python config for a CXL memory expander
from m5.objects import *

system = System()
system.clk_domain = SrcClockDomain(clock='3GHz', voltage_domain=VoltageDomain())
system.mem_mode = 'timing'

# Attach CXL memory expander
cxl_mem = CXLMemExpander(
    range=AddrRange('4GB', size='8GB'),
    latency='150ns',  # approximate CXL.mem add-on latency
)
system.cxl_devices = [cxl_mem]

Key Modeling Considerations

Latency: CXL.mem adds approximately 100–200 ns on top of local DRAM latency. This is a critical parameter to get right because it significantly affects workload performance.

Bandwidth: CXL 1.1 over PCIe Gen 5 x16 provides ~64 GB/s bidirectional bandwidth, though practical bandwidth depends on protocol overhead and request concurrency.

NUMA topology: The OS sees CXL-attached memory as a separate NUMA node. Proper NUMA modeling is important for realistic workload behavior.

Next Steps

In future posts I'll cover how to model CXL.cache (coherent accelerator memory) and how to use gem5's traffic generators to stress-test your CXL memory subsystem model.

← Back to Blog