CXL (Compute Express Link) is an open standard interconnect built on PCIe that enables high-bandwidth, low-latency communication between host processors and devices such as memory expanders, accelerators, and smart NICs. Simulating CXL-based systems is challenging because it requires modeling both the PCIe physical layer and the CXL protocol stack on top of it.
gem5 is the de-facto standard for computer architecture research simulation. It supports full-system simulation, meaning you can boot a real Linux kernel and run unmodified workloads while observing micro-architectural behavior at any level of detail. For CXL research, this is essential because:
The starting point for CXL memory simulation in gem5 is extending the existing PCIe endpoint model. A CXL Type 3 device (memory expander) exposes additional memory capacity to the host through a CXL.mem protocol layer.
# Example gem5 Python config for a CXL memory expander
from m5.objects import *
system = System()
system.clk_domain = SrcClockDomain(clock='3GHz', voltage_domain=VoltageDomain())
system.mem_mode = 'timing'
# Attach CXL memory expander
cxl_mem = CXLMemExpander(
range=AddrRange('4GB', size='8GB'),
latency='150ns', # approximate CXL.mem add-on latency
)
system.cxl_devices = [cxl_mem]
Latency: CXL.mem adds approximately 100–200 ns on top of local DRAM latency. This is a critical parameter to get right because it significantly affects workload performance.
Bandwidth: CXL 1.1 over PCIe Gen 5 x16 provides ~64 GB/s bidirectional bandwidth, though practical bandwidth depends on protocol overhead and request concurrency.
NUMA topology: The OS sees CXL-attached memory as a separate NUMA node. Proper NUMA modeling is important for realistic workload behavior.
In future posts I'll cover how to model CXL.cache (coherent accelerator memory) and how to use gem5's traffic generators to stress-test your CXL memory subsystem model.