A simulator result can look right for the wrong reason.
That is where the first chapter of Ramulator-gate begins.
At MICRO 2024, we published our paper on the Mess benchmark, showing how it can be used to evaluate the memory performance of real systems and simulators [1].
A year later, the Ramulator team published a preprint that characterized the Mess paper’s results and conclusions as incorrect due to “multiple trivial human errors” and “gross configuration errors” [2]. In 2026, this work was accepted to ISPASS [3]. The wording about “trivial” and “gross” errors was removed, but the main narrative remained.
We chose not to respond publicly to earlier preprints, despite what I consider serious misrepresentations of the Mess study. However, publication in an IEEE venue changes the situation.
In this series of posts, we will challenge several statements and results presented by the Ramulator team.
The first chapter of Ramulator-gate shows why better-looking results are not always the right results.
On one technical point, we agree with the Ramulator team: to resemble the bandwidth of the actual 8xDDR5-4800 Amazon Graviton3 and Intel Sapphire Rapids systems, Ramulator 2.0 requires a 16-channel DDR5-4800 configuration. With 8 channels, Ramulator 2.0 simulates roughly half of the expected bandwidth.
Where we disagree is whether the 16-channel Ramulator 2.0 configuration was obvious in this context.
This distinction matters.
The Ramulator team does not provide actual system measurements for the evaluated baseline and does not explicitly state that the baseline real systems are used to guide the simulator configuration. As a result, the 16-channel simulator configuration is presented in a way that appears more natural for an unspecified “ARM-based real system” than for Amazon Graviton3 with 8-channel DDR5-4800 main memory.
This is where narrative starts to diverge from the technical record.
And this is only the beginning.
In the next chapters, we will directly challenge the Ramulator team’s position that “channel” and “subchannel” are not distinct concepts in the DDR5 context.
Stay tuned!
-
Mess paper evaluation
- Real-system measurements
- Amazon Graviton 3: 8×DDR5-4800
Intel Sapphire Rapids: 8×DDR5-4800 - Ramulator 2.0 in the Mess paper
- 1×DDR5-4800 simulated in detail, bandwidth scaled ×8
System Max BW Amazon Graviton 3 · 8×DDR5-4800 292 GB/s Intel Sapphire Rapids · 8×DDR5-4800 264 GB/s Simulation: 1×DDR5-4800 scaled ×8: gem5 + Ramulator 2.0 141 GB/s Simulation: 1×DDR5-4800 scaled ×8: Trace-driven Ramulator 2.0 126 GB/s Conclusion
Max simulated BW ≈ ½ of the actual one.
-
Ramulator team re-evaluation
The Ramulator team does not provide actual system measurements for the evaluated baseline, and does not state that the baseline real systems guide the simulator configuration. As a result, the 16-channel configuration is presented as more natural for an unspecified “ARM-based real system” than for the 8×DDR5-4800 Amazon Graviton 3 it actually reproduces.
- Real system under study
- NOT SPECIFIED
- Real-system measurements
- NONE.
Reproducing “Mess results for an ARM-based real system” to Amazon Graviton 3: 8×DDR5-4800
System Max BW Reproduced from the Mess paper: Amazon Graviton 3 · 8×DDR5-4800 292 GB/s Simulation: 16×DDR5-4800 Trace-driven Ramulator 2.0 281 GB/s Conclusion
“By correctly configuring Ramulator 2.0, simulated memory performance resembles real system characteristics well.”
Correctly configuring here means using a 16×DDR5-4800 Ramulator 2.0 setup to simulate the 8×DDR5-4800 actual system.
From the Ramulator re-evaluation paper:
-
Discussion
On one technical point, we agree: to resemble the bandwidth of the actual 8×DDR5-4800 systems, Ramulator 2.0 needs a 16-channel DDR5-4800 configuration. Where we disagree is whether that 16-channel configuration was obvious in this context. This distinction matters.
- We agree
- To resemble the bandwidth of the actual Amazon Graviton 3 and Intel Sapphire Rapids systems, Ramulator 2.0 needs a 16-channel DDR5-4800 configuration.
- We disagree
- On whether the 16-channel Ramulator 2.0 configuration was obvious in this context.
- To date
- The Ramulator team has not acknowledged that Amazon Graviton 3 and Intel Sapphire Rapids have 8 memory channels, although Intel, Amazon and numerous independent resources state this. See the sources below.
Take out
To resemble the bandwidth of the actual 8×DDR5-4800 Amazon Graviton 3 and Intel Sapphire Rapids systems, Ramulator 2.0 needs a 16-channel DDR5-4800 configuration.
With 8 channels, Ramulator 2.0 simulates roughly half of the expected bandwidth.
References
- [1]Esmaili-Dokht et al. “A Mess of Memory System Benchmarking, Simulation and Application Profiling,” MICRO 2024. Best paper runner-up award.
- [2]Haocong Luo, Ataberk Olgun, Maria Makeenkova, F. Nisa Bostancı, Geraldo F. de Oliveira, A. Giray Yağlıkçı, Onur Mutlu, “Cleaning up the Mess,” arXiv Oct. 2025.
- [3]F. Nisa Bostancı, Haocong Luo, Ataberk Olgun, Maria Makeenkova, Geraldo F. de Oliveira, A. Giray Yağlıkçı, Onur Mutlu, “Cleaning up the Mess: Re-Evaluating the Real-System Modeling Accuracy of Ramulator 2.0”. ISPASS 2026.
The Mess paper measures two real systems and simulates Ramulator 2.0 against them. The detailed single-channel run is scaled ×8, and the simulated peak lands at roughly half of the measured bandwidth.