King

Ramulator-Gate

When a benchmark exposes an uncomfortable truth, the community has two choices: Examine the evidence, or attack the mirror.

June 2026

What is this?

At MICRO 2024, we published our paper on the Mess benchmark, showing how it can be used to evaluate the memory performance of both real systems and simulators.

The work was built on years of experience in memory-systems research, including numerous industry collaborations. Our evaluation followed a realistic user scenario: we relied on publicly available installation and setup information, and unless we encountered major issues, we did not request additional support from simulator developers.

Some memory models performed well.

Some did not. Before publication, the work was presented at the MEMSYS 2024 panel. What was planned as a five-minute opening statement turned into a forty-minute technical discussion. I later received the award for “Most controversial, truth-telling panel discussion.”

Many in the community understood the importance of the problem and gave a name to what the results exposed:

"The emperor is naked."

The paper later received the Best Paper Runner-Up Award at MICRO 2024. The message resonated.

We opened a technical discussion with the Ramulator team even before the paper was published. Over the following year, we provided detailed technical support, addressed the technical questions raised, shared the files and instructions needed to reproduce the experiments, and proposed constructive ways forward.

What followed, however, was not merely a technical disagreement. The issue was progressively reframed away from the evidence and into a narrative that misrepresents the technical record.

We chose not to respond publicly to earlier preprints, despite what we consider serious misrepresentations of the Mess study. However, the publication of this work in an IEEE venue changes the situation. The technical record matters.

We will not allow carefully crafted narratives to replace open technical discussion.

It is rebuttal time.

Section 01

Memory bandwidth

A simulator result can look right for the wrong reason.

That is where the first chapter of Ramulator-gate begins.

At MICRO 2024, we published our paper on the Mess benchmark, showing how it can be used to evaluate the memory performance of real systems and simulators [1].

A year later, the Ramulator team published a preprint that characterized the Mess paper’s results and conclusions as incorrect due to “multiple trivial human errors” and “gross configuration errors” [2]. In 2026, this work was accepted to ISPASS [3]. The wording about “trivial” and “gross” errors was removed, but the main narrative remained.

We chose not to respond publicly to earlier preprints, despite what I consider serious misrepresentations of the Mess study. However, publication in an IEEE venue changes the situation.

In this series of posts, we will challenge several statements and results presented by the Ramulator team.

The first chapter of Ramulator-gate shows why better-looking results are not always the right results.

On one technical point, we agree with the Ramulator team: to resemble the bandwidth of the actual 8xDDR5-4800 Amazon Graviton3 and Intel Sapphire Rapids systems, Ramulator 2.0 requires a 16-channel DDR5-4800 configuration. With 8 channels, Ramulator 2.0 simulates roughly half of the expected bandwidth.

Where we disagree is whether the 16-channel Ramulator 2.0 configuration was obvious in this context.

This distinction matters.

The Ramulator team does not provide actual system measurements for the evaluated baseline and does not explicitly state that the baseline real systems are used to guide the simulator configuration. As a result, the 16-channel simulator configuration is presented in a way that appears more natural for an unspecified “ARM-based real system” than for Amazon Graviton3 with 8-channel DDR5-4800 main memory.

This is where narrative starts to diverge from the technical record.

And this is only the beginning.

In the next chapters, we will directly challenge the Ramulator team’s position that “channel” and “subchannel” are not distinct concepts in the DDR5 context.

Stay tuned!

  1. Mess paper evaluation

    The Mess paper measures two real systems and simulates Ramulator 2.0 against them. The detailed single-channel run is scaled ×8, and the simulated peak lands at roughly half of the measured bandwidth.

    Real-system measurements
    Amazon Graviton 3: 8×DDR5-4800
    Intel Sapphire Rapids: 8×DDR5-4800
    Ramulator 2.0 in the Mess paper
    1×DDR5-4800 simulated in detail, bandwidth scaled ×8
    SystemMax BW
    Amazon Graviton 3 · 8×DDR5-4800292 GB/s
    Intel Sapphire Rapids · 8×DDR5-4800264 GB/s
    Simulation: 1×DDR5-4800 scaled ×8: gem5 + Ramulator 2.0141 GB/s
    Simulation: 1×DDR5-4800 scaled ×8: Trace-driven Ramulator 2.0126 GB/s

    Conclusion

    Max simulated BW ≈ ½ of the actual one.

  2. Ramulator team re-evaluation

    The Ramulator team does not provide actual system measurements for the evaluated baseline, and does not state that the baseline real systems guide the simulator configuration. As a result, the 16-channel configuration is presented as more natural for an unspecified “ARM-based real system” than for the 8×DDR5-4800 Amazon Graviton 3 it actually reproduces.

    Real system under study
    NOT SPECIFIED
    Real-system measurements
    NONE.
    Reproducing “Mess results for an ARM-based real system” to Amazon Graviton 3: 8×DDR5-4800
    SystemMax BW
    Reproduced from the Mess paper: Amazon Graviton 3 · 8×DDR5-4800292 GB/s
    Simulation: 16×DDR5-4800 Trace-driven Ramulator 2.0281 GB/s

    Conclusion

    “By correctly configuring Ramulator 2.0, simulated memory performance resembles real system characteristics well.”

    Correctly configuring here means using a 16×DDR5-4800 Ramulator 2.0 setup to simulate the 8×DDR5-4800 actual system.

    From the Ramulator re-evaluation paper:

  3. Discussion

    On one technical point, we agree: to resemble the bandwidth of the actual 8×DDR5-4800 systems, Ramulator 2.0 needs a 16-channel DDR5-4800 configuration. Where we disagree is whether that 16-channel configuration was obvious in this context. This distinction matters.

    We agree
    To resemble the bandwidth of the actual Amazon Graviton 3 and Intel Sapphire Rapids systems, Ramulator 2.0 needs a 16-channel DDR5-4800 configuration.
    We disagree
    On whether the 16-channel Ramulator 2.0 configuration was obvious in this context.
    To date
    The Ramulator team has not acknowledged that Amazon Graviton 3 and Intel Sapphire Rapids have 8 memory channels, although Intel, Amazon and numerous independent resources state this. See the sources below.

    Evidence library

    Public sources on the evaluated platforms

Take out

To resemble the bandwidth of the actual 8×DDR5-4800 Amazon Graviton 3 and Intel Sapphire Rapids systems, Ramulator 2.0 needs a 16-channel DDR5-4800 configuration.

With 8 channels, Ramulator 2.0 simulates roughly half of the expected bandwidth.

References

  1. [1]Esmaili-Dokht et al. “A Mess of Memory System Benchmarking, Simulation and Application Profiling,” MICRO 2024. Best paper runner-up award.
  2. [2]Haocong Luo, Ataberk Olgun, Maria Makeenkova, F. Nisa Bostancı, Geraldo F. de Oliveira, A. Giray Yağlıkçı, Onur Mutlu, “Cleaning up the Mess,” arXiv Oct. 2025.
  3. [3]F. Nisa Bostancı, Haocong Luo, Ataberk Olgun, Maria Makeenkova, Geraldo F. de Oliveira, A. Giray Yağlıkçı, Onur Mutlu, “Cleaning up the Mess: Re-Evaluating the Real-System Modeling Accuracy of Ramulator 2.0”. ISPASS 2026.
Coming next

Next chapterDDR5 DIMMs: Channel vs. sub-channels

The full response is written. We are making it public one claim at a time, each documented to the same standard of evidence. Check back for the next one.