Back to Tools
Mess Benchmark logo

Mess Benchmark

v2.0

Memory System Stress Framework

What is the Mess Benchmark?

The Mess (Memory stress) benchmark provides a holistic and detailed memory system characterization. It describes memory system performance with a family of bandwidth-latency curves that cover the full range of memory traffic intensity, from unloaded to fully-saturated memory systems. Unlike traditional benchmarks that report only maximum bandwidth or unloaded latency, Mess considers numerous compositions of read and write operations, providing a comprehensive view of how memory systems behave under different workloads.

The benchmark is designed for holistic, detailed, and close-to-hardware memory system performance characterization. It is implemented directly in assembly to minimize compiler intervention and system software overhead, ensuring measurements reflect actual hardware behavior rather than software artifacts. The benchmark detects and quantifies aspects of memory system behavior not discussed in previous studies, such as the impact of read and write memory traffic composition on performance, or discrepancies between different memory systems.

Mess Simulator Curves Image

Mess Benchmark is part of the broader Mess framework , which also includes the Mess simulator for analytical memory performance simulation and Mess application profiling for correlating application behavior with memory system characteristics.

Mess 2.0: The Redesigned Benchmark

Mess Benchmark v2.0 represents a complete redesign of the original benchmark, focused on ease of use and performance. The new version is 84x faster than v1, achieved through a complete C++ rewrite, while maintaining the same accuracy and comprehensive characterization capabilities. Mess 2.0 can be run with zero setup—simply compile and execute, and it will automatically detect your system configuration and adapt accordingly.

The original Mess Benchmark repository ( Mess-Benchmark ) has been deprecated in favor of Mess 2.0 ( Mess-2.0 ). While Mess 2.0 is the recommended version for new benchmarking work, the original repository remains available for accessing pre-computed benchmark results and system curves from previous studies. These historical results can be valuable for comparing new systems against previously characterized platforms or for research that references earlier Mess measurements.

Mess 2.0 Image

Mess 2.0 maintains full compatibility with the original benchmark's output format and analysis tools, ensuring continuity with existing workflows while providing significant performance and usability improvements.

How It Works

The Mess benchmark constructs bandwidth-latency curves through a systematic measurement process:

  1. Traffic Generation: A memory traffic generator runs on multiple CPU cores (or GPU SMs) to create configurable memory pressure. The generator can produce different ratios of read and write operations, from 100% reads to 50% reads/50% writes (due to write-allocate cache policies).
  2. Latency Measurement: While the traffic generator stresses the memory system, a pointer-chase benchmark runs on a dedicated core to measure memory access latency. The pointer-chase is simple, portable, and provides accurate latency measurements.
  3. Bandwidth Monitoring: Memory bandwidth is measured using hardware performance counters (e.g., CAS_COUNT_RD and CAS_COUNT_WR) that count memory controller operations. This provides architecture-level bandwidth measurements that include all memory traffic, not just application-level data.
  4. Curve Construction: By varying the memory traffic intensity and read/write ratios, Mess generates tens of measurement points for each curve. Multiple curves are created, each corresponding to a specific read/write ratio, forming a family of bandwidth-latency curves that comprehensively characterize the memory system.

The benchmark uses huge memory pages to minimize TLB miss overheads, and runtime measurements subtract OS-dependent overheads (such as page walks) from the latency measurements, ensuring close-to-hardware accuracy.

Key Insights from Mess

Mess benchmarking has revealed several important aspects of memory system behavior:

  • Read/Write Impact: Memory writes significantly reduce performance compared to reads. The best performance (lowest latency, highest bandwidth) is achieved with 100% read traffic. Writes introduce additional timing constraints (tWR, tWTR) that reduce efficiency.
  • Bandwidth Saturation: Memory systems typically saturate between 70-90% of theoretical maximum bandwidth. The remaining bandwidth is "lost" due to DRAM refresh cycles, row-buffer misses requiring precharge/activate operations, and timing restrictions at various memory hierarchy levels.
  • Latency Range: Memory access latency can vary dramatically—from unloaded latency (typically 85-130ns for CPUs) to maximum latency under saturation (often 200-600ns or more). This range is critical for understanding application performance variability.
  • Bandwidth Decline: In some systems, increasing memory pressure beyond a certain point actually causes measured bandwidth to decline while latency continues to increase. This "wave form" behavior is correlated with increased row-buffer miss rates.
Single Curve Image

These insights are crucial for understanding how applications interact with memory systems and for making informed decisions about memory system design and optimization.

Platform Characterization

Mess has been deployed to characterize a wide range of platforms, including Intel, AMD, IBM, Fujitsu, and Amazon servers, as well as NVIDIA GPUs. The benchmark has been used to evaluate systems with DDR4, DDR5, HBM2, and HBM2E memory technologies. Results show significant variation in memory system behavior even for platforms with the same memory standard, highlighting the importance of platform-specific characterization.

The benchmark has also been used to evaluate memory system simulation accuracy in hardware simulators like ZSim, gem5, and OpenPiton Metro-MPI. These evaluations revealed that many simulators poorly resemble actual system performance, with errors of tens of percent for memory-intensive benchmarks. This finding motivated the development of the Mess simulator, which uses Mess bandwidth-latency curves for accurate analytical simulation.

Platforms Image

Key Advantages

Holistic Memory Characterization

Describes memory system performance with a family of bandwidth-latency curves covering the full range from unloaded to fully-saturated memory systems.

Multi-Architecture Support

Covers all major CPU and GPU ISAs: x86, ARM, Power, RISC-V, and NVIDIA's Parallel Thread Execution (PTX).

Order of magnitude faster in v2.0

Complete C++ rewrite provides dramatic performance improvements while maintaining accuracy and ease of use.

Zero-Setup Design

Automatically detects system configuration and adapts to different platforms without manual configuration.

Comprehensive Traffic Analysis

Considers numerous compositions of read and write operations, revealing how memory traffic patterns affect performance.

Close-to-Hardware Measurements

Implemented in assembly to minimize compiler and OS overhead, providing accurate microarchitecture-level insights.

Use Cases

The Mess benchmark is valuable for:

  • Characterizing new memory systems and technologies before deploying applications
  • Comparing memory system performance across different platforms and architectures
  • Validating and improving memory system simulators by comparing simulated behavior with actual hardware measurements
  • Understanding how read/write traffic composition affects memory performance in real systems
  • Providing input data for analytical memory simulators (like the Mess simulator) that use bandwidth-latency curves
  • Supporting research into memory system behavior and optimization opportunities

Getting Started

Mess 2.0 is designed for ease of use. The benchmark automatically detects your system configuration and can be run with minimal setup. For detailed installation instructions, usage examples, and documentation, visit the Mess framework website or the GitHub repository.

For accessing pre-computed benchmark results and historical system curves from previous studies, visit the original Mess-Benchmark repository . This repository contains measurement data from earlier Mess deployments and can be useful for comparative analysis.