It's the Memory, Stupid!
Slides, reference papers, code repositories, and interactive visualizer tools for every session of the course.
Roofline Model
Roofline(s)
The Roofline model is a well-established performance model that provides an intuitive visual framework for identifying compute and memory bottlenecks in HPC applications. The Cache-Aware Roofline Model (CARM) extends it with per-cache-level ceilings, enabling finer-grained analysis of memory hierarchy behavior. The authors of CARM have joined us for this course and have kindly provided the resources for this section.
Resources provided by INESC-ID.
Lecture Slides
Presentation slides used during the Roofline(s) session.
Cache-aware Roofline model: Upgrading the loft
The original paper introducing the Cache-Aware Roofline Model (CARM), extending the classical Roofline with per-cache-level ceilings for finer-grained memory bottleneck analysis.
CARM Tool: Automatic Benchmarking and Application Analysis
Presents the CARM tool for automatic roofline benchmarking and application analysis, enabling systematic identification of compute and memory bottlenecks.
CARM Roofline Tool
INESC-ID
CARM Website
Memory System Stress Framework
Mess
The Mess Framework provides unified benchmarking, simulation, and application profiling for memory systems across x86, ARM, RISC-V, and GPU architectures.
Lecture Slides
Presentation slides used during the Mess session.
Mess Paper
Check the MICRO-2024 Best Paper Runner-up
Mess Website
Official home of the Mess Framework, documentation, news, and getting started.
Mess Benchmark
Unified benchmarking, simulation and profiling across x86, ARM, RISC-V, and GPU architectures.
Mess Simulator
Advanced simulation framework for modeling memory systems in HPC and AI workloads.
Mess-Paraver
Utility to use Mess together with the BSC's Paraver for advanced memory system analysis and visualization.
Mess Benchmark
Mess-Results
Collection of measured curves for multiple systems
Mess Simulator
Mess-Paraver
Mess-Paraver Setup Guide
Step-by-step guide to installing and configuring Mess-Paraver for trace collection.
Performance & Energy Prediction
PROFET
PROFET provides analytical models that predict application performance and energy changes across current and future memory systems.
Lecture Slides
Presentation slides used during the PROFET session.
PROFET: Modeling System Performance and Energy Without Simulating the CPU
PROFET
Predict how your application's performance and energy consumption will change across current and future memory systems — no simulation required.
PROFET–Mess Tutorial
PROFET
TopDown Microarchitecture Analysis
TopDown
TopDown analysis decomposes CPU pipeline slots into meaningful categories, pinpointing whether bottlenecks originate in the front-end, back-end, speculation, or retirement stages.
Lecture Slides
Presentation slides used during the TopDown session.
A Top-Down Method for Performance Analysis and Counters Architecture
The original paper by Ahmad Yasin introducing the TopDown Microarchitecture Analysis Method.
TopDown Reference Guide
BSC internal reference manual as of February 2026. Full breakdown of all memory metrics and references. Not official documentation.
Intel VTune – Top-Down Microarchitecture Analysis Method
Official Intel VTune Profiler cookbook guide for applying the Top-Down analysis method.
Intel 64/IA-32 Optimization Reference Manual (Appendix B / TMAM)
Official Intel optimization reference manual — Appendix B covers the Top-Down Microarchitecture Analysis Metrics in depth.
Intel VTune Profiler
Intel's performance analysis tool with built-in TopDown analysis support.
TMA Metrics (Full spreadsheet)
Intel perfmon repository's detailed Excel spreadsheet with all TMA metrics, thresholds, and counter formulas.
Perf Wiki – Top-Down Analysis
Community-maintained wiki covering Top-Down analysis with the Linux perf tool.
Software Optimizations with Top-Down Analysis – Ahmad Yasin @ IDF'15
Video tutorial introducing the Top-Down method applied to Intel Skylake.
Heterogeneous Memory Systems
Heterogeneous Memory
This section is presented by the BSC Heterogeneous Architectures group, whose research spans accelerators and coprocessors in HPC, programmability of heterogeneous memory systems, and inter-node communications. The group collaborates closely with major HPC vendors including NVIDIA, Intel, and Mellanox, and organizes events such as the PUMPS+AI Summer School and BSC courses on heterogeneous memory systems.
Lecture Slides
Presentation slides used during the Heterogeneous Memory session.
Interactive Tools
Course Visualizers
Custom-built desktop applications designed for this course. Each visualizer lets you interactively explore the concepts covered in the lectures.
Mess Visualizer
Interactive visualizer for Mess benchmark curves and memory system characterization data.
PROFET Visualizer
Interactive visualizer for PROFET performance prediction results.
TopDown Visualizer
Interactive visualizer for TopDown microarchitecture analysis trees and CPI breakdowns.
These visualizers are standalone desktop applications compiled for educational use during this course.
Sample Files
Sample Data
Pre-recorded measurement curves and trace files to load directly into the visualizers during the course.
Ready for the course?
Go back to the main course page for the full agenda, speaker bios, and registration.
← Back to Course Page