Memory Performance and SPEC OpenMP Scalability on Quad-Socket x86_64 Systems

Research output: Contribution to book/conference proceedings/anthology/reportChapter in book/anthology/reportContributedpeer-review


Because of the continuous trend towards higher core counts, parallelization is mandatory for many application domains beyond the traditional HPC sector. Current commodity servers comprise up to 48 processor cores in con gurations with only four sockets. Those shared memory systems have distinct NUMA characteristics. The exact location of data within the memory system signi cantly a ects both access latency and bandwidth. Therefore, NUMA aware memory allocation and scheduling are highly performance relevant issues. In this paper we use low-level microbenchmarks to compare two state-of-the-art quad-socket systems with x86_64 processors from AMD and Intel. We then investigate the performance of the application based OpenMP benchmark suite SPEC OMPM2001. Our analysis shows how these benchmarks scale on shared memory systems with up to 48 cores and how scalability correlates with the previously determined characteristics of the memory hierarchy. Furthermore, we demonstrate how the processor interconnects influence the benchmark results.


Original languageEnglish
Title of host publicationAlgorithms and Architectures for Parallel Processing
EditorsYang Xiang, Alfredo Cuzzocrea, Michael Hobbs, Wanlei Zhou
PublisherSpringer Verlag
Number of pages12
ISBN (print)978-3-642-24649-4
Publication statusPublished - 2011

Publication series

SeriesLecture Notes in Computer Science, Volume 7016


TitleInternational Conference on Algorithms and Architectures for Parallel Processing
Abbreviated titleICA3PP
Conference number11
Duration24 - 26 October 2011
Degree of recognitionInternational event

External IDs

Scopus 80455162491
ORCID /0000-0002-8491-770X/work/141543300
ORCID /0009-0003-0666-4166/work/151475600



  • benchmark, scalability, SPEC OMP2001

Library keywords