Comparing Cache Architectures and Coherency Protocols on x86-64 Multicore SMP Systems

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Abstract

Across a broad range of applications, multicore technology is the most important factor that drives today's microprocessor performance improvements. Closely coupled is a growing complexity of the memory subsystems with several cache levels that need to be exploited efficiently to gain optimal application performance. Many important implementation details of these memory subsystems are undocumented. We therefore present a set of sophisticated benchmarks for latency and bandwidth measurements to arbitrary locations in the memory subsystem. We consider the coherency state of cache lines to analyze the cache coherency protocols and their performance impact. The potential of our approach is demonstrated with an in-depth comparison of ccNUMA multiprocessor systems with AMD (Shanghai) and Intel (Nehalem-EP) quad-core x86-64 processors that both feature integrated memory controllers and coherent point-to-point interconnects. Using our benchmarks we present fundamental memory performance data and architectural properties of both processors. Our comparison reveals in detail how the microarchitectural differences tremendously affect the performance of the memory subsystem.

Details

Original languageEnglish
Title of host publication42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
PublisherACM Press
Pages413-422
Number of pages10
Publication statusPublished - 2009
Peer-reviewedYes

External IDs

Scopus 76749126627
ORCID /0000-0002-8491-770X/work/141543298