Main Memory and Cache Performance of Intel Sandy Bridge and AMD Bulldozer

Research output: Contribution to conferencesPaperContributedpeer-review

Abstract

Application performance on multicore processors is seldom constrained by the speed of floating point or integer units. Much more often, limitations are caused by the memory subsystem, particularly shared resources such as last level caches or memory controllers. Measuring, predicting and modeling memory performance becomes a steeper challenge with each new processor generation due to the growing complexity and core count. We tackle the important aspect of measuring and understanding undocumented memory performance numbers in order to create valuable insight into microprocessor details. For this, we build upon a set of sophisticated benchmarks that support latency and bandwidth measurements to arbitrary locations in the memory subsystem. These benchmarks are extended to support AVX instructions for bandwidth measurements and to integrate the coherence states (O)wned and (F)orward. We then use these benchmarks to perform an indepth analysis of current ccNUMA multiprocessor systems with Intel (Sandy Bridge-EP) and AMD (Bulldozer) processors. Using our benchmarks we present fundamental memory performance data and illustrate performance-relevant architectural properties of both designs.

Details

Original languageEnglish
Pages261-270
Number of pages10
Publication statusPublished - 2014
Peer-reviewedYes

External IDs

Scopus 84904540843
ORCID /0000-0002-8491-770X/work/141543271
ORCID /0009-0003-0666-4166/work/151475564

Keywords

Keywords

  • memory, performance, Intel, AMD