关键词:
.cache misses
peak performance
cache blocking
algorithms for dense
algorithmic prefetching
RISC System
dense-matrix
摘要:
In this paper, we introduce a concept called algorithmic prefetching, for exploiting some of the features of the IBM RISC System/6000(R) computer. Algorithmic prefetching denotes changing algorithm A to algorithm B, which contains additional steps to move data from slower levels of memory to faster levels, with the aim that algorithm B outperform algorithm A. The objective of algorithmic prefetching is to minimize any penalty due to cache misses in the innermost loop of an algorithm. This concept, along with ''cache blocking,'' can be exploited to improve the performance of linear algebra algorithms for dense matrices. We experimentally demonstrated the impact of prefetching on two dense-matrix operations. For one operation, the performance was improved from 74% of peak to 89% of peak by algorithmic prefetching;for the second operation, it was improved from 73% to 87% of the peak performance.