Volume image registration remains one of the best candidates for Graphics Processing Unit (GPU) acceleration because of its enormous computation time and plentiful data-level parallelism. However, an efficient GPU implementation for image registration is still challenging due to the heavy utilization of expensive atomic operations for similarity calculations. In this paper, we first propose five GPU-friendly Correlation Ratio (CR) based methods to accelerate the process of image registration. Compared to widely used Mutual Information (MI) based methods, the CR-based approaches require less resource for shadow histograms, a faster storage, such as the on-chip scratchpad memory, therefore can be fully exploited to achieve better performance. Second, we make design space exploration of the CR-based methods, and study the trade-off of introducing shadow histograms on different storage (shared memory, global memory) by computation units of different granularity (thread, warp, thread block). Third, we exhaustively test the proposed designs on GPUs of different generations (Fermi, Kepler and Maxwell) so that performance variations due to hardware migration are addressed. Finally, we evaluate the performance impact corresponding to the tuning of concurrency, algorithm settings as well as overheads incurred by preprocessing, smoothing and workload unbalancing. We highlight our last CR approach which completely avoids updating conflicts of histogram calculation, leading to substantial performance improvements (up to 55× speedup over naive CPU implementation). It reduces the registration time from 145 s to 2.6 s for two typical 256 × 256 × 160 volume images on a Kepler GPU.
|Seiten (von - bis)
|Microprocessors and microsystems
|Veröffentlicht - Nov. 2015