TY - GEN
T1 - Designing coalescing network-on-chip for efficient memory accesses of GPGPUs
AU - Chen, Chien-Ting
AU - Huang, Yoshi Shih-Chieh
AU - Chang, Yuan-Ying
AU - Tu, Chiao-Yun
AU - King, Chung-Ta
AU - Wang, Tai-Yuan
AU - Sang, Janche
AU - Li, Ming-Hua
PY - 2014/1/1
Y1 - 2014/1/1
N2 - The massive multithreading architecture of General Purpose Graphic Processors Units (GPGPU) makes them ideal for data parallel computing. However, designing efficient GPGPU chips poses many challenges. One major hurdle is the interface to the external DRAM, particularly the buffers in the memory controllers (MCs), which is stressed heavily by the many concurrent memory accesses from the GPGPU. Previous approaches considered scheduling the memory requests in the memory buffers to reduce switching of memory rows. The problem is that the window of requests that can be considered for scheduling is too narrow and the memory controller is very complex, affecting the critical path. In view of the massive multithreading architecture of GPGPUs that can hide memory access latencies, we exploit in this paper the novel idea of rearranging the memory requests in the network-on-chip (NoC), called packet coalescing. To study the feasibility of this idea, we have designed an expanded NoC router that supports packet coalescing and evaluated its performance extensively. Evaluation results show that this NoC-assisted design strategy can improve the row buffer hit rate in the memory controllers. A comprehensive investigation of factors affecting the performance of coalescing is also conducted and reported. © 2014 IFIP International Federation for Information Processing.
AB - The massive multithreading architecture of General Purpose Graphic Processors Units (GPGPU) makes them ideal for data parallel computing. However, designing efficient GPGPU chips poses many challenges. One major hurdle is the interface to the external DRAM, particularly the buffers in the memory controllers (MCs), which is stressed heavily by the many concurrent memory accesses from the GPGPU. Previous approaches considered scheduling the memory requests in the memory buffers to reduce switching of memory rows. The problem is that the window of requests that can be considered for scheduling is too narrow and the memory controller is very complex, affecting the critical path. In view of the massive multithreading architecture of GPGPUs that can hide memory access latencies, we exploit in this paper the novel idea of rearranging the memory requests in the network-on-chip (NoC), called packet coalescing. To study the feasibility of this idea, we have designed an expanded NoC router that supports packet coalescing and evaluated its performance extensively. Evaluation results show that this NoC-assisted design strategy can improve the row buffer hit rate in the memory controllers. A comprehensive investigation of factors affecting the performance of coalescing is also conducted and reported. © 2014 IFIP International Federation for Information Processing.
KW - general-purpose graphic processors unit
KW - latency hiding
KW - memory controller
KW - Network-on-chip
KW - router design
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84906776621&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=84906776621&origin=inward
U2 - 10.1007/978-3-662-44917-2_15
DO - 10.1007/978-3-662-44917-2_15
M3 - Conference contribution
SN - 9783662449165
VL - 8707 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 169
EP - 180
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PB - Springer [email protected]
CY - deu
T2 - 11th IFIP WG 10.3 International Conference on Network and Parallel Computing, NPC 2014
Y2 - 18 September 2014 through 20 September 2014
ER -