| CUDA Hardware Computation Hierarchy |
|
| June 2013 | |
|
This note requires the reading of CUDA processor hierarchy and memory hierarchy as a pre-requisite as it attempts to link the two aspects together for computational situations. The content was based on various online resources as of 2013-06.
Kernel, Grid, Block and Thread
o All kernels have access to the global memory. As such kernels can share data. o A grid can have 1 or 2 dimensions and a block can have 1, 2 or 3 dimensions. These dimensions or coordinates are abstractions to assist the programmer only. The programmer specifies the grid size (number of rows and columns) and block size (rows and columns and layers of threads) to correspond with the problem being addressed.
Warp and Thread o Accesses to the global or shared memory are arranged in half-warps (each SM has 16 Store / Load registers) so that the other half can continue to run to use time more effectively. o Programmers are encouraged to specify a large number of warps in order to achieve minimal waste of latency time and maximum parallel execution effect. |
