Menu Content/Inhalt
Home arrow Technology Park arrow HPC arrow Advances in Kepler K20

Advances in Kepler K20 Print
February 2013
CUDA hardware offers many dimensions of parallelism by the arrangement of multiprocessors and cores within each.  Some parallelism is controlled by hardware but some are left to software to optimize.  
K20 adds a new parallel dimension called Hyper Q.   K20 is capable of executing up to 32 kernels launched from different CPU processes simultaneously, which increments the percentage of temporal occupancy on the GPU.  Previous generation CUDA hardware such as Fermi has one connection to the CPU only.  The multi-connectors feature is hardware based.  It improves the level of utilisation of CPU and GPU depending on individual scenarios and the key point is that it eliminates the CPU-GPU connection as a potential bottleneck.  

Another new architecture of K20 is Dynamic Parallelism.  It is an ability to launch new grids from the GPU. Features:
(Hover Mouse over images to Enlarge)

 


o Dynamically: Based on run-time data.
o Independently: Each thread can launch a different grid.
o Simultaneously: From multiple threads at once.

This reduces the coordination with the CPU via the PCI Express bus and shifts coordination to within the GPU.  Internal GPGPU memory transfers are faster than global memory transfer over PCI Express lanes by over 10 times.  
                                           
CUDA SDK (software development kit) Version 5.0 supports the above mentioned new K20 features.   

Titan supercomputer at Oak Ridge National Laboratory published some early experience with K20X and CUDA5 at SC12 in November 2012 on 5 applications.  Gain is defined as the processing time with Opteron CPU and K20X GPU over CPU without GPU.  Gains ranged from 1.8 to 7.8.