N-Body refers to a number of bodies such as stars in a galaxy on a very
large scale or bio-molecules within a cell on a very small scale.
Scientists have attempted to develop models and make observations of their
behaviors and they have not gone far because all bodies interact with each
other and the counts of bodies are far beyond our day to day visible
scale. We have to resort to simulations as a method to stimulate our
N-Body Simulations are mathematical models with time dependency. Properties
of bodies change with time. Change of any properties of any one body
would have a direct or indirect effect on all other bodies. Simulations
by computations will reveal patterns of behaviors over time and state of
affairs at any one time, and these types of artificial information would
provide stimulations and ideas for our inquiries into
The tests Compucon did and reported here are in the domain of astrophysics with
gravity as the main force between stars in a galaxy. The mathematical
model was taken from an open source community and Compucon did not have
any inputs to the model. The purpose of Compucon tests is to find out how
well our hardware tools will handle the simulation process.
All bodies exert a force on other bodies simultaneously. We can attempt
to use the conventional approach of computer simulation by running software
codes by hardware processors in series. This will not be a joyful process
as we talk about an astronomical number of processes happening at the same instant
and at all instants. We must resort to parallel processing for this type
of simulation. Tesla from Nvidia was designed and developed squarely for
We were interested in the computational abilities of the hardware and not on
the scientific implications of simulations on this occasion. We show
3 figures of test results below. In each figure, we show the
computational performance for a very small number of bodies (2000) to start
with and up to 50K (thousand) which is still a very small number in the
cosmos. Scientists have predicted over 1000B (billion) of stars in
the Milky Way Galaxy.
The test system is a standard Compucon Superhawk Plus with 16GB of main memory
and a standard 7200rpm hard disk running Windows 7 Professional 64bit version.
Parallel computation capabilities tested were Quadro 2000 (with 448 cores and
1GB memory), Tesla C2075 (with 512 cores and 6GB memory), and 2x Tesla C2075.
(Hover Mouse Over To View Figures)
# Figure 1: Total time per 10 iterations in ms
# Figure 2: Number of interactions in billion per second
# Figure 3: GFLOPS at 20 FLOPS per interaction on single precision
The figures show that the performance of one Tesla card flatted out at about
50K bodies on 500GFLOPS, and that 2 Tesla cards hit 1TFLOPS at 50K and would go
up further beyond 50K. This implies the needs for a very high level of
parallel computation power for scenarios involving a large count of bodies such
as galaxies. Forget Quadro. Put in more Tesla.
Nvidia specified C2075 as having 1TFLOPS peak. 1000 Tesla C2075 will give
us more than 1PFLOPS by simple arithmetic. It is not difficult to go past
IBM Sequoia which was ranked world #1 in June 2012 with 16PFLOPS.
Cost is 16K x $2K = $32M for an idea.
We repeated the tests with double precision. As expected, we obtained the
same pattern of computational performance as for single precision. The
figures below show SP and DP together. DP is double of SP in count of
decimals, but the performance of Tesla for DP is less than half of SP.
Heterogeneous hardware performance, CUDA specifically, for super-computing
purposes will be analysed in a separate article to be published in September
2012 on this website.