|Computing Sub-system Performance|
This article attempts to provide the basis for assessing the fitness of computer platforms for desktops and servers. The discussions refer to Von Neumann system architecture as all other architectures do not have a similar level of market dominance.
Von Neumann Stored Program Architecture
o The architecture was first known in 1945 and has slightly evolved to mean a stored program computer in which an instruction fetch and a data operation cannot occur at the same time because they share a common bus. It keeps its programmed instructions and data in RAM (random access memory). Modern processors include separate instruction cache and data cache on the CPU die, but they are still considered Von Neumann in this article. The original Harvard Architecture was meant to have separate bus and spaces for instructions and data, and the modified version applies separation at the cache level and not main memory. As such, there is a universal concern on the single bus arrangement creating a performance bottleneck.
Connectivity of Modern x86 Systems
o Modern desktop processors such as Intel Core and AMD APU have incorporated the GPU, memory controller, display controller, and PCIe controller inside the CPU die. They are close in design to System-on-Chip (SOC) such as Intel Atom Centerton which includes the above lot and other I/O controllers on die. Intel Core and AMD APU employ a south bridge to handle low speed I/O device control. Note: The GPU mentioned above is normally for display purposes and not in the class that Nvidia touted for high performance general purpose computing.
o In terms of connectivity, the CPU interfaces with memory and display devices through dedicated controllers, and with other devices through the PCIe controller. Display devices go through the PCIe controller as well for some modern CPU designs. As such, the memory controller and PCIe controller have crucial roles in deciding the performance of a computer.
o Compucon system platform design philosophy carries the caution that a system is as strong as its weakest link. In reality, not all applications have the same appetite for different resources of a computer system and this “as strong as weak link” concept shall be modified to adapt.
Bottleneck of Computing Sub-system
o Is the CPU continually forced to wait for needed data to be transferred to or from memory? One answer comes from the Arithmetic Intensity (see separate article) required by the application relative to the CPU design. AI is the ratio of CPU processing rate to memory data transfer rate.
o The current memory type is DDR3. It is known to have a high latency (lead time before data transfer takes place) and current efforts for DDR4 development are to reduce the latency. Intel E5 Xeon and Core 2011p i7 processors have 4 memory controllers- Intel website has quoted a memory transfer rate of 51GB/s for DDR3. We guess the same processors may hit 100GB/s if DDR4 is used.