Menu Content/Inhalt
Home

Tesla P4 Print
April 2018
2NTP4P8.PNG

 


In the new era of AI and intelligent machines, deep learning is shaping our world like no other computing model in history. Interactive speech, visual search, and video recommendations are a few of many AI-based services that we use every day.

Accuracy and responsiveness are key to user adoption for these services. As deep learning models increase in accuracy and complexity, CPUs are no longer capable of delivering a responsive user experience.

The NVIDIA Tesla P4 is powered by the revolutionary NVIDIA Pascal™ architecture and purpose-built to boost efficiency for scale-out servers running deep learning workloads, enabling smart responsive AI-based services. It slashes inference latency by 15X in any hyperscale infrastructure and provides an incredible 60X better energy efficiency than CPUs. This unlocks a new wave of AI services previous impossible due to latency limitations.


TESLA P4 Accelerator Features and Benefits

RESPONSIVE EXPERIENCE WITH REAL-TIME INFERENCE
Responsiveness is key to user engagement for services such as interactive speech, visual search, and video recommendations. As models increase in accuracy and complexity, CPUs are no longer capable of delivering a responsive user experience. The Tesla P4 delivers 22 TOPs of inference performance with INT8 operations to slash latency by 15X.
UNLOCK NEW AI-BASED VIDEO SERVICES WITH A DEDICATED DECODE ENGINE Tesla P4 can transcode and infer up to 35 HD video streams in real-time, powered by a dedicated hardware-accelerated decode engine that works in parallel with the GPU doing inference. By integrating deep learning into the video pipeline, customers can offer smart, innovative video services to users which were previously impossible to do.
UNPRECEDENTED EFFICIENCY FOR LOWPOWER SCALE-OUT SERVERS The Tesla P4’s small form factor and 50W/75W power footprint design accelerates densityoptimized, scale-out servers. It also provides an incredible 60X better energy efficiency than CPUs for deep learning inference workloads, letting hyperscale customers meet the exponential growth in demand for AI applications.
FASTER DEPLOYMENT WITH TensorRT AND DEEPSTREAM SDK TensorRT is a library created for optimizing deep learning models for production deployment. It takes trained neural nets—usually in 32-bit or 16-bit data—and optimizes them for reduced precision INT8 operations. NVIDIA DeepStream SDK taps into the power of Pascal GPUs to simultaneously decode and analyze video streams.

Specifications:

Specification Description
GPU Architecture
NVIDIA Pascal™
Cuda Cores
2560
Single-Precision Performance
5.5 TeraFLOPS
Integer Operations (INT8) 22 TOPS (Tera- Operations per Second)
GPU Memory
8GB GDDR5
Memory bus width
256-bit
Memory bandwidth
192 GB/s
Memory clock Perfermance: 2.8GHz
Idle: 324 MHz
System Interface Low-Profile PCI Express Gen3
Max Power 50W-75W
Form Factor 68.58 mm, (2.7 inches) (H) × 167.64 mm, (6.6 inches) (L)

 
Tesla P100 Print
April 2018
2NTP100P16.PNG

 


HPC data centers need to support the ever-growing demands of scientists and researchers while staying within a tight budget. The old approach of deploying lots of commodity compute nodes requires huge interconnect overhead that substantially increases costs without proportionally increasing performance.

NVIDIA Tesla P100 GPU accelerators are the most advanced ever built, powered by the breakthrough NVIDIA Pascal™ architecture and designed to boost throughput and save money for HPC and hyperscale data centers. The newest addition to this family, Tesla P100 for PCIe enables a single node to replace half a rack of commodity CPU nodes by delivering lightning-fast performance in a broad range of HPC applications.


TESLA P100 Accelerator Features and Benefits

PASCAL ARCHITECTURE
More than 18.7 TeraFLOPS of FP16, 4.7 TeraFLOPS of double-precision, and 9.3 TeraFLOPS of singleprecision performance powers new possibilities in deep learning and HPC workloads.
COWOS HBM2 Compute and data are integrated on the same package using Chip-on- Wafer-on-Substrate with HBM2 technology for 3X memory performance over the previous-generation architecture.
PAGE MIGRATION ENGINE Simpler programming and computing performance tuning means that applications can now scale beyond the GPU’s physical memory size to virtually limitless levels.

Specifications:

Specification Description
GPU Architecture
NVIDIA Pascal™ 
Cuda Cores
3584
Single-Precision Performance
9.3 TeraFLOPS
Double-Precision Performance 4.7 TeraFLOPS
GPU Memory
16GB GDDR5
Memory bus width
4096-bit
Memory bandwidth
732 GB/s
Memory clock
715 MHz
System Interface
Full Height/Length PCI Express Gen3
Max Power
250W
Form Factor
111.15 mm, (4.38 inches) (H) × 266.70 mm, (10.5 inches) (L)

 
Tesla P40 Print
April 2018
2NTP40P24.PNG

 


In the new era of AI and intelligent machines, deep learning is shaping our world like no other computing model in history. GPUs powered by the revolutionary NVIDIA Pascal™ architecture provide the computational engine for the new era of artificial intelligence, enabling amazing user experiences by accelerating deep learning applications at scale.

The NVIDIA Tesla P40 is purpose-built to deliver maximum throughput for deep learning deployment. With 47 TOPS (Tera-Operations Per Second) of inference performance and INT8 operations per GPU, a single server with 8 Tesla P40s delivers the performance of over 140 CPU servers.

As models increase in accuracy and complexity, CPUs are no longer capable of delivering interactive user experience. The Tesla P40 delivers over 30X lower latency than a CPU for real-time responsiveness in even the most complex models.


TESLA P40 Accelerator Features and Benefits

140X HIGHER THROUGHPUT TO KEEP UP WITH EXPLODING DATA
The Tesla P40 is powered by the new Pascal architecture and delivers over 47 TOPS of deep learning inference performance. A single server with 8 Tesla P40s can replace up to 140 CPU-only servers for deep learning workloads, resulting in substantially higher throughput with lower acquisition cost.
SIMPLIFIED OPERATIONS WITH A SINGLE TRAINING AND INFERENCE PLATFORM Today, deep learning models are trained on GPU servers but deployed in CPU servers for inference. The Tesla P40 offers a drastically simplified workflow, so organizations can use the same servers to iterate and deploy.
REAL-TIME INFERENCE The Tesla P40 delivers up to 30X faster inference performance with INT8 operations for real-time responsiveness for even the most complex deep learning models.
FASTER DEPLOYMENT WITH NVIDIA DEEP LEARNING SDK TensorRT included with NVIDIA Deep Learning SDK and Deep Stream SDK help customers seamlessly leverage inference capabilities like the new INT8 operations and video trans-coding.

Specifications:

Specification Description
GPU Architecture
NVIDIA Pascal™ 
Cuda Cores
3840
Single-Precision Performance
12 TeraFLOPS
Integer Operations (INT8) 47 TOPS (Tera- Operations per Second)
GPU Memory
24GB GDDR5
Memory bus width
384-bit
Memory bandwidth
346 GB/s
Memory clock
Performance:3615 MHz
Idle: 405 MHz
System Interface
Full Height/Length PCI Express Gen3
Max Power
250W
Form Factor
111.15 mm, (4.38 inches) (H) × 266.70 mm, (10.5 inches) (L)

 
Tesla V100 Print
April 2018
2NTV100P.PNG

 


NVIDIA® Tesla® V100 is the world’s most advanced data center GPU ever built to accelerate AI, HPC, and graphics. Powered by NVIDIA Volta, the latest GPU architecture, Tesla V100 offers the performance of up to 100 CPUs in a single GPU—enabling data scientists, researchers, and engineers to tackle challenges that were once thought impossible.

Tesla V100 is the flagship product of Tesla data center computing platform for deep learning, HPC, and graphics. The Tesla platform accelerates over 550 HPC applications and every major deep learning framework. It is available everywhere from desktops to servers to cloud services, delivering both dramatic performance gains and cost savings opportunities.


 

TESLA V100 Accelerator Features and Benefits

VOLTA ARCHITECTURE
By pairing CUDA Cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning.
MAXIMUM EFFICIENCY MODE The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80% of the performance at half the power consumption.
TENSOR CORE Equipped with 640 Tensor Cores, Tesla V100 delivers 125 teraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for DL Training, and 6X Tensor FLOPS for DL Inference when compared to NVIDIA Pascal™ GPUs.
HBM2 With a combination of improved raw bandwidth of 900GB/s and higher DRAM utilization efficiency at 95%, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STREAM. Tesla V100 is now available in a 32GB configuration that doubles the memory of the standard 16GB offering.
NEXT GENERATION NVLINK NVIDIA NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300GB/s to unleash the highest application performance possible on a single server.
PROGRAMMABILITY Tesla V100 is architected from the ground up to simplify programmability. Its new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.

Specifications:

Specification Description
GPU Architecture
NVIDIA Volta
Tensor Cores
640
Cuda Cores
5120
Single-Precision Performance
14 TeraFLOPS
Double-Precision Performance
7 TeraFLOPS
Tensor Performance 112 TeraFLOPS
GPU Memory
16GB GDDR5
Memory bus width
4096-bit
Memory bandwidth
900 GB/s
Memory clock
877 MHz
System Interface
Full Height/Length PCI Express Gen3
Max Power
250W
Form Factor
111.15 mm, (4.38 inches) (H) × 266.70 mm, (10.5 inches) (L)

 
Tesla P100 Print
April 2018
2NTP100P16.PNG

 


HPC data centers need to support the ever-growing demands of scientists and researchers while staying within a tight budget. The old approach of deploying lots of commodity compute nodes requires huge interconnect overhead that substantially increases costs without proportionally increasing performance.

NVIDIA Tesla P100 GPU accelerators are the most advanced ever built, powered by the breakthrough NVIDIA Pascal™ architecture and designed to boost throughput and save money for HPC and hyperscale data centers. The newest addition to this family, Tesla P100 for PCIe enables a single node to replace half a rack of commodity CPU nodes by delivering lightning-fast performance in a broad range of HPC applications.


TESLA P100 Accelerator Features and Benefits

PASCAL ARCHITECTURE
More than 18.7 TeraFLOPS of FP16, 4.7 TeraFLOPS of double-precision, and 9.3 TeraFLOPS of singleprecision performance powers new possibilities in deep learning and HPC workloads.
COWOS HBM2 Compute and data are integrated on the same package using Chip-on- Wafer-on-Substrate with HBM2 technology for 3X memory performance over the previous-generation architecture.
PAGE MIGRATION ENGINE Simpler programming and computing performance tuning means that applications can now scale beyond the GPU’s physical memory size to virtually limitless levels.

Specifications:

Specification Description
GPU Architecture
NVIDIA Pascal™ 
Cuda Cores
3584
Single-Precision Performance
9.3 TeraFLOPS
Double-Precision Performance 4.7 TeraFLOPS
GPU Memory
16GB GDDR5
Memory bus width
4096-bit
Memory bandwidth
732 GB/s
Memory clock
715 MHz
System Interface
Full Height/Length PCI Express Gen3
Max Power
250W
Form Factor
111.15 mm, (4.38 inches) (H) × 266.70 mm, (10.5 inches) (L)

 
<< Start < Prev 31 32 33 34 35 36 37 38 39 40 Next > End >>

Results 352 - 360 of 2511