NVIDIA is also unveiling its brand new Ampere-based A10 and A30 Tensor Core GPUs today, in addition to all of the other CPU and GPU announcements. The two GPUs are mostly designed for virtualization platforms and are targeted at data centres.
NVIDIA Ampere A10 24 GB GDDR6 Tensor Core CPUs and A30 24 GB HBM2 Tensor Core CPUs are now available.
The specs of these brand new Tensor Core GPUs are fascinating. The GA102 GPU is used by the A10, while the GA100 GPU is used by the A30. While both will be Ampere-based, the memory subsystems for both GPUs will be somewhat different, with the A10 providing GDDR6 and the A30 opting for the traditional HBM2 data centre memory interface. But let's look at the specs in more detail.
NVIDIA A10 GPU with Ampere Tensor Cores
The GA102-890 SKU powers the NVIDIA A10 Tensor Core GPU. There are 72 SMs in all, with a total of 9216 CUDA Cores. The GPU has an 885 MHz base clock which can increase to 1695 MHz. It supports PCIe Gen 4.0, and the GPU has 24 GB of GDDR6 VRAM that runs at 12.5 Gbps over a 384-bit wide bus interface. A bandwidth of 600 GB/s is provided by the GPU.
The card is designed with a champagne gold-colored shroud and is available in a single-slot, full-length form factor. There is no fan on this card since it is passively cooled, and power is supplied by a single 8-pin connector, with the card operating off a single 8-pin connector to satisfy its 150W TDP requirement. The NVIDIA A10 Tensor Core GPU provides up to 31.2 TF FP32, 62.5 TF TF32, 125 TF BFLOAT16, 250 TOPS INT8, 500 TOPS INT4 and twice the speeds of sparsity in terms of efficiency.
NVIDIA A30 GPU with Ampere Tensor Cores
On the other hand, the NVIDIA A30 Tensor Core GPU uses a GA100 SKU, although the exact version is unknown. It seems to be a simplified version with a base clock of 930 MHz and a boost clock of up to 1440 MHz. The GPU is fitted with 24 GB of HBM2 VRAM, which runs at 1215 MHz over a 3072-bit large bus interface. This means there are only three HBM2 memory stacks available at any given time. Memory bandwidth of up to 933 GB/s is provided by the stacks.
The NVIDIA A30 Tensor Core GPU, unlike the A10, has a dual-slot and full-length architecture. It, too, is operated by a single 8-pin connector, but with a TDP of 165W. The NVIDIA A30 Tensor Core GPU will achieve up to 5.2 TF FP64, 10.3 TF
Peak FP64TF, 10.3 TF FP32, 82 TF TF32, 165 TF BFLOAT16, 330 TOPS INT8, 661 TOPS INT4 and double the rates with sparsity.
NVIDIA Tensor Core Ampere GPUs | ||
---|---|---|
A10 | A30 | |
GPU | GA102-890 | GA100 |
FP64 | – | 5.2 teraFLOPS |
FP64 Tensor Core | – | 10.3 teraFLOPS |
FP32 | 31.2 teraFLOPS | 10.3 teraFLOPS |
TF32 Tensor Core | 62.5 teraFLOPS | 125 teraFLOPS* | 82 teraFLOPS | 165 teraFLOPS* |
BFLOAT16 Tensor Core | 125 teraFLOPS | 250 teraFLOPS* | 165 teraFLOPS | 330 teraFLOPS* |
FP16 Tensor Core | 125 teraFLOPS | 250 teraFLOPS* | 165 teraFLOPS | 330 teraFLOPS* |
INT8 Tensor Core | 250 TOPS | 500 TOPS* | 330 TOPS | 661 TOPS* |
INT4 Tensor Core | 500 TOPS | 1,000 TOPS* | 661 TOPS | 1321 TOPS* |
RT Core | 72 RT Cores | – |
Encode/decode | 1 encoder 2 decoder (+AV1 decode) | 1 optical flow accelerator (OFA) 1 JPEG decoder (NVJPEG) 4 video decoders (NVDEC) |
GPU memory | 24GB GDDR6 | 24GB HBM2 |
GPU memory bandwidth | 600GB/s | 933GB/s |
Interconnect | PCIe Gen4 64GB/s | PCIe Gen4: 64GB/s Third-gen NVLINK: 200GB/s** |
Form factors | Single-slot, full-height, full-length (FHFL) | Dual-slot, full-height, full-length (FHFL) |
Max thermal design power (TDP) | 150W | 165W |
Multi-Instance GPU (MIG) | – | 4 GPU instances @ 6GB each 2 GPU instances @ 12GB each 1 GPU instance @ 24GB |
vGPU software support | NVIDIA Virtual PC, NVIDIA Virtual Applications, NVIDIA RTX Virtual Workstation, NVIDIA Virtual Compute Server | NVIDIA AI Enterprise for VMware NVIDIA Virtual Compute Server |
All-New GPU Servers from Inspur Supporting A30, A10, and A100
NF5468M6: supports 2x Intel 3rd Gen Intel Xeon Scalable processors and 8x NVIDIA A100/A40/A30 GPUs, 16x NVIDIA A10 GPUs, or 20x NVIDIA T4 GPUs in a 4U chassis; supports up to 12x 3.5-inch hard drives for large local storage; flexibly adapts to latest AI accelerators and smart NICs; has the unique feature of switching topologies with one click for
NF5468A5: scalable AI server with 2x AMD Rome/Milan CPUs and 8x NVIDIA A100/A40/A30 GPUs; N+N redundancy architecture enables 8x 350W AI accelerators to run at full speed for superior reliability; CPU-to-GPU non-blocking design allows interconnection without the use of a PCIe switch, resulting in higher computing efficiency.
NF5280M6: a 2U chassis with 2x Intel 3rd Gen Xeon Scalable processors and 4x NVIDIA A100/A40/A30/A10 GPUs or 8x NVIDIA T4 Tensor Core GPUs that can run at 45°C for long periods of time. The NF5280M6 has the latest PFR/SGX technology and a trustworthy security module architecture, making it ideal for high-performance AI applications.
Inspur also announced the Inspur M6 AI servers, which completely support NVIDIA Bluefield- 2 DPUs. Inspur intends to incorporate NVIDIA Bluefield-2 DPUs into its next-generation AI servers in the future, allowing for quicker and more effective management of users and clusters, as well as interconnected data connectivity, in contexts such as AI, big data processing, cloud storage, and virtualization.
Availability
More than 20 NVIDIA-Certified Systems are now available from computer manufacturers all over the world. Manufacturers will begin offering NVIDIA-Certified Systems with NVIDIA A30 and NVIDIA A10 GPUs later this year.
The perpetual licence for NVIDIA AI Enterprise costs $3,595 per CPU socket. NVIDIA AI Enterprise Enterprise Business Standard Support costs $899 per licence per year. Customers who want to upgrade to VMware vSphere 7 Update 2 will qualify for early access to NVIDIA AI Enterprise.
Post a Comment