Home PC & LaptopHardware B200 Shatters Records, Instinct Fights Against Hopper

B200 Shatters Records, Instinct Fights Against Hopper

by admin

NVIDIA & AMD have just submitted the latest MLPerf Inference performance benchmarks of their latest GPUs, including Blackwell B200 & Instinct MI325X.

NVIDIA Blackwell B200, AMD Instinct MI325X & More Added To The Latest MLPerf Inference Benchmarks, Green Team Miles Ahead of The Competition In Raw Performance

MLPerf Inference v5.0 performance benchmarks are out and the GPU giants have submitted their latest results powered by their latest chips. As we have seen in the past, it’s not just about raw GPU horsepower, but software optimizations and support for the new AI ecosystems and workloads matter a lot too.

NVIDIA Blackwell Sets New Records

The GB200 NVL72 system — connecting 72 NVIDIA Blackwell GPUs to act as a single, massive GPU — delivered up to 30x higher throughput on the Llama 3.1 405B benchmark over the NVIDIA H200 NVL8 submission this round. This feat was achieved through more than triple the performance per GPU and a 9x larger NVIDIA NVLink interconnect domain.

While many companies run MLPerf benchmarks on their hardware to gauge performance, only NVIDIA and its partners submitted and published results on the Llama 3.1 405B benchmark.

Production inference deployments often have latency constraints on two key metrics. The first is time to first token (TTFT), or how long it takes for a user to begin seeing a response to a query given to a large language model. The second is time per output token (TPOT), or how quickly tokens are delivered to the user.

The new Llama 2 70B Interactive benchmark has a 5x shorter TPOT and 4.4x lower TTFT — modeling a more responsive user experience. On this test, NVIDIA’s submission using an NVIDIA DGX B200 system with eight Blackwell GPUs tripled performance over using eight NVIDIA H200 GPUs, setting a high bar for this more challenging version of the Llama 2 70B benchmark.

Combining the Blackwell architecture and its optimized software stack delivers new levels of inference performance, paving the way for AI factories to deliver higher intelligence, increased throughput and faster token rates.

via NVIDIA

With that said, we start by talking about the Green Giant, who has once again taken the lead and scored impressive records with its latest Blackwell GPUs such as the B200. The GB200 NVL72 rack with a total of 72 B200 chips takes the lead, offering a 30x higher performance throughput at the Llama 3.1 405B benchmarks versus the last-generation NVIDIA H200. NVIDIA also saw a tripling in the Llama 70B benchmark when comparing an 8 GPU B200 system against an 8 GPU H200 system.

0

16667

33334

50001

66668

83335

100002

Blackwell B200 180 GB (x8 @ 1000W)

Hopper H200 141 GB (x8 @ 700W)

Instinct MI325X 256 GB (x8 @ 1000W)

Hopper H100 80 GB (x8 @ 700W)

AMD is also submitting its newest Instinct MI325X 256 GB accelerator, which can be seen present in an x8 configuration.

The AMD results put them on par with the H200 system and the larger memory capacity surely helps with massive LLMs though they are still far behind the Blackwell B200 & with the Ultra platform arriving later this year in the form of the B300, AMD would have to keep the pace up in both hardware and software segments. They do have the Instinct MI350 series.

Blackwell B200 180 GB (x8 @ 1000W)

Hopper H200 141 GB (x8 @ 700W)

Hopper H100 80 GB (x8 @ 700W)

Instinct MI325X 256 GB (x8 @ 1000W)

There are also benchmarks for the Hopper H200 series, which has seen continued optimizations. As compared to just last year, the inference performance has been raised by 50 percent, which is a substantial gain for firms that are continuing to rely on the platforms.

Source link

Related Posts

Leave a Comment