Huawei outperforms NVIDIA at the “cluster” level. Which are mostly turnkey systems for datacenter units. And promises truck container level cluster for next generation that is 30x the zetaflops as NVIDIA rubin cluster. China currently operates at 50% electric production capacity, and energy extremely abundant and low price, which make the per level card performance deficit irrelevant.
To be fair, the raw FLOPs count doesn’t tell the whole story. On a lot of workloads (including token generation during LLM inference), you’re bound by the memory bandwidth rather than throughput/FLOPs. On H100/H200, keeping the tensor cores fully occupied is surprisingly difficult, and that’s with 3+ TB/s of memory bandwidth. And I believe those cards have much higher throughput (at least at FP8, Ascend wins at FP4 since H100/200 don’t support it) compared to Ascend.
The Ascend 950PR units have far lower memory bandwidth, reportedly at 1.4 TB/s. Compare that to Blackwell, which has something like 8TB/s of bandwidth. I believe they’re manufacturing their own kind of HBM, so that’s still really impressive considering this is a fairly recent push into manufacturing accelerators. But I’m a bit skeptical it actually outperforms NVIDIA at scale.
Huawei outperforms NVIDIA at the “cluster” level. Which are mostly turnkey systems for datacenter units. And promises truck container level cluster for next generation that is 30x the zetaflops as NVIDIA rubin cluster. China currently operates at 50% electric production capacity, and energy extremely abundant and low price, which make the per level card performance deficit irrelevant.
To be fair, the raw FLOPs count doesn’t tell the whole story. On a lot of workloads (including token generation during LLM inference), you’re bound by the memory bandwidth rather than throughput/FLOPs. On H100/H200, keeping the tensor cores fully occupied is surprisingly difficult, and that’s with 3+ TB/s of memory bandwidth. And I believe those cards have much higher throughput (at least at FP8, Ascend wins at FP4 since H100/200 don’t support it) compared to Ascend.
The Ascend 950PR units have far lower memory bandwidth, reportedly at 1.4 TB/s. Compare that to Blackwell, which has something like 8TB/s of bandwidth. I believe they’re manufacturing their own kind of HBM, so that’s still really impressive considering this is a fairly recent push into manufacturing accelerators. But I’m a bit skeptical it actually outperforms NVIDIA at scale.