466 FP8 TFLOPS at 300W
Tenstorrent has unveiled its next-generation Wormhole processor for AI workloads that guarantees to supply first rate efficiency at a low worth. The corporate at present provides two add-on PCIe playing cards carrying one or two Wormhole processors in addition to TT-LoudBox, and TT-QuietBox workstations aimed toward software program builders. The entire of at this time’s launch is aimed toward builders quite than those that will deploy the Wormhole boards for his or her business workloads.
“It’s all the time rewarding to get extra of our merchandise into developer palms. Releasing improvement methods with our Wormhole™ card helps builders scale up and work on multi-chip AI software program.” stated Jim Keller, CEO of Tenstorrent. “Along with this launch, we’re excited that the tape-out and power-on for our second era, Blackhole, goes very properly.”
Every Wormhole processor packs 72 Tensix cores (that includes 5 RISC-V cores supporting numerous knowledge codecs) with 108 MB of SRAM to ship 262 FP8 TFLOPS at 1 GHz at 160W thermal design energy. A single-chip Wormhole n150 card carries 12 GB of GDDR6 reminiscence that includes a 288 GB/s bandwidth.
Wormhole processors supply versatile scalability to satisfy the various wants of workloads. In an ordinary workstation setup with 4 Wormhole n300 playing cards, the processors can merge to operate as a single unit, showing as a unified, in depth community of Tensix cores to the software program. This configuration permits the accelerators to both work on the identical workload, be divided amongst 4 builders or run as much as eight distinct AI fashions concurrently. An important characteristic of this scalability is that it operates natively with out the necessity for virtualization. In knowledge heart environments, Wormhole processors will scale each inside one machine utilizing PCIe or outdoors of a single machine utilizing Ethernet.
From efficiency standpoint, Tenstorrent’s single-chip Wormhole n150 card (72 Tensix cores at 1 GHz, 108 MB SRAM, 12 GB GDDR6 at 288 GB/s) is able to 262 FP8 TFLOPS at 160W, whereas the dual-chip Wormhole n300 board (128 Tensix cores at 1 GHz, 192 MB SRAM, aggregated 24 GB GDDR6 at 576 GB/s) can supply as much as 466 FP8 TFLOPS at 300W (in line with Tom’s Hardware).
To place that 466 FP8 TFLOPS at 300W quantity into context, let’s evaluate it to what AI market chief Nvidia has to supply at this thermal design energy. Nvidia’s A100 doesn’t assist FP8, however it does assist INT8 and its peak efficiency is 624 TOPS (1,248 TOPS with sparsity). Against this, Nvidia’s H100 helps FP8 and its peak efficiency is huge 1,670 TFLOPS (3,341 TFLOPS with sparsity) at 300W, which is a giant distinction from Tenstorrent’s Wormhole n300.
There’s a large catch although. Tenstorrent’s Wormhole n150 is obtainable for $999, whereas n300 is out there for $1,399. Against this, one Nvidia H100 card can retail for $30,000, relying on portions. After all, we have no idea whether or not 4 or eight Wormhole processors can certainly ship the efficiency of a single H300, although they are going to achieve this at 600W or 1200W TDP, respectively.
Along with playing cards, Tenstorrent provides builders pre-built workstations with 4 n300 playing cards contained in the cheaper Xeon-based TT-LoudBox with lively cooling and a premium EPYC-powered TT-QuietBox with liquid cooling.
Sources: Tenstorrent, Tom’s Hardware