Amazon’s Trainium2 AI Accelerator Options 96 GB of HBM, Quadruples Coaching Efficiency
Amazon Net Companies this week launched Trainium2, its new accelerator for synthetic intelligence (AI) workload that tangibly will increase efficiency in comparison with its predecessor, enabling AWS to coach basis fashions (FMs) and huge language fashions (LLMs) with as much as trillions of parameters. As well as, AWS has set itself an formidable aim to allow its shoppers to entry large 65 ‘AI’ ExaFLOPS efficiency for his or her workloads.
The AWS Trainium2 is Amazon’s 2nd Era accelerator designed particularly for FMs and LLMs coaching. When in comparison with its predecessor, the unique Trainium, it options 4 occasions greater coaching efficiency, two occasions greater efficiency per watt, and thrice as a lot reminiscence – for a complete of 96GB of HBM. The chip designed by Amazon’s Annapurna Labs is a multi-tile system-in-package that includes two compute tiles, 4 HBM reminiscence stacks, and two chiplets whose goal is undisclosed for now.
Amazon notably doesn’t disclose particular efficiency numbers of its Trainium2, nevertheless it says that its Trn2 situations are scale-out with as much as 100,000 Trainium2 chips to stand up to 65 ExaFLOPS of low-precision compute efficiency for AI workloads. Which, working backwards, would put a single Trainium2 accelerator at roughly 650 TFLOPS. 65 EFLOPS is a degree set to be achievable solely on the highest-performing upcoming AI supercomputers, such because the Jupiter. Such scaling ought to dramatically cut back the coaching time for a 300-billion parameter giant language mannequin from months to weeks, in line with AWS.
Amazon but has to reveal the total specs for Trainium2, however we might be shocked if it did not add some options on high of what the unique Trainium already helps. As a reminder, that co-processor helps FP32, TF32, BF16, FP16, UINT8, and configurable FP8 information codecs in addition to delivers as much as 190 TFLOPS of FP16/BF16 compute efficiency.
What is maybe extra vital than pure efficiency numbers of a single AWS Trainium2 accelerators is that Amazon has companions, similar to Anthropic, which might be able to deploy it.
“We’re working carefully with AWS to develop our future basis fashions utilizing Trainium chips,” stated Tom Brown, co-founder of Anthropic. “Trainium2 will assist us construct and practice fashions at a really giant scale, and we anticipate it to be a minimum of 4x quicker than first technology Trainium chips for a few of our key workloads. Our collaboration with AWS will assist organizations of all sizes unlock new potentialities, as they use Anthropic’s state-of-the-art AI techniques along with AWS’s safe, dependable cloud know-how.”