The AMD Advancing AI & Intuition MI300 Launch Reside Weblog (Begins at 10am PT/18:00 UTC)
This morning is a crucial one for AMD – maybe a very powerful of the 12 months. After virtually a 12 months and a half of build-up, and even longer for precise improvement, AMD is launching their subsequent era GPU/APU/AI accelerator household, the Intuition MI300 collection. Primarily based on AMD’s new CDNA 3 structure, and mixing it with AMD’s confirmed Zen 4 cores, AMD might be making a full-court press for the high-end GPU and accelerator market with their new product, aiming to steer in each big-metal HPC in addition to the burgeoning marketplace for generative AI coaching and inference.
Taking the stage for AMD’s launch occasion might be AMD CEO Dr. LIsa Su, in addition to a quite a few AMD executives and ecosystem companions, to element, finally, AMD’s newest era GPU structure, and the numerous types it can are available. With each the MI300X accelerator and MI300A APU, AMD is aiming to cowl many of the accelerator market, whether or not shoppers simply want a robust GPU or a tightly-coupled GPU/CPU pairing.
The stakes for right now’s announcement are important. The marketplace for generative AI is all however {hardware} constrained in the intervening time, a lot to the good thing about (and earnings for) AMD’s rival NVIDIA. So AMD is hoping to capitalize on this second to chop off a chunk – maybe a really large piece – of the marketplace for generative AI accelerators. AMD has made breaking into the server area their highest precedence during the last half-decade, and now, they consider, is their time to take a giant piece of the server GPU market.
12:56PM EST – We’re right here in San Jose for AMD’s remaining and most necessary launch occasion of the 12 months: Advancing AI
12:57PM EST – As we speak AMD is making the eagerly anticipated launch of their next-generation MI300 collection of accelerators
12:58PM EST – Together with MI300A, their first chiplet-based server APU, and MI300X, their stab on the strongest GPU/accelerator doable for the AI market
12:59PM EST – I would say the occasion is being held in AMD’s yard, however since AMD bought their campus right here within the bay space a number of years in the past, that is extra like NVIDIA’s yard. Which is becoming, on condition that AMD is trying to seize a chunk of the extremely worthwhile Generative AI market from NVIDIA
12:59PM EST – We’re supposed to begin at 10am native time right here – so in one other minute or so
12:59PM EST – And hey, right here we go. Proper on time
01:00PM EST – Beginning with a gap trailer
01:00PM EST – (And becoming a member of me on this morning’s reside weblog is the always-awesome Gavin Bonshor)
01:00PM EST – Advancing AI… collectively
01:01PM EST – And here is AMD’s CEO, Dr. Lisa Su
01:01PM EST – As we speak “is all about AI”
01:01PM EST – And Lisa is diving proper in
01:02PM EST – It is solely been only a bit over a 12 months since ChatGPT was launched. And it is turned the computing {industry} on its head reasonably rapidly
01:02PM EST – AMD views AI as the one most transformative expertise within the final 50 years
01:02PM EST – And with a reasonably fast adoption charge, regardless of being on the very starting of the AI period
01:02PM EST – Lisa’s itemizing off among the use instances for AI
01:03PM EST – And the important thing to it? Generative AI. Which requires important investments in infrastructure
01:03PM EST – (Which NVIDIA has captured the lion’s share of to this point)
01:03PM EST – In 2023 AMD projected the CAGR for the AI market can be $350B by 2027
01:04PM EST – Now they assume it’ll be $400B+ by 2027
01:04PM EST – A better than 70% compound annual development charge
01:04PM EST – AMD’s AI technique is centered round 3 large strategic priorities
01:05PM EST – A broad {hardware} portfolio, an open and confirmed software program ecosystem, and partnerships to co-innovate with
01:05PM EST – (AMD has traditionally struggled with software program specifically)
01:05PM EST – Now to merchandise, beginning with the cloud
01:06PM EST – Generative AI requires tens of 1000’s of accelerators on the high-end
01:06PM EST – The extra compute, the higher the mannequin, the quicker the solutions
01:06PM EST – Launching right now: AMD Intuition MI300X accelerator
01:06PM EST – “Highest efficiency accelerator on the planet for generative AI”
01:07PM EST – CDNA 3 comes wiht a brand new compute engine, sparsity help, industry-leading reminiscence bandwidth and capability, and so forth
01:07PM EST – 3.4x extra perf for BF16, 6.8x INT8 perf, 1.6x reminiscence bandwidth
01:07PM EST – 153B transistors for MI300X
01:08PM EST – A dozen 5nm/6nm chiplets
01:08PM EST – 4 I/O Dies within the base layer
01:08PM EST – 256MB AMD Infinity Cache, Infinity Cloth Help, and so forth
01:08PM EST – 8 XCD compute dies stacked on prime
01:08PM EST – 304 CDNA 3 compute items
01:08PM EST – Wired to the IODs by way of TSVs
01:09PM EST – And eight stacks of HBM3 connected to the IODs, for 192GB of reminiscence, 5.3 TB/second of bandwidth
01:09PM EST – And instantly leaping to the H100 comparisons
01:10PM EST – AMD has the benefit in reminiscence capability and bandwidth because of having extra HBM stacks. They usually assume that is going to assist carry them to victory over H100
01:10PM EST – AMD finds they’ve the efficiency benefit in FlashAttention-2 and Llama 2 70B. On the kernel stage in TFLOPS
01:11PM EST – And the way does MI300X scale?
01:11PM EST – Evaluating a single 8 accelerator server
01:12PM EST – Bloom 176B (throughput) and Llama 2 70B (latency) inference efficiency.
01:12PM EST – And now AMD’s first visitor of many, Microsoft
01:13PM EST – MS CTO, Kevin Scott
01:14PM EST – Lisa is asking Kevin for his ideas on the place the {industry} is on this AI journey
01:15PM EST – Microsoft and AMD have been constructing the muse for a number of years right here
01:16PM EST – And MS might be providing MI300X Azure cases
01:16PM EST – MI300X VMs can be found in preview right now
01:17PM EST – (So MS apparently already has a significant quanity of the accelerators)
01:17PM EST – And that is MS. Again to Lisa
01:17PM EST – Now speaking concerning the Intuition platform
01:18PM EST – Which relies on an OCP (OAM) {hardware} design
01:18PM EST – (No fancy title for the platform, in contrast to HGX)
01:18PM EST – So here is an entire 8-way MI300X board
01:18PM EST – Could be dropped into virtually any OCP-compliant design
01:19PM EST – Making it straightforward to put in MI300X
01:19PM EST – And making some extent that AMD helps all the identical I/O and networking capabilities of the competitors (however with higher GPUs and reminiscence, in fact)
01:20PM EST – Prospects are attempting to maximise not simply area, however capital expedetures and operational expedetures as effectively
01:20PM EST – On the OpEx aspect, extra reminiscence means having the ability to run both extra fashions or larger fashions
01:21PM EST – Which saves on CapEx bills by shopping for fewer {hardware} items total
01:21PM EST – And now for the subsequent accomplice, Oracle. Karan Batta, the SVP of Oracle Cloud Infrastructure
01:22PM EST – Oracle is one in every of AMD’s main cloud computing prospects
01:23PM EST – Oracle might be supporting MI300X as a part of their naked metallic compute choices
01:23PM EST – And MI300X in a generative AI service that’s within the works
01:24PM EST – Now on stage: AMD President Victor Peng to speak about software program progress
01:25PM EST – AMD’s software program stack is historically been their achilles heel, regardless of efforts to enhance it. Peng’s large challenge has been to lastly get issues so as
01:25PM EST – Together with constructing a unified AI software program stack
01:25PM EST – As we speak’s focus is on ROCm, AMD’s GPU software program stack
01:26PM EST – AMD has firmly connected their horse to open supply, which they take into account an enormous profit
01:26PM EST – Bettering ROCm help for Radeon GPUs continues
01:26PM EST – ROMc 6 delivery later this month
01:27PM EST – It has been optimized for generative AI, for MI300 and different {hardware}
01:27PM EST – “ROCm 6 delivers a quantum leap in efficiency and functionality”
01:28PM EST – Software program perf optimization instance with LLMs
01:28PM EST – 2.6x from optimized libraries, 1.4x from HIP Graph, and so forth
01:28PM EST – This, mixed with {hardware} adjustments, is how AMD is delivering 8x extra GenAI perf on MI300X versus MI250 (with ROCm 5)
01:29PM EST – Recapping latest acquisitions as effectively, such because the nod AI compiler
01:30PM EST – And on the ecosystem stage, AMD has an growing variety of companions
01:30PM EST – Hugging Face arguably being a very powerful, with 62K+ fashions up and operating on AMD {hardware}
01:31PM EST – AMD GPUs might be supported within the OpenAI Triton 3.0 launch
01:32PM EST – Now for extra visitors: Databricks, Important AI, and Lamini
01:33PM EST – The 4 of them are having a brief chat concerning the AI world and their expertise with AMD
01:34PM EST – Speaking concerning the improvement of main instruments reminiscent of vLLM
01:34PM EST – Price is a large driver
01:36PM EST – It was very straightforward to incluide ROCm in Databricks’ stack
01:36PM EST – In the meantime Important AI is taking a full stack strategy
01:37PM EST – The benefit of use of AMD’s software program was “very nice”
01:38PM EST – And at last, Lamini’s CEO, who has a PhD in Generative AI
01:39PM EST – Prospects get to totally personal their fashions
01:39PM EST – Imbuing LLNs with actual data
01:39PM EST – Had an AMD cloud in manufacturing for over the previous 12 months on MI210s/MI250s
01:40PM EST – Lamini has reached software program parity with CUDA
01:41PM EST – Most of the genAI instruments out there right now are open supply
01:41PM EST – Lots of them can run on ROCm right now
01:43PM EST – AMD’s Intuition merchandise are essential to supporting the way forward for enterprise software program
01:46PM EST – And that is the mini-roundtable
01:47PM EST – Summing up the final 6 months of labor on software program
01:47PM EST – ROCm 6 delivery quickly
01:47PM EST – 62K fashions operating right now, and extra coming quickly
01:48PM EST – And that is a wrap for Victor Peng. Again to Lisa Su
01:49PM EST – And now for an additional visitor spot: Meta
01:49PM EST – Ajit Mathews, Sr. Director of Engineering at Meta AI
01:50PM EST – Meta opened entry to the Llama 2 mannequin household in July
01:50PM EST – “An open strategy results in higher and safer expertise within the long-run”
01:51PM EST – Meta has been working with EPYC CPUs since 2019. And just lately deployed Genoa at scale
01:51PM EST – However that partnership is far broader than CPUs
01:52PM EST – Been utilizing the Intuition since 2020
01:53PM EST – And Meta is sort of enthusiastic about MI300
01:53PM EST – Increasing their partnership to incorporate Intuition in Fb’s datacenters
01:53PM EST – MI300X is one in every of their quickest design-to-deploy tasks
01:54PM EST – And Meta is happy with the optimizations carried out for ROCm
01:55PM EST – (All of those visitors are right here for a motive: AMD desires to demonstate that their platform is prepared. That prospects are utilizing it right now and are having success with it)
01:55PM EST – Now one other visitor: Dell
01:56PM EST – Arthur Lewer, President of Core Enterprise Operations for the World Infrastrucutre Options Group
01:56PM EST – (Shopping for NVIDIA is the secure wager; AMD desires to display that purchasing AMD is not an unsafe wager)
01:57PM EST – Prospects want a greater answer than right now’s ecosystem
01:58PM EST – Dell is asserting an replace to the Poweredge 9680 servers. Now providing them with MI300X accelerators
01:58PM EST – As much as 8 accelerators in a field
01:58PM EST – Serving to prospects consolidate LLM coaching to fewer containers
01:59PM EST – Able to quote and taking orders right now
02:01PM EST – And that is Dell
02:02PM EST – And here is one other visitor: Supermicro (we have now pivoted from cloud to enterprise)
02:02PM EST – Charles Liang, Founder, President, and CEO of Supermicro
02:03PM EST – Supermicro is an important AMD server accomplice
02:05PM EST – What does Supermicro have deliberate for MI300X?
02:05PM EST – 8U air cooled system, and 4U system with liquid cooling
02:05PM EST – As much as 100kW racks of the latter
02:05PM EST – And that is Supermicro
02:06PM EST – And one other visitor: Lenovo
02:06PM EST – Kirk Skaugen, President of Lenovo’s Infrastructure Options Group
02:07PM EST – Lenovo believes that genAI might be a hybrid strategy
02:07PM EST – And AI might be wanted on the edge
02:08PM EST – 70 AI-ready server and infrastructure merchandise
02:09PM EST – Lenovo additionally has an AI innovators program for key verticals for simplifying issues for purchasers
02:10PM EST – Lenovo thinks inference would be the dominate AI workload. Coaching solely must occur as soon as; inference occurs on a regular basis
02:11PM EST – Lenovo is deliver MI300X to their ThinkSystem platform
02:11PM EST – And out there as a service
02:12PM EST – And that is Lenovo
02:13PM EST – And that is nonetheless simply the tip of the iceberg for the variety of companions AMD has lined up for Mi300X
02:13PM EST – And now again to AMD with Forrest Norrod to speak about networking
02:14PM EST – The compute required to coach essentially the most superior fashions has elevated by leaps and bounds during the last decade
02:14PM EST – Main AI clusters are tens-of-thousands of GPUs, and that can solely enhance
02:14PM EST – So AMD has labored to scale issues up on a number of fronts
02:14PM EST – Internally with Infinity Cloth
02:15PM EST – Close to-linear scaling efficiency as you enhance the variety of GPUs
02:15PM EST – AMD is extending entry to Infinity Cloth to innovators and strategic companions throughout the {industry}
02:15PM EST – We’ll hear extra about this initiative subsequent 12 months
02:16PM EST – In the meantime the back-end community connecting the servers collectively is simply as essential
02:16PM EST – And AMD believes that community must be open
02:17PM EST – And AMD is backing Ethernet (versus InfiniBand)
02:17PM EST – And Ethernet is open
02:18PM EST – Now coming to the stage are a number of netowrking leaders, together with Arista, Broadcom, and Cisco
02:19PM EST – Having a panel dialogue on Ethernet
02:21PM EST – What are some great benefits of Ethernet for AI?
02:22PM EST – Majority of hyperscalers are utilizing Ethernet or have a excessive want to
02:23PM EST – The NIC is essential. Individuals need decisions
02:24PM EST – “We have to proceed to innovate”
02:24PM EST – AI networks should be open requirements primarily based. Prospects want decisions
02:25PM EST – Extremely Ethernet is a essential subsequent step
02:26PM EST – https://www.anandtech.com/show/18965/ultra-ethernet-consortium-to-adapt-ethernet-for-ai-and-hpc-needs
02:28PM EST – UEC is fixing an important technical downside of contemporary RDMA at scale
02:28PM EST – And that is the networking panel
02:28PM EST – Now on to high-performance computing (HPC)
02:29PM EST – Recapping AMD’s expertise to this point, together with the newest MI250X
02:29PM EST – MI250X + EPYC had a coherent reminiscence area, however nonetheless the GPU and CPU separated by a considerably sluggish hyperlink
02:29PM EST – However now MI300A is right here with a unified reminiscence system
02:29PM EST – Quantity manufacturing started earlier this quarter
02:30PM EST – MI300 structure, however with 3 Zen 4 CCDs layered on prime of among the IODs
02:31PM EST – 128GB of HBM3 reminiscence, 4 IODs, 6 XCDs, 3 CCDs
02:31PM EST – And actually unified reminiscence, as each GPU and CPU tiles undergo the shared IODs
02:32PM EST – Efficiency comparisons with H100
02:32PM EST – 1.8x the FP64 and FP32 (vector?) efficiency
02:33PM EST – 4x performnace on OpenFOAM with MI300A versus H100
02:33PM EST – Many of the enchancment comes from unified reminiscence, avoiding having to repeat round reminiscence earlier than it may be used
02:34PM EST – 2x the perf-per-watt than Grace Hopper (unclear by what metric)
02:35PM EST – MI300A might be within the El Capitan supercomputer. Over 2 EFLOPS of FP64 compute
02:35PM EST – Now rolling a video from HPE and the Lawrence Livermore Nationwide Lab
02:35PM EST – “El Capitan would be the most succesful AI machine”
02:36PM EST – El Capitan might be 16x quicker than LLNL’s present supercomputer
02:37PM EST – And now one other visitor on stage: HPE
02:37PM EST – Trish Damkroger, SVP and Chief Product Officer
02:38PM EST – Frontier was nice. El Capitan might be even higher
02:39PM EST – AMD and HPE energy a lot of essentially the most energy environment friendly supercomputers
02:40PM EST – (Poor Forrest is a bit tongue tied)
02:40PM EST – ElCap can have MI300A nodes with SlingShot cloth
02:41PM EST – Probably the most succesful AI techniques on the planet
02:41PM EST – Supercomputing is the muse wanted to run AI
02:42PM EST – And that is HPE
02:43PM EST – MI300A: A brand new stage of high-performance management
02:43PM EST – MI300A techniques avaialble quickly from companions world wide
02:43PM EST – (So it appears like MI300A is trailing MI300X by a bit)
02:43PM EST – Now again to Lisa
02:44PM EST – To cap off the day: Advancing AI PCs
02:44PM EST – AMD began together with NPUs this 12 months with the Ryzen Cell 7000 collection. The primary x86 firm to take action
02:44PM EST – Utilizing AMD’s XDNA structure
02:45PM EST – A big computing array that’s extraordinarily performant and environment friendly
02:45PM EST – Shipped tens of millions of NPU-enabled PCs this 12 months
02:46PM EST – Exhibiting off among the software program functions on the market that provide AI acceleration
02:46PM EST – Adobe, Home windows studio results, and so forth
02:46PM EST – Asserting Ryzen AI 1.0 software program for builders
02:46PM EST – So AMD’s software program SDK is lastly out there
02:47PM EST – Deploy tained and quantized fashions utilizing ONNX
02:47PM EST – Asserting Ryzen Cell 8040 collection processors
02:47PM EST – Hawk Level
02:47PM EST – That is (nonetheless) the Phoenix die
02:48PM EST – With one wrinkle: quicker AI efficiency because of a better clocked NPU
02:48PM EST – AMD’s personal perf benchmarks present 1.4x over 7040 collection
02:48PM EST – Now time for an additional visitor: Microsoft
02:49PM EST – Pavan Davuluri, CVP for Home windows and Gadgets
02:49PM EST – Speaking concerning the work AMD and MS are doing collectively for shopper AI
02:50PM EST – Microsoft’s marquee challenge is Copilot
02:52PM EST – MS desires to have the ability to load-shift between the cloud and the shopper. Seamless computing between the 2
02:52PM EST – Exhibiting AMD’s NPU roadmap
02:53PM EST – Subsequent-gen Strix Level processors within the works. Utilizing a brand new NPU primarily based on XDNA 2
02:53PM EST – Launching in 2024
02:53PM EST – XDNA 2 designed for “management” AI efficiency
02:53PM EST – AMD has silicon. So does MS
02:54PM EST – Greater than 3x the genAI perf (versus Hawk Level?)
02:55PM EST – And that is AI on the PC
02:55PM EST – Now recapping right now’s bulletins
02:55PM EST – MI300X, delivery right now. MI300A, in quantity manufacturing
02:55PM EST – Ryzen Cell 8040 Collection, delivery now
02:56PM EST – “As we speak is an extremely proud second for AMD”
02:57PM EST – And that is it for Lisa, and for right now’s presentation
02:58PM EST – Thanks for becoming a member of us, and you’ll want to take a look at our expanded protection of AMD’s bulletins
02:58PM EST – https://www.anandtech.com/show/21177/amd-unveils-ryzen-8040-mobile-series-apus-hawk-point-with-zen-4-and-ryzen-ai
02:58PM EST – https://www.anandtech.com/show/21178/amd-widens-availability-of-ryzen-ai-software-for-developers-xdna-2-coming-with-strix-point-in-2024