The AMD Advancing AI & Intuition MI300 Launch Reside Weblog (Begins at 10am PT/18:00 UTC)


This morning is a crucial one for AMD – maybe a very powerful of the 12 months. After virtually a 12 months and a half of build-up, and even longer for precise improvement, AMD is launching their subsequent era GPU/APU/AI accelerator household, the Intuition MI300 collection. Primarily based on AMD’s new CDNA 3 structure, and mixing it with AMD’s confirmed Zen 4 cores, AMD might be making a full-court press for the high-end GPU and accelerator market with their new product, aiming to steer in each big-metal HPC in addition to the burgeoning marketplace for generative AI coaching and inference.

Taking the stage for AMD’s launch occasion might be AMD CEO Dr. LIsa Su, in addition to a quite a few AMD executives and ecosystem companions, to element, finally, AMD’s newest era GPU structure, and the numerous types it can are available. With each the MI300X accelerator and MI300A APU, AMD is aiming to cowl many of the accelerator market, whether or not shoppers simply want a robust GPU or a tightly-coupled GPU/CPU pairing.

The stakes for right now’s announcement are important. The marketplace for generative AI is all however {hardware} constrained in the intervening time, a lot to the good thing about (and earnings for) AMD’s rival NVIDIA. So AMD is hoping to capitalize on this second to chop off a chunk – maybe a really large piece – of the marketplace for generative AI accelerators. AMD has made breaking into the server area their highest precedence during the last half-decade, and now, they consider, is their time to take a giant piece of the server GPU market.


12:56PM EST – We’re right here in San Jose for AMD’s remaining and most necessary launch occasion of the 12 months: Advancing AI

12:57PM EST – As we speak AMD is making the eagerly anticipated launch of their next-generation MI300 collection of accelerators

12:58PM EST – Together with MI300A, their first chiplet-based server APU, and MI300X, their stab on the strongest GPU/accelerator doable for the AI market

12:59PM EST – I would say the occasion is being held in AMD’s yard, however since AMD bought their campus right here within the bay space a number of years in the past, that is extra like NVIDIA’s yard. Which is becoming, on condition that AMD is trying to seize a chunk of the extremely worthwhile Generative AI market from NVIDIA

12:59PM EST – We’re supposed to begin at 10am native time right here – so in one other minute or so

12:59PM EST – And hey, right here we go. Proper on time

01:00PM EST – Beginning with a gap trailer

01:00PM EST – (And becoming a member of me on this morning’s reside weblog is the always-awesome Gavin Bonshor)

01:00PM EST – Advancing AI… collectively

01:01PM EST – And here is AMD’s CEO, Dr. Lisa Su

01:01PM EST – As we speak “is all about AI”

01:01PM EST – And Lisa is diving proper in

01:02PM EST – It is solely been only a bit over a 12 months since ChatGPT was launched. And it is turned the computing {industry} on its head reasonably rapidly

01:02PM EST – AMD views AI as the one most transformative expertise within the final 50 years

01:02PM EST – And with a reasonably fast adoption charge, regardless of being on the very starting of the AI period

01:02PM EST – Lisa’s itemizing off among the use instances for AI

01:03PM EST – And the important thing to it? Generative AI. Which requires important investments in infrastructure

01:03PM EST – (Which NVIDIA has captured the lion’s share of to this point)

01:03PM EST – In 2023 AMD projected the CAGR for the AI market can be $350B by 2027

01:04PM EST – Now they assume it’ll be $400B+ by 2027

01:04PM EST – A better than 70% compound annual development charge

01:04PM EST – AMD’s AI technique is centered round 3 large strategic priorities

01:05PM EST – A broad {hardware} portfolio, an open and confirmed software program ecosystem, and partnerships to co-innovate with

01:05PM EST – (AMD has traditionally struggled with software program specifically)

01:05PM EST – Now to merchandise, beginning with the cloud

01:06PM EST – Generative AI requires tens of 1000’s of accelerators on the high-end

01:06PM EST – The extra compute, the higher the mannequin, the quicker the solutions

01:06PM EST – Launching right now: AMD Intuition MI300X accelerator

01:06PM EST – “Highest efficiency accelerator on the planet for generative AI”

01:07PM EST – CDNA 3 comes wiht a brand new compute engine, sparsity help, industry-leading reminiscence bandwidth and capability, and so forth

01:07PM EST – 3.4x extra perf for BF16, 6.8x INT8 perf, 1.6x reminiscence bandwidth

01:07PM EST – 153B transistors for MI300X

01:08PM EST – A dozen 5nm/6nm chiplets

01:08PM EST – 4 I/O Dies within the base layer

01:08PM EST – 256MB AMD Infinity Cache, Infinity Cloth Help, and so forth

01:08PM EST – 8 XCD compute dies stacked on prime

01:08PM EST – 304 CDNA 3 compute items

01:08PM EST – Wired to the IODs by way of TSVs

01:09PM EST – And eight stacks of HBM3 connected to the IODs, for 192GB of reminiscence, 5.3 TB/second of bandwidth

01:09PM EST – And instantly leaping to the H100 comparisons

01:10PM EST – AMD has the benefit in reminiscence capability and bandwidth because of having extra HBM stacks. They usually assume that is going to assist carry them to victory over H100

01:10PM EST – AMD finds they’ve the efficiency benefit in FlashAttention-2 and Llama 2 70B. On the kernel stage in TFLOPS

01:11PM EST – And the way does MI300X scale?

01:11PM EST – Evaluating a single 8 accelerator server

01:12PM EST – Bloom 176B (throughput) and Llama 2 70B (latency) inference efficiency.

01:12PM EST – And now AMD’s first visitor of many, Microsoft

01:13PM EST – MS CTO, Kevin Scott

01:14PM EST – Lisa is asking Kevin for his ideas on the place the {industry} is on this AI journey

01:15PM EST – Microsoft and AMD have been constructing the muse for a number of years right here

01:16PM EST – And MS might be providing MI300X Azure cases

01:16PM EST – MI300X VMs can be found in preview right now

01:17PM EST – (So MS apparently already has a significant quanity of the accelerators)

01:17PM EST – And that is MS. Again to Lisa

01:17PM EST – Now speaking concerning the Intuition platform

01:18PM EST – Which relies on an OCP (OAM) {hardware} design

01:18PM EST – (No fancy title for the platform, in contrast to HGX)

01:18PM EST – So here is an entire 8-way MI300X board

01:18PM EST – Could be dropped into virtually any OCP-compliant design

01:19PM EST – Making it straightforward to put in MI300X

01:19PM EST – And making some extent that AMD helps all the identical I/O and networking capabilities of the competitors (however with higher GPUs and reminiscence, in fact)

01:20PM EST – Prospects are attempting to maximise not simply area, however capital expedetures and operational expedetures as effectively

01:20PM EST – On the OpEx aspect, extra reminiscence means having the ability to run both extra fashions or larger fashions

01:21PM EST – Which saves on CapEx bills by shopping for fewer {hardware} items total

01:21PM EST – And now for the subsequent accomplice, Oracle. Karan Batta, the SVP of Oracle Cloud Infrastructure

01:22PM EST – Oracle is one in every of AMD’s main cloud computing prospects

01:23PM EST – Oracle might be supporting MI300X as a part of their naked metallic compute choices

01:23PM EST – And MI300X in a generative AI service that’s within the works

01:24PM EST – Now on stage: AMD President Victor Peng to speak about software program progress

01:25PM EST – AMD’s software program stack is historically been their achilles heel, regardless of efforts to enhance it. Peng’s large challenge has been to lastly get issues so as

01:25PM EST – Together with constructing a unified AI software program stack

01:25PM EST – As we speak’s focus is on ROCm, AMD’s GPU software program stack

01:26PM EST – AMD has firmly connected their horse to open supply, which they take into account an enormous profit

01:26PM EST – Bettering ROCm help for Radeon GPUs continues

01:26PM EST – ROMc 6 delivery later this month

01:27PM EST – It has been optimized for generative AI, for MI300 and different {hardware}

01:27PM EST – “ROCm 6 delivers a quantum leap in efficiency and functionality”

01:28PM EST – Software program perf optimization instance with LLMs

01:28PM EST – 2.6x from optimized libraries, 1.4x from HIP Graph, and so forth

01:28PM EST – This, mixed with {hardware} adjustments, is how AMD is delivering 8x extra GenAI perf on MI300X versus MI250 (with ROCm 5)

01:29PM EST – Recapping latest acquisitions as effectively, such because the nod AI compiler

01:30PM EST – And on the ecosystem stage, AMD has an growing variety of companions

01:30PM EST – Hugging Face arguably being a very powerful, with 62K+ fashions up and operating on AMD {hardware}

01:31PM EST – AMD GPUs might be supported within the OpenAI Triton 3.0 launch

01:32PM EST – Now for extra visitors: Databricks, Important AI, and Lamini

01:33PM EST – The 4 of them are having a brief chat concerning the AI world and their expertise with AMD

01:34PM EST – Speaking concerning the improvement of main instruments reminiscent of vLLM

01:34PM EST – Price is a large driver

01:36PM EST – It was very straightforward to incluide ROCm in Databricks’ stack

01:36PM EST – In the meantime Important AI is taking a full stack strategy

01:37PM EST – The benefit of use of AMD’s software program was “very nice”

01:38PM EST – And at last, Lamini’s CEO, who has a PhD in Generative AI

01:39PM EST – Prospects get to totally personal their fashions

01:39PM EST – Imbuing LLNs with actual data

01:39PM EST – Had an AMD cloud in manufacturing for over the previous 12 months on MI210s/MI250s

01:40PM EST – Lamini has reached software program parity with CUDA

01:41PM EST – Most of the genAI instruments out there right now are open supply

01:41PM EST – Lots of them can run on ROCm right now

01:43PM EST – AMD’s Intuition merchandise are essential to supporting the way forward for enterprise software program

01:46PM EST – And that is the mini-roundtable

01:47PM EST – Summing up the final 6 months of labor on software program

01:47PM EST – ROCm 6 delivery quickly

01:47PM EST – 62K fashions operating right now, and extra coming quickly

01:48PM EST – And that is a wrap for Victor Peng. Again to Lisa Su

01:49PM EST – And now for an additional visitor spot: Meta

01:49PM EST – Ajit Mathews, Sr. Director of Engineering at Meta AI

01:50PM EST – Meta opened entry to the Llama 2 mannequin household in July

01:50PM EST – “An open strategy results in higher and safer expertise within the long-run”

01:51PM EST – Meta has been working with EPYC CPUs since 2019. And just lately deployed Genoa at scale

01:51PM EST – However that partnership is far broader than CPUs

01:52PM EST – Been utilizing the Intuition since 2020

01:53PM EST – And Meta is sort of enthusiastic about MI300

01:53PM EST – Increasing their partnership to incorporate Intuition in Fb’s datacenters

01:53PM EST – MI300X is one in every of their quickest design-to-deploy tasks

01:54PM EST – And Meta is happy with the optimizations carried out for ROCm

01:55PM EST – (All of those visitors are right here for a motive: AMD desires to demonstate that their platform is prepared. That prospects are utilizing it right now and are having success with it)

01:55PM EST – Now one other visitor: Dell

01:56PM EST – Arthur Lewer, President of Core Enterprise Operations for the World Infrastrucutre Options Group

01:56PM EST – (Shopping for NVIDIA is the secure wager; AMD desires to display that purchasing AMD is not an unsafe wager)

01:57PM EST – Prospects want a greater answer than right now’s ecosystem

01:58PM EST – Dell is asserting an replace to the Poweredge 9680 servers. Now providing them with MI300X accelerators

01:58PM EST – As much as 8 accelerators in a field

01:58PM EST – Serving to prospects consolidate LLM coaching to fewer containers

01:59PM EST – Able to quote and taking orders right now

02:01PM EST – And that is Dell

02:02PM EST – And here is one other visitor: Supermicro (we have now pivoted from cloud to enterprise)

02:02PM EST – Charles Liang, Founder, President, and CEO of Supermicro

02:03PM EST – Supermicro is an important AMD server accomplice

02:05PM EST – What does Supermicro have deliberate for MI300X?

02:05PM EST – 8U air cooled system, and 4U system with liquid cooling

02:05PM EST – As much as 100kW racks of the latter

02:05PM EST – And that is Supermicro

02:06PM EST – And one other visitor: Lenovo

02:06PM EST – Kirk Skaugen, President of Lenovo’s Infrastructure Options Group

02:07PM EST – Lenovo believes that genAI might be a hybrid strategy

02:07PM EST – And AI might be wanted on the edge

02:08PM EST – 70 AI-ready server and infrastructure merchandise

02:09PM EST – Lenovo additionally has an AI innovators program for key verticals for simplifying issues for purchasers

02:10PM EST – Lenovo thinks inference would be the dominate AI workload. Coaching solely must occur as soon as; inference occurs on a regular basis

02:11PM EST – Lenovo is deliver MI300X to their ThinkSystem platform

02:11PM EST – And out there as a service

02:12PM EST – And that is Lenovo

02:13PM EST – And that is nonetheless simply the tip of the iceberg for the variety of companions AMD has lined up for Mi300X

02:13PM EST – And now again to AMD with Forrest Norrod to speak about networking

02:14PM EST – The compute required to coach essentially the most superior fashions has elevated by leaps and bounds during the last decade

02:14PM EST – Main AI clusters are tens-of-thousands of GPUs, and that can solely enhance

02:14PM EST – So AMD has labored to scale issues up on a number of fronts

02:14PM EST – Internally with Infinity Cloth

02:15PM EST – Close to-linear scaling efficiency as you enhance the variety of GPUs

02:15PM EST – AMD is extending entry to Infinity Cloth to innovators and strategic companions throughout the {industry}

02:15PM EST – We’ll hear extra about this initiative subsequent 12 months

02:16PM EST – In the meantime the back-end community connecting the servers collectively is simply as essential

02:16PM EST – And AMD believes that community must be open

02:17PM EST – And AMD is backing Ethernet (versus InfiniBand)

02:17PM EST – And Ethernet is open

02:18PM EST – Now coming to the stage are a number of netowrking leaders, together with Arista, Broadcom, and Cisco

02:19PM EST – Having a panel dialogue on Ethernet

02:21PM EST – What are some great benefits of Ethernet for AI?

02:22PM EST – Majority of hyperscalers are utilizing Ethernet or have a excessive want to

02:23PM EST – The NIC is essential. Individuals need decisions

02:24PM EST – “We have to proceed to innovate”

02:24PM EST – AI networks should be open requirements primarily based. Prospects want decisions

02:25PM EST – Extremely Ethernet is a essential subsequent step

02:26PM ESThttps://www.anandtech.com/show/18965/ultra-ethernet-consortium-to-adapt-ethernet-for-ai-and-hpc-needs

02:28PM EST – UEC is fixing an important technical downside of contemporary RDMA at scale

02:28PM EST – And that is the networking panel

02:28PM EST – Now on to high-performance computing (HPC)

02:29PM EST – Recapping AMD’s expertise to this point, together with the newest MI250X

02:29PM EST – MI250X + EPYC had a coherent reminiscence area, however nonetheless the GPU and CPU separated by a considerably sluggish hyperlink

02:29PM EST – However now MI300A is right here with a unified reminiscence system

02:29PM EST – Quantity manufacturing started earlier this quarter

02:30PM EST – MI300 structure, however with 3 Zen 4 CCDs layered on prime of among the IODs

02:31PM EST – 128GB of HBM3 reminiscence, 4 IODs, 6 XCDs, 3 CCDs

02:31PM EST – And actually unified reminiscence, as each GPU and CPU tiles undergo the shared IODs

02:32PM EST – Efficiency comparisons with H100

02:32PM EST – 1.8x the FP64 and FP32 (vector?) efficiency

02:33PM EST – 4x performnace on OpenFOAM with MI300A versus H100

02:33PM EST – Many of the enchancment comes from unified reminiscence, avoiding having to repeat round reminiscence earlier than it may be used

02:34PM EST – 2x the perf-per-watt than Grace Hopper (unclear by what metric)

02:35PM EST – MI300A might be within the El Capitan supercomputer. Over 2 EFLOPS of FP64 compute

02:35PM EST – Now rolling a video from HPE and the Lawrence Livermore Nationwide Lab

02:35PM EST – “El Capitan would be the most succesful AI machine”

02:36PM EST – El Capitan might be 16x quicker than LLNL’s present supercomputer

02:37PM EST – And now one other visitor on stage: HPE

02:37PM EST – Trish Damkroger, SVP and Chief Product Officer

02:38PM EST – Frontier was nice. El Capitan might be even higher

02:39PM EST – AMD and HPE energy a lot of essentially the most energy environment friendly supercomputers

02:40PM EST – (Poor Forrest is a bit tongue tied)

02:40PM EST – ElCap can have MI300A nodes with SlingShot cloth

02:41PM EST – Probably the most succesful AI techniques on the planet

02:41PM EST – Supercomputing is the muse wanted to run AI

02:42PM EST – And that is HPE

02:43PM EST – MI300A: A brand new stage of high-performance management

02:43PM EST – MI300A techniques avaialble quickly from companions world wide

02:43PM EST – (So it appears like MI300A is trailing MI300X by a bit)

02:43PM EST – Now again to Lisa

02:44PM EST – To cap off the day: Advancing AI PCs

02:44PM EST – AMD began together with NPUs this 12 months with the Ryzen Cell 7000 collection. The primary x86 firm to take action

02:44PM EST – Utilizing AMD’s XDNA structure

02:45PM EST – A big computing array that’s extraordinarily performant and environment friendly

02:45PM EST – Shipped tens of millions of NPU-enabled PCs this 12 months

02:46PM EST – Exhibiting off among the software program functions on the market that provide AI acceleration

02:46PM EST – Adobe, Home windows studio results, and so forth

02:46PM EST – Asserting Ryzen AI 1.0 software program for builders

02:46PM EST – So AMD’s software program SDK is lastly out there

02:47PM EST – Deploy tained and quantized fashions utilizing ONNX

02:47PM EST – Asserting Ryzen Cell 8040 collection processors

02:47PM EST – Hawk Level

02:47PM EST – That is (nonetheless) the Phoenix die

02:48PM EST – With one wrinkle: quicker AI efficiency because of a better clocked NPU

02:48PM EST – AMD’s personal perf benchmarks present 1.4x over 7040 collection

02:48PM EST – Now time for an additional visitor: Microsoft

02:49PM EST – Pavan Davuluri, CVP for Home windows and Gadgets

02:49PM EST – Speaking concerning the work AMD and MS are doing collectively for shopper AI

02:50PM EST – Microsoft’s marquee challenge is Copilot

02:52PM EST – MS desires to have the ability to load-shift between the cloud and the shopper. Seamless computing between the 2

02:52PM EST – Exhibiting AMD’s NPU roadmap

02:53PM EST – Subsequent-gen Strix Level processors within the works. Utilizing a brand new NPU primarily based on XDNA 2

02:53PM EST – Launching in 2024

02:53PM EST – XDNA 2 designed for “management” AI efficiency

02:53PM EST – AMD has silicon. So does MS

02:54PM EST – Greater than 3x the genAI perf (versus Hawk Level?)

02:55PM EST – And that is AI on the PC

02:55PM EST – Now recapping right now’s bulletins

02:55PM EST – MI300X, delivery right now. MI300A, in quantity manufacturing

02:55PM EST – Ryzen Cell 8040 Collection, delivery now

02:56PM EST – “As we speak is an extremely proud second for AMD”

02:57PM EST – And that is it for Lisa, and for right now’s presentation

02:58PM EST – Thanks for becoming a member of us, and you’ll want to take a look at our expanded protection of AMD’s bulletins

02:58PM ESThttps://www.anandtech.com/show/21177/amd-unveils-ryzen-8040-mobile-series-apus-hawk-point-with-zen-4-and-ryzen-ai

02:58PM ESThttps://www.anandtech.com/show/21178/amd-widens-availability-of-ryzen-ai-software-for-developers-xdna-2-coming-with-strix-point-in-2024

Leave a Reply

Your email address will not be published. Required fields are marked *