AV1 Video Encode at 1W Per Stream
AMD this morning is launching a brand new devoted media accelerator and video encode card for information facilities – and the primary to be launched beneath the AMD model – the Alveo MA35D. The cardboard is a successor to an earlier line of Xilinx playing cards that AMD picked up as a part of their Xilinx acquisition, vaulting them into the marketplace for devoted video encode playing cards. The most recent era Alveo media accelerator card, in flip, guarantees vital efficiency advantages over its predecessor, quadrupling the utmost variety of simultaneous video streams whereas additionally including AV1 and 8K decision encode assist.
Like its predecessor, the Alveo U30, the MA35D is a pure video encode card designed for information facilities. That’s to say that its ASICs are designed solely for real-time/interactive video encoding, with Xilinx seeking to do one factor and do it very nicely. This design technique is in notable distinction to competing merchandise from Intel (GPU Flex Sequence) and NVIDIA (T4 & L4), that are GPU-based merchandise and leverage the flexibleness of their GPUs together with their built-in video encoders with a purpose to perform as video encode playing cards, gaming playing cards, or different roles assigned to them. The MA35D, by comparability, is a comparatively simple product that’s designed to extra optimally and effectively do video encoding by specializing in simply that.
As it is a product line inherited by AMD as a part of their Xilinx acquisition and developed by the ensuing Adaptive & Imbedded Computing Group, the Alveo MA35D is each new for AMD and acquainted on the similar. Earlier information heart video encode merchandise launched by AMD have been based mostly on their GPU lineup, so whereas that is the most recent such video encode card for the ex-Xilinx group, that is the primary time AMD correct has launched a devoted video encode card on this style – and making it a first-rate instance of the type of new market alternatives AMD was searching for in buying Xilinx.
The goal marketplace for the cardboard is, like its predecessor, the information heart market. AMD’s precept shoppers are reside streaming companies and different interactive video companies (assume Twitch, cloud gaming, video conferencing, and so forth), all of whom must encode giant numbers of video streams in real-time in a server setting. So like AMD’s EPYC processors, that is very a lot a server half geared toward a choose group of companies.
Diving into the Alveo MA35D {hardware} itself, AMD is touting a major generational improve over its predecessor. Whereas the Alveo U30 was an H.264 and H.265 encode card that might encode as much as 8 1080p streams, the Alveo MA35D expands this considerably to 32 1080p streams. In the meantime, assist for the latest-generation AV1 codec has been added – becoming a member of the present H.264 and H.265 choices – and the utmost stream decision has been elevated from 4K to 8K – itself one other quadrupling.
On the coronary heart of the cardboard is AMD’s unnamed video encode ASIC, which they’re calling their Video Processing Unit (VPU). The MA35D accommodates two VPUs, every with their very own 8GB pool of LPDDR5 reminiscence and a PCIe 5.0 x4 connection again to the host processor. The VPU is being constructed on a 5nm course of, by surprisingly AMD isn’t disclosing the fab getting used, which makes us assume it’s a Samsung 5nm course of (ed: at this level, if somebody is utilizing TSMC, they’re normally bragging about it).
Below the hood, every VPU accommodates 4 video encode blocks, augmented with the assorted accent blocks wanted to make it a totally purposeful chip. Two of the encode blocks are full-featured, supporting H.264, H.265, and AV1, whereas the opposite two blocks are solely for AV1 – underscoring the extra computational complexity of the brand new codec. Different blocks on the VPU embody video decoder blocks for transcoding, reminiscence controllers, administration controllers, a bitrate scaler, composition engines, and a 22 TOPS throughput AI processor to additional enhance the cardboard’s video encode high quality.
With the video encode blocks themselves, AMD’s engineers have been fast to notice that, regardless of the overlapping similarities between this half and AMD’s GPU efforts, the VPU’s video encode blocks are a novel design, and never pulled from AMD’s GPU video encode blocks. Whereas I wouldn’t be stunned to see AMD ultimately merge encoder IP throughout the product strains, for the present era product the Alveo MA35D’s VPUs have been in growth earlier than the Xilinx acquisition ever closed, so the previous Xilinx group completed what they began. Which means that the VPUs are certain to come back with their very own set of quirks, but in addition, there’s a sure diploma of delight from the Alveo group that they’ve constructed the higher video encoder.
The VPU additionally marks the transition of the Alveo video encoder household to a totally ASIC-based product. Xilinx, after all, is finest identified for his or her programmable FPGAs, and whereas the earlier Alveo U30’s processors used onerous logic for his or her video encode blocks, that was mixed with a FPGA cloth community. In order that product was nonetheless a mixture of ASIC and FPGA design. MA35D’s VPUs, alternatively, are tried and true ASICs with no FPGA components, permitting the corporate to completely exploit the ability effectivity advantages of utilizing mounted perform logic for a devoted product.
And power effectivity is the opposite main acquire over the older U30 card – and what AMD considers a major edge over their competitors, as nicely. The formal TDP of the cardboard is 50 Watts, however in observe AMD is discovering that the standard energy consumption of the cardboard is nearer to about 35 Watts, or a hair over 1W per stream for 1080p60. This a 66% discount in per-stream power consumption versus the U30, which was at a bit over 3W for a single 1080p stream.
In the meantime, new to the Alveo MA35D and its VPU is an AI acceleration block. Not like GPU-based merchandise, this isn’t for quasi-related AI duties like picture recognition; moderately AMD is utilizing the AI accelerator to feed extra information into their video encoder to additional enhance their encoding high quality. Rated for 22 TOPS of efficiency, the AI processor exists to guage streams on a frame-by-frame foundation, after which use that evaluation to regulate the encode parameters utilized by the remainder of the chip.
Utilizing each region-of-interest encoding and artifact detection, the AI processor primarily permits the MA35D to get away with decrease bitrates than a extra naïve video encode technique. Area-of-interest encoding permits for parts of a video to obtain greater high quality encoding (textual content, faces, and so forth), whereas artifact detection can catch when the encoder is being fed blocky or in any other case degraded pictures – which are literally more durable to encode – and eradicating/correcting them earlier than a body is shipped off for encoding.
All informed, AMD is making some pretty aggressive picture high quality claims with the Alveo MA35D; H.264 and H.265 picture high quality ought to be much like x264 Medium and x265 Medium presets respectively, whereas the cardboard’s AV1 encoding high quality ought to be similar to x265 sluggish. These comparisons are based mostly on VMAF scores, and what settings it takes to realize related scores. Or to border issues in a bitrate foundation, utilizing AV1 AMD says the MA35D can ship the identical picture high quality because the Alveo U30 in H.264 mode at 55% of the bitrate (a 1.8x effectivity enchancment).
Lastly, though secondary to the video encode capabilities of the MA35D, it’s fascinating to notice that the administration processors within the VPU have shifted from Arm to RISC-V. Whereas the U30’s processors used quad core Cortex-A53 cores, the MA35D VPU makes use of a pair of quad core RISC-V cores – although AMD doesn’t specify whose. The RISC-V structure has been quietly pushing out Arm for administration controllers corresponding to these, and that is one other instance of that transition in motion.
With two VPUs, the whole Alveo MA35D card continues to be sufficiently small that it is available in a single slot half-height half-length type issue. In the meantime a 50W TDP signifies that the cardboard is fully powered through the PCIe slot, hooked up through a PCIe x8 connector (which will get bifurcated right down to x4 for every VPU). And, as is typical for information heart accelerator playing cards, the MA35D is passively cooled.
In response to AMD, the Alveo is sampling to companions now. The corporate expects to start manufacturing shipments within the third quarter of the 12 months, with a steered retail value of $1595.