From innovation to impression: How AWS and NVIDIA allow real-world generative AI success

As we collect for NVIDIA GTC, organizations of all sizes are at a pivotal second of their AI journey. The query is not whether or not to undertake generative AI, however easy methods to transfer from promising pilots to production-ready methods that ship actual enterprise worth. The organizations that determine this out first could have a big aggressive benefit—and we’re already seeing compelling examples of what’s potential.
Contemplate Hippocratic AI’s work to develop AI-powered scientific assistants to assist healthcare groups as medical doctors, nurses, and different clinicians face unprecedented ranges of burnout. Throughout a latest hurricane in Florida, their system referred to as 100,000 sufferers in a day to examine on medicines and supply preventative healthcare steerage–the sort of coordinated outreach that may be almost unimaginable to realize manually. They aren’t simply constructing one other chatbot; they’re reimagining healthcare supply at scale.
Manufacturing-ready AI like this requires extra than simply cutting-edge fashions or highly effective GPUs. In my decade working with clients’ information journeys, I’ve seen that a company’s Most worthy asset is its domain-specific information and experience. And now main our information and AI go-to-market, I hear clients persistently emphasize what they should rework their area benefit into AI success: infrastructure and providers they will belief—with efficiency, cost-efficiency, safety, and adaptability—all delivered at scale. When the stakes are excessive, success requires not simply cutting-edge expertise, however the capability to operationalize it at scale—a problem that AWS has persistently solved for patrons. Because the world’s most complete and broadly adopted cloud, our partnership with NVIDIA’s pioneering accelerated computing platform for generative AI amplifies this functionality. It’s inspiring to see how, collectively, we’re enabling clients throughout industries to confidently transfer AI into manufacturing.
On this publish, I’ll share a few of these clients’ outstanding journeys, providing sensible insights for any group seeking to harness the facility of generative AI.
Remodeling content material creation with generative AI
Content material creation represents probably the most seen and instant functions of generative AI at the moment. Adobe, a pioneer that has formed artistic workflows for over 4 a long time, has moved with outstanding velocity to combine generative AI throughout its flagship merchandise, serving to tens of millions of creators work in completely new methods.
Adobe’s strategy to generative AI infrastructure exemplifies what their VP of Generative AI, Alexandru Costin, calls an “AI superhighway”—a complicated technical basis that allows speedy iteration of AI fashions and seamless integration into their artistic functions. The success of their Firefly household of generative AI fashions, built-in throughout flagship merchandise like Photoshop, demonstrates the facility of this strategy. For his or her AI coaching and inference workloads, Adobe makes use of NVIDIA GPU-accelerated Amazon Elastic Compute Cloud (Amazon EC2) P5en (NVIDIA H200 GPUs), P5 (NVIDIA H100 GPUs), P4de (NVIDIA A100 GPUs), and G5 (NVIDIA A10G GPUs) cases. In addition they use NVIDIA software program corresponding to NVIDIA TensorRT and NVIDIA Triton Inference Server for quicker, scalable inference. Adobe wanted most flexibility to construct their AI infrastructure, and AWS offered the entire stack of providers wanted—from Amazon FSx for Lustre for high-performance storage, to Amazon Elastic Kubernetes Service (Amazon EKS) for container orchestration, to Elastic Fabric Adapter (EFA) for high-throughput networking—to create a manufacturing setting that might reliably serve tens of millions of artistic professionals.
Key takeaway
In case you’re constructing and managing your personal AI pipelines, Adobe’s success highlights a crucial perception: though GPU-accelerated compute typically will get the highlight in AI infrastructure discussions, what’s equally necessary is the NVIDIA software program stack together with the inspiration of orchestration, storage, and networking providers that allow production-ready AI. Their outcomes converse for themselves—Adobe achieved a 20-fold scale-up in mannequin coaching whereas sustaining the enterprise-grade efficiency and reliability their clients anticipate.
Pioneering new AI functions from the bottom up
All through my profession, I’ve been significantly energized by startups that tackle audacious challenges—people who aren’t simply constructing incremental enhancements however are basically reimagining how issues work. Perplexity exemplifies this spirit. They’ve taken on a expertise most of us now take with no consideration: search. It’s the sort of bold mission that excites me, not simply due to its daring imaginative and prescient, however due to the unbelievable technical challenges it presents. Whenever you’re processing 340 million queries month-to-month and serving over 1,500 organizations, reworking search isn’t nearly having nice concepts—it’s about constructing strong, scalable methods that may ship constant efficiency in manufacturing.
Perplexity’s revolutionary strategy earned them membership in each AWS Activate and NVIDIA Inception—flagship packages designed to speed up startup innovation and success. These packages offered them with the assets, technical steerage, and assist wanted to construct at scale. They had been one of many early adopters of Amazon SageMaker HyperPod, and proceed to make use of its distributed coaching capabilities to speed up mannequin coaching time by as much as 40%. They use a extremely optimized inference stack constructed with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server to serve each their search utility and pplx-api, their public API service that provides builders entry to their proprietary fashions. The outcomes converse for themselves—their inference stack achieves as much as 3.1 occasions decrease latency in comparison with different platforms. Each their coaching and inference workloads run on NVIDIA GPU-accelerated EC2 P5 cases, delivering the efficiency and reliability wanted to function at scale. To present their customers much more flexibility, Perplexity enhances their very own fashions with providers corresponding to Amazon Bedrock, and offers entry to extra state-of-the-art fashions of their API. Amazon Bedrock affords ease of use and reliability, that are essential for his or her workforce—as they notice, it permits them to successfully preserve the reliability and latency their product calls for.
What I discover significantly compelling about Perplexity’s journey is their dedication to technical excellence, exemplified by their work optimizing GPU reminiscence switch with EFA networking. The workforce achieved 97.1% of the theoretical most bandwidth of 3200 Gbps and open sourced their improvements, enabling different organizations to profit from their learnings.
For these within the technical particulars, I encourage you to learn their fascinating publish Journey to 3200 Gbps: High-Performance GPU Memory Transfer on AWS Sagemaker Hyperpod.
Key takeaway
For organizations with complicated AI workloads and particular efficiency necessities, Perplexity’s strategy affords a precious lesson. Generally, the trail to production-ready AI isn’t about selecting between self-hosted infrastructure and managed providers—it’s about strategically combining each. This hybrid technique can ship each distinctive efficiency (evidenced by Perplexity’s 3.1 occasions decrease latency) and the flexibleness to evolve.
Remodeling enterprise workflows with AI
Enterprise workflows characterize the spine of enterprise operations—they usually’re an important proving floor for AI’s capability to ship instant enterprise worth. ServiceNow, which phrases itself the AI platform for enterprise transformation, is quickly integrating AI to reimagine core enterprise processes at scale.
ServiceNow’s revolutionary AI options showcase their imaginative and prescient for enterprise-specific AI optimization. As Srinivas Sunkara, ServiceNow’s Vice President, explains, their strategy focuses on deep AI integration with expertise workflows, core enterprise processes, and CRM methods—areas the place conventional giant language fashions (LLMs) typically lack domain-specific data. To coach generative AI fashions at enterprise scale, ServiceNow makes use of NVIDIA DGX Cloud on AWS. Their structure combines high-performance FSx for Lustre storage with NVIDIA GPU clusters for coaching, and NVIDIA Triton Inference Server handles manufacturing deployment. This strong expertise platform permits ServiceNow to give attention to domain-specific AI growth and buyer worth relatively than infrastructure administration.
Key takeaway
ServiceNow affords an necessary lesson about enterprise AI adoption: whereas basis fashions (FMs) present highly effective basic capabilities, the best enterprise worth typically comes from optimizing fashions for particular enterprise use circumstances and workflows. In lots of circumstances, it’s exactly this deliberate specialization that transforms AI from an attention-grabbing expertise into a real enterprise accelerator.
Scaling AI throughout enterprise functions
Cisco’s Webex workforce’s journey with generative AI exemplifies how giant organizations can methodically rework their functions whereas sustaining enterprise requirements for reliability and effectivity. With a complete suite of telecommunications functions serving clients globally, they wanted an strategy that may permit them to include LLMs throughout their portfolio—from AI assistants to speech recognition—with out compromising efficiency or rising operational complexity.
The Webex workforce’s key perception was to separate their fashions from their functions. Beforehand, they’d embedded AI fashions into the container photos for functions operating on Amazon EKS, however as their fashions grew in sophistication and measurement, this strategy turned more and more inefficient. By migrating their LLMs to Amazon SageMaker AI and utilizing NVIDIA Triton Inference Server, they created a clear architectural break between their comparatively lean functions and the underlying fashions, which require extra substantial compute assets. This separation permits functions and fashions to scale independently, considerably lowering growth cycle time and rising useful resource utilization. The workforce deployed dozens of fashions on SageMaker AI endpoints, utilizing Triton Inference Server’s mannequin concurrency capabilities to scale globally throughout AWS information facilities.
The outcomes validate Cisco’s methodical strategy to AI transformation. By separating functions from fashions, their growth groups can now repair bugs, carry out assessments, and add options to functions a lot quicker, with out having to handle giant fashions of their workstation reminiscence. The structure additionally allows important price optimization—functions stay obtainable throughout off-peak hours for reliability, and mannequin endpoints can scale down when not wanted, all with out impacting utility efficiency. Trying forward, the workforce is evaluating Amazon Bedrock to additional enhance their price-performance, demonstrating how considerate structure choices create a basis for steady optimization.
Key takeaway
For enterprises with giant utility portfolios seeking to combine AI at scale, Cisco’s methodical strategy affords an necessary lesson: separating LLMs from functions creates a cleaner architectural boundary that improves each growth velocity and value optimization. By treating fashions and functions as impartial elements, Cisco considerably improved growth cycle time whereas lowering prices by extra environment friendly useful resource utilization.
Constructing mission-critical AI for healthcare
Earlier, we highlighted how Hippocratic AI reached 100,000 sufferers throughout a disaster. Behind this achievement lies a narrative of rigorous engineering for security and reliability—important in healthcare the place stakes are terribly excessive.
Hippocratic AI’s strategy to this problem is each revolutionary and rigorous. They’ve developed what they name a “constellation structure”—a complicated system of over 20 specialised fashions working in live performance, every targeted on particular security facets like prescription adherence, lab evaluation, and over-the-counter medicine steerage. This distributed strategy to security means they’ve to coach a number of fashions, requiring administration of great computational assets. That’s why they use SageMaker HyperPod for his or her coaching infrastructure, utilizing Amazon FSx and Amazon Simple Storage Service (Amazon S3) for high-speed storage entry to NVIDIA GPUs, whereas Grafana and Prometheus present the great monitoring wanted to supply optimum GPU utilization. They construct upon NVIDIA’s low-latency inference stack, and are enhancing conversational AI capabilities utilizing NVIDIA Riva fashions for speech recognition and text-to-speech translation, and are additionally utilizing NVIDIA NIM microservices to deploy these fashions. Given the delicate nature of healthcare information and HIPAA compliance necessities, they’ve carried out a complicated multi-account, multi-cluster technique on AWS—operating manufacturing inference workloads with affected person information on utterly separate accounts and clusters from their growth and coaching environments. This cautious consideration to each safety and efficiency permits them to deal with hundreds of affected person interactions whereas sustaining exact management over scientific security and accuracy.
The impression of Hippocratic AI’s work extends far past technical achievements. Their AI-powered scientific assistants tackle crucial healthcare workforce burnout by dealing with burdensome administrative duties—from pre-operative preparation to post-discharge follow-ups. For instance, throughout climate emergencies, their system can quickly assess warmth dangers and coordinate transport for susceptible sufferers—the sort of complete care that may be too burdensome and resource-intensive to coordinate manually at scale.
Key takeaway
For organizations constructing AI options for complicated, regulated, and high-stakes environments, Hippocratic AI’s constellation structure reinforces what we’ve persistently emphasised: there’s not often a one-size-fits-all mannequin for each use case. Simply as Amazon Bedrock affords a alternative of fashions to fulfill various wants, Hippocratic AI’s strategy of mixing over 20 specialised fashions—every targeted on particular security facets—demonstrates how a thoughtfully designed ensemble can obtain each precision and scale.
Conclusion
Because the expertise companions enabling these and numerous different buyer improvements, AWS and NVIDIA’s long-standing collaboration continues to evolve to fulfill the calls for of the generative AI period. Our partnership, which started over 14 years in the past with the world’s first GPU cloud occasion, has grown to supply the business’s widest vary of NVIDIA accelerated computing options and software program providers for optimizing AI deployments. Via initiatives like Undertaking Ceiba—one of many world’s quickest AI supercomputers hosted completely on AWS utilizing NVIDIA DGX Cloud for NVIDIA’s personal analysis and growth use—we proceed to push the boundaries of what’s potential.
As all of the examples we’ve coated illustrate, it isn’t simply concerning the expertise we construct collectively—it’s how organizations of all sizes are utilizing these capabilities to rework their industries and create new potentialities. These tales in the end reveal one thing extra basic: after we make highly effective AI capabilities accessible and dependable, folks discover outstanding methods to make use of them to resolve significant issues. That’s the true promise of our partnership with NVIDIA—enabling innovators to create optimistic change at scale. I’m excited to proceed inventing and partnering with NVIDIA and may’t wait to see what our mutual clients are going to do subsequent.
Sources
Take a look at the next assets to study extra about our partnership with NVIDIA and generative AI on AWS:
Concerning the Creator
Rahul Pathak is Vice President Information and AI GTM at AWS, the place he leads the worldwide go-to-market and specialist groups who’re serving to clients create differentiated worth with AWS’s AI and capabilities corresponding to Amazon Bedrock, Amazon Q, Amazon SageMaker, and Amazon EC2 and Information Companies corresponding to Amaqzon S3, AWS Glue and Amazon Redshift. Rahul believes that generative AI will rework just about each single buyer expertise and that information is a key differentiator for patrons as they construct AI functions. Previous to his present position, he was Vice President, Relational Database Engines the place he led Amazon Aurora, Redshift, and DSQL . Throughout his 13+ years at AWS, Rahul has been targeted on launching, constructing, and rising managed database and analytics providers, all aimed toward making it simple for patrons to get worth from their information. Rahul has over twenty years of expertise in expertise and has co-founded two corporations, one targeted on analytics and the opposite on IP-geolocation. He holds a level in Pc Science from MIT and an Government MBA from the College of Washington.