College of California Los Angeles delivers an immersive theater expertise with AWS generative AI providers


This publish was co-written with Andrew Browning, Anthony Doolan, Jerome Ronquillo, Jeff Burke, Chiheb Boussema, and Naisha Agarwal from UCLA.

The College of California, Los Angeles (UCLA) is dwelling to 16 Nobel Laureates and has been ranked the #1 public college in america for 8 consecutive years. The Workplace of Superior Analysis Computing (OARC) at UCLA is the know-how enlargement companion to the analysis enterprise, offering each mental and technical know-how to show analysis into actuality. The UCLA Heart for Analysis and Engineering in Media and Efficiency (REMAP) approached OARC to construct a set of AI microservices to help an immersive manufacturing of the musical, Xanadu.

REMAP’s manufacturing of Xanadu, in collaboration with the UCLA Division of Theater’s Ray Bolger Musical Theater program, was designed to be an immersive, participatory efficiency throughout which the viewers collaboratively created media by utilizing cell phone gestures to attract photographs on 13 x 9 foot LED screens, known as shrines, offered by 4Wall Entertainment and positionally tracked utilizing Mo-Sys StarTrackers. Their drawings had been then run by means of the AWS microservices for inference with the ensuing media re-projected again to the shrines as AI generated 2D photographs and 3D meshes within the present’s digital surroundings (in Unreal Engine on {hardware} by Boxx). OARC efficiently designed and carried out an answer for 7 performances, in addition to the various playtests and rehearsals main as much as them. The performances ran between Might 15 and Might 23, 2025 with about 500 complete viewers members, as much as 65 at a time co-creating media through the efficiency.

On this publish, we are going to stroll by means of the efficiency constraints and design selections by OARC and REMAP, together with how AWS serverless infrastructure, AWS Managed Providers, and generative AI providers supported the speedy design and deployment of our answer. We will even describe our use of Amazon SageMaker AI and the way it may be used reliably in immersive reside experiences. We’ll define the fashions used and describe how they contributed to the viewers co-created media. We will even evaluate the mechanisms we used to manage price over the period of each rehearsals and performances. Lastly, we are going to current classes realized and enhancements we plan to make for section 2 of this undertaking.

OARC’s answer was designed to allow close to real-time (NRT) inferencing throughout a reside efficiency and included the next high-level necessities:

  • The microservices had a strict minimal concurrency requirement of 80 cell phone customers for every efficiency (accommodating 65 viewers members plus 12 performers)
  • The imply round-trip time (MRTT) from cell phone sketches to media presentation needed to be underneath 2 minutes to be prepared because the efficiency was occurring and supply optimum viewers expertise
  • The AWS GPU assets needed to be fault tolerant and extremely obtainable throughout rehearsals and performances, sleek degradation was not an choice
  • A human-in-the-loop dashboard was required to offer guide management over the infrastructure assets if human intervention was required
  • The structure needed to be versatile sufficient to deal with show-to-show modifications as builders discovered new methods to resolve points

With the above constraints in thoughts, we designed the system with a serverless-first structure method for a lot of the workload. We deployed HuggingFace fashions on Amazon SageMaker AI and used obtainable fashions in Amazon Bedrock, making a complete inference pipeline that used the pliability and strengths of each providers. Amazon Bedrock supplied simplified and managed entry to basis fashions corresponding to Anthropic Claude, Amazon Nova, Stable Diffusion and Amazon SageMaker AI offered full machine studying lifecycle management for open supply fashions from HuggingFace.

The next structure diagram reveals a high-level view of interactions between the cell phone sketch creation and OARC’s AWS microservice.

End-to-end AWS architecture diagram showing integration between on-premises workstations, Lambda functions, queues, and AI services

The OARC microservice software design used a serverless-first method, offering the muse for an event-driven structure. Consumer sketches had been handed to the microservice utilizing a low-latency Firebase orchestration layer and the work was coordinated by means of a collection of processing steps reworking consumer sketches into 2D photographs and 3D meshes. A number of on-premises MacOS workstations on the left of the diagram had been accountable for initiating workflows, expecting job completions, human within the loop evaluate, and for sending completed belongings again to the efficiency media servers.

Inbound viewers sketches and metadata messages had been despatched to Amazon SQS from the on-premises MacOS workstations, the place they had been sorted into sub queues by an AWS Lambda helper perform. Every queue was accountable for beginning a pipeline based mostly on the kind of inference processing that the consumer sketch required (for instance, 2D-image, 3D-mesh). The sorting mechanism let the applying exactly management its processing fee, so busy pipelines didn’t block new messages in different pipelines utilizing open assets.

End-to-end AWS Generative AI workflow diagram demonstrating Lambda orchestration with storage, messaging, and AI model integration

A second extra advanced Lambda perform listened for messages from the sorted sub queues and offered the logic to arrange consumer sketches for inferencing. This perform did the validation, error/success messaging, concurrency dealing with, and orchestration of the pre-processing inference and post-processing steps. This design took a modular method permitting builders to quickly combine new options whereas retaining merge conflicts to a minimal. Since there was a human-in-the-loop, we didn’t carry out automated post-processing on the pictures. We might safely belief that points can be caught earlier than they had been despatched to the shrines. Sooner or later, we glance to validate belongings returned by fashions in SageMaker AI endpoints utilizing guardrails in Amazon Bedrock and different object detection strategies together with human-in-the-loop evaluate.

Our processing steps required giant Python dependencies together with PyTorch. Rising as much as 5GB in measurement, these dependencies had been too giant to slot in Lambda layers. We used Amazon EFS to host the dependencies in a separate quantity mounted to the Lambda perform at run time. The scale of the dependencies elevated the time it took the service to start out, however after preliminary instantiation, future message processing was performant. The elevated latency throughout startup was a great use case to deal with with the Lambda chilly begins and latency enchancment suggestions. Nonetheless, we didn’t implement it as a result of it required some changes to our improvement course of late within the undertaking.

Inference requests had been dealt with by 24 SageMaker AI Endpoints, with 8 endpoints accountable for dealing with the three pipelines. We used the Amazon EC2 G6 occasion household to host the fashions, utilizing 8 g6.12xlarge and 16 g6.4xlarge situations. Every pipeline contained a custom-made workflow particular to the kind of request wanted for the manufacturing. Every SageMaker AI endpoint leveraged each internally loaded fashions and enormous LLMs hosted on Amazon Bedrock to finish every request (the total workflow is detailed within the following AI workflow part). Common processing occasions, measured from Amazon SageMaker AI job initiation to the return of generated belongings to AWS Lambda, ranged from 40-60 seconds on the g6.4xlarge situations, and 20-30 seconds on the g6.12xlarge situations.

After inferencing, the Lambda perform despatched the message to an Amazon SNS subject accountable for sending success emails, publishing to Amazon SQS, and updating an Amazon DynamoDB desk for future analytics. The on-premises MacOS workstations polled the ultimate queue to retrieve new belongings as they completed.

The next picture illustrates the fashions utilized by each Amazon SageMaker AI and Amazon Bedrock in our answer. Fashions for Amazon SageMaker AI embrace: DeepSeek VLM, SDXL, Steady Diffusion 3.5, SPAR3D, ControlNet for openpose, Yamix-8, ControlNet Tile, ControlNet for canny edges, CSGO, IP Adapter, InstantID, antelopev2 mannequin from InsightFace. Fashions utilized by Amazon Bedrock embrace: Nova Canvas, Steady Diffusion 3.5, and Claude 3.5 Sonnet.

Comprehensive listing of available AI models in SageMaker and Bedrock, featuring specialized vision, diffusion, and generation capabilities

The answer leveraged AWS for 3 distinct inference cycles, known as modules. Every module contains a tailor-made AI workflow, using a subset of small and enormous AI fashions, to generate 2D photographs and 3D mesh objects for presentation. Each module begins with an viewers immediate, during which members are requested to attract a sketch for a selected process, corresponding to making a background, rendering a 2D illustration of a 3D object, or putting muses in customized poses and clothes. The AI workflow processes these photographs in response to the necessities of every module.

Every module started by producing textual representations of the consumer’s sketch and any accompanying predesigned reference photographs. To perform this, we used both a DeepSeek VLM loaded onto an Amazon SageMaker AI endpoint or Anthropic’s Claude 3.5 Sonnet mannequin by means of Amazon Bedrock. The predesigned photographs included numerous theatrical poses, designer clothes, and useful belongings supposed to information mannequin outputs. Subsequent, these descriptions, consumer sketches, and supplemental belongings had been offered as inputs to a neighborhood diffusion mannequin paired with a ControlNet or comparable framework to generate the specified picture. In two of the modules, lower-resolution photographs had been generated to scale back inference time. These lower-quality photographs had been handed into both Nova Canvas in Amazon Bedrock or Steady Diffusion 3.5 to quickly generate higher-quality photographs, relying on the module. For instance, with Nova Canvas, we used the IMAGE_VARIATION process sort to generate a 2048 x 512-pixel picture from the low-resolution background sketches created by the DeepSeek VLM. This method offloaded a part of the inference workload, enabling us to run smaller Amazon SageMaker AI occasion sorts with out sacrificing high quality or pace.

The workflow then proceeded with the ultimate processing routines particular to every output sort. For background photographs, a forged member was overlaid at a various location close to the underside fringe of the picture. The customized poses had been transformed into texture objects, and object sketches had been remodeled into 3D mesh objects through the image-to-3D mannequin. Lastly, Amazon SageMaker AI saved the picture belongings in an Amazon S3 bucket, the place the principle AWS Lambda perform might retrieve them.

The next picture is an instance of belongings used and produced by one of many modules. Consumer sketch is on the left, actor photograph is on prime, reference background picture is on the underside, and the AI generated picture on the best.

Creative demonstration of photographic composition combining studio portraits with Monument Valley sunset scene

Deployment of code to the Lambda perform was dealt with by AWS CodeBuild. The job was accountable for listening for pull request merges on GitHub, updating the Python dependencies in EFS, and deploying the updates to the principle Lambda perform. This code deployment technique supported constant and dependable updates throughout our improvement, staging, and manufacturing environments and obviated the necessity for guide code deployments and updates, lowering the danger that entails.

SageMaker AI endpoints had been managed by a customized internet interface that allowed directors to deploy “known-good” endpoint configurations, permitting for fast deployments of infrastructure, speedy redeploys, and easy shutdowns. The dashboard additionally contained metrics on jobs working in Amazon SQS and Amazon CloudWatch Logs in order that the crew might purge messages from the pipeline.

After working by means of the performances and with the advantage of hindsight, now we have some suggestions and issues that will be helpful for future iterations. We advocate utilizing AWS CloudFormation or comparable device to scale back guide deployments and updates of providers used within the software. Many builders observe a improvement, staging, manufacturing pipeline to make modifications and enhancements, so automating the configuration of providers will scale back errors created in comparison with a guide deployment.

Through the use of a modular, serverless, event-driven method we created a dependable and simple to keep up cloud structure. Through the use of AWS Managed Providers builders and directors can give attention to the system design quite than system upkeep. General, we discovered that AWS Managed Providers carried out exceptionally nicely and offered a method to develop advanced technological architectures to help real-time picture inferencing in a high-stakes setting.

The character of this undertaking created a singular use case. We wanted a option to deal with a sudden inflow of inference requests coming in all at one time. This surge of requests solely lasted quarter-hour, so we would have liked to create an answer that was each dependable and ephemeral. We reviewed each Amazon EC2 and Amazon SageMaker AI as our fundamental choices for deploying 20+ situations on demand. To resolve on the very best system, we evaluated the next: On-demand request reliability, upkeep burden, complexity, deployment, and cargo balancing. Amazon EC2 is greater than able to dealing with these necessities, nonetheless acquiring the required on-demand situations was difficult, and sustaining that many hosts created an extreme upkeep burden. Amazon SageMaker AI met all our standards, with simple configuration, easy and dependable deployment, and an built-in load balancing service. In the end, we opted to host most of our fashions on SageMaker AI with Amazon Bedrock offering managed serverless entry to fashions corresponding to Nova Canvas, Steady Diffusion 3.5, and Claude 3.5 Sonnet. Amazon EKS is another choice which will have met our necessities. It’s nice at fast deployments and seamlessly scalable, nonetheless, we felt that Amazon SageMaker AI was the best alternative for this undertaking as a result of it was quick to configure.

Whereas SageMaker AI proved dependable for real-time inference throughout reside performances, it additionally represented the biggest share of our undertaking prices—roughly 40% of complete cloud spend. Throughout rehearsals and improvement, we noticed that idle or unused SageMaker AI endpoints might be a significant supply of price escalation. To mitigate this, we carried out a nightly automated shutdown course of utilizing Amazon EventBridge scheduler and AWS Lambda. This straightforward automation step stopped assets from being left working unintentionally, serving to us keep price predictability with out sacrificing efficiency. We’re additionally taking a look at different price discount methods for section 2.

By making a aware design alternative to make use of AWS generative AI providers and AWS Managed Providers for REMAP’s immersive manufacturing of the musical Xanadu, we had been capable of display that it’s attainable to help new and dynamic types of leisure with AWS.

We confirmed that serverless event-driven structure was a quick and low-cost technique for constructing out such providers, and we confirmed how each Amazon Bedrock and Amazon SageMaker AI can work collectively to make the most of your complete array of accessible generative AI fashions. We described our message pipeline and the message processing that went on inside it. We mentioned the generative AI fashions used and their perform and implementation. Lastly, now we have proven the potential for continued improvement of immersive musical theatre on this technique.

Xanadu E book by Douglas Carter Beane. Music & Lyrics by Jeff Lynne & John Farrar. Directed by Mira Winick & Corey Wright.


In regards to the authors

Andrew Browning is the Analysis Information and Net Platforms Supervisor for the Workplace of Superior Analysis Computing on the College of California Los Angeles (UCLA). He’s enthusiastic about the usage of AI within the fields of Superior Manufacturing, Medical and Dental Self- Care, and Immersive Efficiency. He’s additionally enthusiastic about creating re-usable PaaS functions to deal with frequent issues in these fields.

Anthony Doolan is Utility Programmer and AV Specialist at Analysis Information and Net Platforms | Infrastructure Help Providers at OARC, UCLA. Anthony Doolan is a Full Stack Net Developer and AV Specialist for UCLA’s Workplace of Superior Analysis Computing. He develops and maintains full stack internet functions, each on premises and cloud-based, and gives audiovisual techniques integration and programming experience.

Jerome Ronquillo is Net Developer & Cloud Architect at Analysis Information and Net Platforms at OARC, UCLA. He makes a speciality of designing and implementing scalable, cloud-native options that mix innovation with real-world software.

Lakshmi Dasari Lakshmi is a Sr. Options Architect supporting Public Sector Increased Training clients in Los Angeles. With intensive expertise in Enterprise IT structure, engineering and administration, she now helps AWS clients understand the worth of cloud with migration and modernization pathways. In her prior function as an AWS Accomplice Options Architect, she accelerated buyer’s AWS adoption with AWS SI and ISV companions. She is keen about inclusion in tech and is actively concerned in hiring and mentoring to advertise a various expertise pool on the office.

Aditya Singh Aditya Singh is an AI/ML Specialist Options Architect at AWS who focuses on serving to larger schooling establishments and state/native authorities organizations speed up their AI adoption journey utilizing cutting-edge generative AI and machine studying techniques. He makes a speciality of Generative AI functions, pure language processing, and MLOps that deal with distinctive challenges within the schooling and public sector.

Jeff Burke is Professor and Chair of the Division of Theater and Affiliate Dean, Analysis and Inventive Expertise within the UCLA Faculty of Theater, Movie and Tv, the place he co-directs the Heart for Analysis in Engineering, Media, and Efficiency (REMAP). Burke’s analysis and artistic work explores the intersections of rising know-how and artistic expression. He’s at present the principal investigator of the Innovation, Tradition, and Creativity undertaking funded by the Nationwide Science Basis to discover alternatives nationwide for innovation on the intersection of the artistic and know-how sectors. He developed and produced Xanadu in collaboration with college students from throughout campus.

Chiheb Boussema is an Utilized AI Scientist at REMAP, UCLA the place he develops AI options for artistic functions. His pursuits at present embrace scalability and edge deployment of diffusion fashions, movement management and synthesis for animation, and reminiscence and human-AI interplay modeling and management.

Naisha Agarwal is a rising senior at UCLA majoring in pc science. She was the generative AI co-lead for Xanadu the place she labored on designing the Generative AI workflows that powered numerous viewers interactions within the present, combining her ardour for know-how and the humanities. She interned at Microsoft Analysis, engaged on designing user- authored immersive experiences, augmenting bodily areas with digital worlds. She has additionally interned at Kumo the place she developed a customized AI chatbot which was later deployed on Snowflake. Moreover, she has printed a paper on recommender techniques on the KDD convention. She is keen about utilizing pc science to resolve actual world issues.

Leave a Reply

Your email address will not be published. Required fields are marked *