Assembly summarization and motion merchandise extraction with Amazon Nova

Conferences play an important function in decision-making, venture coordination, and collaboration, and distant conferences are frequent throughout many organizations. Nevertheless, capturing and structuring key takeaways from these conversations is usually inefficient and inconsistent. Manually summarizing conferences or extracting motion gadgets requires vital effort and is liable to omissions or misinterpretations.
Large language models (LLMs) supply a extra sturdy answer by reworking unstructured assembly transcripts into structured summaries and motion gadgets. This functionality is particularly helpful for venture administration, buyer assist and gross sales calls, authorized and compliance, and enterprise information administration.
On this submit, we current a benchmark of various understanding fashions from the Amazon Nova household out there on Amazon Bedrock, to supply insights on how one can select the most effective mannequin for a gathering summarization job.
LLMs to generate assembly insights
Trendy LLMs are extremely efficient for summarization and motion merchandise extraction as a result of their potential to grasp context, infer matter relationships, and generate structured outputs. In these use circumstances, immediate engineering supplies a extra environment friendly and scalable strategy in comparison with conventional mannequin fine-tuning or customization. Slightly than modifying the underlying mannequin structure or coaching on giant labeled datasets, immediate engineering makes use of rigorously crafted enter queries to information the mannequin’s habits, instantly influencing the output format and content material. This technique permits for fast, domain-specific customization with out the necessity for resource-intensive retraining processes. For duties similar to assembly summarization and motion merchandise extraction, immediate engineering permits exact management over the generated outputs, ensuring they meet particular enterprise necessities. It permits for the versatile adjustment of prompts to swimsuit evolving use circumstances, making it a super answer for dynamic environments the place mannequin behaviors must be rapidly reoriented with out the overhead of mannequin fine-tuning.
Amazon Nova fashions and Amazon Bedrock
Amazon Nova models, unveiled at AWS re:Invent in December 2024, are constructed to ship frontier intelligence at industry-leading value efficiency. They’re among the many quickest and most cost-effective fashions of their respective intelligence tiers, and are optimized to energy enterprise generative AI purposes in a dependable, safe, and cost-effective method.
The understanding mannequin household has 4 tiers of fashions: Nova Micro (text-only, ultra-efficient for edge use), Nova Lite (multimodal, balanced for versatility), Nova Professional (multimodal, stability of velocity and intelligence, perfect for many enterprise wants) and Nova Premier (multimodal, probably the most succesful Nova mannequin for advanced duties and trainer for mannequin distillation). Amazon Nova fashions can be utilized for a wide range of duties, from summarization to structured textual content technology. With Amazon Bedrock Model Distillation, prospects may convey the intelligence of Nova Premier to a quicker and cheaper mannequin similar to Nova Professional or Nova Lite for his or her use case or area. This may be achieved via the Amazon Bedrock console and APIs such because the Converse API and Invoke API.
Answer overview
This submit demonstrates the way to use Amazon Nova understanding fashions, out there via Amazon Bedrock, for automated perception extraction utilizing immediate engineering. We give attention to two key outputs:
- Assembly summarization – A high-level abstractive abstract that distills key dialogue factors, selections made, and significant updates from the assembly transcript
- Motion gadgets – A structured checklist of actionable duties derived from the assembly dialog that apply to all the staff or venture
The next diagram illustrates the answer workflow.
Stipulations
To comply with together with this submit, familiarity with calling LLMs utilizing Amazon Bedrock is predicted. For detailed steps on utilizing Amazon Bedrock for textual content summarization duties, confer with Build an AI text summarizer app with Amazon Bedrock. For extra details about calling LLMs, confer with the Invoke API and Using the Converse API reference documentation.
Answer elements
We developed the 2 core options of the answer—assembly summarization and motion merchandise extraction—through the use of widespread fashions out there via Amazon Bedrock. Within the following sections, we have a look at the prompts that had been used for these key duties.
For the assembly summarization job, we used a persona assignment, prompting the LLM to generate a abstract in <abstract>
tags to scale back redundant opening and shutting sentences, and a one-shot strategy by giving the LLM one instance to ensure the LLM persistently follows the precise format for abstract technology. As a part of the system immediate, we give clear and concise guidelines emphasizing the right tone, model, size, and faithfulness in direction of the offered transcript.
For the motion merchandise extraction job, we gave particular directions on producing motion gadgets within the prompts and used chain-of-thought to enhance the standard of the generated motion gadgets. Within the assistant message, the prefix <action_items>
tag is offered as a prefilling to nudge the mannequin technology in the precise route and to keep away from redundant opening and shutting sentences.
Totally different mannequin households reply to the identical prompts in a different way, and it’s vital to comply with the prompting information outlined for the actual mannequin. For extra data on greatest practices for Amazon Nova prompting, confer with Prompting best practices for Amazon Nova understanding models.
Dataset
To guage the answer, we used the samples for the general public QMSum dataset. The QMSum dataset is a benchmark for assembly summarization, that includes English language transcripts from educational, enterprise, and governance discussions with manually annotated summaries. It evaluates LLMs on producing structured, coherent summaries from advanced and multi-speaker conversations, making it a beneficial useful resource for abstractive summarization and discourse understanding. For testing, we used 30 randomly sampled conferences from the QMSum dataset. Every assembly contained 2–5 topic-wise transcripts and contained roughly 8,600 tokens for every transcript in common.
Analysis framework
Attaining high-quality outputs from LLMs in assembly summarization and motion merchandise extraction is usually a difficult job. Conventional analysis metrics similar to ROUGE, BLEU, and METEOR give attention to surface-level similarity between generated textual content and reference summaries, however they usually fail to seize nuances similar to factual correctness, coherence, and actionability. Human analysis is the gold commonplace however is pricey, time-consuming, and never scalable. To handle these challenges, you should utilize LLM-as-a-judge, the place one other LLM is used to systematically assess the standard of generated outputs primarily based on well-defined standards. This strategy gives a scalable and cost-effective approach to automate analysis whereas sustaining excessive accuracy. On this instance, we used Anthropic’s Claude 3.5 Sonnet v1 because the decide mannequin as a result of we discovered it to be most aligned with human judgment. We used the LLM decide to attain the generated responses on three major metrics: faithfulness, summarization, and query answering (QA).
The faithfulness rating measures the faithfulness of a generated abstract by measuring the portion of the parsed statements in a abstract which are supported by given context (for instance, a gathering transcript) with respect to the entire variety of statements.
The summarization rating is the mix of the QA rating and the conciseness rating with the identical weight (0.5). The QA rating measures the protection of a generated abstract from a gathering transcript. It first generates an inventory of query and reply pairs from a gathering transcript and measures the portion of the questions which are requested accurately when the abstract is used as a context as a substitute of a gathering transcript. The QA rating is complimentary to the faithfulness rating as a result of the faithfulness rating doesn’t measure the protection of a generated abstract. We solely used the QA rating to measure the standard of a generated abstract as a result of the motion gadgets aren’t speculated to cowl all facets of a gathering transcript. The conciseness rating measures the ratio of the size of a generated abstract divided by the size of the entire assembly transcript.
We used a modified model of the faithfulness rating and the summarization rating that had a lot decrease latency than the unique implementation.
Outcomes
Our analysis of Amazon Nova fashions throughout assembly summarization and motion merchandise extraction duties revealed clear performance-latency patterns. For summarization, Nova Premier achieved the very best faithfulness rating (1.0) with a processing time of 5.34s, whereas Nova Professional delivered 0.94 faithfulness in 2.9s. The smaller Nova Lite and Nova Micro fashions offered faithfulness scores of 0.86 and 0.83 respectively, with quicker processing instances of two.13s and 1.52s. In motion merchandise extraction, Nova Premier once more led in faithfulness (0.83) with 4.94s processing time, adopted by Nova Professional (0.8 faithfulness, 2.03s). Curiously, Nova Micro (0.7 faithfulness, 1.43s) outperformed Nova Lite (0.63 faithfulness, 1.53s) on this explicit job regardless of its smaller measurement. These measurements present beneficial insights into the performance-speed traits throughout the Amazon Nova mannequin household for text-processing purposes. The next graphs present these outcomes. The next screenshot reveals a pattern output for our summarization job, together with the LLM-generated assembly abstract and an inventory of motion gadgets.
Conclusion
On this submit, we confirmed how you should utilize prompting to generate assembly insights similar to assembly summaries and motion gadgets utilizing Amazon Nova fashions out there via Amazon Bedrock. For giant-scale AI-driven assembly summarization, optimizing latency, price, and accuracy is crucial. The Amazon Nova household of understanding fashions (Nova Micro, Nova Lite, Nova Professional, and Nova Premier) gives a sensible various to high-end fashions, considerably bettering inference velocity whereas decreasing operational prices. These components make Amazon Nova a pretty selection for enterprises dealing with giant volumes of assembly information at scale.
For extra data on Amazon Bedrock and the most recent Amazon Nova fashions, confer with the Amazon Bedrock User Guide and Amazon Nova User Guide, respectively. The AWS Generative AI Innovation Heart has a bunch of AWS science and technique specialists with complete experience spanning the generative AI journey, serving to prospects prioritize use circumstances, construct a roadmap, and transfer options into manufacturing. Take a look at the Generative AI Innovation Center for our newest work and buyer success tales.
In regards to the Authors
Baishali Chaudhury is an Utilized Scientist on the Generative AI Innovation Heart at AWS, the place she focuses on advancing Generative AI options for real-world purposes. She has a powerful background in pc imaginative and prescient, machine studying, and AI for healthcare. Baishali holds a PhD in Pc Science from College of South Florida and PostDoc from Moffitt Most cancers Centre.
Sungmin Hong is a Senior Utilized Scientist at Amazon Generative AI Innovation Heart the place he helps expedite the number of use circumstances of AWS prospects. Earlier than becoming a member of Amazon, Sungmin was a postdoctoral analysis fellow at Harvard Medical College. He holds Ph.D. in Pc Science from New York College. Exterior of labor, he prides himself on protecting his indoor crops alive for 3+ years.
Mengdie (Flora) Wang is a Knowledge Scientist at AWS Generative AI Innovation Heart, the place she works with prospects to architect and implement scalable Generative AI options that deal with their distinctive enterprise challenges. She makes a speciality of mannequin customization methods and agent-based AI programs, serving to organizations harness the complete potential of generative AI expertise. Previous to AWS, Flora earned her Grasp’s diploma in Pc Science from the College of Minnesota, the place she developed her experience in machine studying and synthetic intelligence.
Anila Joshi has greater than a decade of expertise constructing AI options. As a AWSI Geo Chief at AWS Generative AI Innovation Heart, Anila pioneers progressive purposes of AI that push the boundaries of chance and speed up the adoption of AWS providers with prospects by serving to prospects ideate, determine, and implement safe generative AI options.