Enhance Amazon Nova migration efficiency with data-aware immediate optimization

Within the period of generative AI, new giant language fashions (LLMs) are regularly rising, every with distinctive capabilities, architectures, and optimizations. Amongst these, Amazon Nova basis fashions (FMs) ship frontier intelligence and industry-leading cost-performance, out there solely on Amazon Bedrock. Since its launch in 2024, generative AI practitioners, together with the groups in Amazon, have began transitioning their workloads from present FMs and adopting Amazon Nova fashions.
Nonetheless, when transitioning between completely different basis fashions, the prompts created on your authentic mannequin won’t be as performant for Amazon Nova fashions with out immediate engineering and optimization. Amazon Bedrock prompt optimization provides a software to robotically optimize prompts on your specified goal fashions (on this case, Amazon Nova fashions). It will possibly convert your authentic prompts to Amazon Nova-style prompts. Moreover, throughout the migration to Amazon Nova, a key problem is ensuring that efficiency after migration is not less than pretty much as good as or higher than previous to the migration. To mitigate this problem, thorough mannequin analysis, benchmarking, and data-aware optimization are important, to match the Amazon Nova mannequin’s efficiency towards the mannequin used earlier than the migration, and optimize the prompts on Amazon Nova to align efficiency with that of the earlier workload or enhance upon them.
On this submit, we current an LLM migration paradigm and structure, together with a steady technique of mannequin analysis, immediate technology utilizing Amazon Bedrock, and data-aware optimization. The answer evaluates the mannequin efficiency earlier than migration and iteratively optimizes the Amazon Nova mannequin prompts utilizing user-provided dataset and goal metrics. We show profitable migration to Amazon Nova for 3 LLM duties: textual content summarization, multi-class textual content classification, and question-answering applied by Retrieval Augmented Technology (RAG). We additionally focus on the teachings discovered and greatest practices so that you can implement the answer on your real-world use circumstances.
Migrating your generative AI workloads to Amazon Nova
Migrating the mannequin out of your generative AI workload to Amazon Nova requires a structured strategy to realize efficiency consistency and enchancment. It consists of evaluating and benchmarking the previous and new fashions, optimizing prompts on the brand new mannequin, and testing and deploying the brand new fashions in your manufacturing. On this part, we current a four-step workflow and an answer structure, as proven within the following structure diagram.
The workflow consists of the next steps:
- Consider the supply mannequin and acquire key efficiency metrics based mostly on your corporation use case, resembling response accuracy, response format correctness, latency, and price, to set a efficiency baseline because the mannequin migration goal.
- Mechanically replace the construction, instruction, and language of your prompts to adapt to the Amazon Nova mannequin for correct, related, and devoted outputs. We’ll focus on this extra within the subsequent part.
- Consider the optimized prompts on the migrated Amazon Nova mannequin to fulfill the efficiency goal outlined in Step 1. You’ll be able to conduct the optimization in Step 2 as an iterative course of till the optimized prompts meet your corporation standards.
- Conduct A/B testing to validate the Amazon Nova mannequin efficiency in your testing and manufacturing atmosphere. Whenever you’re happy, you’ll be able to deploy the Amazon Nova mannequin, settings, and prompts in manufacturing.
This four-step workflow must run repeatedly, to adapt to variations in each the mannequin and the information, pushed by the modifications in enterprise use circumstances. The continual adaptation supplies ongoing optimization and helps maximize total mannequin efficiency.
Knowledge-aware immediate optimization on Amazon Nova
On this part, we current a complete optimization methodology, taking two steps. Step one is to make use of Amazon Bedrock immediate optimization to refine your immediate construction, after which use an progressive data-aware immediate optimization strategy to additional optimize the immediate to enhance the Amazon Nova mannequin efficiency.
Amazon Bedrock immediate optimization
Amazon Bedrock supplies a immediate optimization characteristic that rewrites prompts to enhance efficiency on your use circumstances. Immediate optimization streamlines the way in which that AWS builders work together with FMs on Amazon Bedrock, robotically adapts the prompts to the chosen fashions, and generates for higher efficiency.
As step one, you should utilize immediate optimization to adapt your immediate to Amazon Nova. By analyzing the immediate you present, the characteristic interprets the duty, system immediate, and instruction inside the immediate, and robotically crafts the immediate with Amazon Nova specific format and appropriate words, phrases, and sentences. The next instance reveals how immediate optimization converts a typical immediate for a summarization job on Anthropic’s Claude Haiku right into a well-structured immediate for an Amazon Nova mannequin, with sections that start with particular markdown tags resembling ## Job, ### Summarization Directions
, and ### Doc to Summarize
.
Mannequin | Immediate |
Anthropic’s Claude 3 Haiku | Human: Act like you’re an clever AI assistant. You’re required to offer a summarization based mostly on given doc. Please use under directions when producing the response. The doc is offered in <doc></doc> XML tags. Please be temporary and concise in your reply. Don’t add any info that isn’t talked about within the doc. Don’t present any preamble and straight begin with the summarization. Don’t make up the reply, For those who don’t know the reply, simply say that I don’t know.<doc>{doc}</doc>Assistant: |
Amazon Nova Lite with Amazon Bedrock immediate optimization |
### Job Your job is to summarize the given doc enclosed in <doc></doc> tags in a quick and concise method. Comply with these tips:### Summarization Directions – Learn the doc fastidiously to grasp its details and key info. – Determine the core concepts, arguments, and supporting particulars introduced within the doc. – Synthesize the important info into a transparent and succinct abstract. – Use your personal phrases to paraphrase the important thing factors – don’t copy verbatim from the unique textual content. – Omit any extraneous or redundant info not central to the principle concepts. – Don’t introduce new info or make up content material not current within the authentic doc. – For those who can not summarize the doc attributable to lack of information, merely reply “I don’t know.”### Doc to Summarize <doc>{doc}</doc>Please present your concise abstract instantly with none preamble. |
We utilized the previous prompts to the Anthropic Claude 3 Haiku and Amazon Nova Lite fashions, respectively, utilizing the general public xsum dataset. To guage the mannequin efficiency, as a result of the summarization job doesn’t have a predefined floor fact, we designed an LLM decide as proven within the following immediate to validate the summarization high quality:
The experiment, utilizing 80 information samples, reveals that the accuracy is improved on the Amazon Nova Lite mannequin from 77.75% to 83.25% utilizing immediate optimization.
Knowledge-aware optimization
Though Amazon Bedrock immediate optimization helps the fundamental wants of immediate engineering, different immediate optimization strategies can be found to maximise LLM efficiency, resembling Multi-Aspect Critique, Self-Reflection, Gradient Descent and Beam Search, and Meta Prompting. Particularly, we noticed necessities from customers that they should fine-tune their prompts towards their optimization goal metrics they outline, resembling ROUGE, BERT-F1, or an LLM decide rating, through the use of a dataset they supply. To satisfy these wants, we designed a data-aware optimization structure as proven within the following diagram.
The information-aware optimization takes inputs. The primary enter is the user-defined optimization goal metrics; for the summarization job mentioned within the earlier part, you should utilize the BERT-F1 rating or create your personal LLM decide. The second enter is a coaching dataset (DevSet) offered by the consumer to validate the response high quality, for instance, a summarization information pattern with the next format.
Supply Doc | Summarization |
Officers searched properties within the Waterfront Park and Colonsay View areas of the town on Wednesday. Detectives stated three firearms, ammunition and a five-figure sum of cash have been recovered. A 26-year-old man who was arrested and charged appeared at Edinburgh Sheriff Courtroom on Thursday. | A person has appeared in courtroom after firearms, ammunition and money have been seized by police in Edinburgh. |
<one other doc ...> |
<one other summarization ...> |
The information-aware optimization makes use of these two inputs to enhance the immediate for higher Amazon Nova response high quality. On this work, we use the DSPy (Declarative Self-improving Python) optimizer for the data-aware optimization. DSPy is a extensively used framework for programming language fashions. It provides algorithms for optimizing the prompts for a number of LLM duties, from easy classifiers and summarizers to classy RAG pipelines. The dspy.MIPROv2
optimizer intelligently explores higher pure language directions for each immediate utilizing the DevSet, to maximise the metrics you outline.
We utilized the MIPROv2 optimizer on high of the outcomes optimized by Amazon Bedrock within the earlier part for higher Amazon Nova efficiency. Within the optimizer, we specify the variety of the instruction candidates within the technology area, use Bayesian optimization to successfully search over the area, and run it iteratively to generate directions and few-shot examples for the immediate in every step:
With the setting of num_candidates=5
, the optimizer generates 5 candidate directions:
We set different parameters for the optimization iteration, together with the variety of trials, the variety of few-shot examples, and the batch measurement for the optimization course of:
When the optimization begins, MIPROv2 makes use of every instruction candidate together with the mini-batch of the testing dataset we offered to deduce the LLM and calculate the metrics we outlined. After the loop is full, the optimizer evaluates one of the best instruction through the use of the complete testing dataset and calculates the complete analysis rating. Based mostly on the iterations, the optimizer supplies the improved instruction for the immediate:
Making use of the optimized immediate, the summarization accuracy generated by the LLM decide on Amazon Nova Lite mannequin is additional improved from 83.25% to 87.75%.
We additionally utilized the optimization course of on different LLM duties, together with a multi-class textual content classification job, and a question-answering job utilizing RAG. In all of the duties, our strategy optimized the migrated Amazon Nova mannequin to out-perform the Anthropic Claude Haiku and Meta Llama fashions earlier than migration. The next desk and chart illustrate the optimization outcomes.
Job | DevSet | Analysis | Earlier than Migration | After Migration (Amazon Bedrock Immediate Optimization) | After Migration (DSPy with Amazon Bedrock Immediate Optimization) |
Summarization (Anthropic Claude 3 Haiku to Amazon Nova Lite) | 80 samples | LLM Decide | 77.75 | 83.25 | 87.75 |
Classification (Meta Llama 3.2 3B to Amazon Nova Micro) | 80 samples | Accuracy | 81.25 | 81.25 | 87.5 |
QA-RAG (Anthropic Claude 3 Haiku to Amazon Nova Lite) | 50 samples | Semantic Similarity | 52.71 | 51.6 | 57.15 |
For the textual content classification use case, we optimized the Amazon Nova Micro mannequin utilizing 80 samples, utilizing the accuracy metrics to judge the optimization efficiency in every step. After seven iterations, the optimized immediate supplies 87.5% accuracy, improved from the accuracy of 81.25% working on the Meta Llama 3.2 3B mannequin.
For the question-answering use case, we used 50 samples to optimize the immediate for an Amazon Nova Lite mannequin within the RAG pipeline, and evaluated the efficiency utilizing a semantic similarity rating. The rating compares the cosine distance between the mannequin’s reply and the bottom fact reply. Evaluating to the testing information working on Anthropic’s Claude 3 Haiku, the optimizer improved the rating from 52.71 to 57.15 after migrating to the Amazon Nova Lite mannequin and immediate optimization.
Yow will discover extra particulars of those examples within the GitHub repository.
Classes discovered and greatest practices
By way of the answer design, we have now recognized greatest practices that may aid you correctly configure your immediate optimization to maximise the metrics you specify on your use case:
- Your dataset for optimizer ought to be of top quality and relevancy, and well-balanced to cowl the information patterns and edge circumstances of your use case, and nuances to attenuate biases.
- The metrics you outlined because the goal of optimization ought to be use case particular. For instance, in case your dataset has floor fact, then you should utilize statistical and programmatical machine studying (ML) metrics resembling accuracy and semantic similarity In case your dataset doesn’t embrace floor fact, a well-designed and human-aligned LLM judge can present a dependable analysis rating for the optimizer.
- The optimizer runs with a lot of immediate candidates (parameter
dspy.num_candidates
) and makes use of the analysis metric you outlined to pick the optimum immediate because the output. Keep away from setting too few candidates which may miss alternative for enchancment. Within the earlier summarization instance, we set 5 immediate candidates for optimizing by way of 80 coaching samples, and obtained good optimization efficiency. - The immediate candidates embrace a mixture of immediate directions and few-shot examples. You’ll be able to specify the variety of examples (parameter
dspy.max_labeled_demos
for examples from labeled samples, and parameterdspy.max_bootstrapped_demos
for examples from unlabeled samples); we advocate the instance quantity be a minimum of 2. - The optimization runs in iteration (parameter
dspy.num_trials
); you need to set sufficient iterations that can help you refine prompts based mostly on completely different situations and efficiency metrics, and step by step improve readability, relevance, and adaptableness. For those who optimize each the directions and the few-shot examples within the immediate, we advocate you set the iteration quantity to a minimum of 2, ideally between 5–10.
In your use case, in case your immediate construction is complicated with chain-of-thoughts or tree-of-thoughts, lengthy directions within the system immediate, and a number of inputs within the consumer immediate, you should utilize a task-specific class to summary the DSPy optimizer. The category helps encapsulate the optimization logic, standardize the immediate construction and optimization parameters, and permit simple implementation of various optimization methods. The next is an instance of the category created for textual content classification job:
Conclusion
On this submit, we launched the workflow and structure so that you can migrate your present generative AI workload into Amazon Nova fashions, and introduced a complete immediate optimization strategy utilizing Amazon Bedrock immediate optimization and a data-aware immediate optimization methodology with DSPy. The outcomes on three LLM duties demonstrated the optimized efficiency of Amazon Nova in its intelligence courses and the mannequin efficiency improved by Amazon Bedrock immediate optimization post-model migration, which is additional enhanced by the data-aware immediate optimization methodology introduced on this submit.
The Python library and code examples are publicly out there on GitHub. You should utilize this LLM migration technique and the immediate optimization resolution emigrate your workloads into Amazon Nova, or in different mannequin migration processes.
In regards to the Authors
Yunfei Bai is a Principal Options Architect at AWS. With a background in AI/ML, information science, and analytics, Yunfei helps clients undertake AWS companies to ship enterprise outcomes. He designs AI/ML and information analytics options that overcome complicated technical challenges and drive strategic goals. Yunfei has a PhD in Digital and Electrical Engineering. Outdoors of labor, Yunfei enjoys studying and music.
Anupam Dewan is a Senior Options Architect with a ardour for generative AI and its functions in actual life. He and his staff allow Amazon Builders who construct buyer going through software utilizing generative AI. He lives in Seattle space, and out of doors of labor likes to go on mountaineering and revel in nature.
Shuai Wang is a Senior Utilized Scientist and Supervisor at Amazon Bedrock, specializing in pure language processing, machine studying, giant language modeling, and different associated AI areas. Outdoors work, he enjoys sports activities, significantly basketball, and household actions.
Kashif Imran is a seasoned engineering and product chief with deep experience in AI/ML, cloud structure, and large-scale distributed methods. At the moment a Senior Supervisor at AWS, Kashif leads groups driving innovation in generative AI and Cloud, partnering with strategic cloud clients to remodel their companies. Kashif holds twin grasp’s levels in Laptop Science and Telecommunications, and makes a speciality of translating complicated technical capabilities into measurable enterprise worth for enterprises.