Import a query answering fine-tuned mannequin into Amazon Bedrock as a {custom} mannequin


Amazon Bedrock is a totally managed service that provides a selection of high-performing basis fashions (FMs) from main AI corporations like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon by a single API, together with a broad set of capabilities to construct generative AI purposes with safety, privateness, and accountable AI.

Frequent generative AI use instances, together with however not restricted to chatbots, digital assistants, conversational search, and agent assistants, use FMs to offer responses. Retrieval Increase Technology (RAG) is a method to optimize the output of FMs by offering context across the questions for these use instances. Wonderful-tuning the FM is advisable to additional optimize the output to observe the model and trade voice or vocabulary.

Customized Mannequin Import for Amazon Bedrock, in preview now, permits you to import personalized FMs created in different environments, resembling Amazon SageMaker, Amazon Elastic Compute Cloud (Amazon EC2) situations, and on premises, into Amazon Bedrock. This publish is a part of a collection that demonstrates varied structure patterns for importing fine-tuned FMs into Amazon Bedrock.

On this publish, we offer a step-by-step method of fine-tuning a Mistral mannequin utilizing SageMaker and import it into Amazon Bedrock utilizing the Customized Import Mannequin function. We use the OpenOrca dataset to fine-tune the Mistral mannequin and use the SageMaker FMEval library to guage the fine-tuned mannequin imported into Amazon Bedrock.

Key Options

A number of the key options of Customized Mannequin Import for Amazon Bedrock are:

  1. This function permits you to deliver your fine-tuned fashions and leverage the absolutely managed serverless capabilities of Amazon Bedrock
  2. At present we’re supporting Llama 2, Llama 3, Flan, Mistral Mannequin architectures utilizing this function with a precisions of FP32, FP16 and BF16 with additional quantizations coming quickly.
  3. To leverage this function you possibly can run the import course of (coated later within the weblog) together with your mannequin weights being in Amazon Easy Storage Service (Amazon S3).
  4. You may even leverage your fashions created utilizing Amazon SageMaker by referencing the Amazon SageMaker mannequin Amazon Useful resource Names (ARN) which offers for a seamless integration with SageMaker.
  5. Amazon Bedrock will routinely scale your mannequin as your site visitors sample will increase and when not in use, scale your mannequin right down to 0 thus decreasing your prices.

Allow us to dive right into a use-case and see how straightforward it’s to make use of this function.

Answer overview

On the time of writing, the Customized Mannequin Import function in Amazon Bedrock helps fashions following the architectures and patterns within the following determine.

On this publish, we stroll by the next high-level steps:

  1. Wonderful-tune the mannequin utilizing SageMaker.
  2. Import the fine-tuned mannequin into Amazon Bedrock.
  3. Check the imported mannequin.
  4. Consider the imported mannequin utilizing the FMEval library.

The next diagram illustrates the answer structure.

The method consists of the next steps:

  1. We use a SageMaker coaching job to fine-tune the mannequin utilizing a SageMaker JupyterLab pocket book. This coaching job reads the dataset from Amazon Simple Storage Service (Amazon S3) and writes the mannequin again into Amazon S3. This mannequin will then be imported into Amazon Bedrock.
  2. To import the fine-tuned mannequin, you should utilize the Amazon Bedrock console, the Boto3 library, or APIs.
  3. An import job orchestrates the method to import the mannequin and make the mannequin obtainable from the shopper account.
    1. The import job copies all of the mannequin artifacts from the person’s account into an AWS managed S3 bucket.
  4. When the import job is full, the fine-tuned mannequin is made obtainable for invocation out of your AWS account.
  5. We use the SageMaker FMEval library in a SageMaker pocket book to guage the imported mannequin.

The copied mannequin artifacts will stay within the Amazon Bedrock account till the {custom} imported mannequin is deleted from Amazon Bedrock. Deleting mannequin artifacts in your AWS account S3 bucket doesn’t delete the mannequin or the associated artifacts within the Amazon Bedrock managed account. You may delete an imported mannequin from Amazon Bedrock together with all of the copied artifacts utilizing both the Amazon Bedrock console, Boto3 library, or APIs.

Moreover, all knowledge (together with the mannequin) stays throughout the chosen AWS Area. The mannequin artifacts are imported into the AWS operated deployment account utilizing a digital personal cloud (VPC) endpoint, and you may encrypt your mannequin knowledge utilizing an AWS Key Management Service (AWS KMS) customer managed key.

Within the following sections, we dive deep into every of those steps to deploy, check, and consider the mannequin.

Stipulations

We use Mistral-7B-v0.3 on this publish as a result of it makes use of an prolonged vocabulary in comparison with its prior model produced by Mistral AI. This mannequin is simple to fine-tune, and Mistral AI has supplied instance fine-tuned fashions. We use Mistral for this use case as a result of this mannequin helps a 32,000-token context capability and is fluent in English, French, Italian, German, Spanish, and coding languages. With the Mixture of Experts (MoE) function, it might obtain greater accuracy for buyer assist use instances.

Mistral-7B-v0.3 is a gated mannequin on the Hugging Face mannequin repository. You must overview the phrases and circumstances and request entry to the mannequin by submitting your particulars.

We use Amazon SageMaker Studio to preprocess the info and fine-tune the Mistral mannequin utilizing a SageMaker coaching job. To arrange SageMaker Studio, discuss with Launch Amazon SageMaker Studio. Seek advice from the SageMaker JupyterLab documentation to arrange and launch a JupyterLab pocket book. You’ll submit a SageMaker coaching job to fine-tune the Mistral mannequin from the SageMaker JupyterLab pocket book, which might discovered on the GitHub repo.

Wonderful-tune the mannequin utilizing QLoRA

To fine-tune the Mistral mannequin, we apply QLoRA and Parameter-Efficient Fine-Tuning (PEFT) optimization strategies. Within the supplied notebook, you employ the Fully Sharded Data Parallel (FSDP) PyTorch API to carry out distributed mannequin tuning. You employ supervised fine-tuning (SFT) to fine-tune the Mistral mannequin.

Put together the dataset

Step one within the fine-tuning course of is to organize and format the dataset. After you remodel the dataset into the Mistral Default Instruct format, you add it as a JSONL file into the S3 bucket utilized by the SageMaker session, as proven within the following code:

# Load dataset from the hub
dataset = load_dataset("Open-Orca/OpenOrca")
flan_dataset = dataset.filter(lambda instance, indice: "flan" in instance["id"], with_indices=True)
flan_dataset = flan_dataset["train"].train_test_split(test_size=0.01, train_size=0.035)

columns_to_remove = listing(dataset["train"].options)
flan_dataset = flan_dataset.map(create_conversation, remove_columns=columns_to_remove, batched=False)

# save datasets to s3
flan_dataset["train"].to_json(f"{training_input_path}/train_dataset.json", orient="data", force_ascii=False)
flan_dataset["test"].to_json(f"{training_input_path}/test_dataset.json", orient="data", force_ascii=False)

You remodel the dataset into Mistral Default Instruct format throughout the SageMaker coaching job as instructed within the coaching script (run_fsdp_qlora.py):

    ################
    # Dataset
    ################
    
    train_dataset = load_dataset(
        "json",
        data_files=os.path.be a part of(script_args.dataset_path, "train_dataset.json"),
        break up="prepare",
    )
    test_dataset = load_dataset(
        "json",
        data_files=os.path.be a part of(script_args.dataset_path, "test_dataset.json"),
        break up="prepare",
    )

    ################
    # Mannequin & Tokenizer
    ################

    # Tokenizer        
    tokenizer = AutoTokenizer.from_pretrained(script_args.model_id, use_fast=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.chat_template = MISTRAL_CHAT_TEMPLATE
    
    # template dataset
    def template_dataset(examples):
        return{"textual content":  tokenizer.apply_chat_template(examples["messages"], tokenize=False)}
    
    train_dataset = train_dataset.map(template_dataset, remove_columns=["messages"])
    test_dataset = test_dataset.map(template_dataset, remove_columns=["messages"])

Optimize fine-tuning utilizing QLoRA

You optimize your fine-tuning utilizing QLoRA and with the precision supplied as enter into the coaching script as SageMaker coaching job parameters. QLoRA is an environment friendly fine-tuning method that reduces reminiscence utilization to fine-tune a 65-billion-parameter mannequin on a single 48 GB GPU, preserving the total 16-bit fine-tuning process efficiency. On this pocket book, you employ the bitsandbytes library to arrange quantization configurations, as proven within the following code:

    # Mannequin    
    torch_dtype = torch.bfloat16 if training_args.bf16 else torch.float32
    quant_storage_dtype = torch.bfloat16

    if script_args.use_qlora:
        print(f"Utilizing QLoRA - {torch_dtype}")
        quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch_dtype,
                bnb_4bit_quant_storage=quant_storage_dtype,
            )
    else:
        quantization_config = None

You employ the LoRA config based mostly on the QLoRA paper and Sebastian Raschka experiment, as proven within the following code. Two key factors to contemplate from the Raschka experiment are that QLoRA gives 33% reminiscence financial savings at the price of an 39% enhance in runtime, and to verify LoRA is utilized to all layers to maximise mannequin efficiency.

################
# PEFT
################
# LoRA config based mostly on QLoRA paper & Sebastian Raschka experiment
peft_config = LoraConfig(
    lora_alpha=8,
    lora_dropout=0.05,
    r=16,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
    )

You employ SFTTrainer to fine-tune the Mistral mannequin:

    ################
    # Coaching
    ################
    coach = SFTTrainer(
        mannequin=mannequin,
        args=training_args,
        train_dataset=train_dataset,
        dataset_text_field="textual content",
        eval_dataset=test_dataset,
        peft_config=peft_config,
        max_seq_length=script_args.max_seq_length,
        tokenizer=tokenizer,
        packing=True,
        dataset_kwargs={
            "add_special_tokens": False,  # We template with particular tokens
            "append_concat_token": False,  # No want so as to add further separator token
        },
    )

On the time of writing, solely merged adapters are supported utilizing the Customized Mannequin Import function for Amazon Bedrock. Let’s take a look at tips on how to merge the adapter with the bottom mannequin subsequent.

Merge the adapters

Adapters are new modules added between layers of a pre-trained community. Creation of those new modules is feasible by back-propagating gradients by a frozen, 4-bit quantized pre-trained language mannequin into low-rank adapters within the fine-tuning course of. To import the Mistral mannequin into Amazon Bedrock, the adapters must be merged with the bottom mannequin and saved in Safetensors format. Use the next code to merge the mannequin adapters and save them in Safetensors format:

        # load PEFT mannequin in fp16
        mannequin = AutoPeftModelForCausalLM.from_pretrained(
            training_args.output_dir,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16
        )
        # Merge LoRA and base mannequin and save
        mannequin = mannequin.merge_and_unload()
        mannequin.save_pretrained(
            sagemaker_save_dir, safe_serialization=True, max_shard_size="2GB"
        )

To import the Mistral mannequin into Amazon Bedrock, the mannequin must be in an uncompressed listing inside an S3 bucket accessible by the Amazon Bedrock service function used within the import job.

Import the fine-tuned mannequin into Amazon Bedrock

Now that you’ve got fine-tuned the mannequin, you possibly can import the mannequin into Amazon Bedrock. On this part, we reveal tips on how to import the mannequin utilizing the Amazon Bedrock console or the SDK.

Import the mannequin utilizing the Amazon Bedrock console

To import the mannequin utilizing the Amazon Bedrock console, see Import a model with Custom Model Import. You employ the Import mannequin web page as proven within the following screenshot to import the mannequin from the S3 bucket.

After you efficiently import the fine-tuned mannequin, you possibly can see the mannequin listed on the Amazon Bedrock console.

Import the mannequin utilizing the SDK

The AWS Boto3 library helps importing {custom} fashions into Amazon Bedrock. You need to use the next code to import a fine-tuned mannequin from throughout the pocket book into Amazon Bedrock. That is an asynchronous methodology.

import boto3
import datetime
br_client = boto3.shopper('bedrock', region_name="<aws-region-name>")
pt_model_nm = "<bedrock-custom-model-name>"
pt_imp_jb_nm = f"{pt_model_nm}-{datetime.datetime.now().strftime('%YpercentmpercentdpercentMpercentHpercentS')}"
role_arn = "<<bedrock_role_with_custom_model_import_policy>>"
pt_model_src = {"s3DataSource": {"s3Uri": f"{pt_pubmed_model_s3_path}"}}
resp = br_client.create_model_import_job(jobName=pt_imp_jb_nm,
                                  importedModelName=pt_model_nm,
                                  roleArn=role_arn,
                                  modelDataSource=pt_model_src)

Check the imported mannequin

Now that you’ve got imported the fine-tuned mannequin into Amazon Bedrock, you possibly can check the mannequin. On this part, we reveal tips on how to check the mannequin utilizing the Amazon Bedrock console or the SDK.

Check the mannequin on the Amazon Bedrock console

You may check the imported mannequin utilizing an Amazon Bedrock playground, as illustrated within the following screenshot.

Check the mannequin utilizing the SDK

You may also use the Amazon Bedrock Invoke Model API to run the fine-tuned imported mannequin, as proven within the following code:

shopper = boto3.shopper("bedrock-runtime", region_name="us-west-2")
model_id = "<<substitute with the imported bedrock mannequin arn>>"


def call_invoke_model_and_print(native_request):
    request = json.dumps(native_request)

    strive:
        # Invoke the mannequin with the request.
        response = shopper.invoke_model(modelId=model_id, physique=request)
        model_response = json.masses(response["body"].learn())

        response_text = model_response["outputs"][0]["text"]
        print(response_text)
    besides (ClientError, Exception) as e:
        print(f"ERROR: Cannot invoke '{model_id}'. Cause: {e}")
        exit(1)

immediate = "will there be a season 5 of shadowhunters"
formatted_prompt = f"[INST] {immediate} [/INST]</s>"
native_request = {
"immediate": formatted_prompt,
"max_tokens": 64,
"top_p": 0.9,
"temperature": 0.91
}
call_invoke_model_and_print(native_request)

The {custom} Mistral mannequin that you simply imported utilizing Amazon Bedrock helps temperature, top_p, and max_gen_len parameters when invoking the mannequin for inferencing. The inference parameters top_k, max_seq_len, max_batch_size, and max_new_tokens usually are not supported for a {custom} Mistral fine-tuned mannequin.

Consider the imported mannequin

Now that you’ve got imported and examined the mannequin, let’s consider the imported mannequin utilizing the SageMaker FMEval library. For extra particulars, discuss with Evaluate Bedrock Imported Models. To judge the query answering process, we use the metrics F1 Score, Exact Match Score, Quasi Exact Match Score, Precision Over Words, and Recall Over Words. The important thing metrics for the query answering duties are Actual Match, Quasi-Actual Match, and F1 over phrases evaluated by evaluating the mannequin predicted solutions towards the bottom fact solutions. The FMEval library helps out-of-the-box analysis algorithms for metrics resembling accuracy, QA Accuracy, and others detailed within the FMEval documentation. Since you fine-tuned the Mistral mannequin for query answering, you should utilize the QA Accuracy algorithm, as proven within the following code. The FMEval library helps these metrics for the QA Accuracy algorithm.

config = DataConfig(
    dataset_name="trex_sample",
    dataset_uri="knowledge/test_dataset.json",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="query",
    target_output_location="reply"
)
bedrock_model_runner = BedrockModelRunner(
    model_id=model_id,
    output="outputs[0].textual content",
    content_template="{"immediate": $immediate, "max_tokens": 500}",
)

eval_algo = QAAccuracy()
eval_output = eval_algo.consider(mannequin=bedrock_model_runner, dataset_config=config, 
                                    prompt_template="[INST]$model_input[/INST]", save=True)

You may get the consolidated metrics for the imported mannequin as follows:

for op in eval_output:
    print(f"Eval Title: {op.eval_name}")
    for rating in op.dataset_scores:
        print(f"{rating.title} : {rating.worth}")

Clear up

To delete the imported mannequin from Amazon Bedrock, navigate to the mannequin on the Amazon Bedrock console. On the choices menu (three dots), select Delete.

To delete the SageMaker area together with the SageMaker JupyterLab area, discuss with Delete an Amazon SageMaker domain. You may additionally wish to delete the S3 buckets the place the info and mannequin are saved. For directions, see Deleting a bucket.

Conclusion

On this publish, we defined the totally different features of fine-tuning a Mistral mannequin utilizing SageMaker, importing the mannequin into Amazon Bedrock, invoking the mannequin utilizing each an Amazon Bedrock playground and Boto3, after which evaluating the imported mannequin utilizing the FMEval library. You need to use this function to import base FMs or FMs fine-tuned both on premises, on SageMaker, or on Amazon EC2 into Amazon Bedrock and use the fashions with none heavy lifting in your generative AI purposes. Discover the Customized Mannequin Import function for Amazon Bedrock to deploy FMs fine-tuned for code era duties in a safe and scalable method. Go to our GitHub repository to discover samples ready for fine-tuning and importing fashions from varied households.


In regards to the Authors

Jay Pillai is a Principal Options Architect at Amazon Internet Companies. On this function, he features because the Lead Architect, serving to companions ideate, construct, and launch Companion Options. As an Info Know-how Chief, Jay makes a speciality of synthetic intelligence, generative AI, knowledge integration, enterprise intelligence, and person interface domains. He holds 23 years of intensive expertise working with a number of purchasers throughout provide chain, authorized applied sciences, actual property, monetary companies, insurance coverage, funds, and market analysis enterprise domains.

Rupinder Grewal is a Senior AI/ML Specialist Options Architect with AWS. He at the moment focuses on serving of fashions and MLOps on Amazon SageMaker. Previous to this function, he labored as a Machine Studying Engineer constructing and internet hosting fashions. Outdoors of labor, he enjoys taking part in tennis and biking on mountain trails.

Evandro Franco is a Sr. AI/ML Specialist Options Architect at Amazon Internet Companies. He helps AWS clients overcome enterprise challenges associated to AI/ML on prime of AWS. He has greater than 18 years of expertise working with know-how, from software program improvement, infrastructure, serverless, to machine studying.

Felipe Lopez is a Senior AI/ML Specialist Options Architect at AWS. Previous to becoming a member of AWS, Felipe labored with GE Digital and SLB, the place he targeted on modeling and optimization merchandise for industrial purposes.

Sandeep Singh is a Senior Generative AI Information Scientist at Amazon Internet Companies, serving to companies innovate with generative AI. He makes a speciality of generative AI, synthetic intelligence, machine studying, and system design. He’s captivated with growing state-of-the-art AI/ML-powered options to unravel advanced enterprise issues for numerous industries, optimizing effectivity and scalability.

Ragha Prasad is a Principal Engineer and a founding member of Amazon Bedrock, the place he has had the privilege to take heed to buyer wants first-hand and understands what it takes to construct and launch scalable and safe Gen AI merchandise. Previous to Bedrock, he labored on quite a few merchandise in Amazon, starting from units to Advertisements to Robotics.

Paras Mehra is a Senior Product Supervisor at AWS. He’s targeted on serving to construct Amazon SageMaker Coaching and Processing. In his spare time, Paras enjoys spending time together with his household and highway biking across the Bay Space.

Leave a Reply

Your email address will not be published. Required fields are marked *