Speed up your ML lifecycle utilizing the brand new and improved Amazon SageMaker Python SDK – Half 2: ModelBuilder

In Part 1 of this collection, we launched the newly launched ModelTrainer class on the Amazon SageMaker Python SDK and its advantages, and confirmed you easy methods to fine-tune a Meta Llama 3.1 8B mannequin on a customized dataset. On this submit, we take a look at the enhancements to the ModelBuilder class, which helps you to seamlessly deploy a mannequin from ModelTrainer to a SageMaker endpoint, and offers a single interface for a number of deployment configurations.

In November 2023, we launched the ModelBuilder class (see Package and deploy models faster with new tools and guided workflows in Amazon SageMaker and Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 1: PySDK Improvements), which diminished the complexity of preliminary setup of making a SageMaker endpoint similar to creating an endpoint configuration, selecting the container, serialization and deserialization, and extra, and helps you create a deployable mannequin in a single step. The latest replace enhances usability of the ModelBuilder class for a variety of use circumstances, significantly within the quickly evolving discipline of generative AI. On this submit, we deep dive into the enhancements made to the ModelBuilder class, and present you easy methods to seamlessly deploy the fine-tuned mannequin from Part 1 to a SageMaker endpoint.

Enhancements to the ModelBuilder class

We’ve made the next usability enhancements to the ModelBuilder class:

Seamless transition from coaching to inference – ModelBuilder now integrates straight with SageMaker coaching interfaces to be sure that the proper file path to the most recent educated mannequin artifact is mechanically computed, simplifying the workflow from mannequin coaching to deployment.
Unified inference interface – Beforehand, the SageMaker SDK supplied separate interfaces and workflows for various kinds of inference, similar to real-time, batch, serverless, and asynchronous inference. To simplify the mannequin deployment course of and supply a constant expertise, we now have enhanced ModelBuilder to function a unified interface that helps a number of inference sorts.
Ease of improvement, testing, and manufacturing handoff – We’re including assist for native mode testing with ModelBuilder in order that customers can effortlessly debug and check their processing and inference scripts with sooner native testing with out together with a container, and a brand new perform that outputs the most recent container picture for a given framework so that you don’t need to replace the code every time a brand new LMI launch comes out.
Customizable inference preprocessing and postprocessing – ModelBuilder now means that you can customise preprocessing and postprocessing steps for inference. By enabling scripts to filter content material and take away personally identifiable info (PII), this integration streamlines the deployment course of, encapsulating the required steps throughout the mannequin configuration for higher administration and deployment of fashions with particular inference necessities.
Benchmarking assist – The brand new benchmarking assist in ModelBuilder empowers you to guage deployment choices—like endpoints and containers—based mostly on key efficiency metrics similar to latency and value. With the introduction of a Benchmarking API, you may check eventualities and make knowledgeable choices, optimizing your fashions for peak efficiency earlier than manufacturing. This enhances effectivity and offers cost-effective deployments.

Within the following sections, we focus on these enhancements in additional element and reveal easy methods to customise, check, and deploy your mannequin.

Seamless deployment from ModelTrainer class

ModelBuilder integrates seamlessly with the ModelTrainer class; you may merely go the ModelTrainer object that was used for coaching the mannequin on to ModelBuilder within the mannequin parameter. Along with the ModelTrainer, ModelBuilder additionally helps the Estimator class and the results of the SageMaker Core TrainingJob.create() perform, and mechanically parses the mannequin artifacts to create a SageMaker Mannequin object. With useful resource chaining, you may construct and deploy the mannequin as proven within the following instance. For those who adopted Part 1 of this collection to fine-tune a Meta Llama 3.1 8B mannequin, you may go the model_trainer object as follows:

# set container URI
image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0"

model_builder = ModelBuilder(
    mannequin=model_trainer,  # ModelTrainer object handed onto ModelBuilder straight
    role_arn=position,
    image_uri=image_uri,
    inference_spec=inf_spec,
    instance_type="ml.g5.2xlarge"
)
# deploy the mannequin
model_builder.construct().deploy()

Customise the mannequin utilizing InferenceSpec

The InferenceSpec class means that you can customise the mannequin by offering customized logic to load and invoke the mannequin, and specify any preprocessing logic or postprocessing logic as wanted. For SageMaker endpoints, preprocessing and postprocessing scripts are sometimes used as a part of the inference pipeline to deal with duties which are required earlier than and after the info is shipped to the mannequin for predictions, particularly within the case of advanced workflows or non-standard fashions. The next instance reveals how one can specify the customized logic utilizing InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec

class CustomerInferenceSpec(InferenceSpec):
    def load(self, model_dir):
        from transformers import AutoModel
        return AutoModel.from_pretrained(HF_TEI_MODEL, trust_remote_code=True)

    def invoke(self, x, mannequin):
        return mannequin.encode(x)

    def preprocess(self, input_data):
        return json.masses(input_data)["inputs"]

    def postprocess(self, predictions):
        assert predictions isn't None
        return predictions

Take a look at utilizing native and in course of mode

Deploying a educated mannequin to a SageMaker endpoint entails making a SageMaker mannequin and configuring the endpoint. This consists of the inference script, any serialization or deserialization required, the mannequin artifact location in Amazon Simple Storage Service (Amazon S3), the container picture URI, the correct occasion sort and depend, and extra. The machine studying (ML) practitioners have to iterate over these settings earlier than lastly deploying the endpoint to SageMaker for inference. The ModelBuilder presents two modes for fast prototyping:

In course of mode – On this case, the inferences are made straight throughout the similar inference course of. That is extremely helpful in shortly testing the inference logic offered by means of InferenceSpec and offers quick suggestions throughout experimentation.
Native mode – The mannequin is deployed and run as a neighborhood container. That is achieved by setting the mode to LOCAL_CONTAINER whenever you construct the mannequin. That is useful to imitate the identical atmosphere because the SageMaker endpoint. Confer with the next notebook for an instance.

The next code is an instance of operating inference in course of mode, with a customized InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec
from transformers import pipeline
from sagemaker.serve import Mode
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.builder.model_builder import ModelBuilder

worth: str = "Girafatron is obsessive about giraffes, essentially the most wonderful animal on the face of this Earth. Giraftron believes all different animals are irrelevant when in comparison with the fantastic majesty of the giraffe.nDaniel: Howdy, Girafatron!nGirafatron:"
schema = SchemaBuilder(worth,
            {"generated_text": "Girafatron is obsessive about giraffes, essentially the most wonderful animal on the face of this Earth. Giraftron believes all different animals are irrelevant when in comparison with the fantastic majesty of the giraffe.nDaniel: Howdy, Girafatron!nGirafatron: Hello, Daniel. I used to be simply excited about how magnificent giraffes are and the way they need to be worshiped by all.nDaniel: You and I believe alike, Girafatron. I believe all animals must be worshipped! However I assume that might be a bit impractical...nGirafatron: That is true. However the giraffe is simply such an incredible creature and will at all times be revered!nDaniel: Sure! And the best way you go on about giraffes, I might inform you actually love them.nGirafatron: I am obsessive about them, and I am glad to listen to you observed!nDaniel: I'"})

# customized inference spec with hugging face pipeline
class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        ...
    def invoke(self, enter, mannequin):
        ...
    def preprocess(self, input_data):
        ...
    def postprocess(self, predictions):
        ...
        
inf_spec = MyInferenceSpec()

# Construct ModelBuilder object in IN_PROCESS mode
builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.IN_PROCESS,
                       schema_builder=schema
                      )
                      
# Construct and deploy the mannequin
mannequin = builder.construct()
predictor=mannequin.deploy()

# make predictions
predictor.predict("How are you right now?")

As the subsequent steps, you may check it in native container mode as proven within the following code, by including the image_uri. You have to to incorporate the model_server argument whenever you embrace the image_uri.

image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04'

builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.LOCAL_CONTAINER,  # you may change it to Mode.SAGEMAKER_ENDPOINT for endpoint deployment
                       schema_builder=schema,
                       image_uri=picture,
                       model_server=ModelServer.TORCHSERVE
                      )

mannequin = builder.construct()                      
predictor = mannequin.deploy()

predictor.predict("How are you right now?")

Deploy the mannequin

When testing is full, now you can deploy the mannequin to a real-time endpoint for predictions by updating the mode to mode.SAGEMAKER_ENDPOINT and offering an occasion sort and measurement:

sm_predictor = mannequin.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    mode=Mode.SAGEMAKER_ENDPOINT,
    position=execution_role,
)

sm_predictor.predict("How is the climate?")

Along with real-time inference, SageMaker helps serverless inference, asynchronous inference, and batch inference modes for deployment. You can too use InferenceComponents to summary your fashions and assign CPU, GPU, accelerators, and scaling insurance policies per mannequin. To be taught extra, see Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker.

After you have got the ModelBuilder object, you may deploy to any of those choices just by including the corresponding inference configurations when deploying the mannequin. By default, if the mode isn’t offered, the mannequin is deployed to a real-time endpoint. The next are examples of different configurations:

from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
predictor = model_builder.deploy(
    endpoint_name="serverless-endpoint",
    inference_config=ServerlessInferenceConfig(memory_size_in_mb=2048))

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3_utils import s3_path_join

predictor = model_builder.deploy(
    endpoint_name="async-endpoint",
    inference_config=AsyncInferenceConfig(
        output_path=s3_path_join("s3://", bucket, "async_inference/output")))

from sagemaker.batch_inference.batch_transform_inference_config import BatchTransformInferenceConfig

transformer = model_builder.deploy(
    endpoint_name="batch-transform-job",
    inference_config=BatchTransformInferenceConfig(
        instance_count=1,
        instance_type="ml.m5.giant",
        output_path=s3_path_join("s3://", bucket, "batch_inference/output"),
        test_data_s3_path = s3_test_path
    ))
print(transformer)

Deploy a multi-model endpoint utilizing InferenceComponent:

from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

predictor = model_builder.deploy(
    endpoint_name="multi-model-endpoint",
    inference_config=ResourceRequirements(
        requests={
            "num_cpus": 0.5,
            "reminiscence": 512,
            "copies": 2,
        },
        limits={},
))

Clear up

For those who created any endpoints when following this submit, you’ll incur expenses whereas it’s up and operating. As greatest apply, delete any endpoints if they’re not required, both utilizing the AWS Management Console, or utilizing the next code:

predictor.delete_model() 
predictor.delete_endpoint()

Conclusion

On this two-part collection, we launched the ModelTrainer and the ModelBuilder enhancements within the SageMaker Python SDK. Each lessons intention to cut back the complexity and cognitive overhead for information scientists, offering you with a simple and intuitive interface to coach and deploy fashions, each regionally in your SageMaker notebooks and to distant SageMaker endpoints.

We encourage you to check out the SageMaker SDK enhancements (SageMaker Core, ModelTrainer, and ModelBuilder) by referring to the SDK documentation and pattern notebooks on the GitHub repo, and tell us your suggestions within the feedback!

In regards to the Authors

Durga Sury is a Senior Options Architect on the Amazon SageMaker crew. Over the previous 5 years, she has labored with a number of enterprise prospects to arrange a safe, scalable AI/ML platform constructed on SageMaker.

Shweta Singh is a Senior Product Supervisor within the Amazon SageMaker Machine Studying (ML) platform crew at AWS, main SageMaker Python SDK. She has labored in a number of product roles in Amazon for over 5 years. She has a Bachelor of Science diploma in Pc Engineering and a Masters of Science in Monetary Engineering, each from New York College.