Deploying A number of Fashions with SageMaker Pipelines | by Ram Vegiraju | Mar, 2023

Making use of MLOps greatest practices to superior serving choices

Picture from Unsplash by Growtika

MLOps is an important observe to productionizing your Machine Studying workflows. With MLOps you’ll be able to set up workflows which might be catered for the ML lifecycle. These make it simpler to centrally preserve sources, replace/observe fashions, and usually simplify the method as your ML experimentation scales up.

A key MLOps software inside the Amazon SageMaker ecosystem is SageMaker Pipelines. With SageMaker Pipelines you’ll be able to outline workflows which might be composed of various outlined ML steps. You can too construction these workflows by defining parameters which you could inject as variables into your Pipeline. For a extra basic introduction to SageMaker Pipelines, please check with the linked article.

Defining a Pipeline in itself shouldn’t be closely sophisticated, however there’s just a few superior use-cases that want some additional configuring. Particularly, say that you’re coaching a number of fashions which might be wanted for inference in your ML use-case. Inside SageMaker there’s a internet hosting possibility often known as Multi-Model Endpoints (MME) the place you’ll be able to host a number of fashions on a singular endpoint and invoke a goal mannequin. Nonetheless, inside SageMaker Pipelines there’s no native assist for outlining or deploying a MME natively for the time being. On this weblog submit we’ll check out how we are able to make the most of a Pipelines Lambda Step to deploy a Multi-Mannequin Endpoint in a customized method, whereas adhering to MLOPs greatest practices.

NOTE: For these of you new to AWS, be sure you make an account on the following link if you wish to observe alongside. The article additionally assumes an intermediate understanding of SageMaker Deployment, I might recommend following this article for understanding Deployment/Inference extra in depth. Specifically, for SageMaker Multi-Mannequin Endpoints I might check with the next blog.


For this instance, we can be working in SageMaker Studio, the place we’ve got entry to the visible interfaces for SageMaker Pipelines and different SageMaker elements. For improvement we can be using a Studio Pocket book Occasion with a Knowledge Science Kernel on an ml.t3.medium occasion. To get began we have to first import the mandatory libraries for the completely different steps we can be using inside SageMaker Pipelines.

import os
import boto3
import re
import time
import json
from sagemaker import get_execution_role, session
import pandas as pd

from time import gmtime, strftime
import sagemaker
from sagemaker.mannequin import Mannequin
from sagemaker.image_uris import retrieve
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.model_step import ModelStep
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep
from sagemaker.workflow.parameters import ParameterString
from sagemaker.estimator import Estimator

# Customized Lambda Step
from sagemaker.workflow.lambda_step import (
from sagemaker.lambda_helper import Lambda
from sagemaker.workflow.pipeline import Pipeline

Subsequent we create a Pipeline Session, this Pipeline Session ensures not one of the coaching jobs are literally executed inside the pocket book till the Pipeline itself is executed.

pipeline_session = PipelineSession()

For this instance we’ll make the most of the Abalone dataset (CC BY 4.0) and run a SageMaker XGBoost algorithm on it for a regression mannequin. You may obtain the dataset from the publicly out there Amazon datasets.

!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv .
!aws s3 cp abalone_dataset1_train.csv s3://{default_bucket}/xgboost-regression/prepare.csv
training_path = 's3://{}/xgboost-regression/prepare.csv'.format(default_bucket)

We are able to then parameterize our Pipeline by defining defaults for each the coaching dataset and occasion sort.

training_input_param = ParameterString(
title = "training_input",

training_instance_param = ParameterString(
title = "training_instance",
default_value = "ml.c5.xlarge")

We then additionally retrieve the AWS provided container for XGBoost that we are going to be using for coaching and inference.

model_path = f's3://{default_bucket}/{s3_prefix}/xgb_model'

image_uri = sagemaker.image_uris.retrieve(


Coaching Setup

For the coaching portion of our Pipeline we can be configuring the SageMaker XGBoost algorithm for our regression Abalone dataset.

xgb_train_one = Estimator(


For our second estimator we then change our hyperparameters to regulate our mannequin coaching so we’ve got two separate fashions behind our Multi-Mannequin Endpoint.

xgb_train_two = Estimator(

#adjusting hyperparams

We then configure our coaching inputs for each estimators to level in the direction of the parameter we outlined for our S3 coaching dataset.

train_args_one = xgb_train_one.match(
"prepare": TrainingInput(
content_type="textual content/csv",

train_args_two = xgb_train_two.match(
"prepare": TrainingInput(
content_type="textual content/csv",

We then outline two separate Coaching Steps that can be executed in parallel by way of our Pipeline.

step_train_one = TrainingStep(

step_train_two = TrainingStep(
title = "TrainTwo",
step_args= train_args_two

Lambda Step

A Lambda Step basically lets you plug in a Lambda operate inside your Pipeline. For each SageMaker Coaching Job a mannequin.tar.gz is emitted containing the educated mannequin artifacts. Right here we’ll make the most of the Lambda step to retrieve the educated mannequin artifacts and deploy them to a SageMaker Multi-Mannequin Endpoint.

Earlier than we are able to do this we have to give our Lambda operate correct permissions to work with SageMaker. We are able to use the next current script to create an IAM Position for our Lambda Operate.

import boto3
import json

iam = boto3.shopper("iam")

def create_lambda_role(role_name):
response = iam.create_role(
"Model": "2012-10-17",
"Assertion": [
"Effect": "Allow",
"Principal": {"Service": ""},
"Action": "sts:AssumeRole",
Description="Position for Lambda to name SageMaker capabilities",

role_arn = response["Role"]["Arn"]

response = iam.attach_role_policy(

response = iam.attach_role_policy(
PolicyArn="arn:aws:iam::aws:coverage/AmazonSageMakerFullAccess", RoleName=role_name

return role_arn

besides iam.exceptions.EntityAlreadyExistsException:
print(f"Utilizing ARN from current position: {role_name}")
response = iam.get_role(RoleName=role_name)
return response["Role"]["Arn"]

from iam_helper import create_lambda_role

lambda_role = create_lambda_role("lambda-deployment-role")

After we’ve outlined our Lambda position we are able to create a Lambda operate that does just a few issues for us:

  • Takes every particular person mannequin.tar.gz from every coaching job and locations it right into a central S3 location containing each tarballs. For MME they anticipate all mannequin tarballs to be in a single singular S3 path.
  • Makes use of the boto3 shopper with SageMaker to create a SageMaker Mannequin, Endpoint Configuration, and Endpoint.

We are able to make the most of the next helper capabilities to realize the primary process, by copying the coaching job artifacts right into a central S3 location with each mannequin tarballs.

sm_client = boto3.shopper("sagemaker")
s3 = boto3.useful resource('s3')

def extract_bucket_key(model_data):
Extracts the bucket and key from the mannequin information tarballs that we're passing in
bucket = model_data.cut up('/', 3)[2]
key = model_data.cut up('/', 3)[-1]
return [bucket, key]

def create_mme_dir(model_data_dir):
Takes in a listing of lists with the completely different educated fashions,
creates a central S3 bucket/key location with all mannequin artifacts for MME.
bucket_name = model_data_dir[0][0]
for i, model_data in enumerate(model_data_dir):
copy_source = {
'Bucket': bucket_name,
'Key': model_data[1]
bucket = s3.Bucket(bucket_name)
destination_key = 'xgboost-mme-pipelines/model-{}.tar.gz'.format(i)
bucket.copy(copy_source, destination_key)
mme_s3_path = 's3://{}/xgboost-mme-pipelines/'.format(bucket_name)
return mme_s3_path

The following steps for our Lambda operate can be creating the mandatory SageMaker entities for making a real-time endpoint:

  • SageMaker Model: Accommodates the mannequin information and container picture, additionally defines Multi-Mannequin vs Single Mannequin endpoint.
  • SageMaker Endpoint Configuration: Defines the {hardware} behind an endpoint, the occasion sort and rely.
  • SageMaker Endpoint: Your REST endpoint which you could invoke for inference, for MME you additionally specify the mannequin that you simply need to carry out inference in opposition to.
    model_name = 'mme-source' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
create_model_response = sm_client.create_model(
"Image": image_uri,
"Mode": "MultiModel",
"ModelDataUrl": model_url
#to-do parameterize this
print("Mannequin Arn: " + create_model_response["ModelArn"])

#Step 2: EPC Creation
xgboost_epc_name = "mme-source" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
endpoint_config_response = sm_client.create_endpoint_config(
"VariantName": "xgbvariant",
"ModelName": model_name,
"InstanceType": "ml.c5.large",
"InitialInstanceCount": 1
print("Endpoint Configuration Arn: " + endpoint_config_response["EndpointConfigArn"])

#Step 3: EP Creation
endpoint_name = "mme-source" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
create_endpoint_response = sm_client.create_endpoint(
print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

We return a profitable message with our Lambda operate as soon as we’re capable of begin creating an endpoint.

return {
"statusCode": 200,
"physique": json.dumps("Created Endpoint!"),
"endpoint_name": endpoint_name

We then outline this Lambda operate within the essential Lambda Step format for our Pipeline to select up on.

# Lambda helper class can be utilized to create the Lambda operate
func = Lambda(

We additionally outline what we’re coming back from the Lambda within the type of output parameters.

output_param_1 = LambdaOutput(output_name="statusCode", output_type=LambdaOutputTypeEnum.String)
output_param_2 = LambdaOutput(output_name="physique", output_type=LambdaOutputTypeEnum.String)
output_param_3 = LambdaOutput(output_name="endpoint_name", output_type=LambdaOutputTypeEnum.String)

We then outline our inputs with the 2 completely different educated mannequin artifacts from the coaching steps that we outlined earlier in our pocket book.

step_deploy_lambda = LambdaStep(
outputs=[output_param_1, output_param_2, output_param_3],

Pipeline Execution & Pattern Inference

Now that we’ve got our completely different steps configured we are able to sew all of this collectively right into a singular Pipeline. We level in the direction of our three completely different steps and the completely different parameters we outlined. Notice which you could additionally outline additional parameters than we did right here relying in your use case.

pipeline = Pipeline(
steps=[step_train_one, step_train_two, step_deploy_lambda],
parameters= [training_input_param, training_instance_param]

We are able to now execute the Pipeline with the next instructions.

execution = pipeline.begin()

Publish execution we discover that within the Studio UI for the Pipelines tab a Directed Acylic Graph (DAG) has been created to your Pipeline to show your workflow.

MME DAG (Screenshot by Writer)

After a couple of minutes you must also see an endpoint has been created within the SageMaker Console.

Endpoint Created (Screenshot by Writer)

We are able to then take a look at this endpoint with a pattern inference to make sure it’s working correctly.

import boto3
smr = boto3.shopper('sagemaker-runtime') #shopper for inference

#specify the tarball you're invoking within the TargetModel param
resp = smr.invoke_endpoint(EndpointName=endpoint_name, Physique=b'.345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0',
ContentType='textual content/csv', TargetModel = 'model-0.tar.gz')


Further Assets & Conclusion

The code for the whole instance might be discovered on the hyperlink above (keep tuned for extra Pipelines examples). This instance combines a sophisticated internet hosting possibility with MLOPs greatest practices. It’s essential to make the most of MLOPs instruments as you scale up your ML experimentation because it helps simplify and parameterize your efforts in order that it’s simpler for groups to collaborate and observe. I hope this text was overview of utilizing Pipelines for a particular Internet hosting use-case in MME. As all the time all suggestions is appreciated, thanks for studying!

Leave a Reply

Your email address will not be published. Required fields are marked *