Deploy Amazon SageMaker pipelines utilizing AWS Controllers for Kubernetes


Kubernetes is a well-liked orchestration platform for managing containers. Its scalability and load-balancing capabilities make it ultimate for dealing with the variable workloads typical of machine studying (ML) purposes. DevOps engineers usually use Kubernetes to handle and scale ML purposes, however earlier than an ML mannequin is on the market, it have to be educated and evaluated and, if the standard of the obtained mannequin is passable, uploaded to a mannequin registry.

Amazon SageMaker supplies capabilities to take away the undifferentiated heavy lifting of constructing and deploying ML fashions. SageMaker simplifies the method of managing dependencies, container pictures, auto scaling, and monitoring. Particularly for the mannequin constructing stage, Amazon SageMaker Pipelines automates the method by managing the infrastructure and assets wanted to course of knowledge, prepare fashions, and run analysis checks.

A problem for DevOps engineers is the extra complexity that comes from utilizing Kubernetes to handle the deployment stage whereas resorting to different instruments (such because the AWS SDK or AWS CloudFormation) to handle the mannequin constructing pipeline. One various to simplify this course of is to make use of AWS Controllers for Kubernetes (ACK) to handle and deploy a SageMaker coaching pipeline. ACK means that you can make the most of managed mannequin constructing pipelines while not having to outline assets outdoors of the Kubernetes cluster.

On this put up, we introduce an instance to assist DevOps engineers handle the complete ML lifecycle—together with coaching and inference—utilizing the identical toolkit.

Answer overview

We think about a use case during which an ML engineer configures a SageMaker mannequin constructing pipeline utilizing a Jupyter pocket book. This configuration takes the type of a Directed Acyclic Graph (DAG) represented as a JSON pipeline definition. The JSON doc will be saved and versioned in an Amazon Simple Storage Service (Amazon S3) bucket. If encryption is required, it may be carried out utilizing an AWS Key Management Service (AWS KMS) managed key for Amazon S3. A DevOps engineer with entry to fetch this definition file from Amazon S3 can load the pipeline definition into an ACK service controller for SageMaker, which is operating as a part of an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. The DevOps engineer can then use the Kubernetes APIs offered by ACK to submit the pipeline definition and provoke a number of pipeline runs in SageMaker. This complete workflow is proven within the following resolution diagram.

architecture

Conditions

To observe alongside, you need to have the next stipulations:

  • An EKS cluster the place the ML pipeline shall be created.
  • A person with entry to an AWS Identity and Access Management (IAM) position that has IAM permissions (iam:CreateRole, iam:AttachRolePolicy, and iam:PutRolePolicy) to permit creating roles and attaching insurance policies to roles.
  • The next command line instruments on the native machine or cloud-based improvement atmosphere used to entry the Kubernetes cluster:

Set up the SageMaker ACK service controller

The SageMaker ACK service controller makes it simple for DevOps engineers to make use of Kubernetes as their management airplane to create and handle ML pipelines. To put in the controller in your EKS cluster, full the next steps:

  1. Configure IAM permissions to verify the controller has entry to the suitable AWS assets.
  2. Set up the controller utilizing a SageMaker Helm Chart to make it obtainable on the consumer machine.

The next tutorial supplies step-by-step directions with the required instructions to put in the ACK service controller for SageMaker.

Generate a pipeline JSON definition

In most firms, ML engineers are liable for creating the ML pipeline of their group. They usually work with DevOps engineers to function these pipelines. In SageMaker, ML engineers can use the SageMaker Python SDK to generate a pipeline definition in JSON format. A SageMaker pipeline definition should observe the offered schema, which incorporates base pictures, dependencies, steps, and occasion varieties and sizes which can be wanted to totally outline the pipeline. This definition then will get retrieved by the DevOps engineer for deploying and sustaining the infrastructure wanted for the pipeline.

The next is a pattern pipeline definition with one coaching step:

{
  "Model": "2020-12-01",
  "Steps": [
  {
    "Name": "AbaloneTrain",
    "Type": "Training",
    "Arguments": {
      "RoleArn": "<<YOUR_SAGEMAKER_ROLE_ARN>>",
      "HyperParameters": {
        "max_depth": "5",
        "gamma": "4",
        "eta": "0.2",
        "min_child_weight": "6",
        "objective": "multi:softmax",
        "num_class": "10",
        "num_round": "10"
     },
     "AlgorithmSpecification": {
     "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1",
     "TrainingInputMode": "File"
   },
   "OutputDataConfig": {
     "S3OutputPath": "s3://<<YOUR_BUCKET_NAME>>/sagemaker/"
   },
   "ResourceConfig": {
     "InstanceCount": 1,
     "InstanceType": "ml.m4.xlarge",
     "VolumeSizeInGB": 5
   },
   "StoppingCondition": {
     "MaxRuntimeInSeconds": 86400
   },
   "InputDataConfig": [
   {
     "ChannelName": "train",
     "DataSource": {
       "S3DataSource": {
         "S3DataType": "S3Prefix",
         "S3Uri": "s3://<<YOUR_BUCKET_NAME>>/sagemaker/xgboost/train/",
         "S3DataDistributionType": "
       }
     },
     "ContentType": "text/libsvm"
   },
   {
     "ChannelName": "validation",
     "DataSource": {
       "S3DataSource": {
         "S3DataType": "S3Prefix",
         "S3Uri": "s3://<<YOUR_BUCKET_NAME>>/sagemaker/xgboost/validation/",
         "S3DataDistributionType": "FullyReplicated"
       }
     },
     "ContentType": "text/libsvm"
   }]
  }
 }]
}

With SageMaker, ML mannequin artifacts and different system artifacts are encrypted in transit and at relaxation. SageMaker encrypts these by default utilizing the AWS managed key for Amazon S3. You’ll be able to optionally specify a customized key utilizing the KmsKeyId property of the OutputDataConfig argument. For extra info on how SageMaker protects knowledge, see Data Protection in Amazon SageMaker.

Moreover, we suggest securing entry to the pipeline artifacts, corresponding to mannequin outputs and coaching knowledge, to a particular set of IAM roles created for knowledge scientists and ML engineers. This may be achieved by attaching an applicable bucket policy. For extra info on finest practices for securing knowledge in Amazon S3, see Top 10 security best practices for securing data in Amazon S3.

Create and submit a pipeline YAML specification

Within the Kubernetes world, objects are the persistent entities within the Kubernetes cluster used to characterize the state of your cluster. Once you create an object in Kubernetes, you will need to present the thing specification that describes its desired state, in addition to some fundamental details about the thing (corresponding to a reputation). Then, utilizing instruments corresponding to kubectl, you present the knowledge in a manifest file in YAML (or JSON) format to speak with the Kubernetes API.

Check with the next Kubernetes YAML specification for a SageMaker pipeline. DevOps engineers want to change the .spec.pipelineDefinition key within the file and add the ML engineer-provided pipeline JSON definition. They then put together and submit a separate pipeline execution YAML specification to run the pipeline in SageMaker. There are two methods to submit a pipeline YAML specification:

  • Go the pipeline definition inline as a JSON object to the pipeline YAML specification.
  • Convert the JSON pipeline definition into String format utilizing the command line utility jq. For instance, you should utilize the next command to transform the pipeline definition to a JSON-encoded string:
jq -r tojson <pipeline-definition.json>

On this put up, we use the primary choice and put together the YAML specification (my-pipeline.yaml) as follows:

apiVersion: sagemaker.companies.k8s.aws/v1alpha1
variety: Pipeline
metadata:
  identify: my-kubernetes-pipeline
spec:
  parallelismConfiguration:
  	maxParallelExecutionSteps: 2
  pipelineName: my-kubernetes-pipeline
  pipelineDefinition: |
  {
    "Model": "2020-12-01",
    "Steps": [
    {
      "Name": "AbaloneTrain",
      "Type": "Training",
      "Arguments": {
        "RoleArn": "<<YOUR_SAGEMAKER_ROLE_ARN>>",
        "HyperParameters": {
          "max_depth": "5",
          "gamma": "4",
          "eta": "0.2",
          "min_child_weight": "6",
          "objective": "multi:softmax",
          "num_class": "10",
          "num_round": "30"
        },
        "AlgorithmSpecification": {
          "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1",
          "TrainingInputMode": "File"
        },
        "OutputDataConfig": {
          "S3OutputPath": "s3://<<YOUR_S3_BUCKET>>/sagemaker/"
        },
        "ResourceConfig": {
          "InstanceCount": 1,
          "InstanceType": "ml.m4.xlarge",
          "VolumeSizeInGB": 5
        },
        "StoppingCondition": {
          "MaxRuntimeInSeconds": 86400
        },
        "InputDataConfig": [
        {
          "ChannelName": "train",
          "DataSource": {
            "S3DataSource": {
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://<<YOUR_S3_BUCKET>>/sagemaker/xgboost/train/",
              "S3DataDistributionType": "FullyReplicated"
            }
          },
          "ContentType": "text/libsvm"
        },
        {
          "ChannelName": "validation",
          "DataSource": {
            "S3DataSource": {
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://<<YOUR_S3_BUCKET>>/sagemaker/xgboost/validation/",
              "S3DataDistributionType": "FullyReplicated"
            }
          },
          "ContentType": "text/libsvm"
        }
      ]
    }
  }
]}
pipelineDisplayName: my-kubernetes-pipeline
roleARN: <<YOUR_SAGEMAKER_ROLE_ARN>>

Submit the pipeline to SageMaker

To submit your ready pipeline specification, apply the specification to your Kubernetes cluster as follows:

kubectl apply -f my-pipeline.yaml

Create and submit a pipeline execution YAML specification

Check with the next Kubernetes YAML specification for a SageMaker pipeline. Put together the pipeline execution YAML specification (pipeline-execution.yaml) as follows:

apiVersion: sagemaker.companies.k8s.aws/v1alpha1
variety: PipelineExecution
metadata:
  identify: my-kubernetes-pipeline-execution
spec:
  parallelismConfiguration:
  	maxParallelExecutionSteps: 2
  pipelineExecutionDescription: "My first pipeline execution by way of Amazon EKS cluster."
  pipelineName: my-kubernetes-pipeline

To begin a run of the pipeline, use the next code:

kubectl apply -f pipeline-execution.yaml

Evaluation and troubleshoot the pipeline run

To checklist all pipelines created utilizing the ACK controller, use the next command:

To checklist all pipeline runs, use the next command:

kubectl get pipelineexecution

To get extra particulars concerning the pipeline after it’s submitted, like checking the standing, errors, or parameters of the pipeline, use the next command:

kubectl describe pipeline my-kubernetes-pipeline

To troubleshoot a pipeline run by reviewing extra particulars concerning the run, use the next command:

kubectl describe pipelineexecution my-kubernetes-pipeline-execution

Clear up

Use the next command to delete any pipelines you created:

Use the next command to cancel any pipeline runs you began:

kubectl delete pipelineexecution

Conclusion

On this put up, we introduced an instance of how ML engineers aware of Jupyter notebooks and SageMaker environments can effectively work with DevOps engineers aware of Kubernetes and associated instruments to design and preserve an ML pipeline with the appropriate infrastructure for his or her group. This allows DevOps engineers to handle all of the steps of the ML lifecycle with the identical set of instruments and atmosphere they’re used to, which allows organizations to innovate sooner and extra effectively.

Discover the GitHub repository for ACK and the SageMaker controller to start out managing your ML operations with Kubernetes.


In regards to the Authors

Pratik Yeole is a Senior Options Architect working with international clients, serving to clients construct value-driven options on AWS. He has experience in MLOps and containers domains. Outdoors of labor, he enjoys time with pals, household, music, and cricket.

Felipe Lopez is a Senior AI/ML Specialist Options Architect at AWS. Previous to becoming a member of AWS, Felipe labored with GE Digital and SLB, the place he centered on modeling and optimization merchandise for industrial purposes.

Leave a Reply

Your email address will not be published. Required fields are marked *