Configure and use defaults for Amazon SageMaker sources with the SageMaker Python SDK


The Amazon SageMaker Python SDK is an open-source library for coaching and deploying machine studying (ML) fashions on Amazon SageMaker. Enterprise clients in tightly managed industries equivalent to healthcare and finance arrange safety guardrails to make sure their information is encrypted and visitors doesn’t traverse the web. To make sure the SageMaker coaching and deployment of ML fashions comply with these guardrails, it’s a standard follow to set restrictions on the account or AWS Organizations degree via service management insurance policies and AWS Identity and Access Management (IAM) insurance policies to implement the utilization of particular IAM roles, Amazon Virtual Private Cloud (Amazon VPC) configurations, and AWS Key Management Service (AWS KMS) keys. In such circumstances, information scientists have to supply these parameters to their ML mannequin coaching and deployment code manually, by noting down subnets, safety teams, and KMS keys. This places the onus on the info scientists to recollect to specify these configurations, to efficiently run their jobs, and keep away from getting Entry Denied errors.

Beginning with SageMaker Python SDK model 2.148.0, now you can configure default values for parameters equivalent to IAM roles, VPCs, and KMS keys. Directors and end-users can initialize AWS infrastructure primitives with defaults laid out in a configuration file in YAML format. As soon as configured, the Python SDK routinely inherits these values and propagates them to the underlying SageMaker API calls equivalent to CreateProcessingJob(), CreateTrainingJob(), and CreateEndpointConfig(), with no further actions wanted. The SDK additionally helps a number of configuration information, permitting admins to set a configuration file for all customers, and customers can override it by way of a user-level configuration that may be saved in Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) for Amazon SageMaker Studio, or the person’s native file system.

On this submit, we present you learn how to create and retailer the default configuration file in Studio and use the SDK defaults function to create your SageMaker sources.

Answer overview

We show this new function with an end-to-end AWS CloudFormation template that creates the required infrastructure, and creates a Studio area within the deployed VPC. As well as, we create KMS keys for encrypting the volumes utilized in coaching and processing jobs. The steps are as follows:

  1. Launch the CloudFormation stack in your account. Alternatively, if you wish to discover this function on an present SageMaker area or pocket book, skip this step.
  2. Populate the config.yaml file and save the file within the default location.
  3. Run a pattern pocket book with an end-to-end ML use case, together with information processing, mannequin coaching, and inference.
  4. Override the default configuration values.

Stipulations

Earlier than you get began, ensure you have an AWS account and an IAM person or position with administrator privileges. In case you are a knowledge scientist presently passing infrastructure parameters to sources in your pocket book, you’ll be able to skip the subsequent step of establishing your surroundings and begin creating the configuration file.

To make use of this function, be certain to improve your SageMaker SDK model by working pip set up --upgrade sagemaker.

Arrange the surroundings

To deploy a whole infrastructure together with networking and a Studio area, full the next steps:

  1. Clone the GitHub repository.
  2. Log in to your AWS account and open the AWS CloudFormation console.
  3. To deploy the networking sources, select Create stack.
  4. Add the template below setup/vpc_mode/01_networking.yaml.
  5. Present a reputation for the stack (for instance, networking-stack), and full the remaining steps to create the stack.
  6. To deploy the Studio area, select Create stack once more.
  7. Add the template below setup/vpc_mode/02_sagemaker_studio.yaml.
  8. Present a reputation for the stack (for instance, sagemaker-stack), and supply the title of the networking stack when prompted for the CoreNetworkingStackName parameter.
  9. Proceed with the remaining steps, choose the acknowledgements for IAM sources, and create the stack.

When the standing of each stacks replace to CREATE_COMPLETE, proceed to the subsequent step.

Create the configuration file

To make use of the default configuration for the SageMaker Python SDK, you create a config.yaml file within the format that the SDK expects. For the format for the config.yaml file, confer with Configuration file structure. Relying in your work surroundings, equivalent to Studio notebooks, SageMaker pocket book situations, or your native IDE, you’ll be able to both save the configuration file on the default location or override the defaults by passing a config file location. For the default areas for different environments, confer with Configuration file locations. The next steps showcase the setup for a Studio pocket book surroundings.

To simply create the config.yaml file, run the next cells in your Studio system terminal, changing the placeholders with the CloudFormation stack names from the earlier step:

git clone https://github.com/aws-samples/amazon-sagemaker-build-train-deploy.git
cd amazon-sagemaker-build-train-deploy
pip set up boto3
python generate-defaults.py --networking-stack <network-stack-name> 
--sagemaker-stack <sagemaker-stack-name>

# save the file to the default location
mkdir .config/sagemaker
cp user-configs.yaml ~/.config/sagemaker/config.yaml

This script routinely populates the YAML file, changing the placeholders with the infrastructure defaults, and saves the file within the dwelling folder. Then it copies the file into the default location for Studio notebooks. The ensuing config file ought to look much like the next format:

SageMaker:
  Mannequin:
    EnableNetworkIsolation: false
    VpcConfig:
      SecurityGroupIds:
      - sg-xxxx
      Subnets:
      - subnet-xxxx
      - subnet-xxxx
  ProcessingJob:
    NetworkConfig:
      EnableNetworkIsolation: false
      VpcConfig:
        SecurityGroupIds:
        - sg-xxxx
        Subnets:
        - subnet-xxxx
        - subnet-xxxx
    ProcessingOutputConfig:
      KmsKeyId: arn:aws:kms:us-east-2:0123456789:alias/kms-defaults
    RoleArn: arn:aws:iam::0123456789:position/service-role/AmazonSageMakerExecutionRole-xxx
  TrainingJob:
    EnableNetworkIsolation: false
    VpcConfig:
      SecurityGroupIds:
      - sg-xxxx
      Subnets:
      - subnet-xxxx
      - subnet-xxxx
SchemaVersion: '1.0'
one thing: '1.0'

If in case you have an present area and networking configuration arrange, create the config.yaml file within the required format and put it aside within the default location for Studio notebooks.

Be aware that these defaults merely auto-populate the configuration values for the suitable SageMaker SDK calls, and don’t implement the person to any particular VPC, subnet, or position. As an administrator, in order for you your customers to make use of a particular configuration or position, use IAM condition keys to implement the default values.

Moreover, every API name can have its personal configurations. For instance, within the previous config file pattern, you’ll be able to specify vpc-a and subnet-a for coaching jobs, and specify vpc-b and subnet-c, subnet-d for processing jobs.

Run a pattern pocket book

Now that you’ve got set the configuration file, you can begin working your mannequin constructing and coaching notebooks as ordinary, with out the necessity to explicitly set networking and encryption parameters, for many SDK capabilities. See Supported APIs and parameters for a whole checklist of supported API calls and parameters.

In Studio, select the File Explorer icon within the navigation pane and open 03_feature_engineering/03_feature_engineering.ipynb, as proven within the following screenshot.

studio-file-explorer

Run the pocket book cells one after the other, and spot that you’re not specifying any further configuration. Whenever you create the processor object, you will notice the cell outputs like the next instance.

configs-applied

As you’ll be able to see within the output, the default configuration is routinely utilized to the processing job, with no need any further enter from the person.

Whenever you run the subsequent cell to run the processor, it’s also possible to confirm the defaults are set by viewing the job on the SageMaker console. Select Processing jobs below Processing within the navigation pane, as proven within the following screenshot.

console-processing-jobs

Select the processing job with the prefix end-to-end-ml-sm-proc, and you must be capable of view the networking and encryption already configured.

console-job-configs

You may proceed working the remaining notebooks to coach and deploy the mannequin, and you’ll discover that the infrastructure defaults are routinely utilized for each coaching jobs and fashions.

Override the default configuration file

There could possibly be circumstances the place a person must override the default configuration, for instance, to experiment with public web entry, or replace the networking configuration if the subnet runs out of IP addresses. In such circumstances, the Python SDK additionally means that you can present a customized location for the configuration file, both on native storage, or you’ll be able to level to a location in Amazon S3. On this part, we discover an instance.

Open the user-configs.yaml file on your own home listing and replace the EnableNetworkIsolation worth to True, below the TrainingJob part.

Now, open the identical pocket book, and add the next cell to the start of the pocket book:

import os
os.environ["SAGEMAKER_USER_CONFIG_OVERRIDE"] = "~/config.yaml"

With this cell, you level the placement of the config file to the SDK. Now, once you create the processor object, you’ll discover that the default config has been overridden to allow community isolation, and the processing job will fail in community isolation mode.

You should use the identical override surroundings variable to set the placement of the configuration file for those who’re utilizing your native surroundings equivalent to VSCode.

Debug and retrieve defaults

For fast troubleshooting for those who run into any errors when working API calls out of your pocket book, the cell output shows the utilized default configurations as proven within the earlier part. To view the precise Boto3 name created to view the attribute values handed from default config file, you’ll be able to debug by turning on Boto3 logging. To activate logging, run the next cell on the prime of the pocket book:

import boto3
import logging
boto3.set_stream_logger(title="botocore.endpoint", degree=logging.DEBUG)

Any subsequent Boto3 calls will likely be logged with the whole request, seen below the physique part within the log.

You too can view the gathering of default configurations utilizing the session.sagemaker_config worth as proven within the following instance.

session-config-values

Lastly, for those who’re utilizing Boto3 to create your SageMaker sources, you’ll be able to retrieve the default configuration values utilizing the sagemaker_config variable. For instance, to run the processing job in 03_feature_engineering.ipynb utilizing Boto3, you’ll be able to enter the contents of the next cell in the identical pocket book and run the cell:

import boto3
import sagemaker
session = sagemaker.Session()
consumer = boto3.consumer('sagemaker')

# get the default values
subnet_ids = session.sagemaker_config["SageMaker"]["ProcessingJob"]['NetworkConfig']["VpcConfig"]["Subnets"]
security_groups = session.sagemaker_config["SageMaker"]["ProcessingJob"]['NetworkConfig']["VpcConfig"]["SecurityGroupIds"]
kms_key = session.sagemaker_config["SageMaker"]["ProcessingJob"]["ProcessingOutputConfig"]["KmsKeyId"]
role_arn = session.sagemaker_config["SageMaker"]["ProcessingJob"]["RoleArn"]

# add processing code
code_location = sagemaker_session.upload_data('./source_dir/preprocessor.py', 
                              bucket=s3_bucket_name, 
                              key_prefix=f'{s3_key_prefix}/code')
code_location = ('/').be a part of(code_location.cut up('/')[:-1])

# create a processing job
response = consumer.create_processing_job(
    ProcessingJobName="end-to-end-ml-sm-proc-boto3",
    ProcessingInputs=[
        {
            'InputName': 'raw_data',
            "S3Input": {
                "S3Uri": raw_data_path,
                "LocalPath": "/opt/ml/processing/input",
                "S3DataType": "S3Prefix",
                "S3InputMode": "File",
            }
        },
        {
            "InputName": "code",
            "S3Input": {
                "S3Uri": code_location,
                "LocalPath": "/opt/ml/processing/input/code",
                "S3DataType": "S3Prefix",
                "S3InputMode": "File",
            }
        }
    ],
    ProcessingOutputConfig={
        'Outputs': [
            {
                'OutputName': 'train_data',
                'S3Output': {
                    'S3Uri': train_data_path,
                    'LocalPath': "/opt/ml/processing/train",
                    'S3UploadMode': 'EndOfJob'
                },
            },
            {
                'OutputName': 'val_data',
                'S3Output': {
                    'S3Uri': val_data_path,
                    'LocalPath': "/opt/ml/processing/val",
                    'S3UploadMode': 'EndOfJob'
                },
            },
            {
                'OutputName': 'test_data',
                'S3Output': {
                    'S3Uri': test_data_path,
                    'LocalPath': "/opt/ml/processing/test",
                    'S3UploadMode': 'EndOfJob'
                },
            },
            {
                'OutputName': 'model',
                'S3Output': {
                    'S3Uri': model_path,
                    'LocalPath': "/opt/ml/processing/model",
                    'S3UploadMode': 'EndOfJob'
                },
            },
        ],
        'KmsKeyId': kms_key
    },
    ProcessingResources={
        'ClusterConfig': {
            'InstanceCount': 1,
            'InstanceType': 'ml.m5.massive',
            'VolumeSizeInGB': 30,
        }
    },
    AppSpecification={
        "ImageUri": "257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3",
        "ContainerArguments": [
            "--train-test-split-ratio",
            "0.2"
        ],
        "ContainerEntrypoint": [
            "python3",
            "/opt/ml/processing/input/code/preprocessor.py"
        ]
    },
    NetworkConfig={
        'EnableNetworkIsolation': False,
        'VpcConfig': {
            'SecurityGroupIds': security_groups,
            'Subnets': subnet_ids
        }
    },
    RoleArn=role_arn,
)

Automate config file creation

For directors, having to create the config file and save the file to every SageMaker pocket book occasion or Studio person profile is usually a daunting process. Though you’ll be able to suggest that customers use a standard file saved in a default S3 location, it places the extra overhead of specifying the override on the info scientists.

To automate this, directors can use SageMaker Lifecycle Configurations (LCC). For Studio person profiles or pocket book situations, you’ll be able to connect the next pattern LCC script as a default LCC for the person’s default Jupyter Server app:

# pattern LCC script to set default config information on the person's folder
#!/bin/bash

set -eux

# add --endpoint-url [S3 Interface Endpoint] if utilizing an S3 interface endpoint
aws s3 cp <s3-location-of-config-file> ~/.config/sagemaker/config.yaml
echo "config file saved within the default location"

See Use Lifecycle Configurations for Amazon SageMaker Studio or Customize a Notebook Instance for directions on creating and setting a default lifecycle script.

Clear up

Whenever you’re performed experimenting with this function, clear up your sources to keep away from paying further prices. If in case you have provisioned new sources as specified on this submit, full the next steps to wash up your sources:

  1. Shut down your Studio apps for the person profile. See Shut Down and Update SageMaker Studio and Studio Apps for directions. Be sure that all apps are deleted earlier than deleting the stack.
  2. Delete the EFS quantity created for the Studio area. You may view the EFS quantity connected with the area by utilizing a DescribeDomain API name.
  3. Delete the Studio area stack.
  4. Delete the safety teams created for the Studio area. You will discover them on the Amazon Elastic Compute Cloud (Amazon EC2) console, with the names security-group-for-inbound-nfs-d-xxx and security-group-for-outbound-nfs-d-xxx
  5. Delete the networking stack.

Conclusion

On this submit, we mentioned configuring and utilizing default values for key infrastructure parameters utilizing the SageMaker Python SDK. This permits directors to set default configurations for information scientists, thereby saving time for customers and admins, eliminating the burden of repetitively specifying parameters, and leading to leaner and extra manageable code. For the complete checklist of supported parameters and APIs, see Configuring and using defaults with the SageMaker Python SDK. For any questions and discussions, be a part of the Machine Learning & AI community.


Concerning the Authors

Giuseppe Angelo Porcelli is a Principal Machine Studying Specialist Options Architect for Amazon Internet Companies. With a number of years software program engineering an ML background, he works with clients of any measurement to deeply perceive their enterprise and technical wants and design AI and Machine Studying options that make one of the best use of the AWS Cloud and the Amazon Machine Studying stack. He has labored on initiatives in numerous domains, together with MLOps, Pc Imaginative and prescient, NLP, and involving a broad set of AWS companies. In his free time, Giuseppe enjoys enjoying soccer.

Bruno Pistone is an AI/ML Specialist Options Architect for AWS based mostly in Milan. He works with clients of any measurement on serving to them to deeply perceive their technical wants and design AI and Machine Studying options that make one of the best use of the AWS Cloud and the Amazon Machine Studying stack. His discipline of experience are Machine Studying finish to finish, Machine Studying Industrialization and MLOps. He enjoys spending time along with his buddies and exploring new locations, in addition to travelling to new locations.

Durga Sury is an ML Options Architect on the Amazon SageMaker Service SA group. She is captivated with making machine studying accessible to everybody. In her 4 years at AWS, she has helped arrange AI/ML platforms for enterprise clients. When she isn’t working, she loves bike rides, thriller novels, and lengthy walks along with her 5-year-old husky.

Leave a Reply

Your email address will not be published. Required fields are marked *