Configure cross-account entry of Amazon Redshift clusters in Amazon SageMaker Studio utilizing VPC peering
With cloud computing, as compute energy and knowledge grew to become extra accessible, machine studying (ML) is now making an impression throughout each business and is a core a part of each enterprise and business.
Amazon SageMaker Studio is the primary totally built-in ML growth atmosphere (IDE) with a web-based visible interface. You’ll be able to carry out all ML growth steps and have full entry, management, and visibility into every step required to construct, practice, and deploy fashions.
Amazon Redshift is a completely managed, quick, safe, and scalable cloud knowledge warehouse. Organizations typically need to use SageMaker Studio to get predictions from knowledge saved in an information warehouse akin to Amazon Redshift.
As described within the AWS Well-Architected Framework, separating workloads throughout accounts allows your group to set frequent guardrails whereas isolating environments. This may be significantly helpful for sure safety necessities, in addition to to simplify value controls and monitoring between initiatives and groups. Organizations with a multi-account structure sometimes have Amazon Redshift and SageMaker Studio in two separate AWS accounts. Additionally, Amazon Redshift and SageMaker Studio are sometimes configured in VPCs with personal subnets to enhance safety and scale back the danger of unauthorized entry as a greatest observe.
Amazon Redshift natively supports cross-account knowledge sharing when RA3 node sorts are used. Should you’re utilizing another Amazon Redshift node sorts, akin to DS2 or DC2, you should use VPC peering to determine a cross-account connection between Amazon Redshift and SageMaker Studio.
On this submit, we stroll via step-by-step directions to determine a cross-account connection to any Amazon Redshift node sort (RA3, DC2, DS2) by connecting the Amazon Redshift cluster positioned in a single AWS account to SageMaker Studio in one other AWS account in the identical Area utilizing VPC peering.
Answer overview
We begin with two AWS accounts: a producer account with the Amazon Redshift knowledge warehouse, and a shopper account for Amazon SageMaker ML use instances that has SageMaker Studio arrange. The next is a high-level overview of the workflow:
- Arrange SageMaker Studio with
VPCOnly
mode within the shopper account. This prevents SageMaker from offering web entry to your studio notebooks. All SageMaker Studio site visitors is thru the required VPC and subnets. - Replace your SageMaker Studio area to activate
SourceIdentity
to propagate the person profile identify. - Create an AWS Identity and Access Management (IAM) function within the Amazon Redshift producer account that the SageMaker Studio IAM function will assume to entry Amazon Redshift.
- Replace the SageMaker IAM execution function within the SageMaker Studio shopper account that SageMaker Studio will use to imagine the function within the producer Amazon Redshift account.
- Arrange a peering connection between VPCs within the Amazon Redshift producer account and SageMaker Studio shopper account.
- Question Amazon Redshift in SageMaker Studio within the shopper account.
The next diagram illustrates our answer structure.
Stipulations
The steps on this submit assume that Amazon Redshift is launched in a personal subnet within the Amazon Redshift producer account. Launching Amazon Redshift in a personal subnet gives an extra layer of safety and isolation in comparison with launching it in a public subnet as a result of the personal subnet is just not instantly accessible from the web and safer from exterior assaults.
To obtain public libraries, it’s essential to create a VPC and a personal and public subnet within the SageMaker shopper account. Then launch a NAT gateway within the public subnet and add an web gateway for SageMaker Studio within the personal subnet to entry the web. For directions on how you can set up a connection to a personal subnet, consult with How do I set up a NAT gateway for a private subnet in Amazon VPC?
Arrange SageMaker Studio with VPCOnly mode within the shopper account
To create SageMaker Studio with VPCOnly
mode, full the next steps:
- On the SageMaker console, select Studio within the navigation pane.
- Launch SageMaker Studio, select Customary setup, and select Configure.
Should you’re already utilizing AWS IAM Identity Center (successor to AWS Single Sign-On) for accessing your AWS accounts, you should use it for authentication. In any other case, you should use IAM for authentication and use your current federated roles.
- Within the Common settings part, choose Create a brand new function.
- Within the Create an IAM function part, optionally specify your Amazon Simple Storage Service (Amazon S3) buckets by deciding on Any, Particular, or None, then select Create function.
This creates a SageMaker execution function, akin to AmazonSageMaker-ExecutionRole-00000000
.
- Below Community and Storage Part, select your VPC, subnet (personal subnet), and safety group that you just created as a prerequisite.
- Choose VPC Solely, then select Subsequent.
Replace your SageMaker Studio area to activate SourceIdentity to propagate the person profile identify
SageMaker Studio is built-in with AWS CloudTrail to allow directors to observe and audit person exercise and API calls from SageMaker Studio notebooks. You’ll be able to configure SageMaker Studio to file the person identification (particularly, the user profile name) to observe and audit person exercise and API calls from SageMaker Studio notebooks in CloudTrail occasions.
To log particular person exercise amongst a number of person profiles, we beneficial that you just activate SourceIdentity
to propagate the SageMaker Studio area with the person profile identify. This lets you persist the person data into the session so you possibly can attribute actions to a selected person. This attribute can also be continued over whenever you chain roles, so you will get fine-grained visibility into their actions within the producer account. As of the time this submit was written, you possibly can solely configure this utilizing the AWS Command Line Interface (AWS CLI) or any command line device.
To replace this configuration, all apps within the area have to be within the Stopped or Deleted state.
Use the next code to allow the propagation of the person profile identify because the SourceIdentity
:
This requires that you just add sts:SetSourceIdentity
within the belief relationship to your execution function.
Create an IAM function within the Amazon Redshift producer account that SageMaker Studio should assume to entry Amazon Redshift
To create a task that SageMaker will assume to entry Amazon Redshift, full the next steps:
- Open the IAM console within the Amazon Redshift producer account.
- Select Roles within the navigation pane, then select Create function.
- On the Choose trusted entity web page, choose Customized belief coverage.
- Enter the next customized belief coverage into the editor and supply your SageMaker shopper account ID and the SageMaker execution function that you just created:
- Select Subsequent.
- On the Add required permissions web page, select Create coverage.
- Add the next pattern coverage and make needed edits primarily based in your configuration.
- Save the coverage by including a reputation, akin to
RedshiftROAPIUserAccess
.
The SourceIdentity
attribute is used to tie the identification of the unique SageMaker Studio person to the Amazon Redshift database person. The actions by the person within the producer account can then be monitored utilizing CloudTrail and Amazon Redshift database audit logs.
- On the Title, evaluation, and create web page, enter a task identify, evaluation the settings, and select Create function.
Replace the IAM function within the SageMaker shopper account that SageMaker Studio assumes within the Amazon Redshift producer account
To replace the SageMaker execution function for it to imagine the function that we simply created, full the next steps:
- Open the IAM console within the SageMaker shopper account.
- Select Roles within the navigation pane, then select the SageMaker execution function that we created (
AmazonSageMaker-ExecutionRole-*
). - Within the Permissions coverage part, on the Add permissions menu, select Create inline coverage.
- Within the editor, on the JSON tab, enter the next coverage, the place <StudioRedshiftRoleARN> is the ARN of the function you created within the Amazon Redshift producer account:
You may get the ARN of the function created within the Amazon Redshift producer account on the IAM console, as proven within the following screenshot.
- Select Assessment coverage.
- For Title, enter a reputation to your coverage.
- Select Create coverage.
Your permission insurance policies ought to look just like the next screenshot.
Arrange a peering connection between the VPCs within the Amazon Redshift producer account and SageMaker Studio shopper account
To determine communication between the SageMaker Studio VPC and Amazon Redshift VPC, the 2 VPCs have to be peered utilizing VPC peering. Full the next steps to determine a connection:
- In both the Amazon Redshift or SageMaker account, open the Amazon VPC console.
- Within the navigation pane, select Peering connections, then select Create peering connection.
- For Title, enter a reputation to your connection.
- Below Choose a neighborhood VPC to look with, select a neighborhood VPC.
- Below Choose one other VPC to look with, specify one other VPC in the identical Area and one other account.
- Select Create peering connection.
- Assessment the VPC peering connection and select Settle for request to activate.
After the VPC peering connection is efficiently established, you create routes on each the SageMaker and Amazon Redshift VPCs to finish connectivity between them.
- Within the SageMaker account, open the Amazon VPC console.
- Select Route tables within the navigation pane, then select the VPC that’s related to SageMaker and edit the routes.
- Add CIDR for the vacation spot Amazon Redshift VPC and the goal because the peering connection.
- Moreover, add a NAT gateway.
- Select Save modifications.
- Within the Amazon Redshift account, open the Amazon VPC console.
- Select Route tables within the navigation pane, then select the VPC that’s related to Amazon Redshift and edit the routes.
- Add CIDR for the vacation spot SageMaker VPC and the goal because the peering connection.
- Moreover, add an web gateway.
- Select Save modifications.
You’ll be able to hook up with SageMaker Studio out of your VPC via an interface endpoint in your VPC as an alternative of connecting over the web. Once you use a VPC interface endpoint, communication between your VPC and the SageMaker API or runtime is performed completely and securely inside the AWS community.
- To create a VPC endpoint, within the SageMaker account, open the VPC console.
- Select Endpoints within the navigation pane, then select Create endpoint.
- Specify the SageMaker VPC, the respective subnets and applicable safety teams to permit inbound and outbound NFS site visitors to your SageMaker notebooks area, and select Create VPC endpoint.
Question Amazon Redshift in SageMaker Studio within the shopper account
After all of the networking has been efficiently established, observe the steps on this part to connect with the Amazon Redshift cluster within the SageMaker Studio shopper account utilizing the AWS SDK for pandas library:
- In SageMaker Studio, create a brand new pocket book.
- If the AWS SDK for pandas package deal is just not put in you possibly can set up it utilizing the next:
This set up is just not persistent and might be misplaced if the KernelGateway App is deleted. Customized packages may be added as a part of a Lifecycle Configuration.
- Enter the next code within the first cell and run the code. Exchange
RoleArn
andregion_name
values primarily based in your account settings:
- Enter the next code in a brand new cell and run the code to get the present SageMaker person profile identify:
- Enter the next code in a brand new cell and run the code:
To efficiently question Amazon Redshift, your database administrator must assign the newly created person with the required learn permissions inside the Amazon Redshift cluster within the producer account.
- Enter the next code in a brand new cell, replace the question to match your Amazon Redshift desk, and run the cell. This could return the information efficiently for additional knowledge processing and evaluation.
Now you can begin constructing your knowledge transformations and evaluation primarily based on your corporation necessities.
Clear up
To wash up any sources to keep away from incurring recurring prices, delete the SageMaker VPC endpoints, Amazon Redshift cluster, and SageMaker Studio apps, customers, and area. Additionally delete any S3 buckets and objects you created.
Conclusion
On this submit, we confirmed how you can set up a cross-account connection between personal Amazon Redshift and SageMaker Studio VPCs in numerous accounts utilizing VPC peering and entry Amazon Redshift knowledge in SageMaker Studio utilizing IAM function chaining, whereas additionally logging the person identification when the person accessed Amazon Redshift from SageMaker Studio. With this answer, you remove the necessity to manually transfer knowledge between accounts to entry knowledge. We additionally walked via how you can entry the Amazon Redshift cluster utilizing the AWS SDK for pandas library in SageMaker Studio and put together the info to your ML use instances.
To be taught extra about Amazon Redshift and SageMaker, consult with the Amazon Redshift Database Developer Guide and Amazon SageMaker Documentation.
In regards to the Authors
Supriya Puragundla is a Senior Options Architect at AWS. She helps key buyer accounts on their AI and ML journey. She is enthusiastic about data-driven AI and the realm of depth in machine studying.
Marc Karp is a Machine Studying Architect with the Amazon SageMaker crew. He focuses on serving to prospects design, deploy, and handle ML workloads at scale. In his spare time, he enjoys touring and exploring new locations.