Combine HyperPod clusters with Energetic Listing for seamless multi-user login
Amazon SageMaker HyperPod is purpose-built to speed up basis mannequin (FM) coaching, eradicating the undifferentiated heavy lifting concerned in managing and optimizing a big coaching compute cluster. With SageMaker HyperPod, you possibly can practice FMs for weeks and months with out disruption.
Usually, HyperPod clusters are utilized by a number of customers: machine studying (ML) researchers, software program engineers, knowledge scientists, and cluster directors. They edit their very own recordsdata, run their very own jobs, and need to keep away from impacting one another’s work. To attain this multi-user setting, you possibly can reap the benefits of Linux’s consumer and group mechanism and statically create a number of customers on every occasion by means of lifecycle scripts. The disadvantage to this method, nonetheless, is that consumer and group settings are duplicated throughout a number of cases within the cluster, making it tough to configure them constantly on all cases, equivalent to when a brand new crew member joins.
To resolve this ache level, we will use Lightweight Directory Access Protocol (LDAP) and LDAP over TLS/SSL (LDAPS) to combine with a listing service equivalent to AWS Directory Service for Microsoft Active Directory. With the listing service, you possibly can centrally keep customers and teams, and their permissions.
On this submit, we introduce an answer to combine HyperPod clusters with AWS Managed Microsoft AD, and clarify the best way to obtain a seamless multi-user login setting with a centrally maintained listing.
Resolution overview
The answer makes use of the next AWS companies and assets:
We additionally use AWS CloudFormation to deploy a stack to create the conditions for the HyperPod cluster: VPC, subnets, safety group, and Amazon FSx for Lustre quantity.
The next diagram illustrates the high-level answer structure.
On this answer, HyperPod cluster cases use the LDAPS protocol to connect with the AWS Managed Microsoft AD through an NLB. We use TLS termination by putting in a certificates to the NLB. To configure LDAPS in HyperPod cluster cases, the lifecycle script installs and configures System Security Services Daemon (SSSD)—an open supply consumer software program for LDAP/LDAPS.
Stipulations
This submit assumes you already know the best way to create a primary HyperPod cluster with out SSSD. For extra particulars on the best way to create HyperPod clusters, confer with Getting started with SageMaker HyperPod and the HyperPod workshop.
Additionally, within the setup steps, you’ll use a Linux machine to generate a self-signed certificates and procure an obfuscated password for the AD reader consumer. In the event you don’t have a Linux machine, you possibly can create an EC2 Linux occasion or use AWS CloudShell.
Create a VPC, subnets, and a safety group
Observe the directions within the Own Account part of the HyperPod workshop. You’ll deploy a CloudFormation stack and create prerequisite assets equivalent to VPC, subnets, safety group, and FSx for Lustre quantity. You could create each a main subnet and backup subnet when deploying the CloudFormation stack, as a result of AWS Managed Microsoft AD requires at least two subnets with completely different Availability Zones.
On this submit, for simplicity, we use the identical VPC, subnets, and safety group for each the HyperPod cluster and listing service. If you could use completely different networks between the cluster and listing service, ensure safety teams and route tables are configured in order that they’ll talk one another.
Create AWS Managed Microsoft AD on Listing Service
Full the next steps to arrange your listing:
- On the Directory Service console, select Directories within the navigation pane.
- Select Arrange listing.
- For Listing sort, choose AWS Managed Microsoft AD.
- Select Subsequent.
- For Version, choose Commonplace Version.
- For Listing DNS title, enter your most well-liked listing DNS title (for instance,
hyperpod.abc123.com
). - For Admin password¸ set a password and reserve it for later use.
- Select Subsequent.
- Within the Networking part, specify the VPC and two personal subnets you created.
- Select Subsequent.
- Overview the configuration and pricing, then select Create listing.
The listing creation begins. Wait till the standing adjustments from Creating to Energetic, which might take 20–half-hour. - When the standing adjustments to Energetic, open the element web page of the listing and pay attention to the DNS addresses for later use.
Create an NLB in entrance of Listing Service
To create the NLB, full the next steps:
- On the Amazon EC2 console, select Goal teams within the navigation pane.
- Select Create goal teams.
- Create a goal group with the next parameters:
- For Select a goal sort, choose IP addresses.
- For Goal group title, enter
LDAP
. - For Protocol: Port, select TCP and enter
389
. - For IP handle sort, choose IPv4.
- For VPC, select SageMaker HyperPod VPC (which you created with the CloudFormation template).
- For Well being verify protocol, select TCP.
- Select Subsequent.
- Within the Register targets part, register the listing service’s DNS addresses because the targets.
- For Ports, select Embody as pending under.The addresses are added within the Overview targets part with Pending standing.
- Select Create goal group.
- On the Load Balancers console, select Create load balancer.
- Below Community Load Balancer, select Create.
- Configure an NLB with the next parameters:
- For Load balancer title, enter a reputation (for instance,
nlb-ds
). - For Scheme, choose Inner.
- For IP handle sort, choose IPv4.
- For VPC, select SageMaker HyperPod VPC (which you created with the CloudFormation template).
- Below Mappings, choose the 2 personal subnets and their CIDR ranges (which you created with the CloudFormation template).
- For Safety teams, select
CfStackName-SecurityGroup-XYZXYZ
(which you created with the CloudFormation template).
- For Load balancer title, enter a reputation (for instance,
- Within the Listeners and routing part, specify the next parameters:
- For Protocol, select TCP.
- For Port, enter
389
. - For Default motion, select the goal group named LDAP.
Right here, we’re including a listener for LDAP. We are going to add LDAPS later.
- Select Create load balancer.Wait till the standing adjustments from Provisioning to Energetic, which might take 3–5 minutes.
- When the standing adjustments to Energetic, open the element web page of the provisioned NLB and pay attention to the DNS title (
xyzxyz.elb.region-name.amazonaws.com
) for later use.
Create a self-signed certificates and import it to Certificates Supervisor
To create a self-signed certificates, full the next steps:
- In your Linux-based setting (native laptop computer, EC2 Linux occasion, or CloudShell), run the next OpenSSL instructions to create a self-signed certificates and personal key:
- On the Certificate Manager console, select Import.
- Enter the certificates physique and personal key, from the contents of
ldaps.crt
andldaps.key
respectively. - Select Subsequent.
- Add any elective tags, then select Subsequent.
- Overview the configuration and select Import.
Add an LDAPS listener
We added a listener for LDAP already within the NLB. Now we add a listener for LDAPS with the imported certificates. Full the next steps:
- On the Load Balancers console, navigate to the NLB particulars web page.
- On the Listeners tab, select Add listener.
- Configure the listener with the next parameters:
- For Protocol, select TLS.
- For Port, enter
636
. - For Default motion, select LDAP.
- For Certificates supply, choose From ACM.
- For Certificates, enter what you imported in ACM.
- Select Add.Now the NLB listens to each LDAP and LDAPS. It is strongly recommended to delete the LDAP listener as a result of it transmits knowledge with out encryption, not like LDAPS.
Create an EC2 Home windows occasion to manage customers and teams within the AD
To create and keep customers and teams within the AD, full the next steps:
- On the Amazon EC2 console, select Situations within the navigation pane.
- Select Launch cases.
- For Title, enter a reputation on your occasion.
- For Amazon Machine Picture, select Microsoft Home windows Server 2022 Base.
- For Occasion sort, select t2.micro.
- Within the Community settings part, present the next parameters:
- For VPC, select SageMaker HyperPod VPC (which you created with the CloudFormation template).
- For Subnet, select both of two subnets you created with the CloudFormation template.
- For Widespread safety teams, select
CfStackName-SecurityGroup-XYZXYZ
(which you created with the CloudFormation template).
- For Configure storage, set storage to 30 GB gp2.
- Within the Superior particulars part, for Area be a part of listing¸ select the AD you created.
- For IAM occasion profile, select an AWS Identity and Access Management (IAM) function with at the very least the
AmazonSSMManagedEC2InstanceDefaultPolicy
coverage. - Overview the abstract and select Launch occasion.
Create customers and teams in AD utilizing the EC2 Home windows occasion
With Remote Desktop, hook up with the EC2 Home windows occasion you created within the earlier step. Utilizing an RDP consumer is beneficial over utilizing a browser-based Distant Desktop to be able to trade the contents of the clipboard along with your native machine utilizing copy-paste operations. For extra particulars about connecting to EC2 Home windows cases, confer with Connect to your Windows instance.
If you’re prompted for a login credential, use hyperpodAdmin
(the place hyperpod
is the primary a part of your listing DNS title) because the consumer title, and use the admin password you set to the listing service.
- When the Home windows desktop display screen opens, select Server Supervisor from the Begin menu.
- Select Native Server within the navigation pane, and make sure that the area is what you specified to the listing service.
- On the Handle menu, select Add Roles and Options.
- Select Subsequent till you might be on the Options web page.
- Broaden the characteristic Distant Server Administration Instruments, increase Function Administration Instruments, and choose AD DS and AD LDS Instruments and Energetic Listing Rights Administration Service.
- Select Subsequent and Set up.Characteristic set up begins.
- When the set up is full, select Shut.
- Open Energetic Listing Customers and Computer systems from the Begin menu.
- Below
hyperpod.abc123.com
, increasehyperpod
. - Select (right-click)
hyperpod
, select New, and select Organizational Unit. - Create an organizational unit known as
Teams
. - Select (right-click) Teams, select New, and select Group.
- Create a bunch known as
ClusterAdmin
. - Create a second group known as
ClusterDev
. - Select (right-click) Customers, select New, and select Person.
- Create a brand new consumer.
- Select (right-click) the consumer and select Add to a bunch.
- Add your customers to the teams
ClusterAdmin
orClusterDev
.Customers added to theClusterAdmin
group can havesudo
privilege on the cluster.
Create a ReadOnly consumer in AD
Create a consumer known as ReadOnly
underneath Customers
. The ReadOnly
consumer is utilized by the cluster to programmatically entry customers and teams in AD.
Be aware of the password for later use.
(For SSH public key authentication) Add SSH public keys to customers
By storing an SSH public key to a consumer in AD, you possibly can log in with out coming into a password. You need to use an present key pair, or you possibly can create a brand new key pair with OpenSSH’s ssh-keygen
command. For extra details about producing a key pair, confer with Create a key pair for your Amazon EC2 instance.
- In Energetic Listing Customers and Computer systems, on the View menu, allow Superior Options.
- Open the Properties dialog of the consumer.
- On the Attribute Editor tab, select
altSecurityIdentities
select Edit. - For Worth so as to add, select Add.
- For Values, add an SSH public key.
- Select OK.Affirm that the SSH public key seems as an attribute.
Get an obfuscated password for the ReadOnly consumer
To keep away from together with a plain textual content password within the SSSD configuration file, you obfuscate the password. For this step, you want a Linux setting (native laptop computer, EC2 Linux occasion, or CloudShell).
Set up the sssd-tools
bundle on the Linux machine to put in the Python module pysss
for obfuscation:
Run the next one-line Python script. Enter the password of the ReadOnly
consumer. You’re going to get the obfuscated password.
Create a HyperPod cluster with an SSSD-enabled lifecycle script
Subsequent, you create a HyperPod cluster with LDAPS/Energetic Listing integration.
- Discover the configuration file
config.py
in your lifecycle script listing, open it along with your textual content editor, and edit the properties within theConfig
class andSssdConfig
class:- Set
True
forenable_sssd
to allow establishing SSSD. - The
SssdConfig
class accommodates configuration parameters for SSSD. - Be sure to use the obfuscated password for the
ldap_default_authtok
property, not a plain textual content password.
- Set
- Copy the certificates file
ldaps.crt
to the identical listing (the placeconfig.py
exists). - Add the modified lifecycle script recordsdata to your Amazon Simple Storage Service (Amazon S3) bucket, and create a HyperPod cluster with it.
- Wait till the standing adjustments to InService.
Verification
Let’s confirm the answer by logging in to the cluster with SSH. As a result of the cluster was created in a personal subnet, you possibly can’t immediately SSH into the cluster out of your native setting. You’ll be able to select from two choices to connect with the cluster.
Choice 1: SSH login by means of AWS Programs Supervisor
You need to use AWS Systems Manager as a proxy for the SSH connection. Add a number entry to the SSH configuration file ~/.ssh/config
utilizing the next instance. For the HostName
area, specify the Programs Manger goal title within the format of sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id]
. For the IdentityFile
area, specify the file path to the consumer’s SSH personal key. This area is just not required when you selected password authentication.
Run the ssh
command utilizing the host title you specified. Affirm you possibly can log in to the occasion with the desired consumer.
At this level, customers can nonetheless use the Programs Supervisor default shell session to log in to the cluster as ssm-user
with administrative privileges. To dam the default Programs Supervisor shell entry and implement SSH entry, you possibly can configure your IAM coverage by referring to the next instance:
For extra particulars on the best way to implement SSH entry, confer with Start a session with a document by specifying the session documents in IAM policies.
Choice 2: SSH login by means of bastion host
One other choice to entry the cluster is to make use of a bastion host as a proxy. You need to use this feature when the consumer doesn’t have permission to make use of Programs Supervisor periods, or to troubleshoot when Programs Supervisor is just not working.
- Create a bastion safety group that permits inbound SSH entry (TCP port 22) out of your native setting.
- Replace the safety group for the cluster to permit inbound SSH entry from the bastion safety group.
- Create an EC2 Linux occasion.
- For Amazon Machine Picture, select Ubuntu Server 20.04 LTS.
- For Occasion sort, select t3.small.
- Within the Community settings part, present the next parameters:
- For VPC, select SageMaker HyperPod VPC (which you created with the CloudFormation template).
- For Subnet, select the general public subnet you created with the CloudFormation template.
- For Widespread safety teams, select the bastion safety group you created.
- For Configure storage, set storage to eight GB.
- Establish the general public IP handle of the bastion host and the personal IP handle of the goal occasion (for instance, the login node of the cluster), and add two host entries within the SSH config, by referring to the next instance:
- Run the
ssh
command utilizing the goal host title you specified earlier, and make sure you possibly can log in to the occasion with the desired consumer:
Clear up
Clear up the assets within the following order:
- Delete the HyperPod cluster.
- Delete the Community Load Balancer.
- Delete the load balancing goal group.
- Delete the certificates imported to Certificates Supervisor.
- Delete the EC2 Home windows occasion.
- Delete the EC2 Linux occasion for the bastion host.
- Delete the AWS Managed Microsoft AD.
- Delete the CloudFormation stack for the VPC, subnets, safety group, and FSx for Lustre quantity.
Conclusion
This submit supplied steps to create a HyperPod cluster built-in with Energetic Listing. This answer removes the trouble of consumer upkeep on large-scale clusters and lets you handle customers and teams centrally in a single place.
For extra details about HyperPod, take a look at the HyperPod workshop and the SageMaker HyperPod Developer Guide. Go away your suggestions on this answer within the feedback part.
In regards to the Authors
Tomonori Shimomura is a Senior Options Architect on the Amazon SageMaker crew, the place he gives in-depth technical session to SageMaker prospects and suggests product enhancements to the product crew. Earlier than becoming a member of Amazon, he labored on the design and improvement of embedded software program for online game consoles, and now he leverages his in-depth abilities in Cloud facet expertise. In his free time, he enjoys enjoying video video games, studying books, and writing software program.
Giuseppe Angelo Porcelli is a Principal Machine Studying Specialist Options Architect for Amazon Net Companies. With a number of years software program engineering and an ML background, he works with prospects of any measurement to know their enterprise and technical wants and design AI and ML options that make one of the best use of the AWS Cloud and the Amazon Machine Studying stack. He has labored on tasks in several domains, together with MLOps, pc imaginative and prescient, and NLP, involving a broad set of AWS companies. In his free time, Giuseppe enjoys enjoying soccer.
Monidipa Chakraborty at the moment serves as a Senior Software program Improvement Engineer at Amazon Net Companies (AWS), particularly throughout the SageMaker HyperPod crew. She is dedicated to helping prospects by designing and implementing sturdy and scalable programs that exhibit operational excellence. Bringing almost a decade of software program improvement expertise, Monidipa has contributed to numerous sectors inside Amazon, together with Video, Retail, Amazon Go, and AWS SageMaker.
Satish Pasumarthi is a Software program Developer at Amazon Net Companies. With a number of years of software program engineering and an ML background, he likes to bridge the hole between the ML and programs and is passionate to construct programs that make massive scale mannequin coaching attainable. He has labored on tasks in a wide range of domains, together with Machine Studying frameworks, mannequin benchmarking, constructing hyperpod beta involving a broad set of AWS companies. In his free time, Satish enjoys enjoying badminton.