Use the Amazon SageMaker and Salesforce Information Cloud integration to energy your Salesforce apps with AI/ML

This submit is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI.

That is the second submit in a collection discussing the combination of Salesforce Information Cloud and Amazon SageMaker. In Part 1, we present how the Salesforce Information Cloud and Einstein Studio integration with SageMaker permits companies to entry their Salesforce information securely utilizing SageMaker and use its instruments to construct, prepare, and deploy fashions to endpoints hosted on SageMaker. The endpoints are then registered to the Salesforce Information Cloud to activate predictions in Salesforce.

On this submit, we increase on this subject to show tips on how to use Einstein Studio for product suggestions. You should utilize this integration for conventional fashions in addition to massive language fashions (LLMs).

Answer overview

On this submit, we show tips on how to create a predictive mannequin in SageMaker to suggest the following finest product to your clients by utilizing historic information similar to buyer demographics, advertising and marketing engagements, and buy historical past from Salesforce Information Cloud.

We use the next sample dataset. To make use of this dataset in your Information Cloud, confer with Create Amazon S3 Data Stream in Data Cloud.

The next attributes are wanted to create the mannequin:

  • Membership Member – If the client is a membership member
  • Marketing campaign – The marketing campaign the client is part of
  • State – The state or province the client resides in
  • Month – The month of buy
  • Case Rely – The variety of circumstances raised by the client
  • Case Kind Return – Whether or not the client returned any product inside the final 12 months
  • Case Kind Cargo Broken – Whether or not the client had any shipments broken within the final 12 months
  • Engagement Rating – The extent of engagement the client has (response to mailing campaigns, logins to the web retailer, and so forth)
  • Tenure – The tenure of the client relationship with the corporate
  • Clicks – The common variety of clicks the client has made inside per week prior to buy
  • Pages Visited – The common variety of pages the client has visited inside per week prior to buy
  • Product Bought – The precise product bought
  • Id – The ID of the file
  • DateTime – The timestamp of the dataset

The product suggestion mannequin is constructed and deployed on SageMaker and is skilled utilizing information within the Salesforce Information Cloud. The next steps give an outline of tips on how to use the brand new capabilities launched in SageMaker for Salesforce to allow the general integration:

  1. Arrange the Amazon SageMaker Studio area and OAuth between Salesforce and the AWS accounts.
  2. Use the newly launched functionality of the Amazon SageMaker Data Wrangler connector for Salesforce Information Cloud to arrange the information in SageMaker with out copying the information from Salesforce Information Cloud.
  3. Prepare a suggestion mannequin in SageMaker Studio utilizing coaching information that was ready utilizing SageMaker Information Wrangler.
  4. Package deal the SageMaker Information Wrangler container and the skilled suggestion mannequin container in an inference pipeline so the inference request can use the identical information preparation steps you created to preprocess the coaching information. The actual-time inference name information is first handed to the SageMaker Information Wrangler container within the inference pipeline, the place it’s preprocessed and handed to the skilled mannequin for product suggestion. For extra details about this course of, confer with New — Introducing Support for Real-Time and Batch Inference in Amazon SageMaker Data Wrangler. Though we use a particular algorithm to coach the mannequin in our instance, you need to use any algorithm that you simply discover acceptable on your use case.
  5. Use the newly launched SageMaker supplied mission template for Salesforce Information Cloud integration to streamline implementing the previous steps by offering the next templates:
    1. An instance pocket book showcasing information preparation, constructing, coaching, and registering the mannequin.
    2. The SageMaker supplied mission template for Salesforce Information Cloud integration, which automates making a SageMaker endpoint internet hosting the inference pipeline mannequin. When a model of the mannequin within the Amazon SageMaker Model Registry is accepted, the endpoint is uncovered as an API with Amazon API Gateway utilizing a customized Salesforce JSON Net Token (JWT) authorizer. API Gateway is required to permit Salesforce Information Cloud to make predictions towards the SageMaker endpoint utilizing a JWT token that Salesforce creates and passes with the request when making predictions from Salesforce. JWT can be used as a part of OpenID Connect (OIDC) and OAuth 2.0 frameworks to limit consumer entry to your APIs.
  6. After you create the API, we suggest registering the mannequin endpoint in Salesforce Einstein Studio. For directions, confer with Bring Your Own AI Models to Salesforce with Einstein Studio

The next diagram illustrates the answer structure.

Create a SageMaker Studio area

First, create a SageMaker Studio area. For directions, confer with Onboard to Amazon SageMaker Domain. It’s best to be aware down the area ID and execution function that’s created and will probably be utilized by your person profile. You add permissions to this function in subsequent steps.

The next screenshot exhibits the area we created for this submit.

The next screenshot exhibits the instance person profile for this submit.

Arrange the Salesforce linked app

Subsequent, we create a Salesforce linked app to allow the OAuth stream from SageMaker Studio to Salesforce Information Cloud. Full the next steps:

  1. Log in to Salesforce and navigate to Setup.
  2. Seek for App Supervisor and create a brand new linked app.
  3. Present the next inputs:
    1. For Linked App Title, enter a reputation.
    2. For API Title, go away as default (it’s mechanically populated).
    3. For Contact E mail, enter your contact e mail handle.
    4. Choose Allow OAuth Settings.
    5. For Callback URL, enter https://<domain-id>.studio.<area>, and supply the area ID that you simply captured whereas creating the SageMaker area and the Area of your SageMaker area.
  4. Below Chosen OAuth Scopes, transfer the next from Obtainable OAuth Scopes to Chosen OAuth Scopes and select Save:
    1. Handle person information by way of APIs (api)
    2. Carry out requests at any time (refresh_token, offline_access)
    3. Carry out ANSI SQL queries on Salesforce Information Cloud information (Information Cloud_query_api)
    4. Handle Salesforce Buyer Information Platform profile information (Information Cloud_profile_api
    5. Entry the id URL service (id, profile, e mail, handle, telephone)
    6. Entry distinctive person identifiers (openid)

For extra details about making a linked app, confer with Create a Connected App.

  1. Return to the linked app and navigate to Client Key and Secret.
  2. Select Handle Client Particulars.
  3. Copy the important thing and secret.

You might be requested to log in to your Salesforce org as a part of the two-factor authentication right here.

  1. Navigate again to the Handle Linked Apps web page.
  2. Open the linked app you created and select Handle.
  3. Select Edit Insurance policies and alter IP Rest to Chill out IP restrictions, then save your settings.

Configure SageMaker permissions and lifecycle guidelines

On this part, we stroll via the steps to configure SageMaker permissions and lifecycle administration guidelines.

Create a secret in AWS Secrets and techniques Supervisor

Allow OAuth integration with Salesforce Information Cloud by storing credentials out of your Salesforce linked app in AWS Secrets Manager:

  1. On the Secrets and techniques Supervisor console, select Retailer a brand new secret.
  2. Choose Different kind of secret.
  3. Create your secret with the next key-value pairs:
    "identity_provider": "SALESFORCE",
    "authorization_url": "",
    "token_url": "",
    "client_id": "<YOUR_CONSUMER_KEY>",
    "client_secret": "<YOUR_CONSUMER_SECRET>"
    “issue_url”: “<YOUR_SALESFORCE_ORG_URL>”

  4. Add a tag with the important thing sagemaker:companion and your selection of worth.
  5. Save the key and be aware the ARN of the key.

Configure a SageMaker lifecycle rule

The SageMaker Studio area execution function would require AWS Identity and Access Management (IAM) permissions to entry the key created within the earlier step. For extra info, confer with Creating roles and attaching policies (console).

  1. On the IAM console, connect the next polices to their respective roles (these roles will probably be utilized by the SageMaker mission for deployment):
    1. Add the coverage AmazonSageMakerPartnerServiceCatalogProductsCloudFormationServiceRolePolicy to the service function AmazonSageMakerServiceCatalogProductsCloudformationRole.
    2. Add the coverage AmazonSageMakerPartnerServiceCatalogProductsApiGatewayServiceRolePolicy to the service function AmazonSageMakerServiceCatalogProductsApiGatewayRole.
    3. Add the coverage AmazonSageMakerPartnerServiceCatalogProductsLambdaServiceRolePolicy to the service function AmazonSageMakerServiceCatalogProductsLambdaRole.
  2. On the IAM console, navigate to the SageMaker area execution function.
  3. Select Add permissions and choose Create an inline coverage.
  4. Enter the next coverage within the JSON coverage editor:
    "Model": "2012-10-17",
    "Assertion": [
    "Effect": "Allow",
    "Action": [
    "Useful resource": "arn:aws:secretsmanager:*:*:secret:*",
    "Situation": {
    "ForAnyValue:StringLike": {
    "aws:ResourceTag/sagemaker:companion": "*"
    "Impact": "Permit",
    "Motion": [
    "Useful resource": "arn:aws:secretsmanager:*:*:secret:AmazonSageMaker-*"

SageMaker Studio lifecycle configuration supplies shell scripts that run when a pocket book is created or began. The lifecycle configuration will probably be used to retrieve the key and import it to the SageMaker runtime.

  1. On the SageMaker console, select Lifecycle configurations within the navigation pane.
  2. Select Create configuration.
  3. Go away the default choice Jupyter Server App and select Subsequent.
  4. Give the configuration a reputation.
  5. Enter the next script within the editor, offering the ARN for the key you created earlier:
    set -eux
    cat > ~/.sfgenie_identity_provider_oauth_config <<EOL
    "secret_arn": "<YOUR_SECRETS_ARN>"

  1. Select Submit to avoid wasting the lifecycle configuration.
  2. Select Domains within the navigation pane and open your area.
  3. On the Setting tab, select Connect to connect your lifecycle configuration.
  4. Select the lifecycle configuration you created and select Connect to area.
  5. Select Set as default.

In case you are a returning person to SageMaker Studio, with a purpose to guarantee Salesforce Information Cloud is enabled, upgrade to the latest Jupyter and SageMaker Data Wrangler kernels.

This completes the setup to allow information entry from Salesforce Information Cloud to SageMaker Studio to construct AI and machine studying (ML) fashions.

Create a SageMaker mission

To start out utilizing the answer, first create a mission utilizing Amazon SageMaker Projects. Full the next steps:

  1. In SageMaker Studio, underneath Deployments within the navigation pane, select Tasks.
  2. Select Create mission.
  3. Select the mission template known as Mannequin deployment for Salesforce.
  4. Select Choose mission template.
  5. Enter a reputation and elective description on your mission.
  6. Enter a mannequin group title.
  7. Enter the title of the Secrets and techniques Supervisor secret that you simply created earlier.
  8. Select Create mission.

The mission might take 1–2 minutes to provoke.

You possibly can see two new repositories. The primary one is for pattern notebooks that you need to use as is or customise to arrange, prepare, create, and register fashions within the SageMaker Mannequin Registry. The second repository is for automating the mannequin deployment, which incorporates exposing the SageMaker endpoint as an API.

  1. Select clone repo for each notebooks.

For this submit, we use the product suggestion instance, which might be discovered within the sagemaker-<YOUR-PROJECT-NAME>-p-<YOUR-PROJECT-ID>-example-nb/product-recommendation listing that you simply simply cloned. Earlier than we run the product-recommendation.ipynb pocket book, let’s do some information preparation to create the coaching information utilizing SageMaker Information Wrangler.

Put together information with SageMaker Information Wrangler

Full the next steps:

  1. In SageMaker Studio, on the File menu, select New and Information Wrangler stream.
  2. After you create the information stream, select (right-click) the tab and select Rename to rename the file.
  3. Select Import information.
  4. Select Create connection.
  5. Select Salesforce Information Cloud.
  6. For Title, enter salesforce-data-cloud-sagemaker-connection.
  7. For Salesforce org URL, enter your Salesforce org URL.
  8. Select Save + Join.
  9. Within the Information Explorer view, choose and preview the tables from the Salesforce Information Cloud to create and run the question to extract the required dataset.
  10. Your question will seem like under and you might use the desk title that you simply used whereas importing information in Salesforce Information Cloud.
    SELECT product_purchased__c, club_member__c, campaign__c, state__c, month__c,
          case_count__c,case_type_return__c, case_type_shipment_damaged__c,
          pages_visited__c,engagement_score__c, tenure__c, clicks__c, id__c
    FROM Training_Dataset_for_Sagemaker__dll

  11. Select Create dataset.

Creating the dataset might take a while.

Within the information stream view, now you can see a brand new node added to the visible graph.

For extra info on how you need to use SageMaker Information Wrangler to create Information High quality and Insights Studies, confer with Get Insights On Data and Data Quality.

SageMaker Information Wrangler gives over 300 built-in transformations. On this step, we use a few of these transformations to arrange the dataset for an ML mannequin. For detailed directions on tips on how to implement these transformations, confer with Transform Data.

  1. Use the Handle columns step with the Drop column rework to drop the column id__c.
  2. Use the Deal with lacking step with the Drop lacking rework to drop rows with lacking values for numerous options. We apply this transformation on all columns.
  3. Use a customized rework step to create categorical values for state__c, case_count__c, and tenure options. Use the next code for this transformation:
    from pyspark.sql.capabilities import when
    States_List = [‘Washington’, ‘Massachusetts’, ‘California’, ‘Minnesota’, ‘Vermont’, ‘Colorado’, ‘Arizona’]
    df = df.withColumn(‘state__c’, when(df.state__c.isin(States_List), df.state__c).in any other case(“Different”))
    df = df.withColumn(‘case_count__c’, when(df.case_count__c == 0, “No Instances”).in any other case( when(df.case_count__c <= 2, “1 to 2 Instances”).in any other case(“Larger than 2 Instances”)))
    df = df.withColumn(‘tenure__c’, when(df.tenure__c < 1, “Lower than 1 12 months”).in any other case( when(df.tenure__c == 1, “1 to 2 Years”).in any other case(when(df.tenure__c ==2, “2 to three Years”).in any other case(when(df.tenure__c == 3, “3 to 4 Years”).in any other case(“Grater Than 4 Years”)))))

  4. Use the Course of numeric step with the Scale values rework and select Customary scaler to scale clicks__c, engagement__score, and pages__visited__c options.
  5. Use the Encode categorical step with the One-hot encode rework to transform categorical variables to numeric for case__type__return___c, case__type_shipment__damaged, month__c, club__member__c, and campaign__c options (all options besides clicks__c, engagement__score, pages__visited__c, and product_purchased__c).

Mannequin constructing, coaching, and deployment

To construct, prepare, and deploy the mannequin, full the next steps:

  1. Return to the SageMaker mission, open the product-recommendation.ipynb pocket book, and run a processing job to preprocess the information utilizing the SageMaker Information Wrangler configuration you created.
  2. Observe the steps within the pocket book to coach a mannequin and register it to the SageMaker Mannequin Registry.
  3. Be certain that to replace the mannequin group title to match with the mannequin group title that you simply used whereas creating the SageMaker mission.

To find the mannequin group title, open the SageMaker mission that you simply created earlier and navigate to the Settings tab.

Equally, the stream file referenced within the pocket book should match with the stream file title that you simply created earlier.

  1. For this submit, we used product-recommendation because the mannequin group title, so we replace the pocket book with project-recommendation because the mannequin group title within the pocket book.

After the pocket book is run, the skilled mannequin is registered within the Mannequin Registry. To be taught extra in regards to the Mannequin Registry, confer with Register and Deploy Models with Model Registry.

  1. Choose the mannequin model you created and replace the standing of it to Accepted.

Now that you’ve got accepted the registered mannequin, the SageMaker Salesforce mission deploy step will provision and set off AWS CodePipeline.

CodePipeline has steps to construct and deploy a SageMaker endpoint for inference containing the SageMaker Information Wrangler preprocessing steps and the skilled mannequin. The endpoint will probably be uncovered to Salesforce Information Cloud as an API via API Gateway. The next screenshot exhibits the pipeline prefixed with Sagemaker-salesforce-product-recommendation-xxxxx. We additionally present you the endpoints and API that will get created by the SageMaker mission for Salesforce.

If you need, you possibly can check out the CodePipeline deploy step, which makes use of AWS CloudFormation scripts to create SageMaker endpoint and API Gateway with a customized JWT authorizer.

When pipeline deployment is full, yow will discover the SageMaker endpoint on the SageMaker console.

You possibly can discover the API Gateway created by the mission template on the API Gateway console.

Select the hyperlink to seek out the API Gateway URL.

You could find the main points of the JWT authorizer by selecting Authorizers on the API Gateway console. You can even go to the AWS Lambda console to assessment the code of the Lambda perform created by mission template.

To find the schema for use whereas invoking the API from Einstein Studio, select Info within the navigation pane of the Mannequin Registry. You will notice an Amazon Simple Storage Service (Amazon S3) hyperlink to a metadata file. Copy and paste the hyperlink into a brand new browser tab URL.

Let’s have a look at the file with out downloading it. On the file particulars web page, select the Object actions menu and select Question with S3 Choose.

Select Run SQL question and pay attention to the API Gateway URL and schema as a result of you’ll need this info when registering with Einstein Studio. For those who don’t see an APIGWURL key, both the mannequin wasn’t accepted, deployment continues to be in progress, or deployment failed.

Use the Salesforce Einstein Studio API for predictions

Salesforce Einstein Studio is a brand new and centralized expertise in Salesforce Information Cloud that information science and engineering groups can use to simply entry their conventional fashions and LLMs utilized in generative AI. Subsequent, we arrange the API URL and client_id that you simply set in Secrets and techniques Supervisor earlier in Salesforce Einstein Studio to register and use the mannequin inferences in Salesforce Einstein Studio. For directions, confer with Bring Your Own AI Models to Salesforce with Einstein Studio.

Clear up

To delete all of the sources created by the SageMaker mission, on the mission web page, select the Motion menu and select Delete.

To delete the sources (API Gateway and SageMaker endpoint) created by CodePipeline, navigate to the AWS CloudFormation console and delete the stack that was created.


On this submit, we defined how one can construct and prepare ML fashions in SageMaker Studio utilizing SageMaker Information Wrangler to import and put together information that’s hosted on the Salesforce Information Cloud and use the newly launched Salesforce Information Cloud JDBC connector in SageMaker Information Wrangler and first-party Salesforce template within the SageMaker supplied mission template for Salesforce Information Cloud integration. The SageMaker mission template for Salesforce allows you to deploy the mannequin and create the endpoint and safe an API for a registered mannequin. You then use the API to make predictions in Salesforce Einstein Studio for your small business use circumstances.

Though we used the instance of product suggestion to showcase the steps for implementing the end-to-end integration, you need to use the SageMaker mission template for Salesforce to create an endpoint and API for any SageMaker conventional mannequin and LLM that’s registered within the SageMaker Mannequin Registry. We look ahead to seeing what you construct in SageMaker utilizing information from Salesforce Information Cloud and empower your Salesforce purposes utilizing SageMaker hosted ML fashions!

This submit is a continuation of the collection concerning Salesforce Information Cloud and SageMaker integration. For a high-level overview and to be taught extra in regards to the enterprise impression you can also make with this integration strategy, confer with Part 1.

Extra sources

In regards to the authors

Daryl Martis is the Director of Product for Einstein Studio at Salesforce Information Cloud. He has over 10 years of expertise in planning, constructing, launching, and managing world-class options for enterprise clients together with AI/ML and cloud options. He has beforehand labored within the monetary providers business in New York Metropolis. Observe him on

Rachna Chadha is a Principal Options Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that moral and accountable use of AI can enhance society sooner or later and convey financial and social prosperity. In her spare time, Rachna likes spending time together with her household, mountain climbing, and listening to music.

Ife Stewart is a Principal Options Architect within the Strategic ISV phase at AWS. She has been engaged with Salesforce Information Cloud over the past 2 years to assist construct built-in buyer experiences throughout Salesforce and AWS. Ife has over 10 years of expertise in know-how. She is an advocate for variety and inclusion within the know-how discipline.

Dharmendra Kumar Rai (DK Rai) is a Sr. Information Architect, Information Lake & AI/ML, serving strategic clients. He works carefully with clients to know how AWS may help them resolve issues, particularly within the AI/ML and analytics house. DK has a few years of expertise in constructing data-intensive options throughout a variety of business verticals, together with high-tech, FinTech, insurance coverage, and consumer-facing purposes.

Marc Karp is an ML Architect with the SageMaker Service workforce. He focuses on serving to clients design, deploy, and handle ML workloads at scale. In his spare time, he enjoys touring and exploring new locations.

Leave a Reply

Your email address will not be published. Required fields are marked *