DeepSeek-R1 mannequin now accessible in Amazon Bedrock Market and Amazon SageMaker JumpStart

Right now, we’re saying that DeepSeek AI’s first-generation frontier mannequin, DeepSeek-R1, is out there by way of Amazon SageMaker JumpStart and Amazon Bedrock Marketplace to deploy for inference. Now you can use DeepSeek-R1 to construct, experiment, and responsibly scale your generative AI concepts on AWS.

On this submit, we exhibit methods to get began with DeepSeek-R1 on Amazon Bedrock and SageMaker JumpStart.

Overview of DeepSeek-R1

DeepSeek-R1 is a big language mannequin (LLM) developed by DeepSeek-AI that makes use of reinforcement studying to reinforce reasoning capabilities by way of a multi-stage coaching course of from a DeepSeek-V3-Base basis. A key distinguishing function is its reinforcement learning (RL) step, which was used to refine the mannequin’s responses past the usual pre-training and fine-tuning course of. By incorporating RL, DeepSeek-R1 can adapt extra successfully to person suggestions and targets, in the end enhancing each relevance and readability. As well as, DeepSeek-R1 employs a chain-of-thought (CoT) method, which means it’s outfitted to interrupt down advanced queries and motive by way of them in a step-by-step method. This guided reasoning course of permits the mannequin to supply extra correct, clear, and detailed solutions. This mannequin combines RL-based fine-tuning with CoT capabilities, aiming to generate structured responses whereas specializing in interpretability and person interplay. With its wide-ranging capabilities DeepSeek-R1 has captured the business’s consideration as a flexible text-generation mannequin that may be built-in into numerous workflows comparable to brokers, logical reasoning and information interpretation duties

DeepSeek-R1 makes use of a Combination of Consultants (MoE) structure and is 671 billion parameters in measurement. The MoE structure permits activation of 37 billion parameters, enabling environment friendly inference by routing queries to essentially the most related knowledgeable “clusters.” This method permits the mannequin to focus on totally different downside domains whereas sustaining general effectivity. DeepSeek-R1 requires not less than 800 GB of HBM reminiscence in FP8 format for inference. On this submit, we’ll use an ml.p5e.48xlarge occasion to deploy the mannequin. ml.p5e.48xlarge comes with 8 Nvidia H200 GPUs offering 1128 GB of GPU reminiscence.

You possibly can deploy DeepSeek-R1 mannequin both by way of SageMaker JumpStart or Bedrock Marketplace. As a result of DeepSeek-R1 is an rising mannequin, we advocate deploying this mannequin with guardrails in place. On this weblog, we’ll use Amazon Bedrock Guardrails to introduce safeguards, forestall dangerous content material, and consider fashions towards key security standards. On the time of scripting this weblog, for DeepSeek-R1 deployments on SageMaker JumpStart and Bedrock Market, Bedrock Guardrails helps solely the ApplyGuardrail API. You possibly can create a number of guardrails tailor-made to totally different use instances and apply them to the DeepSeek-R1 mannequin, bettering person experiences and standardizing security controls throughout your generative AI purposes.

Stipulations

To deploy the DeepSeek-R1 mannequin, you want entry to an ml.p5e occasion. To test in case you have quotas for P5e, open the Service Quotas console and underneath AWS Providers, select Amazon SageMaker, and make sure you’re utilizing ml.p5e.48xlarge for endpoint utilization. Just be sure you have not less than one ml.P5e.48xlarge occasion within the AWS Area you’re deploying. To request a restrict enhance, create a limit increase request and attain out to your account staff.

As a result of you may be deploying this mannequin with Amazon Bedrock Guardrails, ensure you have the proper AWS Identity and Access Management (IAM) permissions to make use of Amazon Bedrock Guardrails. For directions, see Set up permissions to use guardrails for content filtering.

Implementing guardrails with the ApplyGuardrail API

Amazon Bedrock Guardrails means that you can introduce safeguards, forestall dangerous content material, and consider fashions towards key security standards. You possibly can implement security measures for the DeepSeek-R1 mannequin utilizing the Amazon Bedrock ApplyGuardrail API. This lets you apply guardrails to guage person inputs and mannequin responses deployed on Amazon Bedrock Market and SageMaker JumpStart. You possibly can create a guardrail utilizing the Amazon Bedrock console or the API. For the instance code to create the guardrail, see the GitHub repo.

The final circulation entails the next steps: First, the system receives an enter for the mannequin. This enter is then processed by way of the ApplyGuardrail API. If the enter passes the guardrail test, it’s despatched to the mannequin for inference. After receiving the mannequin’s output, one other guardrail test is utilized. If the output passes this ultimate test, it’s returned as the ultimate end result. Nevertheless, if both the enter or output is intervened by the guardrail, a message is returned indicating the character of the intervention and whether or not it occurred on the enter or output stage. The examples showcased within the following sections exhibit inference utilizing this API.

Deploy DeepSeek-R1 in Amazon Bedrock Market

Amazon Bedrock Market provides you entry to over 100 standard, rising, and specialised foundation models (FMs) by way of Amazon Bedrock. To entry DeepSeek-R1 in Amazon Bedrock, full the next steps:

On the Amazon Bedrock console, select Mannequin catalog underneath Basis fashions within the navigation pane.
On the time of scripting this submit, you need to use the InvokeModel API to invoke the mannequin. It doesn’t assist Converse APIs and different Amazon Bedrock tooling.
Filter for DeepSeek as a supplier and select the DeepSeek-R1 mannequin.

The mannequin element web page offers important details about the mannequin’s capabilities, pricing construction, and implementation tips. You will discover detailed utilization directions, together with pattern API calls and code snippets for integration. The mannequin helps numerous textual content technology duties, together with content material creation, code technology, and query answering, utilizing its reinforcement studying optimization and CoT reasoning capabilities.
The web page additionally contains deployment choices and licensing data that will help you get began with DeepSeek-R1 in your purposes.
To start utilizing DeepSeek-R1, select Deploy.

You can be prompted to configure the deployment particulars for DeepSeek-R1. The mannequin ID shall be pre-populated.
For Endpoint identify, enter an endpoint identify (between 1–50 alphanumeric characters).
For Variety of cases, enter a lot of cases (between 1–100).
For Occasion sort, select your occasion sort. For optimum efficiency with DeepSeek-R1, a GPU-based occasion sort like ml.p5e.48xlarge is advisable.
Optionally, you possibly can configure superior safety and infrastructure settings, together with digital personal cloud (VPC) networking, service function permissions, and encryption settings. For many use instances, the default settings will work nicely. Nevertheless, for manufacturing deployments, you may need to evaluate these settings to align along with your group’s safety and compliance necessities.
Select Deploy to start utilizing the mannequin.

When the deployment is full, you possibly can check DeepSeek-R1’s capabilities instantly within the Amazon Bedrock playground.
Select Open in playground to entry an interactive interface the place you possibly can experiment with totally different prompts and modify mannequin parameters like temperature and most size.
When utilizing R1 with Bedrock’s InvokeModel and Playground Console, use DeepSeek’s chat template for optimum outcomes. For instance, <｜start▁of▁sentence｜><｜Consumer｜>content material for inference<｜Assistant｜> .

This is a wonderful technique to discover the mannequin’s reasoning and textual content technology talents earlier than integrating it into your purposes. The playground offers quick suggestions, serving to you perceive how the mannequin responds to numerous inputs and letting you fine-tune your prompts for optimum outcomes.

You possibly can shortly check the mannequin within the playground by way of the UI. Nevertheless, to invoke the deployed mannequin programmatically with any Amazon Bedrock APIs, it’s essential to get the endpoint ARN.

Run inference utilizing guardrails with the deployed DeepSeek-R1 endpoint

The next code instance demonstrates methods to carry out inference utilizing a deployed DeepSeek-R1 mannequin by way of Amazon Bedrock utilizing the invoke_model and ApplyGuardrail API. You possibly can create a guardrail utilizing the Amazon Bedrock console or the API. For the instance code to create the guardrail, see the GitHub repo. After you might have created the guardrail, use the next code to implement guardrails. The script initializes the bedrock_runtime consumer, configures inference parameters, and sends a request to generate textual content primarily based on a person immediate.

import boto3
import json

# Initialize Bedrock consumer
bedrock_runtime = boto3.consumer("bedrock-runtime")

# Configuration
MODEL_ID = "your-model-id"  # Bedrock mannequin ID
GUARDRAIL_ID = "your-guardrail-id"
GUARDRAIL_VERSION = "your-guardrail-version"

def invoke_with_guardrails(immediate, max_tokens=1000, temperature=0.6, top_p=0.9):
    """
    Invoke Bedrock mannequin with enter and output guardrails
    """
    # Apply enter guardrails
    input_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        supply="INPUT",
        content material=[{"text": {"text": prompt}}]
    )
    
    if input_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Enter blocked: {input_guardrail['outputs'][0]['text']}"

    # Put together mannequin enter
    request_body = {
        "inputs": f"""You might be an AI assistant. Do because the person asks.
### Instruction: {immediate}
### Response: <assume>""",
        "parameters": {
            "max_new_tokens": max_tokens,
            "top_p": top_p,
            "temperature": temperature
        }
    }

    # Invoke mannequin
    response = bedrock_runtime.invoke_model(
        modelId=MODEL_ID,
        physique=json.dumps(request_body)
    )
    
    # Parse mannequin response
    model_output = json.hundreds(response['body'].learn())['generated_text']

    # Apply output guardrails
    output_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        supply="OUTPUT",
        content material=[{"text": {"text": model_output}}]
    )

    if output_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Output blocked: {output_guardrail['outputs'][0]['text']}"
    
    return model_output

# Instance utilization
if __name__ == "__main__":
    immediate = "What's 1+1?"
    end result = invoke_with_guardrails(immediate)
    print(end result)

Deploy DeepSeek-R1 with SageMaker JumpStart

SageMaker JumpStart is a machine studying (ML) hub with FMs, built-in algorithms, and prebuilt ML options which you could deploy with only a few clicks. With SageMaker JumpStart, you possibly can customise pre-trained fashions to your use case, along with your information, and deploy them into manufacturing utilizing both the UI or SDK.

Deploying DeepSeek-R1 mannequin by way of SageMaker JumpStart presents two handy approaches: utilizing the intuitive SageMaker JumpStart UI or implementing programmatically by way of the SageMaker Python SDK. Let’s discover each strategies that will help you select the method that most closely fits your wants.

Deploy DeepSeek-R1 by way of SageMaker JumpStart UI

Full the next steps to deploy DeepSeek-R1 utilizing SageMaker JumpStart:

On the SageMaker console, select Studio within the navigation pane.
First-time customers shall be prompted to create a website.
On the SageMaker Studio console, select JumpStart within the navigation pane.

The mannequin browser shows accessible fashions, with particulars just like the supplier identify and mannequin capabilities.
Seek for DeepSeek-R1 to view the DeepSeek-R1 mannequin card.
Every mannequin card exhibits key data, together with:
- Mannequin identify
- Supplier identify
- Process class (for instance, Textual content Technology)
- Bedrock Prepared badge (if relevant), indicating that this mannequin might be registered with Amazon Bedrock, permitting you to make use of Amazon Bedrock APIs to invoke the mannequin
Select the mannequin card to view the mannequin particulars web page.

The mannequin particulars web page contains the next data:
- The mannequin identify and supplier data
- Deploy button to deploy the mannequin
- About and Notebooks tabs with detailed data
The About tab contains essential particulars, comparable to:
- Mannequin description
- License data
- Technical specs
- Utilization tips
Earlier than you deploy the mannequin, it’s advisable to evaluate the mannequin particulars and license phrases to verify compatibility along with your use case.
Select Deploy to proceed with deployment.
For Endpoint identify, use the robotically generated identify or create a customized one.
For Occasion sort¸ select an occasion sort (default: ml.p5e.48xlarge).
For Preliminary occasion rely, enter the variety of cases (default: 1).
Deciding on acceptable occasion varieties and counts is essential for value and efficiency optimization. Monitor your deployment to regulate these settings as wanted.Below Inference sort, Actual-time inference is chosen by default. That is optimized for sustained visitors and low latency.
Evaluate all configurations for accuracy. For this mannequin, we strongly advocate adhering to SageMaker JumpStart default settings and ensuring that community isolation stays in place.
Select Deploy to deploy the mannequin.

The deployment course of can take a number of minutes to finish.

When deployment is full, your endpoint standing will change to InService. At this level, the mannequin is able to settle for inference requests by way of the endpoint. You possibly can monitor the deployment progress on the SageMaker console Endpoints web page, which is able to show related metrics and standing data. When the deployment is full, you possibly can invoke the mannequin utilizing a SageMaker runtime client and combine it along with your purposes.

Deploy DeepSeek-R1 utilizing the SageMaker Python SDK

To get began with DeepSeek-R1 utilizing the SageMaker Python SDK, you have to to put in the SageMaker Python SDK and ensure you have the mandatory AWS permissions and setting setup. The next is a step-by-step code instance that demonstrates methods to deploy and use DeepSeek-R1 for inference programmatically. The code for deploying the mannequin is offered within the Github here . You possibly can clone the pocket book and run from SageMaker Studio.

!pip set up --force-reinstall --no-cache-dir sagemaker==2.235.2

from sagemaker.serve.builder.model_builder import ModelBuilder 
from sagemaker.serve.builder.schema_builder import SchemaBuilder 
from sagemaker.jumpstart.mannequin import ModelAccessConfig 
from sagemaker.session import Session 
import logging 

sagemaker_session = Session()
 
artifacts_bucket_name = sagemaker_session.default_bucket() 
execution_role_arn = sagemaker_session.get_caller_identity_arn()
 
js_model_id = "deepseek-llm-r1"

gpu_instance_type = "ml.p5e.48xlarge"
 
response = "Howdy, I am a language mannequin, and I am right here that will help you along with your English."

 sample_input = {
 "inputs": "Howdy, I am a language mannequin,",
 "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.6},
 }
  
 sample_output = [{"generated_text": response}]
  
 schema_builder = SchemaBuilder(sample_input, sample_output)
  
 model_builder = ModelBuilder( 
 mannequin=js_model_id, 
 schema_builder=schema_builder, 
 sagemaker_session=sagemaker_session, 
 role_arn=execution_role_arn, 
 log_level=logging.ERROR ) 
 
 mannequin= model_builder.construct() 
 predictor = mannequin.deploy(model_access_configs={js_model_id:ModelAccessConfig(accept_eula=True)}, accept_eula=True) 
 
 
 predictor.predict(sample_input)

You possibly can run further requests towards the predictor:

new_input = {
    "inputs": "What's Amazon doing in Generative AI?",
    "parameters": {"max_new_tokens": 64, "top_p": 0.8, "temperature": 0.7},
}

prediction = predictor.predict(new_input)
print(prediction)

Implement guardrails and run inference along with your SageMaker JumpStart predictor

Just like Amazon Bedrock, you may as well use the ApplyGuardrail API along with your SageMaker JumpStart predictor. You possibly can create a guardrail utilizing the Amazon Bedrock console or the API, and implement it as proven within the following code:

import boto3
import json
bedrock_runtime = boto3.consumer('bedrock-runtime')
sagemaker_runtime = boto3.consumer('sagemaker-runtime')

# Add your guardrail identifier and model created from Bedrock Console or AWSCLI
guardrail_id = "" # Your Guardrail ID
guardrail_version = "" # Your Guardrail Model
endpoint_name = "" # Endpoint Identify

immediate = "What's 1+1 equal?"

# Apply guardrail to enter earlier than sending to mannequin
input_guardrail_response = bedrock_runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion=guardrail_version,
    supply="INPUT",
    content material=[{ "text": { "text": prompt }}]
)

# If enter guardrail passes, proceed with mannequin inference
if input_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
    # Put together the enter for the SageMaker endpoint
    template = f"""You might be an AI assistant. Do because the person asks.
### Instruction: {immediate}
### Response: <assume>"""
    
    input_payload = {
        "inputs": template,
        "parameters": {
            "max_new_tokens": 1000,
            "top_p": 0.9,
            "temperature": 0.6
        }
    }
    
    # Convert the payload to JSON string
    input_payload_json = json.dumps(input_payload)
    
    # Invoke the SageMaker endpoint
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Physique=input_payload_json
    )
    
    # Get the response from the mannequin
    model_response = json.hundreds(response['Body'].learn().decode())
    
    # Apply guardrail to output
    output_guardrail_response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        supply="OUTPUT",
        content material=[{ "text": { "text": model_response['generated_text'] }}]
    )
    
    # Test if output passes guardrails
    if output_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
        print(model_response['generated_text'])
    else:
        print("Output blocked: ", output_guardrail_response['outputs'][0]['text'])
else:
    print("Enter blocked: ", input_guardrail_response['outputs'][0]['text'])

Clear up

To keep away from undesirable expenses, full the steps on this part to wash up your sources.

Delete the Amazon Bedrock Market deployment

In the event you deployed the mannequin utilizing Amazon Bedrock Market, full the next steps:

On the Amazon Bedrock console, underneath Basis fashions within the navigation pane, select Market deployments.
Within the Managed deployments part, find the endpoint you need to delete.
Choose the endpoint, and on the Actions menu, select Delete.
Confirm the endpoint particulars to ensure you’re deleting the proper deployment:
1. Endpoint identify
2. Mannequin identify
3. Endpoint standing
Select Delete to delete the endpoint.
Within the deletion affirmation dialog, evaluate the warning message, enter verify, and select Delete to completely take away the endpoint.

Delete the SageMaker JumpStart predictor

The SageMaker JumpStart mannequin you deployed will incur prices when you depart it operating. Use the next code to delete the endpoint if you wish to cease incurring expenses. For extra particulars, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

On this submit, we explored how one can entry and deploy the DeepSeek-R1 mannequin utilizing Bedrock Market and SageMaker JumpStart. Go to SageMaker JumpStart in SageMaker Studio or Amazon Bedrock Market now to get began. For extra data, confer with Use Amazon Bedrock tooling with Amazon SageMaker JumpStart models, SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, Amazon Bedrock Marketplace, and Getting started with Amazon SageMaker JumpStart.

In regards to the Authors

Vivek Gangasani is a Lead Specialist Options Architect for Inference at AWS. He helps rising generative AI firms construct modern options utilizing AWS companies and accelerated compute. Presently, he’s centered on growing methods for fine-tuning and optimizing the inference efficiency of huge language fashions. In his free time, Vivek enjoys climbing, watching films, and attempting totally different cuisines.

Niithiyn Vijeaswaran is a Generative AI Specialist Options Architect with the Third-Social gathering Mannequin Science staff at AWS. His space of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s diploma in Laptop Science and Bioinformatics.

Jonathan Evans is a Specialist Options Architect engaged on generative AI with the Third-Social gathering Mannequin Science staff at AWS.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker’s machine studying and generative AI hub. She is captivated with constructing options that assist clients speed up their AI journey and unlock enterprise worth.

DeepSeek-R1 mannequin now accessible in Amazon Bedrock Market and Amazon SageMaker JumpStart

Overview of DeepSeek-R1

Stipulations

Implementing guardrails with the ApplyGuardrail API

Deploy DeepSeek-R1 in Amazon Bedrock Market

Run inference utilizing guardrails with the deployed DeepSeek-R1 endpoint

Deploy DeepSeek-R1 with SageMaker JumpStart

Deploy DeepSeek-R1 by way of SageMaker JumpStart UI

Deploy DeepSeek-R1 utilizing the SageMaker Python SDK

Implement guardrails and run inference along with your SageMaker JumpStart predictor

Clear up

Delete the Amazon Bedrock Market deployment

Delete the SageMaker JumpStart predictor

Conclusion

In regards to the Authors

Degree up your problem-solving and strategic considering expertise with Amazon Bedrock

Understanding Likelihood Distributions for Machine Studying with Python

Ascending Ranges of Nerd – O’Reilly

Leave a Reply Cancel reply

Moral Concerns and Greatest Practices in LLM Improvement

Degree up your problem-solving and strategic considering expertise with Amazon Bedrock

Google AI bulletins from January

EON Actuality Unveils “Human 2.0”: A Revolutionary AI-Powered Academic Framework to Unlock Creativity, Speed up Drawback-Fixing, and Maximize Human Potential – EON Actuality

Understanding Likelihood Distributions for Machine Studying with Python

Overview of DeepSeek-R1

Stipulations

Implementing guardrails with the ApplyGuardrail API

Deploy DeepSeek-R1 in Amazon Bedrock Market

Run inference utilizing guardrails with the deployed DeepSeek-R1 endpoint

Deploy DeepSeek-R1 with SageMaker JumpStart

Deploy DeepSeek-R1 by way of SageMaker JumpStart UI

Deploy DeepSeek-R1 utilizing the SageMaker Python SDK

Implement guardrails and run inference along with your SageMaker JumpStart predictor

Clear up

Delete the Amazon Bedrock Market deployment

Delete the SageMaker JumpStart predictor

Conclusion

In regards to the Authors

More Stories

Leave a Reply Cancel reply

You may have missed