Customise DeepSeek-R1 671b mannequin utilizing Amazon SageMaker HyperPod recipes – Half 2


This publish is the second a part of the DeepSeek collection specializing in mannequin customization with Amazon SageMaker HyperPod recipes (or recipes for brevity). In Part 1, we demonstrated the efficiency and ease of fine-tuning DeepSeek-R1 distilled fashions utilizing these recipes. On this publish, we use the recipes to fine-tune the unique DeepSeek-R1 671b parameter mannequin. We exhibit this by means of the step-by-step implementation of those recipes utilizing each SageMaker training jobs and SageMaker HyperPod.

Enterprise use case

After its public launch, DeepSeek-R1 mannequin, developed by DeepSeek AI, confirmed spectacular outcomes throughout a number of analysis benchmarks. The mannequin follows the Mixture of Experts (MoE) structure and has 671 billion parameters. Historically, giant fashions are nicely tailored for a large spectrum of generalized duties by the advantage of being skilled on the large quantity of knowledge. The DeepSeek-R1 mannequin was skilled on 14.8 trillion tokens. The unique R1 mannequin demonstrates robust few-shot or zero-shot studying capabilities, permitting it to generalize to new duties and situations that weren’t a part of its authentic coaching.

Nevertheless, many purchasers desire to both fine-tune or run steady pre-training of those fashions to adapt it to their particular enterprise functions or to optimize it for particular duties. A monetary group may wish to customise the mannequin with their customized information to help with their information processing duties. Or a hospital community can fine-tune it with their affected person data to behave as a medical assistant for his or her medical doctors. Fantastic-tuning also can lengthen the mannequin’s generalization capacity. Prospects can fine-tune it with a corpus of textual content in particular languages that aren’t absolutely represented within the authentic coaching information. For instance, a mannequin fine-tuned with a further trillion tokens of Hindi language will be capable to broaden the identical generalization capabilities to Hindi.

The choice on which mannequin to fine-tune will depend on the top utility in addition to the accessible dataset. Primarily based on the quantity of proprietary information, clients can determine to fine-tune the bigger DeepSeek-R1 mannequin as a substitute of doing it for one of many distilled variations. As well as, the R1 fashions have their very own set of guardrails. Prospects may wish to fine-tune to replace these guardrails or broaden on them.

Fantastic-tuning bigger fashions like DeepSeek-R1 requires cautious optimization to steadiness value, deployment necessities, and efficiency effectiveness. To attain optimum outcomes, organizations should meticulously choose an applicable surroundings, decide one of the best hyperparameters, and implement environment friendly mannequin sharding methods.

Answer structure

SageMaker HyperPod recipes successfully deal with these necessities by offering a fastidiously curated mixture of distributed coaching strategies, optimizations, and configurations for state-of-the-art (SOTA) open supply fashions. These recipes have undergone intensive benchmarking, testing, and validation to supply seamless integration with the SageMaker coaching and fine-tuning processes.

On this publish, we discover options that exhibit the way to fine-tune the DeepSeek-R1 mannequin utilizing these recipes on both SageMaker HyperPod or SageMaker coaching jobs. Your alternative between these providers will rely in your particular necessities and preferences. For those who require granular management over coaching infrastructure and intensive customization choices, SageMaker HyperPod is the best alternative. SageMaker coaching jobs, alternatively, is tailor-made for organizations that need a absolutely managed expertise for his or her coaching workflows. To be taught extra particulars about these service options, confer with Generative AI foundation model training on Amazon SageMaker.

The next diagram illustrates the answer structure for coaching utilizing SageMaker HyperPod. With HyperPod, customers can start the method by connecting to the login/head node of the Slurm cluster. Every step is run as a Slurm job and makes use of Amazon FSx for Lustre for storing mannequin checkpoints. For DeepSeek-R1, the method consists of the next steps:

  1. Obtain the DeepSeek-R1 mannequin and convert weights from FP8 to BF16 format
  2. Load the mannequin into reminiscence and carry out fine-tuning utilizing Quantized Low-Rank Adaptation (QLoRA)
  3. Merge QLoRA adapters with the bottom mannequin
  4. Convert and cargo the mannequin for batch analysis

The next diagram illustrates the answer structure for SageMaker coaching jobs. You possibly can execute every step within the coaching pipeline by initiating the method by means of the SageMaker management airplane utilizing APIs, AWS Command Line Interface (AWS CLI), or the SageMaker ModelTrainer SDK. In response, SageMaker launches coaching jobs with the requested quantity and kind of compute situations to run particular duties. For DeepSeek-R1, the method consists of three foremost steps:

  1. Obtain and convert R1 to BF16 datatype format
  2. Load the mannequin into reminiscence and carry out fine-tuning
  3. Consolidate and cargo the checkpoints into reminiscence, then run inference and metrics to guage efficiency enhancements

Conditions

Full the next stipulations earlier than working the DeepSeek-R1 671B mannequin fine-tuning pocket book:

  1. Make the next quota improve requests for SageMaker. You’ll want to request a minimal of two ml.p5.48xlarge situations (with 8 x NVIDIA H100 GPUs) ranging to a most of 4 ml.p5.48xlarge situations (relying on time-to-train and cost-to-train trade-offs on your use case). On the Service Quotas console, request the next SageMaker quotas. It will probably take as much as 24 hours for the quota improve to be accepted:
    • P5 situations (ml.p5.48xlarge) for coaching job utilization: 2–4
    • P5 situations (ml.p5.48xlarge) for HyperPod clusters (ml.p5.48xlarge for cluster utilization): 2–4
  2. For those who select to make use of HyperPod clusters to run your coaching, arrange a HyperPod Slurm cluster, referring to Amazon SageMaker HyperPod Developer Guide. Alternatively, you too can use the AWS CloudFormation template supplied within the Own Account workshop and observe the directions to set up a cluster and a growth surroundings to entry and submit jobs to the cluster.
  3. (Non-compulsory) For those who select to make use of SageMaker coaching jobs, you’ll be able to create an Amazon SageMaker Studio area (confer with Use quick setup for Amazon SageMaker AI) to entry Jupyter notebooks with the previous function (You should utilize JupyterLab in your native setup too).
    1. Create an AWS Identity and Access Management (IAM) role with managed insurance policies AmazonSageMakerFullAccess, AmazonFSxFullAccess, and AmazonS3FullAccess to offer the mandatory entry to SageMaker to run the examples.
  4. Clone the GitHub repository with the property for this deployment. This repository consists of a pocket book that references coaching property:
git clone https://github.com/aws-samples/sagemaker-distributed-training-workshop.git
cd 18_sagemaker_training_recipes/ft_deepseek_r1_qlora

Answer walkthrough

To carry out the answer, observe the steps within the subsequent sections.

Technical concerns

The default weights supplied by the DeepSeek crew on their official R1 repository are of sort FP8. Nevertheless, we selected to disable FP8 in our recipes as a result of we empirically discovered that coaching with BF16 enhances generalization throughout various datasets with minimal adjustments to the recipe hyperparameters. Due to this fact, to attain steady fine-tuning for a mannequin of 671b parameter measurement, we suggest first changing the mannequin from FP8 to BF16 utilizing the fp8_cast_bf16.py command-line script supplied by DeepSeek. Executing this script will copy over the transformed BF16 weights in Safetensor format to the required output listing. Keep in mind to repeat over the mannequin’s config.yaml to the output listing so the weights are loaded precisely. These steps are encapsulated in a prologue script and are documented step-by-step underneath the Fantastic-tuning part.

Prospects can use a sequence size of 8K for coaching, as examined on a p5.48xlarge occasion, every geared up with eight NVIDIA H100 GPUs. You may as well select a smaller sequence size if wanted. Coaching with a sequence size higher than 8K may result in out-of-memory points with GPUs. Additionally, changing mannequin weights from FP8 to BF16 requires a p5.48xlarge occasion, which can also be really useful for coaching as a result of mannequin’s excessive host reminiscence necessities throughout initialization.

Prospects should improve their transformers model to transformers==4.48.2 to run the coaching.

Fantastic-tuning

Run the finetune_deepseek_r1_671_qlora.ipynb pocket book to fine-tune the DeepSeek-R1 mannequin utilizing QLoRA on SageMaker.

Put together the dataset

This part covers loading the FreedomIntelligence/medical-o1-reasoning-SFT dataset, tokenizing and chunking the dataset, and configuring the information channels for SageMaker coaching on Amazon Simple Storage Service (Amazon S3). Full the next steps:

  1. Format the dataset by making use of the immediate format for DeepSeek-R1:
def generate_prompt(data_point):
full_prompt = f"""
Beneath is an instruction that describes a job, paired with an enter
that gives additional context.
Write a response that appropriately completes the request.
Earlier than answering, think twice in regards to the query and create a step-by-step chain of ideas to make sure a logical and correct response.

### Instruction:
You're a medical skilled with superior information in medical reasoning, diagnostics, and remedy planning.
Please reply the next medical query.

### Query:
{data_point["Question"]}

### Response:
{data_point["Complex_CoT"]}

"""
return {"immediate": full_prompt.strip()}

  1. Load the FreedomIntelligence/medical-o1-reasoning-SFT dataset and cut up it into coaching and validation datasets:
# Load dataset from the hub
train_set = load_dataset(dataset_name, 'en', cut up="practice[5%:]")
test_set = load_dataset(dataset_name, 'en', cut up="practice[:5%]")

...

train_dataset = train_set.map(
generate_and_tokenize_prompt,
remove_columns=columns_to_remove,
batched=False
)

test_dataset = test_set.map(
generate_and_tokenize_prompt,
remove_columns=columns_to_remove,
batched=False
)

  1. Load the DeepSeek-R1 tokenizer from the Hugging Face Transformers library and generate tokens for the practice and validation datasets. We use the unique sequence size of 8K:
model_id = "deepseek-ai/DeepSeek-R1"
max_seq_length=8096

# Initialize a tokenizer by loading a pre-trained tokenizer configuration, utilizing the quick tokenizer implementation if accessible.
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

...

train_dataset = train_dataset.map(tokenize, remove_columns=["prompt"])
test_dataset = test_dataset.map(tokenize, remove_columns=["prompt"])

  1. Put together the coaching and validation datasets for SageMaker coaching by saving them as arrow information, required by SageMaker HyperPod recipes, and establishing the S3 paths the place these information will probably be uploaded. This dataset will probably be utilized in each SageMaker coaching jobs and SageMaker HyperPod examples:
train_dataset_s3_path = f"s3://{bucket_name}/{input_path}/practice"
val_dataset_s3_path = f"s3://{bucket_name}/{input_path}/check"

train_dataset.save_to_disk(train_dataset_s3_path)
val_dataset.save_to_disk(val_dataset_s3_path)

The subsequent part describes the way to run a fine-tuning instance with SageMaker coaching jobs.

Possibility A: Fantastic-tune utilizing SageMaker coaching jobs

Comply with these high-level steps:

  1. Obtain DeepSeek-R1 to the FSx for Lustre mounted listing
  2. Convert DeepSeek-R1 from FP8 to BF16
  3. Fantastic-tune the DeepSeek-R1 mannequin
  4. Merge the skilled adapter with the bottom mannequin

Outline a utility operate to create the ModelTrainer class for each step of the SageMaker coaching jobs pipeline:

# Creates and executes a mannequin coaching job utilizing SageMaker
def create_model_trainer(
use_recipes: bool,
compute: dict,
community: dict,
data_channel: dict,
motion: str,
hyperparameters: dict ={},
source_code: str=None,
training_recipe: str=None,
recipe_overrides: str=None,
image_uri: str=None
) -> ModelTrainer:

...

Obtain DeepSeek-R1 to the FSx for Lustre mounted listing

Comply with these steps:

  1. Choose the occasion sort, Amazon FSx information channel, community configuration for the coaching job, and supply code, then outline the ModelTrainer class to run the coaching job on the ml.c5.18xlarge occasion to obtain DeepSeek-R1 from the Hugging Face DeepSeek-R1 hub:
# Create compute occasion
compute = ComputeCreator.create(
instance_type="ml.c5.18xlarge",
instance_count=1
)

# Create FSx information channel
data_channel = FSxDataChannelCreator.create_channel(
directory_path=fsx_mount_point
)

# Create community configuration
community = NetworkConfigCreator.create_network_config(network_config)

# Arrange supply code configuration
source_code = SourceCode(
source_dir="scripts",
entry_script="obtain.py"
)
...

# Create mannequin coach
model_trainer = create_model_trainer(
compute=compute,
community=community,
data_channel=data_channel,
motion="obtain",
source_code=source_code
...
)

  1. Provoke the coaching calling practice operate of the ModelTrainer class:
model_trainer.practice(input_data_config=[data_channel], wait=True)

Convert DeepSeek R1 from FP8 to BF16

Use ModelTrainer to transform the DeepSeek-R1 downloaded mannequin weights from FP8 to BF16 format for optimum PEFT coaching. We use script convert.sh to run the execution utilizing the ml.c5.18xlarge occasion.

Use SageMaker coaching heat pool configuration to retain and reuse provisioned infrastructure after the completion of a mannequin obtain coaching job within the earlier step:

# Outline constants
FSX_MODELDIR_BF16 = "deepseek-r1-bf16"
FSX_DIR_PATH = f"{fsx_mount_point}/{fsx_dir_basemodel}"

# Create compute occasion
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=1
)

...

# Arrange supply code configuration
source_code = SourceCode(
source_dir="scripts",
entry_script="convert.sh"
)

...
# Create mannequin coach for conversion
model_trainer = create_model_trainer(
..
motion="convert",
...
)

Fantastic-tune the DeepSeek-R1 mannequin

The subsequent part entails fine-tuning the DeepSeek-R1 mannequin utilizing two ml.p5.48xlarge situations, utilizing distributed coaching. You implement this by means of the SageMaker recipe hf_deepseek_r1_671b_seq8k_gpu_qlora, which includes the QLoRA methodology. QLoRA makes the large language model (LLM) trainable on restricted compute by quantizing the bottom mannequin to 4-bit precision whereas utilizing small, trainable low-rank adapters for fine-tuning, dramatically lowering reminiscence necessities with out sacrificing mannequin high quality:

# Create compute configuration with P5 situations
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=2
)

...

# Create mannequin coach for fine-tuning
model_trainer = create_model_trainer(
use_recipes=True,
...
motion="finetune",
training_recipe="fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora",
recipe_overrides=recipe_overrides
)

Provoke the coaching job to fine-tune the mannequin. SageMaker coaching jobs will provision two P5 situations, orchestrate the SageMaker mannequin parallel container smdistributed-modelparallel:2.4.1-gpu-py311-cu121, and execute the recipe to fine-tune DeepSeek-R1 with the QLoRA technique on an ephemeral cluster:

model_trainer.practice (input_data_config=[data_channel], wait=True)

Merge the skilled adapter with the bottom mannequin

Merge the skilled adapters with the bottom mannequin so it may be used for inference:

# Create compute configuration with P5 occasion
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=1
)

# Configure supply code location and entry level
source_code = SourceCode(
source_dir="scripts",
entry_script="cli-inference.sh"
)
...

# Create mannequin coach for adapter merging
model_trainer = create_model_trainer(
use_recipes=False,
...
motion="mergeadapter",
source_code=source_code,
)

The subsequent part exhibits how one can run related steps on HyperPod to run your generative AI workloads.

Possibility B: Fantastic-tune utilizing SageMaker HyperPod with Slurm

To fine-tune the mannequin utilizing HyperPod, make it possible for your cluster is up and prepared by following the stipulations talked about earlier. To entry the login/head node of the HyperPod Slurm cluster out of your growth surroundings, observe the login directions at SSH into Cluster within the workshop.

Alternatively, you too can use AWS Systems Manager and run a command resembling the next to start out the session. You could find the cluster ID, occasion group title, and occasion ID on the Amazon SageMaker console.

aws ssm start-session --target sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id] --region region_name

  1. While you’re within the cluster’s login/head node, run the next instructions to arrange the surroundings. Run sudo su - ubuntu to run the remaining instructions as the foundation consumer, except you might have a particular consumer ID to entry the cluster and your POSIX consumer is created by means of a lifecycle script on the cluster. Confer with the multi-user setup for extra particulars.
# create a digital surroundings
python3 -m venv ${PWD}/venv
supply venv/bin/activate

# clone the recipes repository and arrange the surroundings
git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 set up -r necessities.txt

  1. Create a squash file utilizing Enroot to run the job on the cluster. Enroot runtime presents GPU acceleration, rootless container help, and seamless integration with HPC environments, making it ideally suited for working workflows securely.
# create a squash file utilizing Enroot
REGION=<area>
IMAGE="658645717510.dkr.ecr.${REGION}.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121"
aws ecr get-login-password --region "${REGION}" | docker login --username AWS --password-stdin 658645717510.dkr.ecr.${REGION}.amazonaws.com
enroot import -o $PWD/smdistributed-modelparallel.sqsh dockerd://${IMAGE}

  1. After you’ve created the squash file, replace the recipes_collection/config.yaml file with absolutely the path to the squash file (created within the previous step), and replace the instance_type if wanted. The ultimate config file ought to have the next parameters:
...

cluster_type: slurm
...

instance_type: p5.48xlarge
...

container: /fsx/<path-to-smdistributed-modelparallel>.sqsh
...

Additionally replace the file recipes_collection/cluster/slurm.yaml so as to add container_mounts pointing to the FSx for Lustre file system utilized in your cluster.

Comply with these high-level steps to arrange, fine-tune, and consider the mannequin utilizing HyperPod recipes:

  1. Obtain the mannequin and convert weights to BF16
  2. Fantastic-tune the mannequin utilizing QLoRA
  3. Merge the skilled mannequin adapter
  4. Consider the fine-tuned mannequin

Obtain the mannequin and convert weights to BF16

Obtain the DeepSeek-R1 mannequin from the HuggingFace hub and convert the mannequin weights from FP8 to BF16. You’ll want to convert this to make use of QLoRA for fine-tuning. Copy and execute the next bash script:

#!/bin/bash
begin=$(date +%s)
# set up git lfs and obtain the mannequin from huggingface
sudo apt-get set up git-lfs
GIT_LFS_SKIP_SMUDGE=1 && git clone https://huggingface.co/deepseek-ai/DeepSeek-R1 
&& cd DeepSeek-R1 && git config lfs.concurrenttransfers nproc &&  git lfs pull
finish=$(date +%s)
echo "Time taken to obtain mannequin: $((finish - begin)) seconds"
begin=$(date +%s)
#convert the mannequin weights from fp8 to bf16
supply venv/bin/activate
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference && pip set up -r necessities.txt && 
wget https://github.com/aws/sagemaker-hyperpod-training-adapter-for-nemo/blob/foremost/src/hyperpod_nemo_adapter/scripts/fp8_cast_bf16.py && 
python fp8_cast_bf16.py --input-fp8-hf-path ./DeepSeek-R1 --output-bf16-hf-path ./DeepSeek-R1-bf16

finish=$(date +%s)
echo "Time taken to transform mannequin to BF16: $((finish - begin)) seconds"

Fantastic-tune the mannequin utilizing QLoRA

Obtain the ready dataset that you just uploaded to Amazon S3 into your FSx for Lustre quantity hooked up to the cluster.

  1. Enter the next instructions to obtain the information from Amazon S3:
aws s3 cp s3://{bucket_name}/{input_path}/practice /fsx/ubuntu/deepseek/information/practice --recursive
aws s3 cp s3://{bucket_name}/{input_path}/check /fsx/ubuntu/deepseek/information/check --recursive

  1. Replace the launcher script to fine-tune the DeepSeek-R1 671B mannequin. The launcher scripts function handy wrappers for executing the coaching script, foremost.py file, simplifying the method of fine-tuning and parameter adjustment. For fine-tuning the DeepSeek R1 671B mannequin, you’ll find the precise script at:
launcher_scripts/deepseek/run_hf_deepseek_r1_671b_seq8k_gpu_qlora.sh

Earlier than working the script, you should modify the situation of the coaching and validation information, replace the HuggingFace mannequin ID, and optionally the entry token for personal fashions and datasets. The script ought to seem like the next (replace recipes.coach.num_nodes for those who’re utilizing a multi-node cluster):

#!/bin/bash

# Unique Copyright (c), NVIDIA CORPORATION. Modifications © Amazon.com

#Customers ought to setup their cluster sort in /recipes_collection/config.yaml

SAGEMAKER_TRAINING_LAUNCHER_DIR=${SAGEMAKER_TRAINING_LAUNCHER_DIR:-"$(pwd)"}

HF_MODEL_NAME_OR_PATH="/fsx/ubuntu/deepseek/DeepSeek-R1-bf16" # Path to the BF16 transformed mannequin

TRAIN_DIR="/fsx/ubuntu/deepseek/information/practice" # Location of coaching dataset
VAL_DIR="/fsx/ubuntu/deepseek/information/practice/" # Location of validation dataset

EXP_DIR="/fsx/ubuntu/deepseek/checkpoints" # Location to save lots of experiment data together with logging, checkpoints, and so on.

HYDRA_FULL_ERROR=1 python3 "${SAGEMAKER_TRAINING_LAUNCHER_DIR}/foremost.py" 
recipes=fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora 
base_results_dir="${SAGEMAKER_TRAINING_LAUNCHER_DIR}/outcomes" 
recipes.run.title="hf-deepseek-r1-671b-seq8k-gpu-qlora" 
recipes.exp_manager.exp_dir="$EXP_DIR" 
recipes.coach.num_nodes=2 
recipes.mannequin.train_batch_size=1 
recipes.mannequin.information.train_dir="$TRAIN_DIR" 
recipes.mannequin.information.val_dir="$VAL_DIR" 
recipes.mannequin.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH" 

You possibly can view the recipe for this fine-tuning job underneath recipes_collection/recipes/fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora.yaml and override extra parameters as wanted.

  1. Submit the job by working the launcher script:
bash launcher_scripts/deepseek/run_hf_deepseek_r1_671b_seq8k_gpu_qlora.sh

Monitor the job utilizing Slurm instructions resembling squeue and scontrol present to view the standing of the job and the corresponding logs. The logs could be discovered within the outcomes folder within the launch listing. When the job is full, the mannequin adapters are saved within the EXP_DIR that you just outlined within the launch. The construction of the listing ought to seem like this:

ls -R
.:.:
checkpoints experiment outcome.json

./checkpoints:
peft_sharded

./checkpoints/peft_sharded:
step_50

./checkpoints/peft_sharded/step_50:
README.md adapter_config.json adapter_model.safetensors tp0_ep0

You possibly can see the skilled adapter weights are saved as a part of the checkpointing underneath ./checkpoints/peft_sharded/step_N. We are going to later use this to merge with the bottom mannequin.

Merge the skilled mannequin adapter

Comply with these steps:

  1. Run a job utilizing the smdistributed-modelparallel enroot picture to merge the adapter with the bottom mannequin.
  1. Obtain the merge_peft_checkpoint.py code from sagemaker-hyperpod-training-adapter-for-nemo repository and retailer it in Amazon FSx. Modify the export variables within the following scripts accordingly to mirror the paths for SOURCE_DIR, ADAPTER_PATH, BASE_MODEL_BF16 and MERGE_MODEL_PATH.
#!/bin/bash
# Copyright Amazon.com, Inc. or its associates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0
#SBATCH --nodes=1 # variety of nodes to make use of
#SBATCH --job-name=deepseek_merge_adapter # title of your job
#SBATCH --exclusive # job has unique use of the useful resource, no sharing
#SBATCH --wait-all-nodes=1

set -ex;
export SOURCE_DIR=/fsx/path_to_merge_code #(folder containing merge_peft_checkpoint.py)
export ADAPTER_PATH=/fsx/path_to_adapter #( from earlier step )
export BASE_MODEL_BF16=/fsx/path_to_base #( BF16 mannequin from step 1 )
export MERGE_MODEL_PATH=/fsx/path_to_merged_model

# default variables for mounting native paths to container
: "${IMAGE:=$(pwd)/smdistributed-modelparallel.sqsh}"
: "${HYPERPOD_PATH:="/var/log/aws/clusters":"/var/log/aws/clusters"}" #that is want for validating its hyperpod cluster
: "${ADAPTER_PATH_1:=$ADAPTER_PATH:$ADAPTER_PATH}"
: "${BASE_MODEL_BF16_1:=$BASE_MODEL_BF16:$BASE_MODEL_BF16}"
: "${MERGE_MODEL_PATH_1:=$MERGE_MODEL_PATH:$MERGE_MODEL_PATH}"
: "${SOURCE_DIR_1:=$SOURCE_DIR:$SOURCE_DIR}"
############

declare -a ARGS=(
--container-image $IMAGE
--container-mounts $HYPERPOD_PATH,$ADAPTER_PATH_1,$BASE_MODEL_BF16_1,$MERGE_MODEL_PATH_1,$SOURCE_DIR_1
)
#Merge adapter with base mannequin.

srun -l "${ARGS[@]}" python  $SOURCE_DIR/merge_peft_checkpoint.py 
--hf_model_name_or_path $BASE_MODEL_BF16 
--peft_adapter_checkpoint_path $ADAPTER_PATH 
--output_model_path $MERGE_MODEL_PATH 
--deepseek_v3 true

Consider the fine-tuned mannequin

Use the fundamental testing scripts supplied by DeekSeek to deploy the merged mannequin.

  1. Begin by cloning their repo:
git clone https://github.com/deepseek-ai/DeepSeek-V3.git

cd DeepSeek-V3/inference
pip set up -r necessities.txt

  1. You’ll want to convert the merged mannequin to a particular format for working inference. On this case, you want 4*P5 situations to deploy the mannequin as a result of the merged mannequin is in BF16. Enter the next command to transform the mannequin:
python convert.py --hf-ckpt-path /fsx/ubuntu/deepseek/DeepSeek-V3-Base/ 
--save-path /fsx/ubuntu/deepseek/DeepSeek-V3-Demo --n-experts 256 
--model-parallel 32

  1. When the conversion is full, use the next sbatch script to run the batch inference, making the next changes:
    1. Replace the ckpt-path to the transformed mannequin path from the earlier step.
    2. Create a brand new prompts.txt file with every line containing a immediate. The job will use the prompts from this file and generate output.
#!/bin/bash
#SBATCH —nodes=4
#SBATCH —job-name=deepseek_671b_inference
#SBATCH —output=deepseek_671b_percentj.out

# Set surroundings variables
export MASTER_ADDR=$(scontrol present hostnames $SLURM_JOB_NODELIST | head -n 1)
export MASTER_PORT=29500
supply /fsx/ubuntu/alokana/deepseek/venv/bin/activate
# Run the job utilizing torchrun
srun /fsx/ubuntu/alokana/deepseek/venv/bin/torchrun 
—nnodes=4 
—nproc-per-node=8 
—rdzv_id=$SLURM_JOB_ID 
—rdzv_backend=c10d 
—rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT 
./generate.py 
—ckpt-path /fsx/ubuntu/alokana/deepseek/DeepSeek-R1-Demo 
—config ./configs/config_671B.json 
--input-file ./prompts.txt

Cleanup

To wash up your assets to keep away from incurring extra costs, observe these steps:

  1. Delete any unused SageMaker Studio resources.
  2. (Non-compulsory) Delete the SageMaker Studio domain.
  3. Confirm that your coaching job isn’t working anymore. To take action, in your SageMaker console, select Coaching and test Coaching jobs.
  4. For those who created a HyperPod cluster, delete the cluster to cease incurring prices. For those who created the networking stack from the HyperPod workshop, delete the stack as nicely to scrub up the digital personal cloud (VPC) assets and the FSx for Lustre quantity.

Conclusion

On this publish, we demonstrated the way to fine-tune giant fashions resembling DeepSeek-R1 671B utilizing both SageMaker coaching jobs or SageMaker HyperPod with HyperPod recipes in just a few steps. This strategy minimizes the complexity of figuring out optimum distributed coaching configurations and supplies a easy solution to correctly measurement your workloads with one of the best price-performance structure on AWS.

To begin utilizing SageMaker HyperPod recipes, go to our sagemaker-hyperpod-recipes GitHub repository for complete documentation and instance implementations. Our crew frequently expands our recipes primarily based on buyer suggestions and rising machine learning (ML) traits, ensuring you might have the mandatory instruments for profitable AI mannequin coaching.


Concerning the Authors

 Kanwaljit Khurmi is a Principal Worldwide Generative AI Options Architect at AWS. He collaborates with AWS product groups, engineering departments, and clients to supply steerage and technical help, serving to them improve the worth of their hybrid machine studying options on AWS. Kanwaljit makes a speciality of helping clients with containerized functions and high-performance computing options.

Arun Kumar Lokanatha is a Senior ML Options Architect with the Amazon SageMaker crew. He makes a speciality of giant language mannequin coaching workloads, serving to clients construct LLM workloads utilizing SageMaker HyperPod, SageMaker coaching jobs, and SageMaker distributed coaching. Exterior of labor, he enjoys working, mountaineering, and cooking.

 Anoop Saha is a Sr GTM Specialist at Amazon Internet Providers (AWS) specializing in generative AI mannequin coaching and inference. He companions with high frontier mannequin builders, strategic clients, and AWS service groups to allow distributed coaching and inference at scale on AWS and lead joint GTM motions. Earlier than AWS, Anoop held a number of management roles at startups and enormous companies, primarily specializing in silicon and system structure of AI infrastructure.

Rohith Nadimpally is a Software program Growth Engineer engaged on AWS SageMaker, the place he accelerates large-scale AI/ML workflows. Earlier than becoming a member of Amazon, he graduated with Honors from Purdue College with a level in Laptop Science. Exterior of labor, he enjoys taking part in tennis and watching motion pictures.

Leave a Reply

Your email address will not be published. Required fields are marked *