Meta SAM 2.1 is now accessible in Amazon SageMaker JumpStart

This weblog publish is co-written with George Orlin from Meta.

At present, we’re excited to announce that Meta’s Segment Anything Model (SAM) 2.1 imaginative and prescient segmentation mannequin is publicly accessible via Amazon SageMaker JumpStart to deploy and run inference. Meta SAM 2.1 supplies state-of-the-art video and picture segmentation capabilities in a single mannequin. This cutting-edge mannequin helps long-context processing, complicated segmentation situations, and fine-grained evaluation, making it splendid for automating processes for varied industries comparable to medical imaging in healthcare, satellite tv for pc imagery for surroundings monitoring, and object segmentation for autonomous techniques. Meta SAM 2.1 is nicely fitted to zero-shot object segmentation and correct object detection based mostly on easy prompts comparable to level coordinates and bounding containers in a body for video monitoring and picture masking.

This mannequin was predominantly skilled on AWS, and AWS may also be the primary cloud supplier to make it accessible to prospects. On this publish, we stroll via how you can uncover and deploy the Meta SAM 2.1 mannequin utilizing SageMaker JumpStart.

Meta SAM 2.1 overview

Meta SAM 2.1 is a state-of-the-art imaginative and prescient segmentation mannequin designed for high-performance pc imaginative and prescient duties, enabling superior object detection and segmentation workflows. Constructing upon its predecessor, model 2.1 introduces enhanced segmentation accuracy, sturdy generalization throughout numerous datasets, and scalability for production-grade purposes. These options allow AI researchers and builders in pc imaginative and prescient, picture processing, and data-driven analysis to enhance duties that require detailed evaluation segmentation throughout a number of fields.

Meta SAM 2.1 has a streamlined structure that’s optimized for integration with in style model-serving frameworks like TorchServe and may be deployed on Amazon SageMaker AI to energy real-time or batch inference pipelines. Meta SAM 2.1 empowers organizations to realize exact segmentation outcomes in vision-centric workflows with minimal configuration and most effectivity.

Meta SAM 2.1 gives a number of variants—Tiny, Small, Base Plus, and Massive—accessible now on SageMaker JumpStart, balancing mannequin measurement, pace, and segmentation efficiency to cater to numerous utility wants.

SageMaker JumpStart overview

SageMaker JumpStart gives entry to a broad choice of publicly accessible basis fashions (FMs). These pre-trained fashions function highly effective beginning factors that may be deeply personalized to deal with particular use instances. Now you can use state-of-the-art mannequin architectures, comparable to language fashions, pc imaginative and prescient fashions, and extra, with out having to construct them from scratch.

With SageMaker JumpStart, you may deploy fashions in a safe surroundings. Fashions hosted on JumpStart may be provisioned on devoted SageMaker Inference cases, together with AWS Trainium and AWS Inferentia based mostly cases, and are remoted inside your digital non-public cloud (VPC). This enforces knowledge safety and compliance, as a result of the fashions function underneath your personal VPC controls, moderately than in a shared public surroundings. After deploying an FM, you may additional customise and fine-tune it utilizing the in depth capabilities of SageMaker AI, together with SageMaker Inference for deploying fashions and container logs for improved observability. With SageMaker AI, you may streamline the complete mannequin deployment course of.

Stipulations

Be sure you have the next stipulations to deploy Meta SAM 2.1 and run inference:

An AWS account that may comprise all of your AWS assets.
An AWS Identity and Access Management (IAM) position to entry SageMaker AI. To be taught extra about how IAM works with SageMaker AI, consult with Identity and Access Management for Amazon SageMaker AI.
Entry to Amazon SageMaker Studio or a SageMaker pocket book occasion or an interactive improvement surroundings (IDE) comparable to PyCharm or Visible Studio Code. We suggest utilizing SageMaker Studio for simple deployment and inference.
Entry to accelerated cases (GPUs) for internet hosting the mannequin.

Uncover Meta SAM 2.1 in SageMaker JumpStart

SageMaker JumpStart supplies FMs via two major interfaces: SageMaker Studio and the SageMaker Python SDK. This supplies a number of choices to find and use tons of of fashions in your particular use case.

SageMaker Studio is a complete IDE that provides a unified, web-based interface for performing all points of the machine studying (ML) improvement lifecycle. From getting ready knowledge to constructing, coaching, and deploying fashions, SageMaker Studio supplies purpose-built instruments to streamline the complete course of. In SageMaker Studio, you may entry SageMaker JumpStart to find and discover the in depth catalog of FMs accessible for deployment to inference capabilities on SageMaker Inference.

You may entry the SageMaker JumpStart UI via both Amazon SageMaker Unified Studio or SageMaker Studio. To deploy Meta SAM 2.1 utilizing the SageMaker JumpStart UI, full the next steps:

In SageMaker Unified Studio, on the Construct menu, select JumpStart fashions.

If you happen to’re already on the SageMaker Studio console, select JumpStart within the navigation pane.

You may be prompted to create a challenge, after which you’ll start deployment.

Alternatively, you should utilize the SageMaker Python SDK to programmatically entry and use SageMaker JumpStart fashions. This method permits for higher flexibility and integration with present AI/ML workflows and pipelines. By offering a number of entry factors, SageMaker JumpStart helps you seamlessly incorporate pre-trained fashions into your AI/ML improvement efforts, no matter your most popular interface or workflow.

Deploy Meta SAM 2.1 for inference utilizing SageMaker JumpStart

On the SageMaker JumpStart touchdown web page, you may uncover the general public pre-trained fashions provided by SageMaker AI. You may select the Meta mannequin supplier tab to find the Meta fashions accessible.

If you happen to’re utilizing SageMaker Studio and don’t see the SAM 2.1 fashions, replace your SageMaker Studio model by shutting down and restarting. For extra details about model updates, consult with Shut down and Update Studio Classic Apps.

You may select the mannequin card to view particulars in regards to the mannequin comparable to license, knowledge used to coach, and how you can use. You too can discover two buttons, Deploy and Open Pocket book, which assist you use the mannequin.

While you select Deploy, try to be prompted to the following display to decide on an endpoint title and occasion kind to provoke deployment.

Upon defining your endpoint settings, you may proceed to the following step to make use of the mannequin.

Deploy Meta SAM 2.1 imaginative and prescient segmentation mannequin for inference utilizing the Python SDK

While you select Deploy, mannequin deployment will begin. Alternatively, you may deploy via the instance pocket book by selecting Open Pocket book. The pocket book supplies end-to-end steerage on how you can deploy the mannequin for inference and clear up assets.

To deploy utilizing a pocket book, you begin by deciding on an acceptable mannequin, specified by the model_id. You may deploy any of the chosen fashions on SageMaker AI.

You may deploy a Meta SAM 2.1 imaginative and prescient segmentation mannequin utilizing SageMaker JumpStart with the next SageMaker Python SDK code:

from sagemaker.jumpstart.mannequin import JumpStartModel 
mannequin = JumpStartModel(model_id = "meta-vs-sam-2-1-hiera-tiny") 
predictor = mannequin.deploy()

This deploys the mannequin on SageMaker AI with default configurations, together with default occasion kind and default VPC configurations. You may change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you may run inference towards the deployed endpoint via the SageMaker predictor. There are three duties which are accessible with this endpoint: automated masks generator, picture predictor, and video predictor. We offer a code snippet for every later on this publish. To make use of the predictor, a sure payload schema must be adopted. The endpoint has sticky classes enabled, so to start out inference, it’s essential to ship a start_session payload:

def start_session(asset_type, asset_path):

    asset_base64 = None
    
     with open(image_path, 'rb') as f:
            asset_base64 = base64.b64encode(f.learn()).decode('utf-8')
    
    response = predictor.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps({
                    "kind": "start_session",
                    "input_type": asset_type,
                    "path": asset_base64 
                }),
        SessionId="NEW_SESSION",
    )
    
    session_id = response.headers.get("x-amzn-sagemaker-new-session-id")
    
    return session_id

The start_session invocation wants an enter media kind of both picture or video and the base64 encoded knowledge of the media. This may launch a session with an occasion of the mannequin and cargo the media to be segmented.

To shut a session, ship a close_session invocation:

def close_session(session_id):
    response = predictor.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps({
                    "kind": "close_session",
                    "session_id": session_id
                }),
        SessionId=session_id,
    )
    
    session_id = response.headers.get("x-amzn-sagemaker-closed-session-id")
    
    return session_id

If x-amzn-sagemaker-closed-session-id exists as a header, then the session has been efficiently closed.

To proceed a session and retrieve the session ID of the prevailing session, the response header can have the x-amzn-sagemaker-session-id key with the present session ID for any operation that’s not start_session or close_session. Operations that aren’t start_session or close_session must be invoked with a response stream. That is as a result of measurement of the ensuing payload being bigger than what SageMaker real-time endpoints can return.

This can be a primary instance of interacting with the SAM 2.1 SageMaker JumpStart endpoint with sticky classes. The next examples for every of the duties reference these operations with out repeating them. The returned knowledge is of mime kind JSONL. For extra full examples, consult with the instance notebooks for Meta SAM 2.1 on SageMaker Jumpstart.

Advisable cases and benchmarks

The next desk lists all of the Meta SAM 2.1 fashions accessible in SageMaker JumpStart together with the model_id, default occasion varieties, and most variety of whole tokens (sum of variety of enter tokens and variety of generated tokens) supported for every of those fashions. For elevated context size, you may modify the default occasion kind within the SageMaker JumpStart UI.

Mannequin Identify	Mannequin ID	Default Occasion Kind	Supported Occasion Varieties
Meta SAM 2.1 Tiny	meta-vs-sam-2-1-hiera-tiny	ml.g6.24xlarge (5.5 MB whole picture or video measurement)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge
Meta SAM 2.1 Small	meta-vs-sam-2-1-hiera-small	ml.g6.24xlarge (5.5 MB whole picture or video measurement)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge
Meta SAM 2.1 Base Plus	meta-vs-sam-2-1-hiera-base-plus	ml.g6.24xlarge (5.5 MB whole picture or video measurement)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge
Meta SAM 2.1 Massive	meta-vs-sam-2-1-hiera-large	ml.g6.24xlarge (5.5 MB whole picture or video measurement)	ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge

Meta SAM 2.1 use instances: Inference and immediate examples

After you deploy the mannequin utilizing SageMaker JumpStart, it’s best to be capable of see a reference Jupyter pocket book that references the parser and helper capabilities wanted to start utilizing Meta SAM 2.1. After you comply with these cells within the pocket book, try to be prepared to start utilizing the mannequin’s imaginative and prescient segmentation capabilities.

Meta SAM 2.1 gives help for 3 completely different duties (automated masks generator, picture predictor, video predictor) to generate masks for varied objects in photographs, together with object monitoring in movies. Within the following examples, we show how you can use the automated masks generator and picture predictor on a JPG of a truck. This truck.jpg file is saved within the jumpstart-cache-prod bucket; you may entry it with the next code:

s3_bucket = f"jumpstart-cache-prod-{area}"
key_prefix = "inference-notebook-assets"

def download_from_s3(key_filenames):
    for key_filename in key_filenames:
        s3.download_file(s3_bucket, f"{key_prefix}/{key_filename}", key_filename)
        
truck_jpg = "truck.jpg"

#Obtain photographs.
download_from_s3(key_filenames=[truck_jpg])
show(Picture(filename=truck_jpg))

After you could have your picture and it’s encoded, you may create masks for objects within the picture. To be used instances the place you need to generate masks for each object within the picture, you should utilize the automated masks generator job.

Automated masks generator

The automated masks generator is nice for AI researchers for pc imaginative and prescient duties and purposes comparable to medical imaging and diagnostics to mechanically phase areas of curiosity like tumors or particular organs to supply extra correct diagnostic help. Moreover, the automated masks generator may be notably helpful within the autonomous automobile house, during which it may well phase out components in a digicam like pedestrians, autos, and different objects. Let’s use the automated masks generator to generate masks for all of the objects in truck.jpg.

The next code is the immediate to generate masks in your base64 encoded picture:

# Begin session
session_id = start_session("picture", truck_jpg)
    
# Generate and visualize masks with primary parameters
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps({
            "kind": "generate_automatic_masks",
            "session_id": session_id,
            "points_per_side": 32,
            "min_mask_region_area": 100
        }),
        SessionId=session_id,
        Settle for="utility/jsonlines"
    )
    
# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

masks = parser.get_responses()

# Finish session
end_session(session_id)

We obtain the next output (parsed and visualized).

Picture predictor

Moreover, you may select which objects within the supplied picture you need to create a masks for by including factors inside that object for Meta SAM 2.1 to create. A use case for the picture predictor may be precious for duties associated to design and modeling by automating processes that usually require handbook efforts. For instance, the picture predictor can automate turning 2D photographs into 3D fashions by analyzing 2D photographs of blueprints, sketches, or flooring plans and producing preliminary 3D fashions. That is one in every of many examples of how the picture predictor can act as a bridge between 2D and 3D building throughout many various duties. We use the next picture with the factors that we used to immediate Meta SAM 2.1 for masking the article.

The next code is used to immediate Meta SAM 2.1 and plot the coordinates:

# Begin session
session_id = start_session("picture", truck_jpg)

factors = [
            {"type": "point", "coordinates": [500, 375], "label": 1},
            {"kind": "level", "coordinates": [1125, 625], "label": 1}
         ]
    
# Add a number of factors
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps({
            "kind": "add_points",
            "session_id": session_id,
            "factors": [p["coordinates"] for p in factors],
            "labels": [p["label"] for p in factors],
            "clear_old_points": clear_old_point,
        }),
        SessionId=session_id,
        Settle for="utility/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

# Intermediate Response
masks = parser.get_responses()
    
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps({
            "kind": "predict",
            "session_id": session_id,
            "multimask_output": True,
            "return_logits": True
        }),
        SessionId=session_id,
        Settle for="utility/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

masks = parser.get_responses()

# Finish session
end_session(session_id)

We obtain the next output (parsed and visualized).

Video predictor

We now show how you can immediate Meta SAM 2.1 for object monitoring on video. One use case could be for ergonomic knowledge assortment and coaching functions. You need to use the video predictor to investigate the motion and posture of people in actual time, serving as a strategy to cut back harm and enhance efficiency by setting alarms for unhealthy posture or actions. Let’s begin by accessing the basketball-layup.mp4 file [1] from the jumpstart-cache-prod S3 bucket outlined within the following code:

basketball_mp4 = "basketball-layup.mp4"

#Obtain video
download_from_s3(key_filenames=[basketball_mp4])
show(Video(filename=basketball_mp4))

Video:

The next code exhibits how one can arrange the immediate format to trace objects within the video. The primary object will use coordinates to trace and never observe, and the second object will observe one coordinate.

# Begin session
session_id = start_session("video", basketball_mp4)

# Object 1
prompts1 = [
        {"type": "point", "coordinates": [1478, 649], "label": 1},
        {"kind": "level", "coordinates": [1433, 689], "label": 0},
    ]
    
# Extract factors and labels
factors = []
labels = []
for immediate in prompts1:
    if immediate["type"] == "level":
        factors.append(immediate["coordinates"])
        labels.append(immediate["label"])

request = {
        "kind": "add_points",
        "session_id": session_id,
        "frame_index": 0,
        "object_id": 1,
        "factors": factors,
        "labels": labels,
        "clear_old_points": True,
    }
    
# Add a number of factors
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps(request),
        SessionId=session_id,
        Settle for="utility/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

# Intermediate Response
masks = parser.get_responses()

# Object 2
prompts2 = [{"type": "point", "coordinates": [1433, 689], "label": 1}]

# Extract factors and labels
factors = []
labels = []
for immediate in prompts2:
    if immediate["type"] == "level":
        factors.append(immediate["coordinates"])
        labels.append(immediate["label"])

request = {
        "kind": "add_points",
        "session_id": session_id,
        "frame_index": 0,
        "object_id": 2,
        "factors": factors,
        "labels": labels,
        "clear_old_points": True,
    }
    
# Add a number of factors
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps(request),
        SessionId=session_id,
        Settle for="utility/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

# Intermediate Response
masks = parser.get_responses()
    
response = runtime_client.invoke_endpoint_with_response_stream(
        EndpointName=endpoint_name,
        ContentType="utility/json",
        Physique=json.dumps({
            "kind": "propagate_in_video",
            "session_id": session_id,
            "start_frame_index": 0,
        }),
        SessionId=session_id,
        Settle for="utility/jsonlines"
    )

# Parse response stream
parser = StreamParser()
for occasion in response['Body']:
    parser.write(occasion)

masks = parser.get_responses()

# Finish session
end_session(session_id)

We obtain the next output (parsed and visualized).

Video:

Right here we are able to see that Meta SAM 2.1 Tiny was in a position to efficiently observe the objects based mostly off the coordinates that had been supplied in immediate.

Clear up

To keep away from incurring pointless prices, whenever you’re performed, delete the SageMaker AI endpoints utilizing the next code:

predictor.delete_model()
predictor.delete_endpoint()

Alternatively, to make use of the SageMaker AI console, full the next steps:

On the SageMaker AI console, underneath Inference within the navigation pane, select
Seek for the embedding and textual content era endpoints.
On the endpoint particulars web page, select Delete.
Select Delete once more to substantiate.

Conclusion

On this publish, we explored how SageMaker JumpStart empowers knowledge scientists and ML engineers to find, entry, and deploy a variety of pre-trained FMs for inference, together with Meta’s most superior and succesful fashions thus far. Get began with SageMaker JumpStart and Meta SAM 2.1 fashions at this time. For extra details about SageMaker JumpStart, see SageMaker JumpStart pretrained models and Getting started with Amazon SageMaker JumpStart.

Sources:

[1] Erčulj F, Štrumbelj E (2015) Basketball Shot Varieties and Shot Success in Totally different Ranges of Aggressive Basketball. PLOS ONE 10(6): e0128885. https://doi.org/10.1371/journal.pone.0128885

Concerning the Authors

Marco Punio is a Sr. Specialist Options Architect centered on generative AI technique, utilized AI options, and conducting analysis to assist prospects hyper-scale on AWS. As a member of the third Social gathering Mannequin Supplier Utilized Sciences Options Structure staff at AWS, he’s a International Lead for the Meta – AWS Partnership and technical technique. Based mostly in Seattle, WA, Marco enjoys writing, studying, exercising, and constructing purposes in his free time.

Deepak Rupakula is a Principal GTM lead within the specialists group at AWS. He focuses on growing GTM technique for giant language fashions like Meta throughout AWS companies like Amazon Bedrock and Amazon SageMaker AI. With over 15 years of expertise within the tech business, his expertise contains management roles in product administration, buyer success, and analytics.

Harish Rao is a Senior Options Architect at AWS, specializing in large-scale distributed AI coaching and inference. He empowers prospects to harness the ability of AI to drive innovation and resolve complicated challenges. Outdoors of labor, Harish embraces an lively life-style, having fun with the tranquility of mountain climbing, the depth of racquetball, and the psychological readability of mindfulness practices.

Baladithya Balamurugan is a Options Architect at AWS centered on ML deployments for inference and utilizing AWS Neuron to speed up coaching and inference. He works with prospects to allow and speed up their ML deployments on companies comparable to Amazon SageMaker AI and Amazon EC2. Based mostly in San Francisco, Baladithya enjoys tinkering, growing purposes, and constructing his homelab in his free time.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker AI’s machine studying and generative AI hub. She is enthusiastic about constructing options that assist prospects speed up their AI journey and unlock enterprise worth.

Naman Nandan is a software program improvement engineer at AWS, specializing in enabling large-scale AI/ML inference workloads on Amazon SageMaker AI utilizing TorchServe, a challenge collectively developed by AWS and Meta. In his free time, he enjoys taking part in tennis and occurring hikes.

Meta SAM 2.1 is now accessible in Amazon SageMaker JumpStart

Meta SAM 2.1 overview

SageMaker JumpStart overview

Stipulations

Uncover Meta SAM 2.1 in SageMaker JumpStart

Deploy Meta SAM 2.1 for inference utilizing SageMaker JumpStart

Deploy Meta SAM 2.1 imaginative and prescient segmentation mannequin for inference utilizing the Python SDK

Advisable cases and benchmarks

Meta SAM 2.1 use instances: Inference and immediate examples

Automated masks generator

Picture predictor

Video predictor

Clear up

Conclusion

Concerning the Authors

The Roadmap for Mastering Language Fashions in 2025

Mastering Immediate Engineering with Purposeful Testing: A Systematic Information to Dependable LLM Outputs

run Qwen 2.5 on AWS AI chips utilizing Hugging Face libraries

Leave a Reply Cancel reply

The primary FireSat satellite tv for pc has launched to assist detect smaller wildfires earlier.

Aptima Awarded Navy Contract to Advance VR-Primarily based Pilot Coaching

The Roadmap for Mastering Language Fashions in 2025

Revolutionizing customer support: MaestroQA’s integration with Amazon Bedrock for actionable perception

EON Actuality Unleashes Immersive Genius with EON Creativity – EON Actuality

Meta SAM 2.1 overview

SageMaker JumpStart overview

Stipulations

Uncover Meta SAM 2.1 in SageMaker JumpStart

Deploy Meta SAM 2.1 for inference utilizing SageMaker JumpStart

Deploy Meta SAM 2.1 imaginative and prescient segmentation mannequin for inference utilizing the Python SDK

Advisable cases and benchmarks

Meta SAM 2.1 use instances: Inference and immediate examples

Automated masks generator

Picture predictor

Video predictor

Clear up

Conclusion

Concerning the Authors

More Stories

Leave a Reply Cancel reply

You may have missed