Migrate Amazon SageMaker Information Wrangler flows to Amazon SageMaker Canvas for quicker knowledge preparation


Amazon SageMaker Data Wrangler supplies a visible interface to streamline and speed up knowledge preparation for machine studying (ML), which is usually essentially the most time-consuming and tedious activity in ML initiatives. Amazon SageMaker Canvas is a low-code no-code visible interface to construct and deploy ML fashions with out the necessity to write code. Based mostly on prospects’ suggestions, we now have mixed the superior ML-specific knowledge preparation capabilities of SageMaker Information Wrangler inside SageMaker Canvas, offering customers with an end-to-end, no-code workspace for making ready knowledge, and constructing and deploying ML fashions.

By abstracting away a lot of the complexity of the ML workflow, SageMaker Canvas lets you put together knowledge, then construct or use a mannequin to generate extremely correct enterprise insights with out writing code. Moreover, making ready knowledge in SageMaker Canvas affords many enhancements, equivalent to web page hundreds as much as 10 occasions quicker, a pure language interface for knowledge preparation, the flexibility to view the info measurement and form at each step, and improved replace and reorder transforms to iterate on a knowledge movement. Lastly, you possibly can one-click create a mannequin in the identical interface, or create a SageMaker Canvas dataset to fine-tune basis fashions (FMs).

This publish demonstrates how one can convey your current SageMaker Information Wrangler flows—the directions created when constructing knowledge transformations—from SageMaker Studio Classic to SageMaker Canvas. We offer an instance of shifting information from SageMaker Studio Traditional to Amazon Simple Storage Service (Amazon S3) as an intermediate step earlier than importing them into SageMaker Canvas.

Answer overview

The high-level steps are as follows:

  1. Open a terminal in SageMaker Studio and duplicate the movement information to Amazon S3.
  2. Import the movement information into SageMaker Canvas from Amazon S3.

Stipulations

On this instance, we use a folder referred to as data-wrangler-classic-flows as a staging folder for migrating movement information to Amazon S3. It’s not essential to create a migration folder, however on this instance, the folder was created utilizing the file system browser portion of SageMaker Studio Traditional. After you create the folder, take care to maneuver and consolidate related SageMaker Information Wrangler movement information collectively. Within the following screenshot, three movement information essential for migration have been moved into the folder data-wrangler-classic-flows, as seen within the left pane. Considered one of these information, titanic.movement, is opened and visual in the precise pane.

Copy movement information to Amazon S3

To repeat the movement information to Amazon S3, full the next steps:

  1. To open a brand new terminal in SageMaker Studio Traditional, on the File menu, select Terminal.
  2. With a brand new terminal open, you possibly can provide the next instructions to repeat your movement information to the Amazon S3 location of your selecting (changing NNNNNNNNNNNN along with your AWS account quantity):
    cd data-wrangler-classic-flows
    goal="s3://sagemaker-us-west-2-NNNNNNNNNNNN/data-wrangler-classic-flows/"
    aws s3 sync . $goal --exclude "*.*" --include "*.movement"

The next screenshot exhibits an instance of what the Amazon S3 sync course of ought to appear to be. You’ll get a affirmation in any case information are uploaded. You’ll be able to alter the previous code to satisfy your distinctive enter folder and Amazon S3 location wants. When you don’t need to create a folder, once you enter the terminal, merely skip the change listing (cd) command, and all movement information in your total SageMaker Studio Traditional file system shall be copied to Amazon S3, no matter origin folder.

After you add the information to Amazon S3, you possibly can validate that they’ve been copied utilizing the Amazon S3 console. Within the following screenshot, we see the unique three movement information, now in an S3 bucket.

Import Information Wrangler movement information into SageMaker Canvas

To import the movement information into SageMaker Canvas, full the next steps:

  1. On the SageMaker Studio console, select Information Wrangler within the navigation pane.
  2. Select Import knowledge flows.
  3. For Choose a knowledge supply, select Amazon S3.
  4. For Enter S3 endpoint, enter the Amazon S3 location you used earlier to repeat information from SageMaker Studio to Amazon S3, then select Go. It’s also possible to navigate to the Amazon S3 location utilizing the browser beneath.
  5. Choose the movement information to import, then select Import.

After you import the information, the SageMaker Information Wrangler web page will refresh to point out the newly imported information, as proven within the following screenshot.

Use SageMaker Canvas for knowledge transformation with SageMaker Information Wrangler

Select one of many flows (for this instance, we select titanic.movement) to launch the SageMaker Information Wrangler transformation.

Now you possibly can add analyses and transformations to the info movement utilizing a visible interface (Accelerate data preparation for ML in Amazon SageMaker Canvas) or pure language interface (Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas).

While you’re pleased with the info, select the plus signal and select Create mannequin, or select Export to export the dataset to construct and use ML fashions.

Alternate migration methodology

This publish has supplied steerage on utilizing Amazon S3 emigrate SageMaker Information Wrangler movement information from a SageMaker Studio Traditional atmosphere. Phase 3: (Optional) Migrate data from Studio Classic to Studio supplies a second methodology that makes use of your native machine to switch the movement information. Moreover, you possibly can obtain single movement information from the SageMaker Studio tree management to your native machine, then import them manually in SageMaker Canvas. Select the tactic that fits your wants and use case.

Clear up

While you’re completed, shut down any running SageMaker Data Wrangler applications in SageMaker Studio Traditional. To avoid wasting prices, you may as well take away any movement information from the SageMaker Studio Traditional file browser, which is an Amazon Elastic File System (Amazon EFS) quantity. It’s also possible to delete any of the intermediate information in Amazon S3. After the movement information are imported into SageMaker Canvas, the information copied to Amazon S3 are now not wanted.

You’ll be able to sign off of SageMaker Canvas once you’re completed, then relaunch it once you’re prepared to make use of it once more.

Conclusion

Migrating your current SageMaker Information Wrangler flows to SageMaker Canvas is a simple course of that permits you to use the superior knowledge preparations you’ve already developed whereas benefiting from the end-to-end, low-code no-code ML workflow of SageMaker Canvas. By following the steps outlined on this publish, you possibly can seamlessly transition your knowledge wrangling artifacts to the SageMaker Canvas atmosphere, streamlining your ML initiatives and enabling enterprise analysts and non-technical customers to construct and deploy fashions extra effectively.

Begin exploring SageMaker Canvas right now and expertise the ability of a unified platform for knowledge preparation, mannequin constructing, and deployment!


In regards to the Authors

Charles Laughlin is a Principal AI Specialist at Amazon Internet Providers (AWS). Charles holds an MS in Provide Chain Administration and a PhD in Information Science. Charles works within the Amazon SageMaker service staff the place he brings analysis and voice of the shopper to tell the service roadmap. In his work, he collaborates day by day with numerous AWS prospects to assist rework their companies with cutting-edge AWS applied sciences and thought management.

Dan Sinnreich is a Sr. Product Supervisor for Amazon SageMaker, targeted on increasing no-code / low-code companies. He’s devoted to creating ML and generative AI extra accessible and making use of them to resolve difficult issues. Outdoors of labor, he will be discovered enjoying hockey, scuba diving, and studying science fiction.

Huong Nguyen is a Sr. Product Supervisor at AWS. She is main the ML knowledge preparation for SageMaker Canvas and SageMaker Information Wrangler, with 15 years of expertise constructing customer-centric and data-driven merchandise.

Davide Gallitelli is a Specialist Options Architect for AI/ML within the EMEA area. He’s primarily based in Brussels and works intently with buyer all through Benelux. He has been a developer since very younger, beginning to code on the age of seven. He began studying AI/ML in his later years of college, and has fallen in love with it since then.get affirmation

Leave a Reply

Your email address will not be published. Required fields are marked *