SK Telecom improves telco-specific Q&A by fine-tuning Anthropic’s Claude fashions in Amazon Bedrock


This publish has been co-written with Seunghyun Jeong, Sunwoo Lee and Eric Davis from SK Telecom.

SK Telecom (SKT), South Korea’s main telecommunications firm serving 30 million prospects, is on the forefront of AI innovation. Consistent with its AI Pyramid Technique, which goals to unlock AI’s potential for anybody, wherever, anytime, SKT has collaborated with the AWS Generative AI Innovation Center (GenAIIC) Customized Mannequin Program to discover domain-trained fashions utilizing Amazon Bedrock for telco-specific use circumstances.

This collaboration aligns with SKT’s imaginative and prescient of utilizing AI experience and strategic partnerships to develop modern AI-based services. One such initiative centered on growing a customized resolution for grounded query answering (Q&A) based mostly on reference paperwork.

Retrieval Augmented Era (RAG) is a well-liked approach for Q&A duties, providing improved factual accuracy and data grounding. Nonetheless, RAG faces challenges with producing a response not matching most well-liked tone, model, and manners for telco use circumstances, in addition to retrieving irrelevant paperwork, doubtlessly resulting in inaccurate responses. To handle this, SKT and AWS GenAIIC aimed to make use of mannequin customization to enhance Anthropic Claude models on Amazon Bedrock in three key areas:

  • Offering concise and informative solutions
  • Accurately referencing hyperlinks from retrieved paperwork
  • Answering in a tone and elegance according to SKT and just like floor reality solutions

Moreover, the group explored boosting smaller mannequin efficiency utilizing artificial knowledge generated by larger giant language fashions (LLMs) for data distillation and situations with restricted labeled coaching knowledge.

Amazon Bedrock is a completely managed service that provides a wide range of LLMs and basis fashions (FMs) together with capabilities akin to Amazon Bedrock Data Bases, Amazon Bedrock Brokers, and Amazon Bedrock Guardrails that may expedite many generative AI use circumstances. Amazon Bedrock is the one absolutely managed service that gives you with the power to fine-tune Claude fashions. Amazon Bedrock presents an intuitive and secure way of fine-tuning Anthropic’s Claude models and more. The fine-tuned Claude mannequin might be deployed utilizing Amazon Bedrock and may use the capabilities of Amazon Bedrock seamlessly, for instance, Amazon Bedrock Data Bases for the telco domain-specific RAG or Amazon Bedrock Brokers for the agentic utilization.

On this publish, we share how SKT customizes Anthropic Claude fashions for telco-specific Q&A relating to technical telecommunication paperwork of SKT utilizing Amazon Bedrock.

Resolution overview

The group explored mixtures of immediate optimization, customization (fine-tuning), and knowledge augmentation with artificial knowledge. This multifaceted strategy aimed to maximise the advantages of every approach for the grounded Q&A era process.

Within the following sections, we discover these strategies in additional element.

Anthropic’s Claude customization with immediate optimization

Fantastic-tuning, which is out there by way of Amazon Bedrock for numerous FMs, together with Anthropic’s Claude, permits adaptation of pre-trained language fashions for particular use circumstances. It’s significantly efficient for tailoring response model and format adherence.

The group first optimized the system immediate, implementing standardized pointers for reply formatting and doc quotation based mostly on Anthropic model prompting best practices. Key focus areas included:

  • Clear presentation of system instructions
  • Constant use of code block formatting
  • Context-based tailor-made responses

This immediate engineering, mixed with fine-tuning, yielded substantial enhancements:

  • Over 50% improve in ROUGE-3 rating
  • Over 25% enchancment in ROUGE-L rating
  • Over 4% improve in embedding similarity rating
  • Important progress in correct reference quotation

The iterative enhancement course of demonstrated cumulative advantages, with immediate updates alone displaying 35–40 % enhancements in key metrics, and the ultimate personalized mannequin attaining 50–60 % positive aspects in some metrics.

This development clearly illustrates the cumulative advantages of mannequin customization by way of RAG, immediate engineering, and fine-tuning, leading to a mannequin that considerably outperformed each the baseline and the prompt-updated variations by way of ROUGE scores and quotation accuracy. ROUGE score measures the similarity between floor truths and generated outcomes by computing N-gram phrase overlaps. The next desk summarizes these enhancements.

LLM Immediate replace Fantastic-tuning Relative enchancment over baseline
ROUGE-3 ROUGE-L Quotation accuracy
Anthropic’s Claude 3 Sonnet baseline baseline baseline
Anthropic’s Claude 3 Sonnet +38.30% +13.4% +52.94%
Anthropic’s Claude 3 Sonnet +58.1% +26.8% +70.59%

Artificial knowledge for fine-tuning

To handle the problem of restricted high-quality labeled coaching knowledge, the group explored artificial knowledge era methods. This strategy additionally facilitates data distillation from bigger LLMs to smaller, extra focused fashions, providing advantages akin to decrease latency and price.

The group performed managed experiments utilizing:

  • A baseline set of 500 floor reality samples
  • An augmented set with 500 authentic over 1,500 artificial samples
  • A bigger authentic set of two,000 samples

Artificial knowledge was generated utilizing Anthropic’s Claude Sonnet 3, creating new question-answer pairs over the identical retrieved paperwork utilized in floor reality examples.

The outcomes have been evaluated utilizing each LLM-based comparability and human desire analysis. Human evaluators blindly ranked mannequin outputs, with scores assigned based mostly on desire (Greatest: 4, Second: 3, Third: 2, Worst: 1). The next desk exhibits the outcomes of the human desire analysis scores.

Rank Mannequin Cumulative rating
(very best: 160)
1 Fantastic-tuned with 2,000 authentic samples 114
2 Fantastic-tuned with 500 authentic and 1,500 artificial samples 112
3 Fantastic-tuned with 500 authentic samples 85
4 No fine-tuning (baseline) 84

Some key findings embody:

  • Small coaching units (500 samples) confirmed minimal enchancment over baseline
  • Bigger coaching units (2,000 samples) scored significantly greater
  • Synthetically augmented knowledge carried out equally to equivalent-sized authentic knowledge

Though having a big quantity of domain-specific coaching knowledge is at all times very best, many companies have restricted accessible datasets. In such situations, artificial knowledge can play an important position rather than authentic knowledge. This demonstrates the potential of artificial knowledge for mannequin customization.

Conclusion

SK Telecom’s collaboration with AWS GenAIIC showcases the corporate’s dedication to growing modern AI options for telco challenges. Through the use of Amazon Bedrock to customise Anthropic’s Claude fashions, SKT has achieved vital efficiency enhancements for telco-specific, Korean language use circumstances with out the necessity to construct fashions from scratch. The proof of idea demonstrated vital enhancements:

  • ~58% improve in ROUGE-3 rating
  • ~27% improve in ROUGE-L rating
  • Substantial enchancment in returning appropriate reference hyperlinks

This strategy, mixed with artificial knowledge era methods, aligns with SKT’s AI Pyramid Technique, enabling quicker testing and growth of latest approaches. As SKT continues to concentrate on key areas akin to private AI assistants, AI healthcare, and AI knowledge facilities, this collaboration with AWS represents a big step of their AI evolution and long-term competitiveness within the international AI panorama.

For these keen on working with AWS on related initiatives, go to Generative AI Innovation Center.


Concerning the Authors

Sungmin Hong is a Senior Utilized Scientist at AWS Generative AI Innovation Heart the place he helps expedite the number of use circumstances of AWS prospects. Earlier than becoming a member of Amazon, Sungmin was a postdoctoral analysis fellow at Harvard Medical Faculty. He holds Ph.D. in Pc Science from New York College. Outdoors of labor, Sungmin enjoys mountaineering, studying and cooking.

Sujeong Cha is a Deep Studying Architect on the AWS Generative AI Innovation Heart, the place she focuses on mannequin customization and optimization. She has intensive hands-on expertise in fixing prospects’ enterprise use circumstances by using generative AI in addition to conventional AI/ML options. Sujeong holds a M.S. diploma in Information Science from New York College.

Arijit Ghosh Chowdhury is a Scientist with the AWS Generative AI Innovation Heart, the place he works on mannequin customization and optimization. In his position, he works on utilized analysis in fine-tuning and mannequin evaluations to allow GenAI for numerous industries. He has a Grasp’s diploma in Pc Science from the College of Illinois at Urbana Champaign, the place his analysis centered on query answering, search and area adaptation.

Yiyue Qian is an Utilized Scientist II on the AWS Generative AI Innovation Heart, the place she helps offering generative AI options to AWS prospects. On this position, she collaborates with a group of consultants to develop modern AI-driven fashions for AWS prospects throughout numerous industries. Yiyue holds a Ph.D. in Pc Science from the College of Notre Dame, the place her analysis centered on superior machine studying and deep studying methods.

Wei-Chih Chen is a Machine Studying Engineer on the AWS Generative AI Innovation Heart, the place he works on mannequin customization and optimization for LLMs. He additionally builds instruments to assist his group deal with numerous points of the LLM growth life cycle—together with fine-tuning, benchmarking, and load-testing—that accelerating the adoption of various use circumstances for AWS prospects. He holds an M.S. diploma in Pc Science from UC Davis.

Hannah Marlowe is a Senior Supervisor of Mannequin Customization on the AWS Generative AI Innovation Heart. Her group focuses on serving to prospects develop differentiating Generative AI options utilizing their distinctive and proprietary knowledge to realize key enterprise outcomes. She holds a Ph.D in Physics from the College of Iowa, with a concentrate on astronomical X-ray evaluation and instrumentation growth. Outdoors of labor, she might be discovered mountaineering, mountain biking, and snowboarding across the mountains in Colorado.

Seunghyun Jeong (Steve) is a group chief of the Platform Software group at SKT. He’s chargeable for commercializing the World Intelligence Platform (GIP), which supplies AI fashions and instruments. For many of his profession, he has been a PM growing numerous cellular providers akin to cellular pockets, style streaming, and unified login providers for SK. His group is increasing the supply of fashions and options to make it simpler for inner groups to use AI, contributing to SKT’s AI Transformation. Earlier than coming into the AI house, he was a Product Supervisor, growing and working numerous cellular providers akin to cellular pockets, style streaming, and unified login providers for the US and Korea.

Sunwoo Lee (Lois) is the group chief of the Information Development and Analysis Staff inside SK Telecom’s World AI Tech division. She oversees the design and development of coaching knowledge for language fashions, the mannequin efficiency analysis course of, and its software to providers. Her profession has centered on NLP inside IT, which is a good match along with her background in Linguistics and Korean language training. Alongside her world-class group, she continues to discover and resolve fascinating issues akin to find out how to optimize the design of information for language mannequin coaching, which duties and strategies to implement for validating language mannequin efficiency, and the very best design of AI-human conversations.

Eric Davis is the vice chairman of the AI Tech Collaboration Group at SKT. Eric oversees tech collaborations with worldwide tech companions to customise giant language fashions (LLMs) for the telecommunications area. His groups are chargeable for designing and constructing the datasets to tune LLMs, in addition to benchmarking LLMs on the whole and for the telecommunications area. Eric holds a Grasp of Science diploma in Pc Science from Carnegie Mellon from the Language Applied sciences Institute and a Bachelor of Arts in Linguistics and Psychology from the College of California, Los Angeles.

Leave a Reply

Your email address will not be published. Required fields are marked *