Accountable AI: How PowerSchool safeguards hundreds of thousands of scholars with AI-powered content material filtering utilizing Amazon SageMaker AI
This submit is cowritten with Gayathri Rengarajan and Harshit Kumar Nyati from PowerSchool.
PowerSchool is a number one supplier of cloud-based software program for Ok-12 training, serving over 60 million college students in additional than 90 nations and over 18,000 prospects, together with greater than 90 of the highest 100 districts by pupil enrollment in the USA. Once we launched PowerBuddy™, our AI assistant built-in throughout our a number of instructional platforms, we confronted a essential problem: implementing content material filtering refined sufficient to differentiate between reliable educational discussions and dangerous content material in instructional contexts.
On this submit, we reveal how we constructed and deployed a customized content material filtering answer utilizing Amazon SageMaker AI that achieved higher accuracy whereas sustaining low false optimistic charges. We stroll by our technical strategy to tremendous tuning Llama 3.1 8B, our deployment structure, and the efficiency outcomes from inner validations.
PowerSchool’s PowerBuddy
PowerBuddy is an AI assistant that gives personalised insights, fosters engagement, and gives assist all through the tutorial journey. Academic leaders profit from PowerBuddy being delivered to their knowledge and their customers’ most typical workflows inside the PowerSchool ecosystem – similar to Schoology Studying, Naviance CCLR, PowerSchool SIS, Efficiency Issues, and extra – to make sure a constant expertise for college students and their community of assist suppliers at college and at house.
The PowerBuddy suite consists of a number of AI options: PowerBuddy for Studying features as a digital tutor; PowerBuddy for Faculty and Profession gives insights for profession exploration; PowerBuddy for Group simplifies entry to district and faculty data, and others. The answer consists of built-in accessibility options similar to speech-to-text and text-to-speech performance.
Content material filtering for PowerBuddy
As an training expertise supplier serving hundreds of thousands of scholars—a lot of whom are minors—pupil security is our highest precedence. National data shows that roughly 20% of scholars ages 12–17 expertise bullying, and 16% of highschool college students have reported significantly contemplating suicide. With PowerBuddy’s widespread adoption throughout Ok-12 colleges, we would have liked sturdy guardrails particularly calibrated for instructional environments.
The out-of-the-box content material filtering and security guardrails options accessible in the marketplace didn’t absolutely meet PowerBuddy’s necessities, primarily due to the necessity for domain-specific consciousness and fine-tuning inside the training context. For instance, when a highschool pupil is studying about delicate historic subjects similar to World Struggle II or the Holocaust, it’s essential that instructional discussions aren’t mistakenly flagged for violent content material. On the identical time, the system should have the ability to detect and instantly alert faculty directors to indications of potential hurt or threats. Reaching this nuanced stability requires deep contextual understanding, which might solely be enabled by focused fine-tuning.
We would have liked to implement a classy content material filtering system that would intelligently differentiate between reliable educational inquiries and actually dangerous content material—detecting and blocking prompts indicating bullying, self-harm, hate speech, inappropriate sexual content material, violence, or dangerous materials not appropriate for instructional settings. Our problem was discovering a cloud answer to coach and host a customized mannequin that would reliably shield college students whereas sustaining the tutorial performance of PowerBuddy.
After evaluating a number of AI suppliers and cloud providers that permit mannequin customization and fine-tuning, we chosen Amazon SageMaker AI as essentially the most appropriate platform primarily based on these essential necessities:
- Platform stability: As a mission-critical service supporting hundreds of thousands of scholars each day, we require an enterprise-grade infrastructure with excessive availability and reliability.
- Autoscaling capabilities: Scholar utilization patterns in training are extremely cyclical, with vital visitors spikes throughout faculty hours. Our answer wanted to deal with these fluctuations with out degrading efficiency.
- Management of mannequin weights after fine-tuning: We would have liked management over our fine-tuned fashions to allow steady refinement of our security guardrails, enabling us to rapidly reply to new sorts of dangerous content material that may emerge in instructional settings.
- Incremental coaching functionality: The flexibility to repeatedly enhance our content material filtering mannequin with new examples of problematic content material was important.
- Price-effectiveness: We would have liked an answer that might permit us to guard college students with out creating prohibitive prices that might restrict colleges’ entry to our academic instruments.
- Granular management and transparency: Scholar security calls for visibility into how our filtering choices are made, requiring an answer that isn’t a black field however gives transparency into mannequin conduct and efficiency.
- Mature managed service: Our workforce wanted to concentrate on instructional functions relatively than infrastructure administration, making a complete managed service with production-ready capabilities important.
Answer overview

Our content material filtering system structure, proven within the previous determine, consists of a number of key parts:
- Information preparation pipeline:
- Curated datasets of protected and unsafe content material examples particular to instructional contexts
- Information preprocessing and augmentation to make sure sturdy mannequin coaching
- Safe storage in Amazon S3 buckets with acceptable encryption and entry controls
Word: All coaching knowledge was absolutely anonymized and didn’t embody personally identifiable pupil data
- Mannequin coaching infrastructure:
- SageMaker coaching jobs for fine-tuning Llama 3.1 8B
- Inference structure:
- Deployment on SageMaker managed endpoints with auto-scaling configured
- Integration with PowerBuddy by Amazon API Gateway for real-time content material filtering
- Monitoring and logging by Amazon CloudWatch for steady high quality evaluation
- Steady enchancment loop:
- Suggestions assortment mechanism for false positives/negatives
- Scheduled retraining cycles to include new knowledge and enhance efficiency
- A/B testing framework to judge mannequin enhancements earlier than full deployment
Growth course of
After exploring a number of approaches to content material filtering, we determined to fine-tune Llama 3.1 8B utilizing Amazon SageMaker JumpStart. This resolution adopted our preliminary makes an attempt to develop a content material filtering mannequin from scratch, which proved difficult to optimize for consistency throughout numerous sorts of dangerous content material.
SageMaker JumpStart considerably accelerated our improvement course of by offering pre-configured environments and optimized hyperparameters for fine-tuning basis fashions. The platform’s streamlined workflow allowed our workforce to concentrate on curating high-quality coaching knowledge particular to instructional security issues relatively than spending time on infrastructure setup and hyperparameter tuning.
We fine-tuned Llama 3.1 8B mannequin utilizing Low Rank Adaptation (LoRA) approach on Amazon SageMaker AI coaching jobs, which allowed us to take care of full management over the coaching course of.
After the fine-tuning was completed, we deployed the mannequin on SageMaker AI managed endpoint and built-in it as a essential security part inside our PowerBuddy structure.
For our manufacturing deployment, we chosen NVIDIA A10G GPUs accessible by ml.g5.12xlarge cases, which provided the best stability of efficiency and cost-effectiveness for our mannequin measurement. The AWS workforce supplied essential steering on choosing optimum mannequin serving configuration for our use case. This recommendation helped us optimize each efficiency and value by making certain we weren’t over-provisioning assets.
Technical implementation
Beneath is the code snippet to fine-tune the mannequin on the pre-processed dataset. Instruction tuning dataset is first transformed into area adaptation dataset format and scripts make the most of Absolutely Sharded Information Parallel (FSDP) in addition to Low Rank Adaptation (LoRA) methodology for fine-tuning the mannequin.
We outline an estimator object first. By default, these fashions practice by way of area adaptation, so you have to point out instruction tuning by setting the instruction_tuned hyperparameter to True.
After we outline the estimator, we’re prepared to begin coaching:
estimator.match({"coaching": train_data_location})
After coaching, we created a mannequin utilizing the artifacts saved in S3 and deployed the mannequin to a real-time endpoint for analysis. We examined the mannequin utilizing our check dataset that covers key eventualities to validate efficiency and conduct. We calculated recall, F1, confusion matrix and inspected misclassifications. If wanted, modify hyperparameters/immediate template and retrain; in any other case proceed with manufacturing deployment.
You can too take a look at the pattern pocket book for tremendous tuning Llama 3 fashions on SageMaker JumpStart in SageMaker examples.
We used the Faster autoscaling on Amazon SageMaker realtime endpoints pocket book to arrange autoscaling on SageMaker AI endpoints.
Validation of answer
To validate our content material filtering answer, we performed in depth testing throughout a number of dimensions:
- Accuracy testing: In our inner validation testing, the mannequin achieved ~93% accuracy in figuring out dangerous content material throughout a various check set representing numerous types of inappropriate materials.
- False optimistic evaluation: We labored to reduce cases the place reliable instructional content material was incorrectly flagged as dangerous, reaching a false optimistic charge of lower than 3.75% in check environments; outcomes could fluctuate by faculty context.
- Efficiency testing: Our answer maintained response occasions averaging 1.5 seconds. Even throughout peak utilization intervals simulating actual classroom environments, the system constantly delivered seamless consumer expertise with no failed transactions.
- Scalability and reliability validation:
- Complete load testing achieved 100% transaction success charge with constant efficiency distribution, validating system reliability below sustained instructional workload situations.
- Transactions accomplished efficiently with out degradation in efficiency or accuracy, demonstrating the system’s capability to scale successfully for classroom-sized concurrent utilization eventualities.
- Manufacturing deployment: Preliminary rollout to a choose group of faculties confirmed constant efficiency in real-world instructional environments.
- Scholar security outcomes: Faculties reported a major discount in reported incidents of AI-enabled bullying or inappropriate content material era in comparison with different AI methods with out specialised content material filtering.
Advantageous-tuned mannequin metrics in comparison with out-of-the-box content material filtering options
The fine-tuned content material filtering mannequin demonstrated larger efficiency than generic, out-of-the-box filtering options in key security metrics. It achieved the next accuracy (0.93 in comparison with 0.89), and higher F1-scores for each the protected (0.95 in comparison with 0.91) and unsafe (0.90 in comparison with 0.87) lessons. The fine-tuned mannequin additionally demonstrated a extra balanced trade-off between precision and recall, indicating extra constant efficiency throughout lessons. Importantly, it makes fewer false optimistic errors by misclassifying solely 6 protected instances as unsafe, in comparison with 19 authentic responses in a check set of 160— a major benefit in safety-sensitive functions. Total, our fine-tuned content material filtering mannequin proved to be extra dependable and efficient.
Future plans
Because the PowerBuddy suite evolves and is built-in into different PowerSchool merchandise and agent flows, the content material filter mannequin can be repeatedly tailored and improved with tremendous tuning for different merchandise with particular wants.
We plan to implement further specialised adapters utilizing the SageMaker AI multi-adapter inference function alongside our content material filtering mannequin topic to feasibility and compliance consideration. The thought is to deploy fine-tuned small language fashions (SLMs) for particular downside fixing in instances the place giant language fashions (LLMs) are big and generic and don’t meet the necessity for narrower downside domains. For instance:
- Choice making brokers particular to the Training area
- Information area identification in instances of textual content to SQL queries
This strategy will ship vital price financial savings by eliminating the necessity for separate mannequin deployments whereas sustaining the specialised efficiency of every adapter.
The aim is to create an AI studying surroundings that’s not solely protected but additionally inclusive and conscious of various pupil wants throughout our world implementations, in the end empowering college students to study successfully whereas being protected against dangerous content material.
Conclusion
The implementation of our specialised content material filtering system on Amazon SageMaker AI has been transformative for PowerSchool’s capability to ship protected AI experiences in instructional settings. By constructing sturdy guardrails, we’ve addressed one of many main issues educators and oldsters have about introducing AI into school rooms—serving to to make sure pupil security.
As Shivani Stumpf, our Chief Product Officer, explains: “We’re now monitoring round 500 faculty districts who’ve both bought PowerBuddy or activated included options, reaching over 4.2 million college students roughly. Our content material filtering expertise ensures college students can profit from AI-powered studying assist with out publicity to dangerous content material, making a protected house for tutorial progress and exploration.”
The impression extends past simply blocking dangerous content material. By establishing belief in our AI methods, we’ve enabled colleges to embrace PowerBuddy as a helpful instructional device. Lecturers report spending much less time monitoring pupil interactions with expertise and extra time on personalised instruction. College students profit from 24/7 studying assist with out the dangers that may in any other case include AI entry.
For organizations requiring domain-specific security guardrails, contemplate how the fine-tuning capabilities and managed endpoints of SageMaker AI could be tailored to your use case.
As we proceed to increase PowerBuddy’s capabilities with the multi-adapter inference of SageMaker, we stay dedicated to sustaining the proper stability between instructional innovation and pupil security—serving to to make sure that AI turns into a optimistic drive in training that oldsters, academics, and college students can belief.
Concerning the authors
Gayathri Rengarajan is the Affiliate Director of Information Science at PowerSchool, main the PowerBuddy initiative. Identified for bridging deep technical experience with strategic enterprise wants, Gayathri has a confirmed monitor document of delivering enterprise-grade generative AI options from idea to manufacturing.
Harshit Kumar Nyati is a Lead Software program Engineer at PowerSchool with 10+ years of expertise in software program engineering and analytics. He makes a speciality of constructing enterprise-grade Generative AI functions utilizing Amazon SageMaker AI, Amazon Bedrock, and different cloud providers. His experience consists of fine-tuning LLMs, coaching ML fashions, internet hosting them in manufacturing, and designing MLOps pipelines to assist the complete lifecycle of AI functions.
Anjali Vijayakumar is a Senior Options Architect at AWS with over 9 years of expertise serving to prospects construct dependable and scalable cloud options. Primarily based in Seattle, she makes a speciality of architectural steering for EdTech options, working intently with Training Expertise firms to remodel studying experiences by cloud innovation. Exterior of labor, Anjali enjoys exploring the Pacific Northwest by climbing.
Dmitry Soldatkin is a Senior AI/ML Options Architect at Amazon Net Providers (AWS), serving to prospects design and construct AI/ML options. Dmitry’s work covers a variety of ML use instances, with a main curiosity in Generative AI, deep studying, and scaling ML throughout the enterprise. He has helped firms in lots of industries, together with insurance coverage, monetary providers, utilities, and telecommunications. You’ll be able to join with Dmitry on LinkedIn.
Karan Jain is a Senior Machine Studying Specialist at AWS, the place he leads the worldwide Go-To-Market technique for Amazon SageMaker Inference. He helps prospects speed up their generative AI and ML journey on AWS by offering steering on deployment, cost-optimization, and GTM technique. He has led product, advertising, and enterprise improvement efforts throughout industries for over 10 years, and is keen about mapping advanced service options to buyer options.