Deploy Tiny-Llama on AWS EC2. Learn to deploy an actual ML… | by Marcello Politi

Deploy Tiny-Llama on AWS EC2. Learn to deploy an actual ML… | by Marcello Politi | Jan, 2024

Tiny-Llama brand (src: https://github.com/jzhang38/TinyLlama)

Learn to deploy an actual ML utility utilizing AWS and FastAPI

Introduction

I’ve all the time thought that even the very best undertaking on this planet doesn’t have a lot worth if folks can’t use it. That’s the reason it is rather essential to discover ways to deploy Machine Studying fashions. On this article we concentrate on deploying a small massive language mannequin, Tiny-Llama, on an AWS occasion referred to as EC2.

Listing of instruments I’ve used for this undertaking:

Deepnote: is a cloud-based pocket book that’s nice for collaborative knowledge science tasks, good for prototyping
FastAPI: an internet framework for constructing APIs with Python
AWS EC2: is an internet service that gives sizable compute capability within the cloud
Nginx: is an HTTP and reverse proxy server. I take advantage of it to attach the FastAPI server to AWS
GitHub: GitHub is a internet hosting service for software program tasks
HuggingFace: is a platform to host and collaborate on limitless fashions, datasets, and purposes.

About Tiny Llama

TinyLlama-1.1B is a undertaking aiming to pretrain a 1.1B Llama on 3 trillion tokens. It makes use of the identical structure as Llama2 .

As we speak’s massive language fashions have spectacular capabilities however are extraordinarily costly by way of {hardware}. In lots of areas we now have restricted {hardware}: assume smartphones or satellites. So there’s plenty of analysis on creating smaller fashions to allow them to be deployed on edge.

Here’s a checklist of “small” fashions which might be catching on: