Deploy Tiny-Llama on AWS EC2. Learn to deploy an actual ML… | by Marcello Politi | Jan, 2024
Introduction
I’ve all the time thought that even the very best undertaking on this planet doesn’t have a lot worth if folks can’t use it. That’s the reason it is rather essential to discover ways to deploy Machine Studying fashions. On this article we concentrate on deploying a small massive language mannequin, Tiny-Llama, on an AWS occasion referred to as EC2.
Listing of instruments I’ve used for this undertaking:
- Deepnote: is a cloud-based pocket book that’s nice for collaborative knowledge science tasks, good for prototyping
- FastAPI: an internet framework for constructing APIs with Python
- AWS EC2: is an internet service that gives sizable compute capability within the cloud
- Nginx: is an HTTP and reverse proxy server. I take advantage of it to attach the FastAPI server to AWS
- GitHub: GitHub is a internet hosting service for software program tasks
- HuggingFace: is a platform to host and collaborate on limitless fashions, datasets, and purposes.
About Tiny Llama
TinyLlama-1.1B is a undertaking aiming to pretrain a 1.1B Llama on 3 trillion tokens. It makes use of the identical structure as Llama2 .
As we speak’s massive language fashions have spectacular capabilities however are extraordinarily costly by way of {hardware}. In lots of areas we now have restricted {hardware}: assume smartphones or satellites. So there’s plenty of analysis on creating smaller fashions to allow them to be deployed on edge.
Here’s a checklist of “small” fashions which might be catching on:
- Cell VLM (Multimodal)
- Phi-2
- Obsidian (Multimodal)