The Final Information to Imaginative and prescient Transformers | by François Porcher | Aug, 2024


A complete information to the Imaginative and prescient Transformer (ViT) that revolutionized pc imaginative and prescient

Hello everybody! For individuals who have no idea me but, my title is Francois, I’m a Analysis Scientist at Meta. I’ve a ardour for explaining superior AI ideas and making them extra accessible.

At this time, let’s dive into one of the vital contribution within the area of Laptop Imaginative and prescient: the Imaginative and prescient Transformer (ViT).

Changing a picture into patches, picture by creator

The Imaginative and prescient Transformer was launched by Alexey Dosovitskiy and al. (Google Mind) in 2021 within the paper An Image is worth 16×16 words. On the time, Transformers had proven to be the important thing to unlock nice efficiency on NLP duties, launched within the should paper Attention is All you Need in 2017.

Between 2017 and 2021, there have been a number of makes an attempt to combine the eye mechanism into Convolutional Neural Networks (CNNs). Nevertheless, these had been largely hybrid fashions (combining CNN layers with consideration layers) and lacked scalability. Google addressed this by fully eliminating convolutions and leveraging their computational energy to scale the mannequin.

Leave a Reply

Your email address will not be published. Required fields are marked *