Optimisation Algorithms: Neural Networks 101 | by Egor Howell | Nov, 2023

The best way to enhance coaching past the “vanilla” gradient descent algorithm

https://www.flaticon.com/free-icons/neural-network.neural community icons. Neural community icons created by andinur — Flaticon.

In my final publish, we mentioned how one can enhance the efficiency of neural networks by way of hyperparameter tuning:

It is a course of whereby the most effective hyperparameters corresponding to studying fee and variety of hidden layers are “tuned” to search out probably the most optimum ones for our community to spice up its efficiency.

Sadly, this tuning course of for big deep neural networks (deep learning) is painstakingly sluggish. A technique to enhance upon that is to make use of quicker optimisers than the normal “vanilla” gradient descent technique. On this publish, we’ll dive into the most well-liked optimisers and variants of gradient descent that may improve the pace of coaching and likewise convergence and evaluate them in PyTorch!

Earlier than diving in, let’s shortly brush up on our information of gradient descent and the idea behind it.

The aim of gradient descent is to replace the parameters of the mannequin by subtracting the gradient (partial spinoff) of the parameter with respect to the loss operate. A studying fee, α, serves to control this course of to make sure updating of the parameters happens on an inexpensive scale and doesn’t over or undershoot the optimum worth.

θ are the parameters of the mannequin.
J(θ) is the loss operate.
∇J(θ) is the gradient of the loss operate. ∇ is the gradient operator, often known as nabla.
α is the training fee.

I wrote a earlier article on gradient descent and the way it works if you wish to familiarise your self a bit extra about it:

Optimisation Algorithms: Neural Networks 101 | by Egor Howell | Nov, 2023

The best way to enhance coaching past the “vanilla” gradient descent algorithm

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Speed up LLM Inference

Radical Simplicity in Knowledge Engineering | by Cai Parry-Jones | Jul, 2024

Discover solutions precisely and shortly utilizing Amazon Q Enterprise with the SharePoint On-line connector

Leave a Reply Cancel reply

ASRock Launches Passively Cooled Radeon RX 7900 XTX & XT Playing cards for Servers

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Speed up LLM Inference

Radical Simplicity in Knowledge Engineering | by Cai Parry-Jones | Jul, 2024

Discover solutions precisely and shortly utilizing Amazon Q Enterprise with the SharePoint On-line connector

Shader Launches Actual-Time AI Video Results Creation Platform

The best way to enhance coaching past the “vanilla” gradient descent algorithm

More Stories

Leave a Reply Cancel reply

You may have missed