Optimisation Algorithms: Neural Networks 101 | by Egor Howell | Nov, 2023


The best way to enhance coaching past the “vanilla” gradient descent algorithm

https://www.flaticon.com/free-icons/neural-network.neural community icons. Neural community icons created by andinur — Flaticon.

In my final publish, we mentioned how one can enhance the efficiency of neural networks by way of hyperparameter tuning:

It is a course of whereby the most effective hyperparameters corresponding to studying fee and variety of hidden layers are “tuned” to search out probably the most optimum ones for our community to spice up its efficiency.

Sadly, this tuning course of for big deep neural networks (deep learning) is painstakingly sluggish. A technique to enhance upon that is to make use of quicker optimisers than the normal “vanilla” gradient descent technique. On this publish, we’ll dive into the most well-liked optimisers and variants of gradient descent that may improve the pace of coaching and likewise convergence and evaluate them in PyTorch!

Earlier than diving in, let’s shortly brush up on our information of gradient descent and the idea behind it.

The aim of gradient descent is to replace the parameters of the mannequin by subtracting the gradient (partial spinoff) of the parameter with respect to the loss operate. A studying fee, α, serves to control this course of to make sure updating of the parameters happens on an inexpensive scale and doesn’t over or undershoot the optimum worth.

  • θ are the parameters of the mannequin.
  • J(θ) is the loss operate.
  • ∇J(θ) is the gradient of the loss operate. is the gradient operator, often known as nabla.
  • α is the training fee.

I wrote a earlier article on gradient descent and the way it works if you wish to familiarise your self a bit extra about it:

Leave a Reply

Your email address will not be published. Required fields are marked *