Convergence Analysis of Stochastic Gradient Descent with Adaptive Learning Rates: A Mathematical Framework

Sohail Ahmed Memon*; Imtiaz Ahmed Shar; Ghulam Muhammad

doi:10.63163/jpehss.v4i1.1072

Authors

Sohail Ahmed Memon* Department of Mathematics, Shah Abdul Latif University, Khairpur Mirs. Email: suhail.memon@salu.edu.pk
Imtiaz Ahmed Shar Department of Mathematics, Shah Abdul Latif University, Khairpur Mirs. Email: sharimtiaz2014@gmail.com
Ghulam Muhammad Department of Mathematics, Shah Abdul Latif University, Khairpur Mirs. Email: gm.bhangu@gmail.com

DOI:

https://doi.org/10.63163/jpehss.v4i1.1072

Keywords:

Stochastic Gradient Descent, Adaptive Learning Rates, Neural Networks, Machine Learning, Deep Learning, Non-Convex Optimization

Abstract

Neural networks are growing and shaping tech industry very rapidly, especially the deep neural networks have been employed to a wide variety of AI applications. Stochastic Gradient Descent (SGD) is one of the algorithms used in deep neural networks. As an optimization method, the SGD using adaptive learning rates is used as choice of training deep neural networks. In spite of its widespread popularity, the deep understanding of its convergence properties for adaptive methods remains imperfect. This study provides a mathematical framework to analyze the convergence of adaptive variations of SGD which comprise RMSprop, AdaGrad, and Adam. This work focusses on establishing the convergence rates within various assumptions targeting the objective function. These assumptions non-convex settings for deep learning. Our study discloses the importance of second-moment accumulation in variance reduction and reveal explicit error bounds. We show that in suitable conditions, adaptive methods attain O(1\/√T) convergence rate for non-convex objectives and O(1\/T) for strongly convex functions. Along with proofs, we implement the numerical experiments which validate our theoretical findings. The results show a significant mathematical justification for selecting choices in adaptive optimizers and provide better approaches for hyperparameter tuning.

Convergence Analysis of Stochastic Gradient Descent with Adaptive Learning Rates: A Mathematical Framework

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Most read articles by the same author(s)

info

Latest publications

Language

Information