Data Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/18081
This is the collection for the University of Waterloo's Data Science program.
Browse
Browsing Data Science by Subject "adaptive gradient methods"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Algorithmic Behaviours of Adagrad in Underdetermined Linear Regression(University of Waterloo, 2023-08-24) Rambidis, AndrewWith the high use of over-parameterized data in deep learning, the choice of optimizer in training plays a big role in a model’s ability to generalize well due to the existence of solution selection bias. We consider the popular adaptive gradient method: Adagrad, and aim to study its convergence and algorithmic biases in the underdetermined linear regression regime. First we prove that Adagrad converges in this problem regime. Subsequently, we empirically find that when using sufficiently small step sizes, Adagrad promotes diffuse solutions, in the sense of uniformity among the coordinates of the solution. Additionally, when compared to gradient descent, we see empirically and show theoretically that Adagrad’s solution, under the same conditions, exhibits greater diffusion compared to the solution obtained through gradient descent. This behaviour is unexpected as conventional data science encourages the utilization of optimizers that attain sparser solutions. This preference arises due to some inherent advantages such as helping to prevent overfitting, and reducing the dimensionality of the data. However, we show that in the application of interpolation, diffuse solutions yield beneficial results when compared to solutions with localization; Namely, we experimentally observe the success of diffuse solutions when interpolating a line via the weighted sum of spike-like functions. The thesis concludes with some suggestions to possible extensions of the content in future work.