A Hamiltonian Systems Approach to Neural Network Optimization
Loading...
Date
Authors
Advisor
Morris, Kirsten
C. Del Rey Fern´andez, David
C. Del Rey Fern´andez, David
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
We propose and analyze structure-preserving methods for first-order optimization of Lipschitz
smooth objectives by interpreting the dynamics as a dissipative Hamiltonian system, in which the
model parameters evolve jointly with an auxiliary momentum variable. This formulation induces
a natural energy dissipation mechanism that motivates the design of optimization algorithms that
inherit a discrete energy decay property. We develop discrete gradient (DG) methods that pre-
serve an exact discrete time energy decay property, ensuring monotone dissipation independent of
stepsize. Building on this framework, we introduce variants which empirically reduce oscillations,
improve runtime, and improve robustness to ill-conditioned problems.
To address the computational cost of the implicit DG methods, we propose semi-implicit
discrete gradient (SIDG) schemes obtained by linearizing the DG updates and incorporating
curvature through L-BFGS Hessian approximations, which are used to efficiently solve the result-
ing linear systems. These schemes retain key structure-preserving properties while significantly
reducing computational cost, yielding a practical balance between stability and efficiency. We es-
tablish monotone energy decay, boundedness of iterates, and sublinear convergence to first-order
stationary points.
Numerical experiments on ill-conditioned least-squares problems, regularized logistic regres-
sion, physics-informed neural networks, and CIFAR-10 image classification demonstrate good
performance despite ill-conditioning and competitive performance as compared to widely used
optimizers such as ADAM, Stochastic gradient descent, and L-BFGS.