Manifold-Aware Regularization for Self-Supervised Representation Learning

Sepanj, Mohammad Hadi

Manifold-Aware Regularization for Self-Supervised Representation Learning

Files

Sepanj_MohammadHadi.pdf (4.08 MB)

Date

2025-11-04

Authors

Sepanj, Mohammad Hadi

Advisor

Fieguth, Paul

Publisher

University of Waterloo

Abstract

Self-supervised learning (SSL) has emerged as a dominant paradigm for representation learning, yet much of its recent progress has been guided by empirical heuristics rather than unifying theoretical principles. This thesis advances the understanding of SSL by framing representation learning as a problem of geometry preservation on the data manifold, where the objective is to shape embedding spaces that respect intrinsic structure while remaining discriminative for downstream tasks. We develop a suite of methods—ranging from optimal transport–regularized contrastive learning (SinSim) to kernelized variance–invariance–covariance regularization (Kernel VICReg)—that systematically move beyond the Euclidean metric paradigm toward geometry-adaptive distances and statistical dependency measures, such as maximum mean discrepancy (MMD) and Hilbert–Schmidt independence criterion (HSIC). Our contributions span both theory and practice. Theoretically, we unify contrastive and non-contrastive SSL objectives under a manifold-aware regularization framework, revealing deep connections between dependency reduction, spectral geometry, and invariance principles. We also challenge the pervasive assumption that Euclidean distance is the canonical measure for alignment, showing that embedding metrics are themselves learnable design choices whose compatibility with the manifold geometry critically affects representation quality. Practically, we validate our framework across diverse domains—including natural images and structured scientific data—demonstrating improvements in downstream generalization, robustness to distribution shift, and stability under limited augmentations. By integrating geometric priors, kernel methods, and distributional alignment into SSL, this work reframes representation learning as a principled interaction between statistical dependence control and manifold geometry. The thesis concludes by identifying open theoretical questions at the intersection of Riemannian geometry, kernel theory, and self-supervised objectives, outlining a research agenda for the next generation of geometry-aware foundation models.