Perceptual Relationship and Representation Learning for 3D Understanding and Quality Enhancement

Loading...
Thumbnail Image

Date

2025-01-06

Advisor

Wang, Zhou

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

3D modeling plays a crucial role in a wide variety of real-world applications such as autonomous driving, smart cities, entertainment, education, and video game development. Over the past decades, research has focused on enhancing 3D understanding and improving the quality of 3D representation and rendering. Despite these efforts, existing approaches often rely on either low-level features, such as object sizes and edges, or high-level semantics, like object categories, while neglecting the critical relationships between multiple objects. These relationships are essential for a comprehensive understanding of 3D scenes in natural environments. Moreover, contemporary deep learning-based methods for 3D modeling often overfit due to reliance on large-scale neural networks, limiting their efficiency and generalizability. This thesis addresses these limitations through four key contributions centered on extracting efficient perceptual representations. In the first part, we propose a method to extract perceptual relationships that extend 3D understanding from local cues, such as shapes and semantics, to global cues, such as depth perception. To achieve this, we design a VRS framework with two main objectives: (1) identifying perceptual relationships that significantly contribute to 3D understanding and (2) quantifying their contributions. By integrating these spatialized relationship representations into monocular depth estimation tasks, we evaluate their effectiveness. Experiments on KITTI, NYU v2, and ICL-NUIM datasets validate the efficacy of this approach. Furthermore, incorporating the relationship spatialization framework into state-of-the-art depth estimation models results in marginal improvements across most evaluation metrics. In the second part, we extend perceptual relationship representations from single to multiple viewpoints. This extension enables the integration of richer 3D information into novel-view-related tasks, such as NVS, which requires generating multiple novel views. To this end, we introduce a VRT framework that predicts perceptual relationships from unseen viewpoints, thereby overcoming the constraints of view dependency. By capturing transformed relationship representations, this framework enhances 3D understanding in novel view synthesis tasks. In the third part, we aim to improve the perceptual quality of rendered novel views, by using HPP, which are sensitive to distortions in 2D images. To this end, we design a HuPPO framework to improve the quality of 3D renderings. The framework imposes ``human perception'' as guidance to learn perceptually satisfactory representations. At the same time, the human perception is formulated as a meta-learning objective function to regularize the training process. Evaluation in the novel view synthesis task demonstrates the strong effectiveness of the proposed framework. In the final section, we extend the aforementioned methods, which primarily enhance the quality of 3D modeling from perceptual representations, with a focus on improving efficiency and generalizability. To this end, we propose an HCDM for learning 3D models, represented as neural radiance fields or point clouds, using a latent diffusion model. The designed HCDM}improves the quality of 3D modeling by representing 3D models as functional parameters that are much fewer than the size of 3D models, allowing for an efficient representation. The HCDM is designed to learn the distribution of these parameters, improving generalizability across data from multiple modalities. The significant performance improvement of the proposed methods is further validated on 2D images and 3D motion data.

Description

Keywords

3D modeling, Representation Learning, Perceptual Relationship, Quality Enhancement

LC Subject Headings

Citation