Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection

Chow, Adrian

Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection

Files

Chow_Adrian.pdf (9.98 MB)

Date

2025-08-22

Authors

Chow, Adrian

Advisor

Czarnecki, Krzysztof

Publisher

University of Waterloo

Abstract

3D object detection is a fundamental task in the autonomous driving perception pipeline, where identifying and localizing objects within the surrounding environment is critical for safe and robust decision-making. However, traditional 3D object detectors are limited by their reliance on a closed set of training categories, rendering them incapable of recognizing novel or out-of-distribution objects encountered in open-world driving scenarios. To address this limitation, the field of open-vocabulary (OV)-3D object detection has emerged, aiming to generalize beyond predefined label sets by leveraging vision-language models (VLMs) to align 3D object proposals with semantically rich 2D language-informed features. Despite promising results, a major challenge in OV-3D object detection lies in achieving robust cross-modal alignment between 3D and 2D features, which is often compromised by noisy annotations, occlusions, and resolution inconsistencies that disrupt semantic coherence. In this thesis, we present OV-SCAN, a novel framework for Open-Vocabulary 3D object detection that enforces Semantically Consistent Alignment for Novel object discovery. OV-SCAN introduces a two-stage strategy: (1) discovering precise 3D annotations for novel objects using vision-language supervision, and (2) filtering out semantically inconsistent or low-quality 3D–2D training pairs that arise from annotation errors and sensor limitations. We validate the effectiveness of OV-SCAN through comprehensive experiments on autonomous driving benchmarks, where our framework consistently outperforms existing methods in the OV-3D object detection task. Overall, OV-SCAN underscores the critical role of semantic consistency in cross-modal alignment and demonstrates its potential as a scalable solution for discovering and localizing novel objects in real-world autonomous driving scenarios.

Keywords

Open-Vocabulary, Computer Vision, 3D Object Detection, Vision Language Model, Autonomous Driving

URI

https://hdl.handle.net/10012/22240

Collections

Theses
Electrical and Computer Engineering

Full item page

Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections