Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
| dc.contributor.author | Chow, Adrian | |
| dc.date.accessioned | 2025-08-22T18:24:20Z | |
| dc.date.available | 2025-08-22T18:24:20Z | |
| dc.date.issued | 2025-08-22 | |
| dc.date.submitted | 2025-08-15 | |
| dc.description.abstract | 3D object detection is a fundamental task in the autonomous driving perception pipeline, where identifying and localizing objects within the surrounding environment is critical for safe and robust decision-making. However, traditional 3D object detectors are limited by their reliance on a closed set of training categories, rendering them incapable of recognizing novel or out-of-distribution objects encountered in open-world driving scenarios. To address this limitation, the field of open-vocabulary (OV)-3D object detection has emerged, aiming to generalize beyond predefined label sets by leveraging vision-language models (VLMs) to align 3D object proposals with semantically rich 2D language-informed features. Despite promising results, a major challenge in OV-3D object detection lies in achieving robust cross-modal alignment between 3D and 2D features, which is often compromised by noisy annotations, occlusions, and resolution inconsistencies that disrupt semantic coherence. In this thesis, we present OV-SCAN, a novel framework for Open-Vocabulary 3D object detection that enforces Semantically Consistent Alignment for Novel object discovery. OV-SCAN introduces a two-stage strategy: (1) discovering precise 3D annotations for novel objects using vision-language supervision, and (2) filtering out semantically inconsistent or low-quality 3D–2D training pairs that arise from annotation errors and sensor limitations. We validate the effectiveness of OV-SCAN through comprehensive experiments on autonomous driving benchmarks, where our framework consistently outperforms existing methods in the OV-3D object detection task. Overall, OV-SCAN underscores the critical role of semantic consistency in cross-modal alignment and demonstrates its potential as a scalable solution for discovering and localizing novel objects in real-world autonomous driving scenarios. | |
| dc.identifier.uri | https://hdl.handle.net/10012/22240 | |
| dc.language.iso | en | |
| dc.pending | false | |
| dc.publisher | University of Waterloo | en |
| dc.relation.uri | KITTI 3D Object Detection Dataset | |
| dc.relation.uri | NuScenes Dataset | |
| dc.subject | Open-Vocabulary | |
| dc.subject | Computer Vision | |
| dc.subject | 3D Object Detection | |
| dc.subject | Vision Language Model | |
| dc.subject | Autonomous Driving | |
| dc.title | Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection | |
| dc.type | Master Thesis | |
| uws-etd.degree | Master of Applied Science | |
| uws-etd.degree.department | Electrical and Computer Engineering | |
| uws-etd.degree.discipline | Electrical and Computer Engineering | |
| uws-etd.degree.grantor | University of Waterloo | en |
| uws-etd.embargo.terms | 0 | |
| uws.contributor.advisor | Czarnecki, Krzysztof | |
| uws.contributor.affiliation1 | Faculty of Engineering | |
| uws.peerReviewStatus | Unreviewed | en |
| uws.published.city | Waterloo | en |
| uws.published.country | Canada | en |
| uws.published.province | Ontario | en |
| uws.scholarLevel | Graduate | en |
| uws.typeOfResource | Text | en |