Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection

dc.contributor.authorChow, Adrian
dc.date.accessioned2025-08-22T18:24:20Z
dc.date.available2025-08-22T18:24:20Z
dc.date.issued2025-08-22
dc.date.submitted2025-08-15
dc.description.abstract3D object detection is a fundamental task in the autonomous driving perception pipeline, where identifying and localizing objects within the surrounding environment is critical for safe and robust decision-making. However, traditional 3D object detectors are limited by their reliance on a closed set of training categories, rendering them incapable of recognizing novel or out-of-distribution objects encountered in open-world driving scenarios. To address this limitation, the field of open-vocabulary (OV)-3D object detection has emerged, aiming to generalize beyond predefined label sets by leveraging vision-language models (VLMs) to align 3D object proposals with semantically rich 2D language-informed features. Despite promising results, a major challenge in OV-3D object detection lies in achieving robust cross-modal alignment between 3D and 2D features, which is often compromised by noisy annotations, occlusions, and resolution inconsistencies that disrupt semantic coherence. In this thesis, we present OV-SCAN, a novel framework for Open-Vocabulary 3D object detection that enforces Semantically Consistent Alignment for Novel object discovery. OV-SCAN introduces a two-stage strategy: (1) discovering precise 3D annotations for novel objects using vision-language supervision, and (2) filtering out semantically inconsistent or low-quality 3D–2D training pairs that arise from annotation errors and sensor limitations. We validate the effectiveness of OV-SCAN through comprehensive experiments on autonomous driving benchmarks, where our framework consistently outperforms existing methods in the OV-3D object detection task. Overall, OV-SCAN underscores the critical role of semantic consistency in cross-modal alignment and demonstrates its potential as a scalable solution for discovering and localizing novel objects in real-world autonomous driving scenarios.
dc.identifier.urihttps://hdl.handle.net/10012/22240
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.uriKITTI 3D Object Detection Dataset
dc.relation.uriNuScenes Dataset
dc.subjectOpen-Vocabulary
dc.subjectComputer Vision
dc.subject3D Object Detection
dc.subjectVision Language Model
dc.subjectAutonomous Driving
dc.titleSemantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentElectrical and Computer Engineering
uws-etd.degree.disciplineElectrical and Computer Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorCzarnecki, Krzysztof
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chow_Adrian.pdf
Size:
9.98 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: