Computer Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9930
This is the collection for the University of Waterloo's Cheriton School of Computer Science.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Browsing Computer Science by Subject "3D object detection"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item LiDAR-based 3D Perception from Multi-frame Point Clouds for Autonomous Driving(University of Waterloo, 2025-05-13) Huang, Chengjie3D perception is a critical component of autonomous driving systems, where accurately detecting objects and understanding the surrounding environment is essential for safety. Recent advances in Light Detection and Ranging (LiDAR) technology and deep neural network architectures have enabled state-of-the-art (SOTA) methods to achieve high performance in 3D object detection and segmentation tasks. Many approaches leverage the sequential nature of LiDAR data by aggregating multiple consecutive scans to generate dense multi-frame point clouds. However, the challenges and applications of multi-frame point clouds have not been fully explored. This thesis makes three key contributions to advance the understanding and application of multi-frame point clouds in 3D perception tasks. First, we address the limitations of multi-frame point clouds in 3D object detection. Specifically, we observe that increasing the number of aggregated frames has diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To overcome this performance trade-off, we propose an efficient adaptive method termed Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. This adaptive approach reduces the inherent trade-offs of fixed aggregation, improving detection accuracy. Next, we tackle the challenge of applying multi-frame point cloud to 3D semantic segmentation. Point-wise prediction on dense multi-frame point clouds can be computationally expensive, especially for SOTA transformer-based architectures. To address this issue, we propose MFSeg, an efficient multi-frame 3D semantic segmentation framework. MFSeg aggregates point cloud sequences at the feature level and regularizes the feature extraction and aggregation process to reduce computational overhead without compromising accuracy. Additionally, by employing a lightweight MLP-based point decoder, MFSeg eliminates the need to upsample redundant points from past frames, further improving efficiency. Finally, we explore the use of multi-frame point clouds for cross-sensor domain adaptation. Based on the observation that multi-frame point clouds can weaken the distinct LiDAR scan patterns for stationary objects, we propose Stationary Object Aggregation Pseudo-labelling (SOAP) to generate high quality pseudo-labels for 3D object detection in a target domain. In contrast to the current SOTA in-domain practice of aggregating few input frames, SOAP utilizes entire sequences of point clouds to effectively reduce the sensor domain gap.Item Local and Cooperative Autonomous Vehicle Perception from Synthetic Datasets(University of Waterloo, 2019-09-23) Hurl, Braden; Czarnecki, Krzysztof; Waslander, StevenThe purpose of this work is to increase the performance of autonomous vehicle 3D object detection using synthetic data. This work introduces the Precise Synthetic Image and LiDAR (PreSIL) dataset for autonomous vehicle perception. Grand Theft Auto V (GTA V), a commercial video game, has a large, detailed world with realistic graphics, which provides a diverse data collection environment. Existing works creating synthetic Light Detection and Ranging (LiDAR) data for autonomous driving with GTA V have not released their datasets, rely on an in-game raycasting function which represents people as cylinders, and can fail to capture vehicles past 30 metres. This work describes a novel LiDAR simulator within GTA V which collides with detailed models for all entities no matter the type or position. The PreSIL dataset consists of over 50,000 frames and includes high-definition images with full resolution depth information, semantic segmentation (images), point-wise segmentation (point clouds), and detailed annotations for all vehicles and people. Collecting additional data with the PreSIL framework is entirely automatic and requires no human intervention of any kind. The effectiveness of the PreSIL dataset is demonstrated by showing an improvement of up to 5% average precision on the KITTI 3D Object Detection benchmark challenge when state-of-the-art 3D object detection networks are pre-trained with the PreSIL dataset. The PreSIL dataset and generation code are available at https://tinyurl.com/y3tb9sxy Synthetic data also enables data generation which is genuinely hard to create in the real world. In the next major chapter of this thesis, a new synthethic dataset, the TruPercept dataset, is created with perceptual information from multiple viewpoints. A novel system is proposed for cooperative perception, perception including information from multiple viewpoints. The TruPercept model is presented. TruPercept integrates trust modelling for vehicular ad hoc networks (VANETs) with information from perception, with a focus on 3D object detection. A discussion is presented on how this might create a safer driving experience for fully autonomous vehicles. The TruPercept dataset is used to experimentally evaluate the TruPercept model against traditional local perception (single viewpoint) models. The TruPercept model is also contrasted with existing methods for trust modeling used in ad hoc network environments. This thesis also offers insights into how V2V communication for perception can be managed through trust modeling, aiming to improve object detection accuracy, across contexts with varying ease of observability. The TruPercept model and data are available at https://tinyurl.com/y2nwy52o