Deep Learning for Peptide Feature Detection from Liquid Chromatography - Mass Spectrometry Data
Loading...
Date
2022-04-26
Authors
Zohora, Fatema
Advisor
Li, Ming
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Proteins are the main workhorses of biological functions and activities, such as catalyzing metabolic reactions, DNA replication, providing structure to cells and organisms, etc. Comparative analysis of protein samples from a healthy person and disease afflicted person can discover disease biomarkers, which can be diagnostic or prognostic of the respective disease. Liquid chromatography with tandem mass spectrometry (LC-MS/MS) is the cutting-edge technology for protein identification and quantification. In this thesis, we target the first step in the LC-MS/MS analysis: peptide feature detection from LC-MS map, which is promising for disease biomarker discovery and protein quantification. LC-MS map is usually a three-dimensional plot where peptide features form multi-isotopic patterns. Each map may contain hundreds of thousands of peptide features, which frequently overlap, are tiny with respect to the background, and are often blended with feature-like noisy signals. All of these characteristics make peptide feature detection very challenging. However, deep learning is bringing groundbreaking results in various pattern recognition contexts. Therefore, in this thesis, we investigate deep learning models to address the peptide feature detection problem.
Existing tools for peptide feature detection are designed with domain-specific parameters whose different settings bring very different outcomes and, thus, prone to human error. Moreover, they are hardly updated despite a vast amount of newly coming proteomics data. As a solution, we develop a foundation for applying deep learning in automating peptide feature detection for the first time. The main strength of our approach is that it provides higher sensitivity than other existing tools by learning necessary parameters through training on the appropriate dataset, and newly available information can be easily integrated through fine-tuning the model. We first propose DeepIso, combining convolutional neural network (CNN) and recurrent neural network (RNN), providing higher sensitivity for peptide feature detection than other existing models. Then we offer PointIso, a point cloud based (set of data points in space) deep learning model with attention-based segmentation, which is three times faster than DeepIso and improves the feature detection as well. PointIso's sensitivity for detecting identified spiked peptides on a benchmark dataset is about 98%, which is 5% higher than other existing models. Then we perform a quality assessment of the peptide features generated by PointIso, showing its potential for biomarker discovery. We also apply PointIso to relative peptide abundance calculation among multiple samples, demonstrating its utility in label-free quantification. Finally, we adapt our 3D PointIso model to handle 4D data, achieving 4-6% higher sensitivity than other algorithms on the human proteome dataset. Therefore, our model is transferable to various contexts. We believe our research makes a notable contribution to accelerating the progress of deep learning in the proteomics area, as well as general pattern recognition study.
Description
Keywords
proteomics, peptide feature, peptide feature detection, PointNet, convolutional neural network, attention in deep learning, segmentation, recurrent neural network, label-free quantification, 4D peptide feature, 3D peptide feature, TimsTOF data, Orbitrap data, Liquid Chromatography Mass Spectrometry, LC-MS peptide feature, LC-MS data, MS1 data