Deep Learning for Peptide Feature Detection from Liquid Chromatography - Mass Spectrometry Data

dc.contributor.authorZohora, Fatema
dc.date.accessioned2022-04-26T22:08:05Z
dc.date.available2022-04-26T22:08:05Z
dc.date.issued2022-04-26
dc.date.submitted2022-04-22
dc.description.abstractProteins are the main workhorses of biological functions and activities, such as catalyzing metabolic reactions, DNA replication, providing structure to cells and organisms, etc. Comparative analysis of protein samples from a healthy person and disease afflicted person can discover disease biomarkers, which can be diagnostic or prognostic of the respective disease. Liquid chromatography with tandem mass spectrometry (LC-MS/MS) is the cutting-edge technology for protein identification and quantification. In this thesis, we target the first step in the LC-MS/MS analysis: peptide feature detection from LC-MS map, which is promising for disease biomarker discovery and protein quantification. LC-MS map is usually a three-dimensional plot where peptide features form multi-isotopic patterns. Each map may contain hundreds of thousands of peptide features, which frequently overlap, are tiny with respect to the background, and are often blended with feature-like noisy signals. All of these characteristics make peptide feature detection very challenging. However, deep learning is bringing groundbreaking results in various pattern recognition contexts. Therefore, in this thesis, we investigate deep learning models to address the peptide feature detection problem. Existing tools for peptide feature detection are designed with domain-specific parameters whose different settings bring very different outcomes and, thus, prone to human error. Moreover, they are hardly updated despite a vast amount of newly coming proteomics data. As a solution, we develop a foundation for applying deep learning in automating peptide feature detection for the first time. The main strength of our approach is that it provides higher sensitivity than other existing tools by learning necessary parameters through training on the appropriate dataset, and newly available information can be easily integrated through fine-tuning the model. We first propose DeepIso, combining convolutional neural network (CNN) and recurrent neural network (RNN), providing higher sensitivity for peptide feature detection than other existing models. Then we offer PointIso, a point cloud based (set of data points in space) deep learning model with attention-based segmentation, which is three times faster than DeepIso and improves the feature detection as well. PointIso's sensitivity for detecting identified spiked peptides on a benchmark dataset is about 98%, which is 5% higher than other existing models. Then we perform a quality assessment of the peptide features generated by PointIso, showing its potential for biomarker discovery. We also apply PointIso to relative peptide abundance calculation among multiple samples, demonstrating its utility in label-free quantification. Finally, we adapt our 3D PointIso model to handle 4D data, achieving 4-6% higher sensitivity than other algorithms on the human proteome dataset. Therefore, our model is transferable to various contexts. We believe our research makes a notable contribution to accelerating the progress of deep learning in the proteomics area, as well as general pattern recognition study.en
dc.identifier.urihttp://hdl.handle.net/10012/18181
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=5a6883e2807b473aa45d5e7e2910a10den
dc.relation.urihttp://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD001091en
dc.relation.urihttps://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD010012en
dc.relation.urihttps://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD012431en
dc.subjectproteomicsen
dc.subjectpeptide featureen
dc.subjectpeptide feature detectionen
dc.subjectPointNeten
dc.subjectconvolutional neural networken
dc.subjectattention in deep learningen
dc.subjectsegmentationen
dc.subjectrecurrent neural networken
dc.subjectlabel-free quantificationen
dc.subject4D peptide featureen
dc.subject3D peptide featureen
dc.subjectTimsTOF dataen
dc.subjectOrbitrap dataen
dc.subjectLiquid Chromatography Mass Spectrometryen
dc.subjectLC-MS peptide featureen
dc.subjectLC-MS dataen
dc.subjectMS1 dataen
dc.titleDeep Learning for Peptide Feature Detection from Liquid Chromatography - Mass Spectrometry Dataen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Science (Quantum Information)en
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorLi, Ming
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zohora_Fatema.pdf
Size:
9.11 MB
Format:
Adobe Portable Document Format
Description:
Main Article

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: