Evaluating Synthetic Data as a Proxy for Real Clinical Data in Machine Learning Models: A Comparative Study on Postpartum Hemorrhage Prediction

dc.contributor.authorSharma, Kam
dc.date.accessioned2024-08-30T16:10:04Z
dc.date.available2024-08-30T16:10:04Z
dc.date.issued2024-08-30
dc.date.submitted2024-08-14
dc.description.abstractIntroduction: This thesis investigates the use of synthetic data as a proxy for real clinical data in predictive modeling, focusing on postpartum hemorrhage (PPH). Synthetic data offers a solution to privacy concerns by providing data that mimics real patient data without compromising patient information. The goal is to develop and validate predictive models for PPH using synthetic data and comparing it to the real data, thereby assessing the feasibility and effectiveness of synthetic data in clinical settings. Methods: Synthetic data was generated using Generative Adversarial Networks (GANs) from MDClone to replicate the statistical properties of real clinical data from Ottawa Hospital. The data underwent a thorough cleaning and preparation process, followed by feature selection. Machine learning and statistical models, including logistic regression, decision trees, random forests, and support vector machines, were developed and trained on the synthetic data and then the pipeline was run on the real data at Ottawa Hospital. Model performance was evaluated using precision, recall, F1-score, accuracy, and Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve. Results: The synthetic data closely mirrored the real data in statistical properties, with low Hellinger distances for most variables. Machine learning models trained on synthetic data demonstrated high performance, with comparable results to those trained on real data. Key predictors for PPH were determined which included the administration of certain medication and clinical parameters. The comparative analysis showed minimal discrepancies between model outputs from synthetic and real data, validating the use of synthetic data for predictive modeling. Discussion: The findings indicate that synthetic data can effectively be used to develop predictive models for PPH, and addressing data accessibility. The study highlights the potential of synthetic data to enhance predictive modeling in healthcare, providing a viable alternative to real data without compromising accuracy. The integration of synthetic data in clinical research can facilitate broader data availability, fostering innovation while adhering to privacy regulations. Conclusion: This research demonstrates the viability of synthetic data in predictive modeling for PPH, with models trained on synthetic data showing high performance comparable to those trained on real data. The study contributes to the theoretical understanding of synthetic data utility and offers practical implications for improving patient outcomes and optimizing healthcare resources. Future research should focus on expanding the use of synthetic data in other clinical areas and further validating its effectiveness in diverse healthcare settings.
dc.identifier.urihttps://hdl.handle.net/10012/20927
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectSynthetic Data
dc.subjectGANS
dc.subjectArtificial Intelligence
dc.subjectAI
dc.subjectMachine Learning
dc.subjectPostpartum Haemorrhage
dc.subjectPPH
dc.subjectSynthetic Health Data
dc.subjectSynthetic Data utility Assessment
dc.subjectSynthetic Data Fidelity Assessment
dc.subjectUtility
dc.subjectFidelity
dc.titleEvaluating Synthetic Data as a Proxy for Real Clinical Data in Machine Learning Models: A Comparative Study on Postpartum Hemorrhage Prediction
dc.typeMaster Thesis
uws-etd.degreeMaster of Public Health and Health Systems
uws-etd.degree.departmentSchool of Public Health Sciences
uws-etd.degree.disciplinePublic Health and Health Systems
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorChen, Helen
uws.contributor.affiliation1Faculty of Health
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sharma_Kam.pdf
Size:
1.98 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: