Scaling Laws for Compute Optimal Biosignal Transformers

Fortin, Thomas

Scaling Laws for Compute Optimal Biosignal Transformers

dc.contributor.advisor	Tripp, Bryan
dc.contributor.author	Fortin, Thomas
dc.date.accessioned	2024-08-20T16:01:16Z
dc.date.available	2024-08-20T16:01:16Z
dc.date.issued	2024-08-20
dc.date.submitted	2024-08-13
dc.description.abstract	Scaling laws which predict the optimal balance between number of model parameters and number of training tokens given a fixed compute budget have recently been developed for language transformers. These allow model developers to allocate their compute budgets such that they can achieve optimal performance. This thesis develops such scaling laws for the Biosignal Transformer trained separately on both accelerometer data and EEG data. This is done by applying methods used by other researchers to develop similar scaling laws for language transformer models. These are referred to as the iso-FLOP curve method and the parametric loss function method. The Biosignal Transformer model is a transformer model which is designed specifically to be trained on tasks that use biosignals such as EEG, ECG, and accelerometer data as input. For example, the Biosignal Transformer can be trained to detect or classify seizures from EEG signals. The Biosignal Transformer is also of particular interest because it is designed to use unsupervised pre-training on large unlabelled biosignal datasets to improve performance on downstream tasks with smaller labelled fine-tuning datasets. This work develops scaling laws which optimize for the best unsupervised pre-training loss given a fixed compute budget. Results show that the developed scaling laws are successful at predicting a balance between number of parameters and number of training tokens for compute budgets five times larger than those used to develop them such that pre-training loss is minimized. Researchers who intend to scale up the Biosignal Transformer should use these scaling laws to attain optimal pre-training loss from their given compute budgets when applying unsupervised pre-training with the Biosignal Transformer.
dc.identifier.uri	https://hdl.handle.net/10012/20825
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	biosignal
dc.subject	compute optimal
dc.subject	unsupervised pre-training
dc.subject	scaling law
dc.subject	transformer
dc.title	Scaling Laws for Compute Optimal Biosignal Transformers
dc.type	Master Thesis
uws-etd.degree	Master of Applied Science
uws-etd.degree.department	Systems Design Engineering
uws-etd.degree.discipline	System Design Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Tripp, Bryan
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Fortin_Thomas.pdf
Size:: 1.77 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Systems Design Engineering