Scaling Laws for Compute Optimal Biosignal Transformers
Loading...
Date
2024-08-20
Authors
Advisor
Tripp, Bryan
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Scaling laws which predict the optimal balance between number of model parameters
and number of training tokens given a fixed compute budget have recently been developed
for language transformers. These allow model developers to allocate their compute budgets
such that they can achieve optimal performance. This thesis develops such scaling laws for
the Biosignal Transformer trained separately on both accelerometer data and EEG data.
This is done by applying methods used by other researchers to develop similar scaling laws
for language transformer models. These are referred to as the iso-FLOP curve method and
the parametric loss function method.
The Biosignal Transformer model is a transformer model which is designed specifically
to be trained on tasks that use biosignals such as EEG, ECG, and accelerometer data as
input. For example, the Biosignal Transformer can be trained to detect or classify seizures
from EEG signals. The Biosignal Transformer is also of particular interest because it is
designed to use unsupervised pre-training on large unlabelled biosignal datasets to improve
performance on downstream tasks with smaller labelled fine-tuning datasets. This work
develops scaling laws which optimize for the best unsupervised pre-training loss given a
fixed compute budget.
Results show that the developed scaling laws are successful at predicting a balance
between number of parameters and number of training tokens for compute budgets five
times larger than those used to develop them such that pre-training loss is minimized.
Researchers who intend to scale up the Biosignal Transformer should use these scaling
laws to attain optimal pre-training loss from their given compute budgets when applying
unsupervised pre-training with the Biosignal Transformer.
Description
Keywords
biosignal, compute optimal, unsupervised pre-training, scaling law, transformer