Self-supervised representation learning (SSL) powers modern AI systems and enables the learning of information rich data representations from sparsely labelled datasets. Medical and consumer health applications greatly benefit from SSL: for example, wearable health datasets are typically collected at high sample rates (e.g., >4Hz) while annotations (e.g., self-reported surveys or clinician assessments) are much less frequent. SSL ensures we make use of all of the high-frequency unlabelled wearable data when training a machine learning model, before fine-tuning it with the small set of labels available for the task of interest.
However, training a good model with a self-supervised learning objective is not trivial. Objective functions of SSL models are often biased to learning a subset of features. If these features are irrelevant to (or worse spuriously correlated with) the task of interest, the ML model we train will lack accuracy and robustness when we deploy it.
When working with longitudinal health data (e.g., wearable or speech) this is a significant challenge. There is a lot of heterogeneity in these data that is irrelevant to the prediction task we are interested in developing an algorithm for. This heterogeneity stems from large shifts in the distribution of the signal between individuals (e.g., due to different skin properties or anatomy of the vocal organs). Without adjusting SSL training recipes, these large irrelevant differences may bias the model to learn features that are not helpful for the task of interest, thus hurting performance.
Motivated by this challenge, we are studying how to adapt SSL methods such as contrastive learning to be more robust to irrelevant distribution shifts in the training data, and instead learn features that are helpful for the prediction tasks of interest.