Corinna Leppin

Biography

Corinna is a PhD student at University College London (UCL), working in the UCL Tobacco and Alcohol Research Group. Her research interests include behavioural epidemiology, harm reduction, digital health, and intervention development. For her PhD project, she uses mixed methods to develop and evaluate a Just-In-Time Adaptive Intervention (JITAI) to prevent smoking lapses and aid in smoking cessation among smokers trying to quit. Prior to pursuing a PhD, she completed a BA in Psychological and Behavioural Sciences and an MSc in Health Psychology.

Abstract

Impact of sampling frequency, predictor count, and participant-specific data on predicting cigarette cravings and lapses in ecological momentary assessment data using random forests

This study aims to optimise ecological momentary assessment (EMA) sampling frequency, predictor selection, and training data requirements to balance user burden and the performance of supervised machine learning algorithms predicting high-risk moments for a smoking cessation just-in-time adaptive intervention (JITAI). It uses data from 37 smokers who completed 16 EMAs/day during the first 10 days of their quit attempt, reporting their mood, context, and behaviour. Random forest algorithms were used to predict lapses and cravings. Performance was evaluated using the median F1 score, examining the effects of varying the number of EMAs per day (16, 6, 5, 4, 3), number of predictors (via recursive feature elimination with cross-validation [RFE-CV]), and proportion of the test participant’s own data in the training set (none, 10%, 20%, 30%). Median F1 scores across out-of-sample individuals varied widely, though several trends emerged. Performance declined modestly with fewer prompts (e.g. 16 EMAs: 0.843, 5 EMAs: 0.791, 3 EMAs: 0.729). Feature reduction had a small effect (all predictors: 0.776, RFE-CV: 0.761). Surprisingly, algorithms using none of the test participant’s own data in the training set performed best (none: 0.795, 10%: 0.765, 20%: 0.736, 30%: 0.763). Although reducing the sampling frequency and number of predictors reduces the performance of algorithms predicting high-risk moments, the reduction is modest and may therefore present a reasonable trade-off with user burden when implementing smoking cessation JITAI. Omitting user-specific training data does not decrease performance, so could ease implementation and reduce user burden. Variable performance across individuals may limit the scalability of the algorithms.