Bioinformatics, Computational and Systems Biology
Jennifer Ferina (she/her/hers)
Graduate Student
Rensselaer Polytechnic Institute
Troy, New York, United States
Melanie Kruger
Undergraduate Research Assistant
Rensselaer Polytechnic Institute, United States
Uwe Kruger
Professor of Practice
Rensselaer Polytechnic Institute, United States
Daniel Ryan
Internal Strategic Goal Manager
The Center for Discovery, United States
Conor Anderson
Director of Bioinformatics
The Center for Discovery, United States
Jenny Foster
Director of Adult Psychology, Behavior Specialist
The Center for Discovery, United States
Theresa Hamlin
President
The Center for Discovery, United States
Juergen Hahn
Professor and Department Head
Rensselaer Polytechnic Institute, United States
Data were obtained from The Center for Discovery (TCFD), a nonprofit provider of educational, health, clinical, and residential services for children and adults with ASD and other complex disabilities. Data included various information related to sleep, gastrointestinal (GI), and behavior on a daily basis for extended periods of time. Additional weather, moon, and allergen data were obtained from the National Oceanic and Atmospheric Administration6,7, MoonCalc.org8, and the Armonk, NY station of the American Academy for Allergy, Asthma & Immunology9,10, respectively. Data inclusion criteria included that the individual must live at TCFD, < 19 years of age at the start of the study, have an ASD diagnosis, and exhibit challenging behaviors on between 10% and 90% of days, with sufficient variation, and a minimum of 20 observations. Up to 18 months5 of data were selected per individual starting in July 2015 or the start of all data types’ collection at least one month post-admission, whichever happened later. Adaptive linear neuron (ADALINE)11 models based on direct kernel transformations and logistic activation functions were used to predict the behavior of each person. Three behavior cohorts were included: AGG, SIB, and either AGG or SIB, known as BOTH. The model was trained on 85% of the data and tested on 15%, and train/test sets as well as model weights were randomly selected across 30 random seeds. To determine feature contributions, a sensitivity analysis was performed for each individual to determine how sensitive the model was to a random perturbation of each variable.
The direct kernel classification models were able to predict behavior reliably with a balanced accuracy averaging 71.3%, with a maximum average accuracy of 85.0% for the BOTH cohort and with behaviors occurring between 40% and 70% of the time. Overall, the average accuracy was 67.7% when including the more extreme imbalanced cases. These results were obtained based on the validation sets. All individuals in this cohort had at least one important feature in each of the GI, moon, and weather feature categories. Most individuals also had at least one important feature in the allergen and sleep categories. In summary, there are a number of variables that affect the prediction accuracy, confirming that the heterogeneity of the presentation of ASD may have a variety of potential causes. An important finding, however, is that the contribution of sleep variables was not as important as some other variables. The sleep data in this particular dataset, however, had fewer features than in other studies, suggesting that sleep variables that were not included in this study may be important for behavior prediction. There are several opportunities for future directions that arise from this work, including but not limited to investigating the effect of diets and medications as well as collecting a broader range of sleep-related variables.