Longitudinal trajectories of brain age in young individuals at familial risk of mood disorder from the Scottish Bipolar Family Study

Background: Within young individuals, mood disorder onset may be related to changes in trajectory of brain structure development. To date, however, longitudinal prospective studies remain scarce and show partly contradictory findings, with a lack of emphasis on changes at the level of global brain patterns. Cross-sectional adult studies have applied such methods and show that mood disorders are associated with accelerated brain aging. Currently, it remains unclear whether young individuals show differential brain structure aging trajectories associated with onset of mood disorder and/or presence of familial risk. Methods: Participants included young individuals (15-30 years, 53%F) from the prospective longitudinal Scottish Bipolar Family Study with and without close family history of mood disorder. All were well at time of recruitment. Implementing a structural MRI-based brain age prediction model, we globally assessed individual trajectories of age-related structural change using the difference between predicted brain age and chronological age (brain-predicted age difference (brain-PAD)) at baseline and at 2-year follow-up. Based on follow-up clinical assessment, individuals were categorised into three groups: (i) controls who remained well (C-well, n = 93), (ii) high familial risk who remained well (HR-well, n = 74) and (iii) high familial risk who developed a mood disorder (HR-MD, n = 35). Results: At baseline, brain-PAD was comparable between groups. Results showed statistically significant negative trajectories of brain-PAD between baseline and follow-up for HR-MD versus C-well ( β = -0.60, p corrected < 0.001) and HR-well ( β = -0.36, p corrected = 0.02), with a potential intermediate trajectory for HR-well ( β = -0.24 years, p corrected = 0.06). Conclusions: These preliminary findings suggest that within young individuals, onset of mood disorder and familial risk may be associated with a deceleration in brain structure aging trajectories. Extended longitudinal research will need to corroborate findings of emerging maturational lags in relation to mood disorder risk and onset.


Introduction
Mood disorders are amongst the most common psychiatric disorders, with a life-time prevalence of around 15% (Kessler & Bromet, 2013). Globally, they are the greatest contributor to non-fatal ill-health (World Health Organization, 2017). However, underlying biological mechanisms remain unclear. It is known that mood disorders are highly heritable and share complex genetic architecture; individuals with a family history of Bipolar Disorder (BD) have >10-fold increased risk of developing BD or Major Depressive Disorder (MDD) (Smoller & Finn, 2003). Mood disorders often manifest during adolescence and young adulthood (de Girolamo et al., 2012). During these life stages, age-related changes in brain structure contribute to cognitive development but also increase vulnerability to mental illness, including mood disorders (Andersen, 2003;Dahl, 2004).
From adolescence onward, decreases in brain grey matter and fine-tuning/stabilisation of synapses parallel changes in cognition and affect regulation (Giorgio et al., 2010;Spear, 2000). For higher-order cortical areas these structural trajectories extend into young adulthood (Gogtay et al., 2004;Wierenga et al., 2014;Wierenga et al., 2016). Previous prospective longitudinal studies including young individuals have shown inconsistent findings with regard to brain structure changes and mood disorder onset (Bos et al., 2018;Ducharme et al., 2014;Papmeyer et al., 2015a;Papmeyer et al., 2016;Whittle et al., 2014). The most consistent results suggest that the frontal cortex (Ducharme et al., 2014;Papmeyer et al., 2015a) and subcortical volumes (Whittle et al., 2014) show decelerated brain structure aging trajectories. Theoretically, decelerated trajectories may contribute to vulnerability to mood disorders, particularly when frontal-limbic brain systems of cognitive control and emotional stability are affected. Previous studies in this field mostly focused on specific regions of interest, so that spatially unbiased comprehensive approaches investigating global patterns are currently lacking. We were therefore interested in determining from longitudinal prospective data of young individuals, whether a comprehensive and spatially unconstrained measure of brain structure aging trajectory across the brain was related to concurrent mood disorder onset and/or to familial risk.
A new framework that allows for global assessment of age-related patterns of structural change in the brain involves the estimation of an individual's "biological brain age" from an MRI scan. Subsequent comparison with chronological age provides the brain-predicted age difference (brain-PAD) as a cross-sectional measure of brain aging. Conceptually, when brain aging trajectories that shape cognition and behaviour show individually different temporal dynamics, brain-PAD is expected to relate to relevant outcomes. Indeed, previous cross-sectional research with adults in their mid-later life showed that older appearing brains were associated with age-related diseases and mental illness (for overview see Cole et al., 2019), including mood disorders (Han et al., 2019;Koutsouleris et al., 2014;but no effect in Nenadić et al., 2017), and were furthermore predictive of mortality (Cole et al., 2018). Interestingly, accelerated brain aging in mood disorders is in accordance with accelerated biological aging (Rizzo et al., 2014;Sibille, 2013;Wolkowitz et al., 2011) as well as increased risk of age-related disease and mortality (e.g., Mezuk et al., 2008;Osby et al., 2001;Pan et al., 2011).
Within longitudinal designs including younger individuals, brain-PAD has potential to show the temporal origin of accelerated brain aging trajectories that have been observed in adult samples. Importantly, the brain-PAD approach has previously been applied and validated within samples of children and adolescents (Franke et al., 2012). Furthermore, a previous cross-sectional study did not find differences in brain-PAD between young adults at high familial risk with mood disorder diagnosis, those at high familial risk who were well, and control subjects (Hajek et al., 2019). To our knowledge, however, the current study is the first longitudinal study to apply brain-PAD methods within a sample of young individuals to investigate associations between mood disorder risk and onset, and age-related changes in brain structure.
Specifically, the current study investigated divergence of normative brain structure aging trajectories in young individuals by applying the brain-PAD framework within a prospective longitudinal design, starting before mood disorder onset. We used data from the Scottish Bipolar Family Study (SBFS), which included young individuals who were all initially well and some of whom had a close family history of BD. Within this cohort, our group previously identified differences in cortical thickness trajectories associated with high risk and mood disorder onset, including increased thickness of the left inferior frontal gyrus and left precentral gyrus in those at high risk who subsequently developed mood disorder versus cortical thickness reductions in those who remained well (Papmeyer et al., 2015a). By contrast, no subcortical volume markers of risk and illness were found (Papmeyer et al., 2016). Investigation of white matter structure at baseline furthermore revealed reduced white matter integrity associated with familial risk (Sprooten et al., 2011), and follow-up data suggested that this finding was related to sub-clinical symptoms rather than predictive of clinical outcome (Ganzola et al., 2018). The current study builds on previous research within the BFS cohort, which identified differences in specific grey matter regions and white matter abnormalities, by investigating global trajectories of grey matter structure associated with familial risk and onset of mood disorder.
Recognising similarities between BD and MDD in symptomatology and genetic architecture, as well as the difficulty of defining a definitive stable diagnosis at young age, early-onset mood disorder was defined as having an onset of MDD or BD during adolescence or young adulthood. The longitudinal character of the study enabled the investigation of brain-PAD over two years, to assess differential brain structure aging trajectories for those who were at high risk for mood disorder and/or subsequently developed illness.
Based on previous research, we predicted that mood disorder onset in youth would be associated with differential trajectories of brain structural change, without a specific hypothesis relating to the direction of this effect. Previous results from longitudinal developmental studies are inconclusive, but show weak evidence of decelerated trajectories, which also corresponds to theoretical developmental perspectives. Conversely, early adulthood may represent the temporal origin of the accelerated brain aging observed in older adults, in which case we would expect an effect of mood disorder in this direction instead. We also hypothesised that the presence of familial risk would be associated with differences in brain-PAD trajectory.

Participants
Participants were adolescents and young adults (N = 283, age 15-30 years) recruited as part of the Scottish Bipolar Family Study (SBFS) (Chan et al., 2016;Ganzola et al., 2018;Sprooten et al., 2011;Whalley et al., 2015). Participants at high familial risk of mood disorder (HR-participants) had at least one first-degree relative or two second-degree relatives with BD type-I, and were thus at increased risk of developing a mood disorder (i.e., over 10-fold increased risk for both BD and MDD) (Smoller & Finn, 2003). Unrelated control participants without family history of BD or other mood disorder were recruited from the social networks of HR-participants, and were matched to the HR-group by age and sex. Details of familial structure within the groups are described in the Extended data, which are available online (de Nooij, 2020). Exclusion criteria ensured that, at the time of recruitment, all participants had no personal history of MDD, mania or hypomania, psychosis, or any other major neurological or psychiatric disorder, substance dependence, learning disability, or head injury that included loss of consciousness, and that they were without contraindications to MRI. Therefore, all individuals (HR and control) were considered well at the baseline imaging assessment.
The following additional exclusion criteria were applied in the context of the current study: (i) missing MRI or age data (n = 40), (ii) scans of insufficient image or segmentation quality (n = 15) (iii) unclear or other psychiatric diagnosis without mood disorder (n = 5) (see Extended data (de Nooij, 2020)), and (iv) high familial risk for mood disorder without follow-up measurement (n = 9). These criteria excluded 69 participants, reducing the sample size to a total of 214 participants at timepoint 1 (108 HR-participants), with follow-up timepoint 2 data available for 133 of these participants (78 HR-participants).

Procedure
Participants of the SBFS were invited every two years for a total of four assessments over six years (Whalley et al., 2015). Participants were interviewed and screened with the Structured Clinical Interview for DSM-IV Axis-I Disorders (SCID) (First et al., 2002) by two trained psychiatrists at timepoint 1 to ensure that they were all initially well, and at timepoint 2 to determine the presence of any mood disorder meeting diagnostic criteria since the previous assessment. Timepoint 2 clinical information was available for 93% of the included control participants, and for all included HR-participants. Of note, control participants with missing clinical information at timepoint 2 (7%) also had missing timepoint 2 MRI data, but their baseline data was retained to increase the training sample size and contributed to statistical modelling of the mean brain-PAD at baseline. Using the same procedure as previous studies on this cohort (Chan et al., 2016;Whalley et al., 2015), participants were categorised as well or diagnosed with mood disorder according to available clinical information. Individuals with well outcomes at the earlier two assessments were assumed to have remained well in the absence of further clinical information to the contrary at timepoint 3 (see Appendix A.3, Table S1 in Extended data (de Nooij, 2020)). Additionally, however, if individuals were subsequently found to have been diagnosed with mood disorder at further assessments (n = 13), they were then categorised in the mood disorder group. Including these participants in the mood disorder group enables the investigation of early disease mechanisms, while keeping the well-groups as pure as possible. Group categorisation resulted in the following groups: control participants who remained well (C-well, n = 93), HR-participants who remained well (HR-well, n = 74), and HR-participants who developed a mood disorder (HR-MD, n = 35, including 6 BD). Thus, for the control group, individuals were included if they had remained without being diagnosed with any psychiatric disorder throughout the period of the study (via assessments and/or GP records). Those that did become unwell were a small group (n=12, including 2 BD) and are not included in the current analysis. This approach was used to reduce heterogeneity in the control group. For the high-risk group, none met clinical criteria for any mood disorder at baseline. If at any assessment they did meet criteria for a mood disorder (at the time of assessment or over the intervening period since the earlier assessment) they were considered as being in the HR-MD group. So, once someone had a diagnosis, even though they may not have been actively symptomatic at the time of the assessment, they were still considered as being in the mood disorder group. This approach was based on the premise that once an individual had met diagnostic criteria at any time over the course of the study, they were no longer 'high risk' for mood disorder, but actually a 'case'. The sample for our main analysis therefore consisted of 202 participants at baseline and 124 participants at follow-up.
The National Adult Reading Test (NART) (Nelson & Willison, 1991) and Hamilton Rating Scale for Depression (HRSD) (Hamilton, 1960) were administered at the time of scanning. Mania was also determined at every assessment using the Young Mania Rating Scale (YMRS, ref), however previous studies indicated that there were no significant differences between the three groups in terms of the YMRS cross-sectionally or over time, and the median and interquartile ranges were low, hence we did not specifically examine mania ratings in the current analysis (Papmeyer et al., 2015b;Papmeyer et al., 2016;Young et al., 2000). The participant's age at the time of each assessment was registered in years with a precision of two decimals. Assessments at timepoint 1, timepoint 2 and timepoint 4 included an MRI session, although only MRI measurements at timepoint 1 and timepoint 2 were considered within this study to restrict to a single scanner. The SBFS was approved by the Research Ethics Committee for Scotland (reference number 06/MRE00/9), and written informed consent including consent for data linkage via medical health records was acquired from all participants. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
MRI acquisition and pre-processing Timepoint 1 and timepoint 2 MRI sessions, approximately two years apart, were carried out on a 1.5 T Signa scanner (GE Medical, Milwaukee, USA) at the Brain Research Imaging Centre in Edinburgh and included a structural T1 weighted sequence (180 contiguous 1.2 mm coronal slices; matrix = 192 × 192; fov = 24 cm; flip angle 8°).
Pre-processing of T1 weighted scans was done in Statistical Parametric Mapping (SPM) version 12. The Computational Anatomy Toolbox (CAT) toolbox (version CAT12.3 (r1318)), which runs on SPM12 software, was used to segment T1-weighted MRI scans into different tissue types (for details see Appendix A.4 in Extended data (de Nooij, 2020)). A cross-sectional segmentation approach was utilised in order to maximise the size of our training sample, and the longitudinal aspect of our data was handled with repeated measures linear mixed modelling (see Comparison of brain maturation trajectories). This approach avoided the exclusion of participants with incomplete MRI data. CAT12 Quality Assurance metrics were used in combination with manual checks to achieve an objective and comprehensive procedure to exclude scans with artefacts or of otherwise insufficient quality (see Appendix A.4 in Extended data (de Nooij, 2020)). Subsequently, modulated grey matter maps (GMM) were smoothed with a Gaussian kernel (FWHM = 8 mm). After loading the smoothed GMM (sGMM) into Python version 3.5.4, voxels were resampled into voxels of double the original voxel size, i.e. 3 × 3 × 3 mm 3 . This reduced the number of voxels without further loss of spatial information. The sGMM were then masked with a threshold of 0.01 to ensure that voxels outside the brain were represented by value zero. The resulting sGMM were used as input for the brain-PAD model.

Brain-PAD model
To initially train the brain age prediction model, the training sample included all control and HR-participants that remained well (n = 167) in order to maximise the healthy sample size (a model including control participants only was considered underpowered, see Appendix A.5 in Extended data (de Nooij, 2020)). The current model was equally balanced across timepoint 1 and timepoint 2 measurements in order to maximise the age range. Specifically, each well-group participant provided one scan for the training sample: 48 timepoint 1 scans and 46 timepoint 2 scans (i.e. all available scans) for C-well, together with 37 scans per timepoint for HR-well. HR-well timepoint 2 scans were selected based on the highest chronological ages at follow-up so that the age range covered by the training sample was maximal (M age = 22.37, SD age = 2.94, age range = 15.2-28.1 years; for age distributions see Figure S1 in Appendix A.6 in Extended data (de Nooij, 2020)). Specifically, each well-group participant provided one scan for the training sample; this was a timepoint 2 scan for all 46 C-well participants with follow-up scan and the 37 HR-well participants with the highest chronological ages at follow-up.
Similar to previous studies (see Cole et al., 2019), the sGMM and corresponding chronological ages of the training sample were used to train a brain age prediction model. This model was implemented in Python (version 2.6.6). Corresponding to recent recommendations (Smith et al., 2019), this model initially consisted of dimension reduction of all sGMM voxels to 73 brain components (based on eigenvalue > 1) using principal component analysis (PCA) based on singular value decomposition (SVD) from scikit-learn (Pedregosa et al., 2011). We subsequently used these brain components (X) and chronological age (y) as input for estimating a Relevance Vector Regression (RVR) model with linear kernel (Tipping, 2001); this was implemented using the publicly available scikit-rvm package (version 14 May 2017). The RVR algorithm was chosen because kernel-based methods have been most commonly implemented in brain age models (Cole et al., 2019), because linear RVR was found to be the favourable algorithm in a previous brain-PAD study (Franke et al., 2010) and because RVR does not require estimation of hyperparameters using cross-validation (a procedure that would limit our sample size).
The trained model was then applied to each participant's sGMM to predict their brain age, ensuring that the participant for whom the brain age was being predicted was left out of the training sample to prevent bias (leave-one-out cross-validation). A residuals approach was used to regress out chronological age and gender, and subsequently calculate brain-PAD (for details see Appendix A.7 in Extended data (de Nooij, 2020)), i.e. the gap between brain age prediction and chronological age. This residuals based approach is typically used to derive measures of accelerated aging (e.g. epigenetic aging; Chen et al., 2016;Horvath, 2013) and is recommended for the brain-PAD approach (Smith et al., 2019).
Regarding brain development, a positive brain-PAD reflected a brain-predicted age older than the chronological age of the participant, while a negative brain-PAD indicated a brain-predicted age younger than the participant's chronological age. Changes in brain-PAD over time indicated a relative acceleration in brain maturation if brain-PAD became more positive (or less negative), or a relative deceleration in brain maturation if brain-PAD became more negative (or less positive).
Given the aim of the current study to specifically investigate brain structure aging trajectories within the SBFS cohort, as well as the demographics of our cohort (particularly the narrow age range, also including late adolescence), we achieved within-sample model evaluation based on the brain age predictions for the training sample, using leave-one-out cross-validation.

Comparison of brain maturation trajectories
Since the objective of this study was to investigate deviation of brain maturation trajectories in young individuals at high risk for mood disorder and the association with illness onset, participants were divided in three groups based on clinical information as described above. Clinical information from all available assessments was considered in group categorisation as described above.
In order to compare brain structure aging trajectories between groups, we applied a linear mixed model (LMM) to the brain-PAD measures, taking into account loss to follow-up as well as individual and family-related effects (Gueorguieva & Krystal, 2004). This was modelled using R (version 3.2.3) package nlme (version 3.1-122) with the formula: 'Brain-PAD ~ Timepoint × Group, random = ~1 | FamilyID / SubjectID'. Within this single pre-defined LMM model, we were interested in the following contrasts: group differences in brain-PAD at baseline (group effect), differential trajectories of brain-PAD between groups (group by timepoint interaction effect), and group differences at follow-up. For these contrasts we tested all three pairwise comparisons, and we multiple comparison corrected results (n = 3 pairwise comparisons) with the Holm-Bonferroni method (Holm, 1979) using R package emmeans (version 1.3.5.1).
Exploratory analyses were conducted to further explore group differences in Brain-PAD trajectories. Firstly, we tested a longitudinal model that considered the interaction effect between age (at baseline) and group on the difference in brain-PAD between baseline and follow-up; this was modelled in R using the formula: 'Brain-PAD_difference ~ Age_baseline × Group, random = ~1 | FamilyID/SubjectID'. A second exploratory analysis also modelled the brain-PAD trajectory for the group of control participants who developed a mood disorder (C-MD) within the LMM of the main analysis, considering the pairwise comparisons with control group C-well. In all of the analyses described above, continuous variables (brain-PAD, age) were transformed to Z-scores to retrieve standardised β-coefficients.

Demographic and clinical variables
Sample sizes, demographic information and clinical measures are presented in Table 1. There were no significant differences between groups with regard to age at either timepoint, and no differences in gender, handedness and NART intelligence quotient score. Aggregate information is available for each participant as Underlying data (de Nooij, 2020).
However, HR-MD participants reported greater depression symptomatology on the HRSD as compared to the groups of participants who remained well (C-well and HR-well) at both timepoints (Table 1). At baseline, seven HR-MD participants (20%; M HRSD = 11.4) reported subclinical symptoms of depression (defined as HRSD score > 7). At timepoint 2, ten HR-MD participants reported symptoms of depression (defined as HRSD score > 7). For two of these participants depression symptoms were at subclinical level, as they were not yet diagnosed with a mood disorder. In contrast, there was a very low prevalence of subclinical depression symptomatology within the well-groups: two participants at timepoint 1 (1.2%; M HRSD = 9.0) and two other participants (with included follow-up scans) at timepoint 2 (2.6%; M HRSD = 9.0).
Participants did not report any use of psychotropic medication at baseline. At follow-up, six included HR-MD participants reported the use of psychotropic medication, whereas none of the well-group participants received medication for the treatment of psychiatric symptoms.

Model evaluation
Our model showed a significant positive Pearson correlation between predicted brain age and chronological age (r(165) = .40, p < 0.001), and a mean absolute error (MAE) of 2.21 years (scaled MAE = MAE / age range = 0.17; see Appendix B.2 in Extended data (de Nooij, 2020)) within the training sample. For a discussion on model evaluation within the context of the current study, see Discussion and Appendix B.2 in Extended data (de Nooij, 2020).
The 73 brain components that were used as input for the brain age prediction algorithm indicated a mean total explained variance of 84.0% (SD = 0.0004) for all (leave-one-out) training sample dimension reduction iterations. These brain components showed loadings distributed across the brain, because dimension reduction was spatially unconstrained. This complicated unbiased interpretation (Smith et al., 2019), and therefore, also given our aim to comprehensively assess global patterns of brain structure aging trajectories, these components were not further explored. However, we do present visualisation of these brain components in order to illustrate the method ( Figure S3 in Appendix B.3 in Extended data (de Nooij, 2020)).

Comparison of brain maturation trajectories Comparison at baseline
Group allocation based on diagnostic information resulted in mean brain-PADs of +0.04 (SD = 1.14, n = 93) for C-well, -0.36 (SD = 1.22, n = 74) for HR-well, and -0.01 (SD = 1.39, n = 35) for HR-MD. Results of the LMM (Table 2) suggested C-well, group of participants without family history who remained well; HR-MD, group of participants at high familial risk who developed a mood disorder; HRSD, Hamilton Rating Scale for Depression; HR-well, group of participants at high familial risk who remained well; NART, National Adult Reading Test. Table 2. Fixed effects of linear mixed model applied to investigate group differences in the brain-predicted age difference (brain-PAD). C-well, group of participants without family history who remained well; HR-MD, group of participants at high familial risk who developed a mood disorder; HR-well, group of participants at high familial risk who remained well. Page 7 of 25 lower baseline brain-PAD for HR-well compared to C-well (-0.42 years; β = -0.37, p = 0.03, p corrected = 0.08), but statistical significance did not survive multiple comparison correction. There were no baseline differences in brain-PADs for HR-MD versus C-well (-0.05 years; β = -0.07, p corrected = 0.73) or HR-MD versus HR-well (0.35 years; β = 0.30, p corrected = 0.24).

Brain structure aging trajectories
Results showed a statistically significant timepoint by group interaction effect for HR-MD compared to C-well (-0.70 years; β = -0.60, p corrected < 0.001) and HR-well (-0.43 years; β = -0.36, p corrected = 0.02), indicating decelerating brain structure aging trajectories. Besides that, HR-well showed an intermediate trajectory (-0.28 years; β = -0.24, p corrected = 0.06) which was not statistically significant. Figure 1 displays brain maturation trajectories per group as modelled by unstandardised LMM fixed effects; for clarity, these trajectories are displayed relative to the control group following correction for the effects observed in C-well (i.e., intercept and the significant timepoint coefficient, see Table 2). Figure 2 shows the heterogeneity in observed brain maturation trajectories by displaying the participants' individual changes in brain-PAD over time.

Exploratory findings
Our exploratory longitudinal model showed similar group trajectories as our main model. Furthermore, this model suggested Figure 2. Display of brain structure aging trajectories per participant, reflecting a changing brain-predicted age difference (brain-PAD) between timepoint 1 and timepoint 2 (two years apart). Each panel contains the trajectories of one group in thin line graphs, whereas the thicker line graph represents the average trajectory of that group (of complete cases). The star dots display the mean brain-PAD at each timepoint. Left panel, C-well; Middle panel, HR-well; Right panel, HR-MD. Brain-PAD, brain-predicted age difference; C-well, group of participants without family history who remained well; HR-MD, group of participants at high familial risk who developed a mood disorder; HR-well, group of participants at high familial risk who remained well.

Figure 1.
Modelled fixed effects of the brain-predicted age difference (brain-PAD) per group, for clarity corrected for effects in C-well (i.e., the intercept and timepoint coefficients) as this group functions as control group. Shaded areas display standard errors of the timepoint by group interaction effects. Brain-PAD, brain-predicted age difference; C-well, group of participants without family history who remained well; HR-MD, group of participants at high familial risk who developed a mood disorder; HR-well, group of participants at high familial risk who remained well.

Discussion
The results of the current study showed that in young individuals at familial risk the onset of mood disorder was associated with differences in brain structure changes over time. Statistically significant reductions in brain-PAD indicated decelerated brain structure aging trajectories in young HR individuals who developed a mood disorder as compared to control and HR individuals who remained well. Intermediate effect sizes indicated that young individuals who were at risk but remained well showed intermediate trajectories. These preliminary findings suggest genetic predisposition to mood disorder is accompanied by changes in adolescent brain structural development trajectories that are increased with the onset of mood disorder.
Further research will be necessary to disentangle the role of genetic predisposition and additional environmental risk factors (e.g. adverse life events) on global age-related brain structure changes. As development of mood disorder was associated with a more decelerating trajectory, differences observed for the mood disorder group may also partly reflect prodromal symptoms or early-disease mechanisms of psychological stress. Further, in the familial risk group who became ill, we cannot disentangle separate effects of risk and depressive symptoms. Notably, all groups showed considerable heterogeneity in the direction, size and emergence of the individual brain-PADs. Additional research is therefore required to substantiate the hypothesis that the emergence of a lag in brain structure aging in youth indicates mood disorder onset and familial risk.
The current findings correspond with previous neuroimaging studies using different methods that also indicated deceleration, in the same as well as independent prospective longitudinal cohorts (Ducharme et al., 2014;Papmeyer et al., 2015a;Whittle et al., 2014). According to empirical-based neural models, dysfunctions in medial prefrontal networks and limbic areas underlie disturbances in emotion regulation and cognitive control (e.g. Drevets et al., 2008) which are proposed to play a causal role in the development of mood disorder (Nolen-Hoeksema et al., 2008;Phillips et al., 2008). Correspondingly, previous findings within the same cohort have revealed that illness risk and onset were associated with differential cortical thickness trajectories in prefrontal areas (Papmeyer et al., 2015a) as well as differential patterns of brain activation during emotional tasks in cortico-thalamic-limbic regions (Chan et al., 2016;Whalley et al., 2015), and neurocognitive performance was found to be a trait-marker of familial risk (Papmeyer et al., 2015b). Although the current study adopted a global approach (thus refraining from investigation of regional brain structural or functional development), we speculate about a potential neural mechanism by which decelerated trajectories of brain structural change in young individuals potentially disrupt frontal and limbic brain networks that underly emotion regulation and cognitive control, consequently increasing vulnerability to mood disorder. Inferences of causality however should be drawn with caution. It is important to consider that though prospective longitudinal studies are one approach to examining causal processes, interpretation is complex. In the current study for example, individuals who subsequently developed a mood disorder also showed higher mean subclinical depression symptomatology at baseline. This could be interpreted as a predictor of subsequent illness, or indeed as prodromal or early stages of the illness itself.
Importantly, our findings suggest disease-related brain aging deceleration may emerge in young individuals, in contrast to previous findings of accelerated aging in association with mood disorder (Koutsouleris et al., 2014;Sibille, 2013;Wolkowitz et al., 2011). However, we that note the majority of these studies are conducted in older adult samples, rather than younger adolescent cohorts. Given the non-linear trajectory of regional brain maturation / aging over the life-course, particularly over periods where there is significant developmental change (e.g. adolescence), it may not be the case that there is a simple retrospective linear trajectory of accelerated biological aging from these adult studies to periods earlier in life (Giedd et al., 1999;Scahill et al., 2003;Shaw et al., 2008;Tamnes et al., 2010;Wierenga et al., 2014). Further, this finding is consistent with wider theories of adolescent psychopathology, for example the 'dual system' model where delayed maturation in higher order cortical regions in relation to limbic regions is proposed to underlie difficulties in emotional regulation, cognitive function and social behaviour, increasing vulnerability to mood disorder (Casey et al., 2011).
The current study applied a novel pattern recognition method that, to our knowledge, has not been previously applied to a longitudinal cohort of young individuals at risk of mood disorder. This approach derives a global measure of brain structure, which captures the complexity of spatial and temporal dynamics of brain aging. Challenges in collecting clinical data from young individuals mean that large cohorts are scarce; the SBFS provided a unique opportunity to investigate the dynamics of brain structure aging trajectories in relation to mood disorder. Development of mood disorder was found to be associated with decelerated age-related changes in brain grey matter, which could not have been identified within a cross-sectional design. Clinical information was also available for up to six years, which produced some heterogeneity in the HR-MD group, due to a range in times before onset, but also provided confidence that those classified as HR-well were not in the early stages of mood disorder at the time of imaging assessments. Overall, the current study of the SBFS shows unique strengths for youth mental health research. Ongoing work on the sample seeks to implement data linkage at 10+ years to obtain more definitive, stable diagnoses.
One limitation of our study was the low correlation between brain age prediction and chronological age, reflecting suboptimal performance of the brain age prediction model, although this is probably also related to the tight age range of our sample as well as individual differences within this life stage. Performance of the model was constraint by the limited cohort size, as for the purpose of the current study, we adopted a within-sample approach. In order to maximise the use of available data in building our brain age prediction model, we adhered to a cross-sectional segmentation approach and included all individuals who remained well, also those at high familial risk for mood disorder. Brain-PAD slightly increased over time within the control group, indicating that a brain-PAD of zero did not indicate normative brain maturation in the current study (for details see Appendix B.2 in Extended data (de Nooij, 2020)). This suggests that the brain age prediction model was biased by familial risk related structural differences following inclusion of HR-well participants within the training sample. Although this explanation would not invalidate our results, as it would suggest valid comparison of relative group differences in brain-PAD, it is considered a limitation that we cannot reliably tease apart the familial risk effect from the normative trajectory. Importantly, we directly addressed potential threats to the validity of the brain age prediction model, to the extent possible within the current sample, with a deliberate pre-defined approach consisting of dimension reduction, sparse RVR modelling and a residuals approach, in order to prevent overfitting and thereby optimise the overall model validity. However, our prediction model was not validated for generalisability to other samples because of challenges related to scanner heterogeneity, so that transferability of the model remains uncertain. Further limitations of the current study are that we were unable to investigate differences between MDD and BD, and that we cannot exclude the possibility of medication effects, although use of psychotropic medications was limited within the current sample (see Demographic and clinical variables). Additionally, out-of-sample predictions often show more prediction error than within-sample predictions achieved using cross-validation (Varoquaux et al., 2017). Although participants from all three groups belonged to the same cohort and were recruited and assessed according to the same procedures, brain age predictions for individuals with mood disorder onset (i.e., HR-MD) were out-of-training-sample predictions, whereas predictions for individuals who remained well (i.e., HR-well and C-well) required leave-one-out crossvalidation. Although these differential prediction procedures may have led to increased prediction error for HR-MD, our finding of intermediate trajectories in the HR-well group suggests that results in HR-MD are unlikely to be driven entirely by increased random error. Further, additional testing of out of training sample scans of well-group participants indicated that prediction error was not significantly increased compared to within training sample estimates, conferring further confidence in our findings. However, taken together, the findings of the current study are considered preliminary as they should be interpreted in the context of these limitations. We also note that in the HR-MD group, the depression severity scores are relatively low. However, this group categorisation was based on the presence of a previous or current diagnosis, rather than on the severity of current symptoms at the time of the scan. One final limitation is the use of a 1.5T scanner. Though this is not ideal given current technology, other studies had successfully applied similar methods to 1.5T brain MRI scans previous to the current study (Franke & Gaser, 2012;Gaser et al., 2013). Tthe choice of scanner strength was determined at the beginning of this 10 year prospective study. We considered the value of having prospective longitudinal data, and together with detailed QC this provided sufficient confidence in the quality of data used in the current study.
In order to resolve the above limitations, future research should aim to replicate our results within a larger sample (Button et al., 2013;Jollans et al., 2019). Large-scaled and extended MRI follow-up assessments would furthermore allow the application of a longitudinal brain age prediction model, which will provide a more nuanced understanding of individual developmental trajectories. A sufficient sample size would also allow for investigation of MDD and BD separately, and could account for potential medication effects. Spatial interpretability of the current model's brain age prediction was limited, but with a larger sample methods such as orthonormal projective non-negative matrix factorisation (OPNMF) could provide information about specific regions or networks involved in associations between brain age and mood disorder (Sotiras et al., 2015;Sotiras et al., 2017;Varikuti et al., 2018). Additionally, our exploratory findings suggest it may be useful to investigate associations between brain structure aging trajectories and mood disorder in younger samples. This along with leveraging opportunities from other statistical approaches to determine causal directionality, could be implemented to attempt to understand causal relationships between imaging findings and mood symptoms. For now, the present study lays a theoretical and empirical foundation for the field to build upon, and will hopefully encourage further longitudinal studies of clinical youth cohorts. In the future, replication and further investigation of the association between mood disorder and decelerated brain structure aging trajectories may provide important insights into the prediction of mood disorder onset in young individuals.

Data availability
Underlying data Open Science Framework: Brain age trajectories and mood disorders (SBFS). https://doi.org/10.17605/OSF.IO/QKCYD (de Nooij, 2020). This project contains the following underlying data: • SBFS_Data (containing raw demographic information and brain ages for all participants) For reasons of confidentiality, we are unable to openly share underlying MRI data. Specifically, Research Ethics Committee approval for this project requires raw MRI data to remain stored within the computer system of University of Edinburgh. MR images and other types of data from the SBFS cohort can only be shared in anonymized form, and only with other organisations within the European Union, using a secure data transfer. For this reason, directly accessible underlying data provided via OSF only consists of the unidentifiable underlying data that is needed to reproduce the results of this manuscript, including the intermediate brain age prediction imaging phenotype. Access to MR images and other data requires the arrangement of a data access contract, please contact data holder Dr Heather Whalley (Heather.Whalley@ed.ac.uk) for more information and arranging access.
This project contains the following extended data: • SBFS_Code (containing annotated Python code for derivation of brain ages, R Markdown integrated code for statistical analysis).
Accessible data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0). I was slightly disappointed by the application of cross-sectional image processing for this longitudinally acquired sample. Some of the "considerable heterogeneity in the direction, size and emergence of the individual brain-PADs" (raised by the authors in discussion) across the two timepoints could potentially be ameliorated by longitudinal processing, which I hope will be applied to this dataset in future. I note that the linear mixed models employed here did not take into account the time interval between scans, which while acquired ~2 years apart on average, there is variability in this timing between scans (the range was 1.0-3.8 years, mean±SD=2.12±0.35 according to the supplied supplementary datafiles) which together with the lack of inclusion of baseline age in the main exploratory model and non-linear maturation trajectories not being assessed, may impact the interpretation of the findings. Not withstanding, the analysis reported here using these simpler statistical models are valid and provide interesting new data worthy of publication, as long as the limitations are clear. Incorporation of more complex statistical models which allow the simultaneous modelling of age, age 2 , time (between scans), and their interactions with diagnostic group -plus incorporation of time of diagnosis (in relation to date of scan acquisition) -will allow for greater resolution of the biological processes underpinning cortical trajectory differences in future analyses. This may be particularly important given the relationship between age at baseline and brain-PAD observed here, which suggests that the differences in cortical trajectories are more apparent at earlier time points.
The discussion describes and appropriately acknowledges the limitations of the study, particularly in regards to the low correlation with chronological age, the potential of medication effects (not examined but small exposed group), the potential for increased error in the HR-MD group due to exclusion of those individuals in the training set, and model transferability, the low resolution 1.5T scanner.
In regards to the aging model, there are a few limitations of which to be particularly cognisant. The aging model developed employed training on scans which were also used as targets for prediction, though I note that leave-one-out prediction was employed (for HR-well and C-well scans which were employed as training data) to ameliorate circularity of the model prediction. Of course, in a perfect world, it would be much better to train the model using an independent dataset to ensure the training and test sets are unique and to reduce potential for bias. However employing a training dataset acquired on a different scanner/s may also introduce bias and reduce model fit, so there are advantages to the approach taken by the authors in this regard, particularly if no alternative dataset with equivalent acquisition parameters was available for training. The authors' discussion of possible contributors to the low correlation reported here between chronological age and brain-age raised important and relevant points, and I agree that the tight age window (15-30 years) of the training dataset may result in a poorer correlation coefficient than models trained on samples with greater age variability due to subtler age-related brain differences over a smaller developmental window. It would be interesting to see whether the findings reported here are robust with the application of different (independent) brain-age estimators, such as that developed by the ENIGMA MDD working group (Han et al, Mol Psychiatry. 2020 1 ), although I do not suggest that as necessary for the publication of the current work. Furthermore, I note that the training set for the current study did not stratify based on gender, which may also contribute to poorer model fit due to sex-differences in brain maturation (nor did it appear to exclude relatives which may introduce artifact due to heritability of brain structures), and may warrant a further note in the discussion.
Can the authors please clarify: 1) whether relatives were included in the training dataset for model predictor; 2) at what stage the brain-PAD values were transformed to z-scores (i.e. was this done for all values generated for scan 1 and scan 2 simultaneously, or separately for scan 1 and then scan 2 data?); 3) the relationship between CAT12 quality assurance value and accuracy of predicted brain-PAD, particularly in regards to lower bounds for 'satisfactory', 'good' and 'excellent' are 70%, 80%, and 90% respectively.
Regarding the statistical analysis we acknowledge the additional limitations raised by the reviewer, including cross-sectional image processing, linear age predictors and variation in scanning interval. Future studies should apply more complex and more detailed statistical models, preferably on larger datasets in order to prevent overfitting and decrease the impact of data loss when applying longitudinal processing.
We appreciate that the reviewer acknowledges not only the limitations but also the advantages of our leave-one-out approach. We would like to add that we had also tested a different, externally validated brainage model (brainageR version 1, https://github.com/james-cole/brainageR) for this sample. This model was developed using a large adult sample (n=2001, age mean age = 36.95 ± 18.12, age range 18-90 years). This model however appeared more biased than our previous model and showed structural overestimations for our specific sample, and therefore less optimal for our purposes. To be specific, performance metrics with brainageR (as assessed within our model's training sample) were r = 0.39 for the correlation between brain age prediction and chronological age; MAE = 8.2 years; M brain age prediction = 30.1 (with sample M age = 22.4), SD brain age prediction = 6.3 years (with sample SD age = 3.0 years). This supports our notion that the demographics of our dataset complicated the use of externally validated models, hence our current approach.
The reviewer furthermore requested clarification on three matters, which we provide below: 1) Were relatives included in the training dataset for model predictor? In the model training phase we only left out scans of the subject for whom the brain age was being predicted. So, in some cases the scans of a sibling remained in the training set. We acknowledge this limitation but would also like to note that relatedness within the sample is limited and that we did take it into account within subsequent statistical models (i.e. as random effect).
2) At what stage the brain-PAD values were transformed to z-scores? This was done when running the statistical models (i.e. R formula: lme(scale(BrainPAD) ~ Wave * Group, random = ~ 1|PedID/ID, data=…). Given the long data format, scaling was done over all observations (i.e. across timepoints/waves). This is consistent with the approach of a single brain age prediction model (balanced with regard to timepoints). Furthermore, when one would scale separately per timepoint, the different scaling factor per timepoint would complicate interpretation.
3) What is the relationship between CAT12 quality assurance value and accuracy of predicted brain-PAD, particularly in regards to lower bounds for 'satisfactory', 'good' and 'excellent' are 70%, 80%, and 90% respectively? This is an interesting question to raise. We have now explored the relationship between Brain-PAD and image quality. We have created plots, which can be viewed in the OSF repository (https://osf.io/pz2x5/) under 'Supplementary information' > 'Plots_peer_review'. The plots and corresponding statistics show no correlation between image quality and Brain-PAD (T1: r = 0.056, p = 0.42; T2: r = -0.086, p = 0.32). To elaborate further, visual inspection suggest a minor, non-statistically significant trend in which higher image quality is associated with slightly more accurate brain age predictions / less bias, because (i) at the right end the regression lines approach brain-PAD = 0, and (ii) the SD of brain-PAD appears smaller when image quality is better.
Methods: Please specify more clearly in the methods that the time to follow up is 2 years. The design of the study is confusing and circuitous that it doesn't become apparent until the results that the time elapsed between baseline and follow up scans is 2 years. The difference between a 15 year old two years later is significantly different than a 30 year old 2 years later in terms of brain maturation. By which standard is 15-30 years of age considered to be a "tight" age range?
Methods: Please clarify the temporal reference for group classifications -is it at 2 year follow up or at 6 year follow up? Please provide justification for approach.
Methods: Was mania severity assessed in this sample on clinical follow up?
Results/Discussion: The HAMD scores upon follow up in the individuals categorized as HR-MD are quite low and in the upper limit of the mild range, with a mean that doesn't even reach the lower bound of the mild range. How do authors conceptualize these individuals relative to the healthy individuals in terms of clinical staging in the context of brain age? Is deceleration of brain age likely to be associated with an increased risk for clinical progression or precede early clinical progression? What kind of study design would be able to tease this apart? It would be helpful to include rational future directions beyond increasing sample size.
Scanning at 1.5 tesla significantly limits spatial resolution and should be noted as a limitation.
Minor comment: unless specified by the journal style, the term "ageing" spelled in this way is distracting.

Is the work clearly and accurately presented and does it cite the current literature? No
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? No Are the conclusions drawn adequately supported by the results? Yes brain age prediction software derived in independent samples (BrainAge). We therefore consider that though the study has limitations, and we acknowledge the need for replication, it represents an important foundation for further research in the field.
"Given that the sample involves young people with a family history of bipolar disorder, consider that as a more accurate representation of the title" We agree with the reviewer and have added the cohort name to the title of the manuscript for clarity, this now reads "Longitudinal trajectories of brain age in young individuals at familial risk of mood disorder from the Scottish Bipolar Family Study" "Methods: Please specify more clearly in the methods that the time to follow up is 2 years. The design of the study is confusing and circuitous that it doesn't become apparent until the results that the time elapsed between baseline and follow up scans is 2 years. The difference between a 15 year old two years later is significantly different than a 30 year old 2 years later in terms of brain maturation. By which standard is 15-30 years of age considered to be a "tight" age range?" We have now clarified in the methods that the time between scans was ~ 2 years. We also agree with the reviewer that the change over time for a 15 year old will be different than that for someone aged 30 years. For this reason, we performed an age interaction model, investigating the potential effect of age on these relationships. These results were initially included as exploratory findings in supplemental information and indeed demonstrated larger decelerations for younger participants in the HR-MD group. These results are now detailed in the main body of the manuscript. This information has now been added to the manuscript. In most cases individuals were seen every 2 years for a total of 6 years (x3 assessments, but only 2 scans) For the control group, individuals were included if they had remained without being diagnosed with any psychiatric disorder throughout the period of the study (via assessments and/or GP records). Those that did become unwell were a small group (n=12) and are not included in the current analysis. This approach was used to reduce heterogeneity in the control group. For the high-risk group, none met clinical criteria for any mood disorder at baseline. If at any assessment they did meet criteria for a mood disorder (at the time of assessment or over the intervening period since the earlier assessment) they were considered as being in the HR-MD group. So, once someone had a diagnosis, even though they may not have been actively symptomatic at the time of the assessment, they were still considered as being in the mood disorder group. This approach was based on the premise that once an individual had met diagnostic criteria at any time over the course of the study, they were no longer 'high risk' for mood disorder, but actually a 'case'.

"Methods: Was mania severity assessed in this sample on clinical follow up?"
Yes, mania was also assessed at every assessment using the Young Mania Rating Scale (YMRS), however previous studies indicated that there were no significant differences between the three groups in terms of the YMRS cross sectionally or over time, and the median and interquartile ranges were low, hence we did not specifically examine mania ratings in the current analysis (Papmeyer et al, Psychiatry Res. 2016 Feb 28;248: 119-125;Papmeyer et al, Psychol Med. 2015 Nov;45(15): 3317-3327.). This information has now been added to the manuscript.
"Results/Discussion: The HAMD scores upon follow up in the individuals categorized as HR-MD are quite low and in the upper limit of the mild range, with a mean that doesn't even reach the lower bound of the mild range. How do authors conceptualize these individuals relative to the healthy individuals in terms of clinical staging in the context of brain age? Is deceleration of brain age likely to be associated with an increased risk for clinical progression or precede early clinical progression? What kind of study design would be able to tease this apart? It would be helpful to include rational future directions beyond increasing sample size." The reviewer is correct that the HAMD scores are relatively low, we however note that the depression status was not defined according to the severity of current symptoms at the time of the scan, but based on lifetime depression, a note to this effect has been added to the discussion. Regarding the second point, it is not possible to disentangle whether the differences in brain age are a cause or consequence of clinical progression. In future studies, earlier scans prior to prodromal phases of disease, along with leveraging opportunities from other statistical approaches to determine causal directionality could be implemented to attempt to understand these relationships. This has also now been added to the discussion.
"Scanning at 1.5 tesla significantly limits spatial resolution and should be noted as a limitation." This is now explicitly mentioned as a limitation in the manuscript. As above we also now acknowledge that though this is not ideal given current technology, the choice of scanner strength was determined at the beginning of this 10 year prospective study. We considered the value of having prospective longitudinal data, together with detailed QC (using software quality assurance metrics in combination with manual checks excluding scans with artefacts or of otherwise insufficient quality) provided sufficient confidence in the data quality used in the current study.
"Minor comment: unless specified by the journal style, the term "ageing" spelled in this way is distracting." All spellings of the term "ageing" have now been changed to "aging"

Competing Interests: No competing interests
Partly represents an important foundation for further research in the field.
"There needs to be more frank acknowledgement that the brain aging difference in the risk group that developed mood episodes may have been a result of the depression in addition to the risk state per se." We have now added a section to the discussion addressing this point, explaining that the brain aging differences may have resulted from depression and risk state in combination, and that we cannot disentangle effects of risk and depressive symptoms in the current study.
"The study used a 1.5T MRI scanner. Is it possible that this did not provide sufficient resolution for such a study?" We also now acknowledge this point in the limitations section. Though this is not ideal given current technology, the choice of scanner strength was determined at the beginning of this 10 year prospective study. We considered the value of having prospective longitudinal data, together with detailed QC (using software quality assurance metrics in combination with manual checks excluding scans with artefacts or of otherwise insufficient quality) provided sufficient confidence in the data quality used in the current study. To elaborate from a technical point of view, 1.5T MRI scans indeed provide more noisy data and lower resolution than 3T MRI data. Smoothing is a recommended pre-processing step that reduces the noise on voxel level and was also applied in the pipeline of the current study. Subsequently, in order to appropriately deal with this high-dimensional data, voxels were resampled into greater voxel sizes in order to reduce the number of variables (features) for the machine learning approach (note however that due to previous smoothing, this step did not further reduce the resolution). We find that these recommended pre-processing steps reduce noise and resolution, but that the resolution of our pattern recognition approach is still higher than with standard sMRI analysis approaches in which voxels are often averaged across areas. The last step of the pipeline was the extraction of features with PCA. We thus applied multiple pre-processing steps, and although derived and pre-processed variables will also be more precise when input variables are more precise (e.g. due to higher scanner strength), we would argue that the low scanner strength in itself does not comprise the methods of this study. In addition, we reference several other studies that applied similar approaches also including 1.5T sMRI data (e.g. Franke et al., NeuroImage, 2012, 63, 1305-1312Gaser et al, PLOS ONE, 2013, 8(6), e67346).