Cohort profile for the STratifying Resilience and Depression Longitudinally (STRADL) study: A depression-focused investigation of Generation Scotland, using detailed clinical, cognitive, and neuroimaging assessments

STratifying Resilience and Depression Longitudinally (STRADL) is a population-based study built on the Generation Scotland: Scottish Family Health Study (GS:SFHS) resource. The aim of STRADL is to subtype major depressive disorder (MDD) on the basis of its aetiology, using detailed clinical, cognitive, and brain imaging assessments. The GS:SFHS provides an important opportunity to study complex gene-environment interactions, incorporating linkage to existing datasets and inclusion of early-life variables for two longitudinal birth cohorts. Specifically, data collection in STRADL included: socio-economic and lifestyle variables; physical measures; questionnaire data that assesses resilience, early-life adversity, personality, psychological health, and lifetime history of mood disorder; laboratory samples; cognitive tests; and brain magnetic resonance imaging. Some of the questionnaire and cognitive data were first assessed at the GS:SFHS baseline assessment between 2006-2011, thus providing longitudinal measures relevant to the study of depression, psychological resilience, and cognition. In addition, routinely collected historic NHS data and early-life variables are linked to STRADL data, further providing opportunities for longitudinal analysis. Recruitment has been completed and we consented and tested 1,188 participants.


Introduction
Why was the study set up? Major depressive disorder (MDD) affects approximately 13% of the population at least once in their lifetime 1 , and remains a leading cause of economic burden and non-lethal global disability 2,3 due to its recurrent or chronic nature. At present, MDD diagnosis is based on arbitrary and clinically heterogeneous criteria 4 . Consequently, and even with optimal management, much of the disability caused by MDD persists 5 because of the absence of targeted disease-modifying treatments. The underlying pathophysiology of MDD is believed to be heterogeneous 6 , with genetic and environmental factors acting to influence disease expression. Thus, it is important for treatment to shift from the current "trial and error" approach, towards precision prevention and stratified medicine based on markedly different disease mechanisms. However, progress in this area has been severely restricted because the aetiology of MDD is complex, and remains poorly understood.

STratifying
Resilience and Depression Longitudinally (STRADL) aims to subtype MDD on the basis of its aetiology using detailed clinical, cognitive, and brain imaging assessments. STRADL will examine the interaction between genetic and environmental factors that increase risk and occurrence of different MDD subtypes, and assess common and distinct mechanisms and clinical trajectories of MDD phenotypes. Additionally, STRADL aims to assess individual resilience, or the ability to adapt positively and 'avoid' psychopathology despite exposure to known risk factors such as stress, early-life adversity, and family history 7 . Stratification of MDD will be based on several variables to address its underlying causal and clinical heterogeneity, including: age of onset of MDD; single episode or recurrent depression; obstetric trauma; and developmental factors such as childhood maltreatment, early socioeconomic adversity, and stressful life events. Our key initial predictions are that depression can be stratified on the basis of age of onset into early-onset forms that show a stronger phenotypic and genetic relationship with schizophrenia and other severe mental disorders, and later onsets that show stronger associations with cardiovascular disease and dementia.
STRADL was built on the Generation Scotland: Scottish Family Health Study resource (GS:SFHS) 8 , which undertook its first major baseline assessments between 2006 and 2011. GS:SFHS is a population-based study of genetic health and complex disease in a cohort of 24,096 individuals, who have been extensively phenotyped for MDD and related traits. This cohort provides an important opportunity to study geneenvironment interactions, and remains one of the richest sources of data available, incorporating linkage of existing phenotypic and genomic data, detailed lifestyle and socioeconomic characterisation, extensive eHealth Record linkage 9 , and the inclusion of two longitudinal birth cohorts -the Walker birth cohort 10 , and Aberdeen Children of the 1950s (ACONF) 11 . Table 1 shows data linkages between the current study and existing datasets.
The first wave of STRADL included depression-focused follow-up assessment of GS:SFHS, which involved remote questionnaires that specifically assessed aspects of psychological resilience, coping style, and response to psychological distress; study protocol and cohort characteristics are described elsewhere 12 . Here, we describe the second wave of STRADL, a depressionfocused deep phenotyping face-to-face assessment, using detailed clinical and cognitive tests, and neuroimaging. The results describe the cohort profile and baseline questionnaire and cognitive data, and we provide a summary of key demographic data from the current wave of STRADL, compared to STRADL remote follow-up and wider GS:SFHS baseline assessment. A summary of all data available and the proportion of valid and useable data is also provided.

Who is in the cohort?
We aim to study people both with and without depression, and therefore our recruitment targeted the whole GS:SFHS population, not merely people with a depression history. GS:SFHS included participants aged 35-65 years who were identified at random from collaborating medical practices across Scotland, with some family members further afield. Initially, only Glasgow and Tayside areas were involved, but the study was extended in 2010 to include Ayrshire, Arran and Northeast Scotland, with the age range also broadened (to 18-65 years). Participants were included if: they met the age criteria; had capacity to give informed consent; and could identify at least one first-degree relative who would also participate. Follow-up of participants was done through the NHS Scotland Community Health Index (CHI): 7% of the original cohort could not be matched; no participants withdrew; and ~1,200 had died. Those who

Amendments from Version 1
The revised manuscript includes amendments to the body of the paper in several areas, along with some changes to Table 1 and Figure 1. More specifically, the Introduction now describes in greater detail the focus of stratification of MDD and resilience, and we describe some specific initial hypotheses. The Introduction also now more clearly describes the overlap and distinction between the current wave of STRADL and studies on which the current project was built on -specifically, the wider Generation Scotland population and the first wave of STRADL. Related to this point, we include an additional table (Table 1), which shows a list of data linkages between the current study and existing datasets. We also amended Figure 1 so that it now shows the recruitment and attrition for the first wave of STRADL, as well as the current wave of STRADL. The Methods now includes a clearer and more detailed overview of the wider Generation Scotland population, specifically the selection process and recruitment criteria. The Results includes further analysis of demographic (i.e., age) differences between the current study and existing datasets. In the Discussion we present a more detailed and clearer overview of the study limitations, particularly focusing on how the dataset for the current study may be influence by cultural and societal norms and/or possible section bias Any further responses from the reviewers can be found at the end of the article Participants in the Tayside and Grampian areas who had already taken part in GS:SFHS between 2006-2011, and who were eligible for re-contact, were sent a postal invitation by the University of Dundee Health Informatics Centre (HIC). Included in the invitation was a reply slip to indicate whether the participant would be willing to undergo face-to-face assessment and brain magnetic resonance imaging (MRI), described here. Those who replied positively were contacted by telephone by a researcher at the most local recruitment centre. In Dundee (Tayside) recruitment targeted members of the Walker cohort, and in Aberdeen (Grampian) recruitment initially targeted members of ACONF, due to the rich early-life data already available for these cohorts.

REVISED
In total, 5,649 potential participants were invited to take part in the study; 576 (10.2%) were members of ACONF; 1,103 (19.5%) were members of the Walker cohort; and 3,970 (70.3%) were members of the wider GS:SFHS population. Out of these potential participants, 646 (11.4%) people declined participation at first point of contact with HIC, and we received no reply from 3,358 (59.4%) people, even after sending up to three reminders. Initially, 1,645 (29.1%) people responded positively; however, a further 170 (3.0%) declined once they were contacted by our research team or withdrew before consenting. Recruitment ended in May 2019 and we consented and tested 1,188 (72.2%) of positive respondents across Aberdeen (n = 582) and Dundee (n = 606) sites. This meant that we tested 21% of the n = 5,649 who were initially invited to participate. Figure 1 shows the recruitment process and attrition.
What has been measured and when? Table 2 shows all variables collected in STRADL face-to-face assessments, and those that were repetitions of the GS:SFHS baseline assessment and STRADL remote questionnaire follow-up. Before any new data were collected, participants signed a consent form permitting data and samples to be shared with other researchers through a secure data management system, and provided permission to be re-contacted in the future for additional research. Consent for linkage of participant data and samples to routine NHS records was previously obtained as part of the original GS:SFHS (05/S1401/89). All subsequent procedures were conducted following an independent, but linked, ethics application (14/SS/0039).
At each site participants attended three testing 'stations', which involved i) collection of clinical and questionnaire data, and biological samples ii) cognitive assessment, and iii) neuroimaging, the order of which varied at random between participants. Data from the clinical station were collected without a set order; however, cognitive tests were administered in the same order for each participant, and the MRI sequences also remained the same -except for one fMRI task, which was counterbalanced (details described in section Brain magnetic resonance imaging). All measures were administered in accordance with rigorous standard operating procedures based on best practice.

Clinical assessment
Medical history was updated from previous GS:SFHS baseline assessment and any new diagnoses or medical episodes were recorded. General health and lifestyle data were also collected, as were the physical measurements of height, weight, two automated measures of blood pressure, and left-and righthand grip strength (using a Patterson Medical Jamar hand dynamometer). We collected laboratory blood samples for genetic and additional genomic analyses, including the study of DNA methylation, transcription (RNA) and protein expression. Additionally, detailed questionnaire data were collected that will be used to test the structure of depressive symptoms and their association with each measure.

Laboratory samples.
A small sample of hair was collected from the posterior vertex region for longitudinal cumulative cortisol. Cortisol assays from hair samples provide a more stable marker of chronic cortisol exposure compared to cross-sectional blood or urine samples, which show considerable diurnal variation 13 . Other available assays include cortisone, testosterone, progesterone, and dehydroepiandrosterone. Venepuncture was carried out using a butterfly needle kit. Blood was extracted into the following vacutainers types (analyses in parentheses): 1) EDTA (Full blood count; FBC); 2) clot activator gel for serum separation (C-reactive protein CRP); 3) EDTA (DNA extraction); 4) 2 x Tempus RNA (RNA extraction); 5) EDTA (plasma biomarkers); 6) Lithium Heparin (plasma biomarkers). FBC and CRP samples were taken and sent to NHS laboratories for screening of clinically significant markers of anaemia and inflammation. When blood samples could not be collected, a saliva DNA collection kit (Oragene or GeneFiX) was used instead. These laboratory samples were temporarily stored at each collection site. RNA and blood DNA samples were stored at -80°C, and all others at -20°C, before being sent to the Edinburgh Clinical Research Facility at the University of Edinburgh for analysis and long-term storage. A summary of the completeness of these clinical data is shown in Table 3 and Table 4. FBC and CRP analysis are complete, and other blood and hair samples are in the process of being analysed.
Clinical interview and questionnaire data. All participants were assessed for a lifetime history of MDD. We used a research version of the Structured Clinical Interview for DSM-IV disorders (SCID) 14 to assess symptoms of mood disorder Table 3. Summary of demographic and background data, and proportion valid data (n = 1,188).

Health and lifestyle
Alcohol history 99.7 Smoking history 99.7 Fatigue (preceding 3 months) 86.7 Loneliness (preceding 1 week) 99.9 Medical or mental health history 99.9

List of medications 100
Current infection(s) 96.6

Cardiovascular health
Stroke or transient ischaemic attack 99.9 Myocardial infarction or angina 99.9 Other heart disease 99.9 Peripheral arterial disease 99.9 Jaw claudication 96.5

99.9
Hypertension 100 Hypercholesterolaemia 99.9 (including MDD and episodes of mania and hypomania), repeating the GS:SFHS baseline assessment. Diagnostic criteria were based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR). For participants who met full criteria for MDD, we assessed if any episode had a post-partum onset, and if criteria for melancholic or atypical MDD subtypes were met. The research version of the SCID was designed to allow assessors to systematically evaluate individuals against the key DSM-IV-TR criteria for unipolar depression and bipolar disorder. The SCID has good reliability, and it is considered the "gold standard" in determining clinical diagnoses and their accuracy 15 .
Participants also completed a series of short questionnaires that assessed resilience, psychological well-being and mild psychiatric problems, and personality, some of which were repeated after first being completed for GS:SFHS (see Table 2). The Brief Resilience Scale (BRS) 16 is a six-item questionnaire used as a measure of psychological resilience, or the ability to 'bounce back' from stress. Participants were assessed for a life history of cannabis use using the Drug Use questionnaire developed for UK Biobank 17 . Those who used cannabis more than once were asked follow-up questions about the frequency and functional impact of their use. Three mood questionnaires were administered: the Mood Disorder Questionnaire (MDQ) 18 , which is a sensitive screen for bipolar spectrum disorders; the Quick Inventory of Depressive Symptomology (QIDS) 19 , which is a 16-item inventory designed to assess the severity of depressive symptoms; and the Hospital Anxiety and Depression Scale (HADS) 20 anxiety subscale (seven items) was used to screen for symptoms of anxiety. In addition, the General Health Questionnaire (GHQ) 21 , a 28-item test, was used to assess general psychological well-being on four scales: somatic symptoms; anxiety; social dysfunction; and depression. We used a Likert scoring system for the GHQ to calculate scores for each scale separately, as well as a total score.
We administered two measures that assess core personality traits: we used the neuroticism and extraversion scales from the Eysenck Personality Questionnaire -Revised Short Form (EPQ-R) 22 , each of which has 12 items; and the International Personality Item Pool (IPIP), Five-Factor Personality Inventory 23 , which is a 50-item questionnaire that assesses the following core personality traits: extraversion; agreeableness; conscientiousness; emotional stability (the reverse of neuroticism); and imagination/intellect (similar to openness). Additionally, the General Causality Orientations Scale 24 , which consists of 12 vignettes describing scenarios to determine each person's orientation of causality 25 , was used to assess one's inclination towards being motivated autonomously, externally, or passively.
Finally, we assessed early-life adversity (childhood or adolescent abuse or neglect) using the Childhood Trauma Questionnaire (CTQ) 26 . This is a 28-item retrospective inventory that assesses three areas of abuse (emotional, physical, and sexual) and two areas of neglect (emotional and physical). The CTQ also includes a minimisation/denial scale that identifies potential underreporting of maltreatment. A mean score was calculated for each measure by totalling the item responses, with appropriate reverse scoring (e.g., GHQ, BRS). Higher scores represent higher levels of psychological distress, personality trait, or childhood trauma, except for the BRS where higher scores indicate greater resilience. Scoring and interpretation of data were based on the administration manual of each test.

Cognitive testing
The cognitive tests that were applied will be used to assess the cognitive phenotype of depression, and whether genetic risk variants are related to impairment in specific cognitive domains. As with the questionnaire data, some cognitive tests were also repetitions of the GS:SFHS baseline assessment ( Table 2). We included "cold" (emotion-independent) and "hot" (emotion-laden) cognitive tests, given growing evidence for distinct and interacting relationships between depression and measures of hot and cold cognition 27 .
The cold cognitive test battery included validated and widely used cognitive tests that measure crystallised-and fluid-type cognitive tasks. The Mill Hill Vocabulary test 28 was used as a measure of acquired verbal intelligence, and is an estimate of 'crystallised intelligence' and peak cognitive ability. The Controlled Oral Word Association task 29 was used as a measure of phonemic verbal fluency using three letters (C, F, and L). The Digit Symbol Coding subtest from the Wechsler Adult Intelligence Scale-III 30 was used to measure information processing speed. A United Kingdom version of the Logical Memory subtest from the Wechsler Memory Scale-III 31 was used to assess verbal memory and provided a measure of immediate and delayed verbal declarative recall. Total scores were created for each cognitive test by adding the number of correct responses; higher scores indicate better performance. The Matrix Reasoning test, a paper adaptation of the computerised version from the COGNITO psychometric examination 32 , was used to measure perceptual organisation and visuospatial logic. A summary of all mood, personality, and cognitive data and their completeness is shown in Table 4.
Three 'hot' cognitive measures were administered on a touchscreen laptop computer. The first task -the Bristol Emotion Recognition Test -consisted of 96 trials (16 of each emotion) that assessed recognition of six basic human facial emotions (happiness, anger, sadness, disgust, surprise, and fear), and biases in the attribution of emotion. The Affective Go/No-Go task comprised 120 trials that assessed behavioural inhibition using facial emotional stimuli (happy, sad, and neutral expressions). Finally, given evidence for impairments in reward processing in depression 33 , we also included a modified version of the Cambridge Gambling Task, which assesses decision-making, risk-taking behaviour, and reward processing (30 trials). These three tests are described in detail elsewhere 34,35 .

Brain magnetic resonance imaging
The neuroimaging protocol will allow analysis of potential risk factor relationships with brain structure and function, and test neurobiological mechanisms that are associated with depressive symptoms and resilience. In Aberdeen, participants were imaged on a 3T Philips Achieva TX-series MRI system (Philips Healthcare, Best, Netherlands) with a 32 channel phasedarray head coil and a back facing mirror (software version 5.1.7; gradients with maximum amplitude 80 mT/m and maximum slew rate 100 T/m/s). A projector and "Presentation" (Neurobehavioural Systems Inc, Berkeley, CA, USA) version 18.1 were used for the presentation of task-based fMRI. In Dundee, participants were scanned using a Siemens 3T Prisma-FIT (Siemens, Erlangen, Germany) with 20 channel head and neck phased array coil and a back facing mirror (Syngo E11, gradient with max amplitude 80 mT/m and maximum slew rate 200 T/m/s). A magnetic resonance compatible LCD screen was used to display fMRI (NordicNeuroLab, Bergen, Norway) task stimuli using "Presentation" version 20.0.
Both centres used the same protocol including structural and functional sequences. The structural sequences collected were as follows: 3D T1-weighted fast gradient echo with magnetisation preparation; 3D T2-weighted fast spin echo; 3D Fluid Attenuation Inversion Recovery (FLAIR); Diffusion Tensor Imaging (DTI); and Susceptibility Weighted Imaging (SWI) or T2*-weighted gradient echo. The functional sequences comprised of two task-based fMRI tasks and a resting state fMRI sequence. The sequence parameters, as well as the order of acquisitions, are presented in Table 5.
T1-weighted images of the brain were used to assess brain regional volumes, cortical thickness, gyrification index, voxelbased morphometry analysis, certain lesions such as lacunes, cortical and larger subcortical infarcts, and will also serve as the basis for co-registration with other sequences. A 3D T2weighted sequence was used to detect lacunes, perivascular spaces, cortical and subcortical infarcts, and other morphological measurements, such as hippocampal subfield extraction. A 3D FLAIR was used to detect white matter hyperintensities. SWI data, for the determination of brain microbleeds, basal ganglia mineralisation, and cortical superficial siderosis, were acquired using a 3D multi-echo gradient-echo sequence in Aberdeen and a single-echo protocol in Dundee. Phase and magnitude data were saved for the calculation of T2* relaxation. All vascular lesions listed above are defined in the Standards for Reporting Vascular changes on Neuroimaging standards 36 . All structural images were reviewed by a neuroradiologist for visual analysis of vascular changes and incidental findings. Whole-brain DTI were recorded to allow assessment of microstructural integrity of white matter including fibre direction and structural connectivity. This protocol reflects established neuroimaging approaches as used in several large cohort studies of ageing and of cerebrovascular diseases 37,38 .
There were two task-based fMRI sequences: an implicit emotional processing task (fearful versus neutral faces), and a modified version of an instrumental reward task with an additional choice value component. Both of these, as well as resting state fMRI, were acquired at 30 degrees away from the anterior commisure-posterior commisure (AC-PC), towards the coronal plane. The fearful faces from NimStim 39 facial stimuli set assessed emotional-limbic circuitry through a block fMRI design, and measures the brain's neural responses to the viewing of fearful faces in the absence of learning. In order to avoid a gender bias of the images, two versions of the tasks were used, counterbalanced across participants. The Reward task measured reward-related brain activity using an event-related fMRI design in a reinforcement learning context. The resting state fMRI was used to investigate functional connectivity and brain networks.

Results
What are the key findings?
Here, we report findings for the complete data set including 1,188 participants. Table 6 shows some demographic similarities and differences between the current STRADL cohort and existing samples. More specifically, the median age of the STRADL sample was 62 years, which is older compared to both STRADL remote follow-up and wider GS:SFHS populations. Analysis of group differences for age showed that participants who are part of ACONF were generally older (M = 62.32; SD = 1.55) compared to the Walker cohort and wider GS:SFHS participants (ts ≥ 6.28; ps < .001). However, age did not differ (t = 1.83; p = .068) between participants in the Walker cohort (M = 59.48; SD = 6.76) and those in GS:SFHS (M = 58.28; SD = 12.63). Gender distributions were comparable to existing data, with 59% being female, and our sample had higher levels of education (university-level education = 40%), compared to existing data. Furthermore, based on SCID interviews, a higher proportion of STRADL participants were diagnosed with a lifetime history of mood disorder (30.7%), compared to GS:SFHS (13.2%). Out of the total sample, 28.8% received a diagnosis of MDD, and a further 1.9% of bipolar disorder. Recurrent mood disorder was present in 72.7%, and melancholic features (56%) were dominant in the group. Of those with a diagnosis, 71% were female. Overall, however, the cohort was of good psychological health at the time of assessment, as indicated by mean scores on the GHQ, HADS, and QIDS (Table 7), which fell below the thresholds for the presence of  Table 7.

Discussion
Strengths and limitations STRADL data have been robustly collected on a wide range of key phenotypes that allow epidemiological study of depression and resilience in a population-based cohort. The MRI and detailed depression phenotyping protocol described here was cross-sectional; however, STRADL can provide longitudinal measures of cognition, personality, and psychological health. This is because many of the cognitive tests applied in STRADL are deliberately the same as those used at the GS:SFHS baseline assessment, as well as some personality and mood measures, as shown in Table 2. The availability of repeated cognitive and questionnaire testing allows us to assess potential determinants of change in cognition and psychological health. Similarly, routine NHS data, and ACONF and Walker cohorts' early-life variables, are linked to STRADL data, providing further opportunities for longitudinal predictors on depression and resilience across the full life-course. However, a limitation of these data is that, as the present study is based on ages of different cohorts including people born in the 1950s onwards, early life variables are likely to have been influenced by cultural and societal norms of that time. For example, people born in the 1950s and 1960s may not have considered housing circumstances such as crowdedness as an 'adverse experience', contrary to more contemporary perspectives. These possible confounding variables will be considered in downstream analyses, but it may be difficult to completely mitigate their effects.
A further limitation of this study includes possible selection bias as we tested only 21% of the total pool of eligible and invited participants (n = 5,649). As with many longitudinal population studies, participants in this cohort were more likely to be of good health, and come from more advantaged backgrounds such as higher education and better socioeconomic circumstances than the population in general -findings that are similar to GS:SFHS and STRADL remote follow-up cohort profiles 8,12 , and UK Biobank 17 . Further, because our cohort was of good psychological health at the time of testing, it seems possible that we did not capture as many people with poorer psychological health as what might be available in the wider cohort. Notably, however, participants from a range of health and demographic backgrounds were represented in this group.

Ethical approval and consent
Ethical approval for the study was obtained from the Scotland A Research Ethics Committee (REC reference number 14/55/0039) and the local Research and Development offices. All participants provided written informed consent prior to the collection of any data or samples.

Data availability
All data underlying the results are available as part of the article and no additional source data are required.
A phenotype data dictionary is available and open access genome-wide association study summary statistics can be downloaded. Non-identifiable information from the GS:SFHS cohort is available to researchers in the UK and to international collaborators through application to the Generation Scotland Access Committee (access@generationscotland.org) and through the Edinburgh Data Vault (https://doi.org/10.7488/ 8f68f1ae-0329-4b73-b189-c7288ea844d7). Generation Scotland operates a managed data access process including an online application form, and proposals are reviewed by the Generation Scotland Access Committee. The data and samples collected by the STRADL study have been incorporated in the main Generation Scotland dataset and governance process. Summary information to help researchers assess the feasibility and statistical power of a proposed project is available on request by contacting resources@generationscotland.org.
using detailed clinical, cognitive, and brain imaging assessments. Recruitment has been completed and we consented and tested 1,188 participants.
The manuscript describes the methods of data collection in STRADL included: socio-economic and lifestyle variables; physical measures; questionnaire data that assesses resilience, early-life adversity, personality, psychological health, and lifetime history of mood disorder; laboratory samples; cognitive tests; and brain magnetic resonance imaging. Some of the questionnaire and cognitive data were first assessed at the GS:SFHS baseline assessment between 2006-2011, thus providing longitudinal measures of depression and resilience. The presented results are purely description of the sample. Overall, the study has a lot of potential in providing links to the biomarkers and predictors of depression and resilience, including personality traits, early trauma exposure, and genetic vulnerability. However, it would help to provide clarification about study initial hypotheses and the analytical plan.
The title of the study implies that stratification can be made between depression and resilience longitudinally. Nothing in the description explains what will be done about subtyping MDD (only unipolar and bipolar depression diagnoses are mentioned), or what would it take to stratify by resilience and depression.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? No
My main comments are related to trying to improve the value to the wider scientific community and are in the context of someone who does not work predominantly in mental health, but does work on cohort studies and recognises that for those studies we work on we often forget how much is 'inside our heads' and may be assumed to much of readers who are nothing like as familiar as 'we are' (in this case the authors of this paper who know GS-SFHS and STRADL in considerable detail). There were places where I felt a bit lost and could have done with more detail that publication on WOR supports.
Please note I felt obliged to tick 'not approved' as that was the only option that states I am suggesting revisions that I feel have to be undertaken. I think this is an important paper that should be indexed and widely read. I do think that some of my key concerns to require that changes are made to the paper.
Specifically: Some brief description of the selection criteria for GS-SFHS and its initial aims would be really useful. Did it cover the whole of Scotland? Was it family/house based inclusion criteria? What were the initial inclusion and exclusion criteria? How many in that cohort as a whole have been lost to follow-up? A paragraph providing this background context (rather that relying on the reader knowing GS-SFHS or going and reading that cohort profile) would be really useful.

1.
Similarly, some detail on STRADL -i.e. the first (remote) assessment I think is essentialwhat were the eligibility criteria at the start of STRADL? What was the response to that first remote STRADL assessment? What data were collected in that remote STRADL? And were the aims of that remote STRADL assessment the same as those for this more detailed faceto-face follow-up? This to me is essential both to understand the value of this recent face-toface assessment and to understand some of the demographic differences presented and the potential for selection bias in STRADL as a whole and also this face-to-face assessment 2.
Related to point 2 above I think that Figure 1 needs to be amended so that STRADL remote is also included -this way a clearer picture of the relation between that and the current face-to-face is clear.

3.
The overall response of the 5649 invited to take part in the face-to-face presented here is 21% (N = 1188) that fact is never mentioned and it really has to be -it should be stated in the abstract; clearly shown in Figure 1 and stated in the text related to that figure and needs to be discussed in more detail in the discussion limitations (again it is not mentioned as a limitation in that section).

4.
It is stated that STRADL recruitment and data collection ended in May 2019 but I could not find any statement showing when it started. As this is a study about longitudinal analyses that to me seems essential to report. As noted below, I found It would also be useful to include in that for the Walker and the ACONF the baseline collection of data in those (e.g. could be an arrow back to 1956 for ACONF) and also the date ranges of different linkages (I think for some hospital records may be only back to 90s and may differ between 'psychiatric' diagnoses and 'physical' diagnoses).

5.
I found the content, including column headings of Table 1 confusing, in particular as the content often seemed to be contradicted by the text that referred to that table. Specifically: The chronology and repeats seemed erroneous. How can the GS-SFHS be a repeat when that is the baseline measure .. yet the heading calls it 'repeat'. Similarly, the 1.

6.
STRADL remote is repeat but that was also before the current face-to-face. As above no specific date is given for STRADL face-to-face it simply says 'current', which isn't quite true if recruitment and data collection finished in May 2019 -what is needed is a start and end date for each of these column headings, and I would order chronologically and take out 'repeat' from any of the column headings as it is clear if something has been measured more than once from the entries (or at least should be).
In the text it states that cognitive function measures assessed at STRADL face-to-face are the same as those done at GS-SFHS in several places (including in the discussion as a strength that these were "deliberately the same"), but in Table 1 under cognitive measurements there is no overlap in the ticks for measures from GS-SFHS and STRADL face-to-face. If some were deliberately repeated there should be a tick for both GS-SFHS and STRADL face-to-face.

2.
The text also implies that DNA was taken again at the face-to-face but there is only a tick for that at GS-SFHS & the same applies for some of the plasma biomarkers.

3.
Could the authors clarify when the demographic data presented in Table 4 were taken from. I assume that they were all taken from GS-SFHS baseline assessment, which would be the appropriate comparison (i.e. suggesting that those who agree to participate in the initial STRADL study (the remote one) were older at that baseline than those who did not respond/take part in that and those who took part in STRADL face-to-face). Wherever they are taken from it is important this is stated in the text describing the table and title / footnote to the table; at the moment I could not see it anywhere.

7.
It would be useful in Table 4 and/or in text to have some indication of demographic differences in the two birth cohorts (Walker and ACONF) at each of the assessments. This seems important to me as the older age in STRADL face-to-face (and possibly STRADL remote but not possible to tell without further information on that) could be due to oversampling form those birth cohorts. The Walker cohort were born 1952 to 1966 and the ACONF in 1956 -if that makes them generally older that the GS-SFHS to me that has different implications for selection than if they are similarly aged to the rest of GS-SFHS.

8.
For a cohort profile -which in my book is about telling the international academic community what is available and what it might be used for and where there are limitations and caution required -I was very surprised at the lack of any discussion of limitation. To me the following all need detailed discussion: Sample size. The aim of STRADL and in particular the face-to-face follow-up described here is said to be to "examine the interaction between genetic and environmental factors that increase risk and occurrence of different MDD subtypes, and assess common and distinct mechanisms and clinical trajectories of MDD phenotypes. Additionally, STRADL aims to assess individual resilience, or the ability to adapt positively and 'avoid' psychopathology despite exposure to known risk factors such as stress, early-life adversity, and family history" 'Interactions' and 'subtypes' (i.e. with the two together subgroups within subgroups) require very large sample sizes to give meaningful results and it is not clear to me how this will be possible with just over 1000 participants. Some examples of clinically meaningful results for stratified medicine / precision prevention 1.

9.
(the focus of the introduction and justification for the STRADL sub study) would be really valuable.
Selection bias As noted previously the 21% of those eligible and invited to the STRADL face-to-face study that is the focus of this profile seems to be ignored. There have been a very large number of publications recently that have described issues related to this -including when exploring genetic associations (and particularly in relation to gene-environment interactions) and a clear discussion of that in relation to this study is necessary.

2.
Availability of data for the aims Related to both points above it is unclear from what is written how much missing data will impact on the research that this study aims to particularly address. For example, of the 1188 recruited 544 (~47%) will have prospectively collected birth / early life data (i.e. those in the Walker and ACONF) birth cohorts) but the remaining 53% will not. And whilst Table 2 & 3 provide some reassurance they do not cover the early life data and even with small amounts of missing data (as low as 2-3%) for single variables once one tries to combine these the numbers with complete data for all models/analyses can be considerably smaller than 100%. Related to this could the authors clarify where data in Tables 2 & 3 come from. Are they from one particular assessment (of the three that the 1188 are likely to have participated in) or from any -e.g. could weight and height be available on 99.9% because they are coded 'available' if they had at least one measure from any of the three assessments). Again this information is essential to understanding the potential for repeat measures and prospective analyses that the study aims to address.

3.
Relevance of data for contemporary populations I appreciate that all of our studies are somewhat restricted in terms of their target populations, but as this study (based on ages of different groups) were born in the 1950s and 60s predominantly some discussion of how the data collected for early life exposures will have been influenced by norms and cultures at the time, including how people now in their 50s and 60s might remember or be willing to retrospectively report what would now-adays be considered 'adversity' but may not have been at the time. How will possible residual confounding be dealt with for analyses of the early life exposures when these may not have been collected. For example, smoking during pregnancy was not recorded in ACONF)

4.
As well as discussing these limitations I think one actual analysis for this paper relevant to the aims of STRADL would be valuable to give the authors and readers a sense of what is truly possible. For example, one could try to explore whether there is an interaction between childhood adversity and adult alcohol consumption and one of the facel-to-face MRI or cognitive function continuously measured outcomes. That would provide some evidence of power in relation to precision of any effect estimates an idea of missing data when trying to ensure all potential confounders are accounted for and would mean having to discuss any results in the context of possible selection bias.

Minor comments
The following sentences in the abstract "Some of the questionnaire and cognitive data were first assessed at the GS:SFHS baseline assessment between 2006-2011, thus providing longitudinal measures of depression and resilience. Similarly, routine NHSdata and early-○ life variables are linked to STRADL data, further providing opportunities for longitudinal analysis" (my highlight) read a little oddly to me. As the date for the STRADL study describe here is not prevented it is not clear how / why these provide longitudinal measures. Also it feels like depression and resilience are being used as interchangeable with cognitive function to some extent in the first of these two sentences.
In the introduction "Thus, it is important for treatment to shift from the current "trial and error" approach, towards personalised and preventative forms of treatment for individuals with markedly different disease mechanisms." As we cannot make inference to individuals or personalise (i.e. to an individual) treatments from cohort or RCT studies I would change this to the terms that are more widely accepted -'stratified medicine' and 'precision prevention' ○ It is worth clarifying (for a wide international audience) that Aberdeen is in Grampian (as you do for Dundee and Tayside).
○ Under the summary of biological samples, you mention repeat genetic analyses. Might be worth changing to repeat DNA Methylation as genetic variants will not change over time.
Also I would clarify in that section when you refer to 'epigenomic status' you mean DNA methylation and transcription (you refer to RNA later). Also, worth clarifying that DNA methylation will be in white blood cells only (I think that is correct -if not then state witch other tissues). From the text I understood that DNA would be extracted from samples again a the STRADLE face-to-face but there is not tick in Table 1 to confirm that?

○
In Table 1 it appears as if on all participants DNA from blood and saliva are available whereas that is not the case. I would suggest where you first mention extracting DNA from saliva when blood cells are not available you give the N & % in parentheses where that had to be done & in Table 1 just have one line 'Extracted DNA' and a footnote stating that for 97.5% that was from white blood cells with the remaining 2.5% being from saliva ○ Is the work clearly and accurately presented and does it cite the current literature? Partly

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Partly