Extended Cohort for E-health, Environment and DNA (EXCEED) COVID-19 focus [version 1; peer review: awaiting peer review]

Background: New data collection in established longitudinal population studies provides an opportunity for studying the risk factors and sequelae of the novel coronavirus disease 2019 (COVID19), plus the indirect impacts of the COVID-19 pandemic on wellbeing. The Extended Cohort for E-health, Environment and DNA (EXCEED) cohort is a population-based cohort (N>11,000), recruited from 2013 in Leicester, Leicestershire and Rutland. EXCEED includes consent for electronic healthcare record (EHR) linkage, spirometry, genomic data, and questionnaire data. Methods: Between May 2020 and July 2021, a new questionnaire was deployed in EXCEED, which captured COVID-19 symptoms, general physical and mental health, plus socioeconomic and environmental factors during the pandemic. An online system was developed to invite new participants to join EXCEED, with informed consent being provided online. New and existing participants then completed the COVID-19 questionnaire online. A subset of the new questionnaire respondents were invited to participate in COVID-19 serology substudies, using home antibody testing kits. Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 10 Wellcome Open Research 2021, 6:349 Last updated: 11 JAN 2022


Introduction
In the United Kingdom (UK), the coronavirus disease 2019 (COVID-19) pandemic has highlighted stark pre-existing inequities: infection rates and mortality from COVID-19 are highest in those living in overcrowded housing and areas of deprivation, those working in high-risk occupations (such as key-worker roles,), and in those who have comorbidities 1 . Individuals from minority ethnic groups are overrepresented amongst these risk groups, which reflects structural discrimination 1 . There have been far-reaching impacts on people's livelihoods, with nearly 1 million people projected to be unemployed by the end of 2021 1 .
A subset of those infected with COVID-19 experience persistent symptoms (or develop new clinical sequelae) lasting for at least 12 weeks after acute infection, not explained by an alternative diagnosis ('post-COVID syndrome/long-COVID') 2 . Followup of people who have had COVID-19 identified from population samples may help understand: i) who is most at risk of post-COVID syndrome/long-COVID; ii) post-COVID syndrome/ long-COVID symptomatology, and iii) what might be the most effective management strategies.
Over 2.2 million people in the UK are members of a cohort study. Enriching pre-existing longitudinal study infrastructure in the UK by combining surveys on the impact of COVID-19, alongside electronic record linkage, plus new recruitment initiatives may therefore be a timely and effective way of collecting relevant data, whilst making use of existing administrative, logistic and governance structures.
We undertook a new wave of recruitment using electronic consent in the Extended Cohort for E-health, Environment and DNA (EXCEED) 3 , a longitudinal population-based cohort in Leicester, Leicestershire, and Rutland. The availability of linked healthcare record data, baseline spirometry, detailed information on occupation, smoking and lifestyle, and DNA alongside broad informed consent make EXCEED an ideal cohort to study the risk factors for COVID-19 infection and severity, as well as the broader impact of the pandemic on physical, mental and economic wellbeing. This Data Note describes: i) data collected in the EXCEED COVID-19 questionnaire, deployed between 28 th May 2020 and 6 th July 2021, and ii) data collected in a subset of COVID-19 questionnaire respondents who agreed to be re-contacted to determine COVID-19 antibody status, collected between 24 th March 2021 and 22 nd June 2021. A descriptive summary of respondents is provided, along with monthly incidence proportions of reported or suspected COVID-19, association of COVID-19 with multimorbidity derived from linked electronic health records (EHRs), and summary data of the antibody substudies.

Recruitment
Details of the EXCEED study have been published in the cohort profile paper 3 . The first wave of EXCEED recruitment (November 2013-December 2018) recruited 10,156 participants aged 30-69 years living in Leicester, Leicestershire, and Rutland. Approximately 95% were recruited via general practice, with other participants recruited from smoking cessation clinics (4.4%), or community-based recruitment, focused on Leicester's South Asian communities (1.2%). All participants undertook a baseline questionnaire on their lifestyle and health. Around 50% have anthropometry measures and spirometry recorded by trained research professionals. Participants were invited to provide a DNA sample, and 5,214 were genotyped on the Affymetrix UK Biobank array by the Wellcome Trust Centre for Human Genetics (WTCHG), Oxford.
Informed consent for the first wave of recruitment into the EXCEED study was obtained as described in the cohort profile paper 3 .
In 2020, eligibility for EXCEED was extended to any adult aged 18 or over, living in the East or West Midlands of the UK. After assessing eligibility, participants were directed to the participant information sheet (also publicly available on the EXCEED website), and encouraged to contact the study team with any questions before agreeing to participate in the study.
To improve access to the study, the participation information sheet was also translated, and then back-translated for verification by the Centre for Ethnic Health Research at the University of Leicester, and is currently available in Bengali, Gujarati, Punjabi, and Urdu.
Since 2020, recruitment into the study has been via an online, electronic consent process. This was developed in Python, using the Django framework, and provides seamless access to REDCap surveys through a participant profile. The system provides an easy and secure way of recording participants' consent and tailoring customised questionnaires for different substudies. Eligible participants who wished to participate were asked if they would provide their informed consent via this system. All participants were invited to complete the EXCEED baseline questionnaire after recruitment (this questionnaire was described in the cohort profile paper 3 ).
EXCEED COVID-19 questionnaire (May 2020 -July 2021) Between 28 th May 2020 and 6 th July 2021, existing and newlyrecruited EXCEED participants were invited to complete a questionnaire on the physical, psychological, environmental, social, financial and economic impacts of the COVID-19 pandemic. This questionnaire was developed by the Wellcome COVID-19 Questionnaire Steering Group, and included EXCEED Study investigators. The main body of the questionnaire consisted of six 'core' sections, all of which were included in EXCEED, alongside selected items from the bank of 'recommended' questions (Box 1). A REDCap implementation of the questionnaire was developed, based on that provided by the Wellcome COVID-19 Questionnaire secretariat (see Data availability). Additional data collected in EXCEED included: i) expressions of interest for participation in the antibody substudies, described below, and ii) occupation, automatically mapped via an auto-suggest field to Office for National Statistics (ONS) Standard Occupational Classification (SOC) 2020 codes. The REDCap implementation of the EXCEED questionnaire was tested for acceptability and ease of use by the EXCEED Patient and Public Involvement Group.

EXCEED COVID-19 antibody substudies
There were two distinct COVID-19 home antibody testing substudies in EXCEED, which utilised test kits from Fortress and Roche (see below for details of test kits). Initially, 2000 Fortress kits were acquired, and participants who indicated in the COVID-19 questionnaire (completed between May 2020 -July 2021) that they would be interested in participating in antibody substudies were asked to review a Fortress Participant Information Sheet and instructional video. After reviewing these materials, participants were required to complete a home antibody testing consent questionnaire, in which they provided their consent to participate in the Fortress substudy, and in testing via other kits, should they become available in the future. Subsequently, a further test kit (Roche) was made available to EXCEED, and as a result all consenting participants were provided with an additional Roche-focussed Participant Information Sheet and corresponding instructional video.
A total of 2,849 participants agreed to participate in EXCEED's COVID-19 antibody substudies, and the first 2,000 consenting participants were sent a Fortress home-testing antibody kit (COVID-19 IgG/IgM Rapid Test Cassette -COVID010) in March 2021. The Fortress test kit is a rapid method for detection of COVID-19 antibodies in 15 minutes, and detects IgM (which will decrease over time and tend to be undetectable by the assay after 6-7 weeks of infection) and IgG antibodies (which persist for a longer period of time) 6 . Seropositivity is indicated by a red line in the corresponding zone of the test kit. Participants were asked to self-report their test results with the following options: (1) negative (red line next to C, no line next to G or M); (2) IgM positive, IgG negative (red line next to C and M, no line next to G); (3) IgG positive, IgM negative (red line next to C and G, no line next to M); (4) IgG and IgM positive (red line next to C, G, and M); (5) invalid (line next to C is in blue); or (6) can't tell or not sure. Participants were also asked to upload a photo of their result.
As part of the National Core Studies Serology substudy, EXCEED was given access to additional Roche test kits, which were sent to all 2,849 consented EXCEED participants from April to May 2021. The Roche kit tested for total antibodies against the nucleocapsid protein, which measures response to natural COVID-19 infection, and the spike protein, which measures response to both natural infection and vaccine response 7 . The participants completed the Roche test at home, then posted the sample to Thriva, and the test results were returned to the research team prior to being shared with participants via their personal EXCEED study profile. Participant queries about their antibody test results were addressed by the research team, and responses to common queries were made available on the EXCEED website.
In addition to the testing kits, participants who were sent a Fortress kit were sent an additional brief questionnaire that collected data on COVID-19 symptoms, vaccination status, and previous COVID-19 testing results. COVID-19 infection was defined as either: i) self-reported only (a response of, "Yes, own suspicions", "Yes, doctor's suspicion", or "Yes, diagnosed by positive test" to the question, "Do you think that you have or have had COVID-19?"), or ii) self-reported or symptom-predicted with the above equation, which is a superset of i).

Ethical approval
We examined the association between COVID-19 and the number of chronic diseases identified from a participant's primary care EHR (last date of linkage, 10/11/2018). In total, 16 chronic diseases were analysed, including asthma, cancer, multiple cardiovascular outcomes, chronic kidney disease, chronic obstructive pulmonary disease, diabetes, epilepsy, mental health conditions, osteoporosis, and rheumatoid arthritis (see 3 for details). Number of comorbidities were categorised as 0, 1, 2, or 3 or more. Associations with COVID-19 for each category were calculated using logistic regression and adjusted for age, sex, and area deprivation index, and the adjusted association between COVID-19 and number of diseases was examined using logistic regression by setting the number of diseases as a continuous variable. R 4.1 was used for the statistical analysis.

Results
Out of the 10,102 EXCEED participants recruited before 28 May 2020, 9,227 consented to share their data though UK Longitudinal Linkage Collaboration (UKLLC) and the current data analysis is based on these individuals. Of these, 2,943 (31.9%) completed the EXCEED COVID-19 questionnaire between May 2020 -July 2021, and linked primary care electronic health records (EHRs) were available for 2,786 out of 2,943 participants (94.7%). Demographic differences between the respondents and non-respondents are shown in Table 1 Figure 1 shows the incidence of the predicted COVID-19 positive cases across time.
During the period between March 2020 to May 2021, the  * Due to small sample size, these groups were combined for confidentiality. SD=standard deviation.
UK incidence proportion and area (Leicester, Leicestershire, and Rutland) incidence proportions were 6.7% and 7.4%, respectively. Table 2 shows the cross-tabulation of COVID-19 status by self-report and symptom prediction. A strong association was observed between the two measurements, with participants reporting COVID-19 from a doctor's suspicions being the most likely to be predicted as having COVID-19 by the symptom prediction algorithm. Table 3 shows the association between the number of EHR-linked chronic diseases (until 10/11/2018) and COVID-19 (self-reported between May 2020 and July 2021). There was no clear evidence of a linear association between COVID-19 outcomes and increasing numbers of comorbidities, in either the crude model, or the model adjusted for age, sex and area-level deprivation.
The descriptive results of the antibody levels collected from Fortress (N=1,875) and Roche (N=2,144) kits between March and July 2021 are summarised in Table 4. A total of 1,482 individuals undertook both the Fortress and Roche testing kits. Results from the Fortress kit showed that more than half of the participants had no COVID-19 IgG or IgM antibody. Around 86% of those undertaking the Roche kit had evidence of spike antibodies without nucleocapsid antibodies, suggesting that the COVID-19 antibody response of these individuals was likely induced by vaccination rather than natural infection.

Strengths and limitations of the data
This data note describes new recruitment and data collection in the EXCEED study between May 2020 and July 2021.
The updated resource provides a valuable collection of data on COVID-19 infection status, by self-report of infection and/ or symptoms, plus data on a subset of participants from two serology substudies, to assess evidence of past infection and/or vaccination. The deployed questionnaire also assessed the impact of the pandemic on physical and mental wellbeing, and socioeconomic factors.
A major strength of the new questionnaire is that the instrument was designed with other cohorts as part of the Wellcome Longitudinal Population Studies Steering Committee, meaning that content is harmonised across cohorts. Similarly, one of the serology substudies was undertaken as part of a cross-cohort initiative. All new data are in addition to the rich data already collected in EXCEED, including baseline questionnaire on health and lifestyle, spirometry, anthropometry, genotype data (EXCEED contributes to the COVID-19 host genetics initiative), and linkage to electronic healthcare records (EHRs), with consent to follow-up for 25 years. EXCEED is also part of the UK Longitudinal Linkage Collaboration: data from collaborating studies may be securely analysed by approved researchers with information from other cohort studies, and linked to whole population health and social records, within a Trusted Research Environment. Details on managed access to EXCEED data are given in the Data availability section.
New participants recruited since 2020 were younger on average, and had better representation from minority ethnic groups, the latter being an explicit aim of the study, with targeted communications and translations to facilitate recruitment from minority ethnic groups. New recruitment was online and volunteerdriven, and as different methods of recruitment become possible with different stages of the pandemic, strategies for improved recruitment of specific minority ethnic groups will need to be evaluated, including for males in minority ethnic groups who were under-represented in our study. The majority of the cohort is still of White ethnicity, and we plan to improve the representation of individuals from other ethnic groups; this is crucial given the disproportionate effects of the pandemic on minority groups 1 .
As noted in other longitudinal studies 10 , we observed incomplete response and differential loss-to-follow-up, which may limit generalisability due to selection bias. Around 32% of existing EXCEED participants responded to the COVID-19 questionnaire between May 2020 and July 2021, along with 751 new participants. Those completing the questionnaire (existing and new participants) were also more likely to be from a less deprived background, which is a pattern of attrition observed in other studies 11 . Respondents may also be different to non-respondents in other unmeasured domains, e.g. technology literacy, given that completion was online.
The data were collected over an extended period, over which pandemic restrictions were changing, and thus considering time of completion of questionnaire in analyses may be useful. Moreover, participants who completed the questionnaire most recently were asked to recall symptoms from the previous 18 months, which will introduce some recall bias, however, linkage to more objective outcomes (e.g. serology) should help correctly classify participants by COVID-19 status. Moreover, the COVID-19 symptom prediction was developed early in the pandemic, and may not remain appropriate as knowledge of symptom clusters produced by different variants emerges.

Data availability
Data are available via a system of managed access. To apply for access to the data included in this Data Note and other EXCEED data, applicants need to request the EXCEED Data Access Proposal Form via email (exceed@leicester.ac.uk). We make available data and samples from the study, labelled only with unique codes (no names, addresses, NHS numbers or identifiable data), to researchers approved by the EXCEED Data Access Committee, which is overseen by the EXCEED Independent Scientific Advisory Board. Data access proposals will be reviewed by the EXCEED Data Access Committee and applicants will receive a response within 30 days to advise whether the application has been approved. The EXCEED study encourages requests for collaboration from academic researchers (researchers or employees of an academic institution or the NHS). We are also open to requests for collaboration from commercial organisations, such as companies developing new drug treatments. Potential collaborators must share our scientific goals and be proposing work that fits within the existing study consent.
Please note that costs may apply for use of the resource, for instance if bespoke datasets are required. The Data Access Committee will advise of estimated costs after reviewing proposals.
Please note that text data and any other data deemed potentially disclosive will not be released until they have been coded appropriately.
Several studies, including EXCEED, have deposited de-identified COVID-19 data in a national secure research database, known as the UK Longitudinal Linkage Collaboration (UK LLC).
The UKLLC database will enable approved researchers to investigate high-priority COVID-19 research questions and investigate health and wellbeing throughout and beyond the COVID-19 pandemic. Researchers wishing to access this resource can enquire via: www.ukllc.ac.uk.