Healthcare Workers Bioresource: Study outline and baseline characteristics of a prospective healthcare worker cohort to study immune protection and pathogenesis in COVID-19

Background: Most biomedical research has focused on sampling COVID-19 patients presenting to hospital with advanced disease, with less focus on the asymptomatic or paucisymptomatic. We established a bioresource with serial sampling of health care workers (HCWs) designed to obtain samples before and during mainly mild disease, with follow-up sampling to evaluate the quality and duration of immune memory. Methods: We conducted a prospective study on HCWs from three hospital sites in London, initially at a single centre (recruited just prior to first peak community transmission in London), but then extended to multiple sites 3 weeks later (recruitment still ongoing, target n=1,000). Asymptomatic participants attending work complete a health questionnaire, and provide a nasal swab (for SARS-CoV-2 RNA by RT-PCR tests) and blood samples (mononuclear cells, serum, plasma, RNA and DNA are biobanked) at 16 weekly study visits, and at 6 and 12 months. Results: Preliminary baseline results for the first 731 HCWs (400 single-centre, 331 multicentre extension) are presented. Mean age was 38±11 years; 67% are female, 31% nurses, 20% doctors, and 19% work in intensive care units. COVID-19-associated risk factors were: 37% black, Asian or minority ethnicities; 18% smokers; 13% obesity; 11% asthma; 7% hypertension and 2% diabetes mellitus. At baseline, 41% reported symptoms in the preceding 2 weeks. Preliminary test results from the initial cohort (n=400) are available: PCR at baseline for SARS-CoV-2 was positive in 28 of 396 (7.1%, 95% CI 4.9-10.0%) and 15 of 385 (3.9%, 2.4-6.3%) had circulating IgG antibodies. Conclusions: This COVID-19 bioresource established just before the peak of infections in the UK will provide longitudinal assessments of incident infection and immune responses in HCWs through the natural time course of disease and convalescence. The samples and data from this bioresource are available to academic collaborators by application https://covid-consortium.com/application-for-samples/.


Introduction
The global pandemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to more than 6 million infections and 300,000 deaths worldwide at the time of writing 1 . Healthcare workers (HCW) may be at greater infection risk compared to the general population 2-5 . Many infections are asymptomatic 6 , therefore surveillance of symptomatic coronavirus disease 2019 (COVID- 19) underestimates the infection burden. This has led to calls for regular surveillance of asymptomatic HCWs 7-10 , to ensure that health care facilities do not become transmission hot-spots, to protect the workforce and vulnerable patients and to prevent community reseeding.
Most SARS-CoV-2 studies have focused on severe hospitalized COVID-19 cases [11][12][13][14] . Data are lacking on the host response and biology of asymptomatic or pauci-symptomatic infection as well as the early (pre hospitalisation) stages of disease. This undermines efforts to understand the determinants of disease severity.
We sought to provide a resource to address these gaps by establishing a cohort of HCWs who are well and attending work across selected central London hospitals. We aimed to characterize and quantify the rates of HCW infection (particularly mild or asymptomatic) over the first Amendments from Version 1 -A more detailed description of the healthcare workers (HCWs) roles is now provided in the results section. The characteristics here described are likely generalisable to HCWs in most institutions -participants recruited across all staff groups with broad baseline demographics. They are not, however, generalisable to the wider population given that all participants are of working age and in good general health. More granular data on exposure will be part of the study investigation and presented in subsequent publications -We acknowledge a selection bias could be present as HCWs who felt at higher risk of exposure could have been more motivated to participate in the study than the rest of the hospital staff. However, it is worth highlighting that participants had to sign consent forms recognising that they would not receive results in real-time (particularly when access to tests was limited early in the pandemic), and therefore it should have minimised bias in recruitment. This has now been added to the limitations section.
-This is a cohort study. Thus we have removed the word "observational" as this was deemed redundant.
-We clarified the definition of ethnicity used in this study (see legend of Table 1).
-Typos in the last paragraph have been corrected.
-We avoided repetition of the link to the Covid Consortium application throughout the manuscript.
-A direct link to the UK COVID-19 surveillance reports was now added (reference 25).

REVISED
London COVID-19 pandemic wave, with moderate frequency longitudinal comprehensive sampling before infection and in the weeks to months afterwards. Accordingly, we established the COVID-consortium (https://covid-consortium.com) and the "COVID-19 Immune Protection and Pathogenesis in Healthcare Worker Bioresource" (NCT04318314). In this manuscript we: (1) provide a description of the study design, (2) present preliminary results of the baseline visit in the first 400 HCWs (single-centre, between March 23 rd and 31st 2020) and subsequent 331 (multicentre, from mid-April 2020) -focusing on two different time-points in the epidemiologic curve (just before and after the peak of new daily cases in London), and (3) call for research collaborators wishing to access biological samples in participants across the spectrum of COVID-19 to contribute to a range of prespecified objectives, planned by the consortium https://covid-consortium.com/application-forsamples/.

Study approvals
The study was approved by a UK Research Ethics Committee (South Central -Oxford A Research Ethics Committee, reference 20/SC/0149). All participants provided written informed consent.

Study participants
Adult (>18 years) hospital HCWs who were fit and well to attend work in any role and across a range of clinical areas, were invited to participate via hospital email, posters, staff meetings, training sessions and participant information leaflets (see https://covid-consortium.com). No other inclusion or exclusion criteria were considered.

Study design
The "COVID-19 Immune Protection and Pathogenesis in Healthcare Worker Bioresource" (NCT04318314) uses a prospective cohort design ( Figure 1). The study consists of questionnaires and biological samples (blood samples, nasal swabs ± saliva) performed at all visits: baseline, weekly follow-ups for 15 weeks, and visits at 6 and 12 months.
Recruitment was initially at St Bartholomew's Hospital, London, UK (400 HCWs recruited between 23 rd and 31 st March 2020, just before the peak of new daily cases in London, which happened on the 2 nd April, with 1,022 new cases confirmed 15 ), a secondary care hospital part of Barts and the Royal London NHS Trust to a local population of 3 million, with specialist cancer and cardiovascular services to a supra-regional population of 6 million. In response to the pandemic, the hospital expanded ventilated intensive care provision for COVID-19 patients to 122 beds across five units.
To improve statistical power for downstream analyses, we expanded the target sample size to n=1,000 and extended recruitment on 17 th April 2020 (after peak transmission in London, recruitment still ongoing) to other local sites: Nightingale Hospital London (a temporary hospital providing intensive care, set up in response to  and Royal Free NHS Hospital Trust (large teaching hospital with specialist expertise in infectious diseases). Collaborations with Cape Town (South Africa) and Sydney (Australia) are also in place to explore the impact of different surge rates, ethnicity, vitamin D levels and the 6-month seasonal difference; unlike UK sites, follow-ups there are performed every fortnight. Our team was comprised of researchers and volunteers from outside of the clinical supply chains.

Baseline visit
Participants complete a baseline questionnaire (Table 1) including standard variables related to demographics and exposures. These included occupation, household details, smoking status, physical activity, anthropometry, medical history (including vaccination history, current medication and dietary supplements), occupational exposure (including specific clinical areas and access to/use of personal protective equipment [PPE]), travel history, previous COVID-19 symptoms, proven contact with SARS-CoV-2 infected individuals, and any prior testing for SARS-CoV-2 infection.

Follow-up
Following recruitment (baseline visit), if fit and well to attend work, participants would undertake in-person weekly questionnaires using research electronic data infrastructure (REDCap v8.5.22) 16 to capture occupational metadata, new SARS-CoV-2 exposure, symptoms and test results, and biosample collection (blood sampling and nasopharyngeal swabs ± saliva). Following multi-site expansion, information on exercise, pregnancies/ contraception, vitamin supplements, working hours and psychological wellbeing (General Health Questionnaire-12 and fatigue questions from the Burnout Assessment Tool) 17,18 were added. The questionnaires used are summarized in Table 2.
Subjects who miss an attendance due to shift pattern, redeployment or self-isolation for any reason, resume follow-up on return to work. Illness with suspected COVID-19 is self-reported to the study investigators. Following multisite expansion, participants were also allowed opt-in to a home nasopharyngeal swab and saliva test if self-isolating.

Sample collection
The schedule and quantity of biosample collection is summarized in Figure 2. All study personnel in contact with HCW participants were wearing appropriate PPE in accordance with Public Health England guidance. Nasopharyngeal RNA stabilising swabs are performed at baseline and weekly for 16 weeks. After appropriate training, participants were asked to self-swab both nostrils to minimise the risk to study staff. This strategy was later shown to be reliable when compared to swab collection by health care workers 19 . Blood samples were collected in Tempus TM tubes for whole blood RNA, clot activator tubes for serum, and EDTA tubes for plasma, peripheral blood mononuclear cells and DNA ( Figure 2). Following multisite expansion in mid-April, a pool (2-3 mL) of saliva was collected into a dedicated saliva collection tube.
Initial sample processing All samples were registered into a Laboratory Inventory Management system onsite and either frozen at -80°C or transferred to a containment level 3 facility. Key samples collected and planned laboratory procedures are described in Table 3.

Core analyses
The following experimental approaches will be implemented (    Psychological factors General health questionnaire-12; emotional and physical fatigue   Plasma: centrifuged and stored at -80ºC PBMCs: separated by density gradient centrifugation and cryopreserved Immunology * In the first 400 healthcare workers cohort, saliva samples were taken at the first opportunity after week 5; the participants that followed had a saliva sample taken at baseline.
-Blood RNA extraction focusing on host transcriptomics; -Peripheral blood mononuclear cells (PBMCs) are a scarce resource and discussions are ongoing about maximising yield; -Saliva will be diluted and aliquots are available. Further aliquoting will be dependent on demand; -Other antibody, antigen tests may also be made available should they emerge; -Serum and plasma will be aliquoted into 100µL samples and divided into packs for individual research teams. Excess RNA (swab and blood) and host DNA will potentially also be available.

Access procedure
The COVID-19 consortium has developed access systems to facilitate the use of this bioresource by scientists for healthrelated research of public interest. Research teams can apply to use the bioresource via the study website (https://covid-consortium.com/application-for-samples/). The access principles are those standard to many bioresources: to maximise yield of timely science, to make results available to other researchers in a reasonable timeframe via a data lake, to reward researchers with appropriate levels of authorship and, where present, intellectual property in a fair, transparent and swift way. We encourage teams to apply and to link their analysis datasets of hospitalised patients with severe disease. We also encourage applications from commercial entities as long as the core principles above apply.

Statistical analysis
When designing the initial study, we aimed to sample the population prior to exposure. At the time of ethics submission, there was no data to provide precise estimates. The n=400 was pragmatic, aiming for rapid recruitment and limited by logistical challenges of conducting research within a pandemic environment. An initial n=400 was estimated conservatively in order to ensure sampling without compromising selection criteria. Following initial recruitment success, more formal sample size calculation was possible for study expansion and based on an expected average baseline frequency of SARS-CoV-2 infection of 5% in previously undiagnosed HCWs according to studies 5 . Accordingly, the estimated sample size was n=786 for a β=0.20 and two-sided α=0.05. We targeted a sample size of 1,000 to account for a 20% drop-out rate. However, the specific responses we are seeking are emergent and unknown, and a wider strategy is to link with other studies.
This is a preliminary analysis of the key baseline characteristics of the data. We present discrete variables as absolute frequencies with percentages; continuous normally distributed variables as mean ± standard deviation. Continuous data were checked for normal distribution using Kolmogorov-Smirnov test and visual Q-Q plots assessment. Comparisons between groups were performed using Students' t-test, while categorical variables were compared using Fisher's exact test. Two-sided p-values <0.05 were considered significant. Statistical analysis was performed using SPSS (version 24.0, IBM Corp., Armonk, NY, USA).

Results
Baseline characteristics for the first 400 HCWs (single-centre, recruited just before peak transmission, St Bartholomew's Hospital) and subsequent 331 multicentre study expansion participants (after peak transmission, n=101 in St Bartholomew's Hospital, n=10 in Nightingale Hospital and n=220 in Royal Free Hospital) are presented in Table 4-Table 6. This reflects all baseline visits between March and May 2020 (total n=731).

Demographics
The mean age of all study participants was 38 ± 11 years (0.7% >65 years), 67% female, 37% were black, Asian or minority ethnicities. Demographics are further detailed in Table 4.
Community/social exposure The proportion of HCWs with a household size of at least three people was 48% (n=348), with a third of the participants reported having children at home (Table 6). Only eight participants (1%) had a proven contact with a confirmed COVID-19 case at home (Table 6). Overall, 41% percent (n=299) of HCWs reported having travelled overseas in 2020.

Symptoms, infection and serology
The prevalence of COVID-related symptoms in the two weeks prior to recruitment was 34% (n=249/731), significantly higher in the early cohort recruited in March (41% vs late cohort 26%, p<0.001). More HCWs from the multisite cohort recruited at the later time point thought that they had had prior COVID-19 (24 vs 7% in the initial cohort, p<0.001). Overall, the most prevalent symptom was nasal congestion (13%), followed by odynophagia (11%), dry cough (8%) and fatigue (8%). A recent (<3 months) respiratory tract infection was reported in 20% of the participants.

Discussion
This study is establishing a bioresource (COVID-consortium) derived from health care workers, with samples taken at the time of pre-symptomatic incident infection, linked to data on clinical outcomes, serology and follow-up sampling to evaluate the quality and duration of immune memory to the virus. Here we present preliminary baseline data on the first 731 participants, comprised of a single-centre cohort recruited in March 2020 just before the time of peak community transmission in London, and a subsequent expanded multicentre cohort recruited from mid-April 2020. This resource should enable collaborative science and approved investigators can apply for sample access or access to the resultant data lake to address specific questions or for incorporation into larger COVID-19 datasets.
HCWs baseline characteristics SARS-CoV-2 can rapidly spread to patients and HCWs in hospitals, and HCWs generally have been particularly hard hit with high reported rates of infection 2,3,5,24 . Our cohort is representative of a multi-ethnic urban UK population of working age, and more specifically of the NHS workforce across different clinical roles and departments. Confirmed COVID-19 contacts were low in the community (1%), but much higher in-hospital (43% patients, 30% colleagues), particularly in the second cohort (recruited later). All participants were Table 6. Baseline exposure to SARS-CoV-2 and symptoms. self-reported as fit to attend work on all clinical visits, and at baseline the majority of participants had been asymptomatic and did not think that they had been infected. Nevertheless, 1 out of 10 participants had a confirmed baseline SARS-CoV-2 infection confirmed by PCR and/or positive serology test that could represent current or previous infection, at the beginning of peak transmission in March.

Variable
Of interest, two different timepoints are presented here. As one would expect, the proportion of HCWs who reported prior symptoms was significantly higher in participants recruited just before peak community transmission 25 , and those recruited a month later more often reported they suspected that they had already had COVID-19.

The COVID-19 bioresource
The scientific community has merged forces to tackle this unprecedented pandemic. Since the start of January 2020 (until 31 st May 2020), 160 research projects on COVID-19 received a favourable opinion by the UK NHS Health Research Authority (last updated list on 3 rd June 2020), the majority focused on confirmed COVID-19 patients 6 . Emergent studies are now targeting mild and population disease, but almost all missed peak transmission. Larger-scale community surveillance studies typically also do not have temporal granularity to detect early disease changes and may be more focused on providing data to improve modelling rather than host:pathogen biology. Studying HCWs is a middle ground -subjects are deemed fit to work, but have higher exposure rates to confirmed COVID-19 cases, and can also be frequently assessed. COVID-19 bioresources in the general population and HCWs have already been established. Studies such as the COVID-19 Emergency Response Assessment (CERA) 26 and the Rapid European SARS-COV-2 Emergency research Response (RECOVER) 27 use qualitative instruments to assess the physical and psychological well-being of frontline doctors at different phases of the pandemic. The SARS-CoV-2 Acquisition in Frontline Health Care Workers -Evaluation to Inform Response (SAFER) 28 study will perform qualitative interviews and collect nose and throat swabs twice weekly and serum samples monthly from healthcare staff. Preliminary results from the SAFER study revealed a higher PCR positive rate of 21%  29 .
The COVID-19 Staff Testing of Antibody Responses Study (CO-STARS) follows a similar design, with serologies performed monthly for 6 months and then 6-monthly for a total of 6 years 30 .
The comprehensive (questionnaires and biosamples) serial assessment of asymptomatic participants starting just before peak community transmission of SARS-CoV-2 makes our bioresource a precious dataset for the scientific community. We expect that the data sampled from HCWs facilitates understanding of mild disease and subclinical infection at a more rapid rate than the general population allowing comparison with those more severely affected or hospitalised for COVID-19.
The COVID-19 consortium (https://covid-consortium.com) and the "COVID-19 Immune Protection and Pathogenesis in Healthcare Worker Bioresource" (NCT04318314) thus encourages research teams to apply and even potentially link their own datasets to ours (with results expected to be returned to the data lake for collaborative science). Some of the fields worth exploring include immune responses during the subclinical phases of infection, properties of the immunoglobulins and immune cellular reactivity (correlations between viral RNA PCR and subsequent serology, persistence of neutralizing antibodies, immune decay and longevity of serological responses), host and viral genetic variation, and other environmental or acquired risk factors.

Limitations
The three centres initially included reflect the epidemiological curve of a single city (London). The COVID-19 bioresource started at peak community transmission with prevalent asymptomatic infection in 7.1% and seropositivity of 3.8% at baseline. In data from the subsequent four weeks, we have already reported that the incident asymptomatic infections fell in line with reductions in the London wide incidence 31 . Nationwide data are accruing to assess the generalisability of our findings, but there are also opportunities to expand geographical coverage of our bioresource through collaborations with other studies that include serial sampling of HCWs. Although our cohort is ethnically diverse (37% non-white), the frequency of comorbidities is relatively low, there are no children and elderly subjects are under-represented. In addition, our cohort of hospital HCWs is unlikely to be generalisable to other institutional settings such as care homes, or to the wider population given that all participants are of working age and in good general health.
A selection bias could also be present as HCWs who felt at higher risk of exposure could have been more motivated to participate in the study than the rest of the hospital staff. However, it is worth highlighting that participants had to sign consent forms recognising that they would not receive results in real-time (particularly when access to tests was limited early in the pandemic), and therefore it should have minimised bias in recruitment.

Conclusions
Just before the peak of COVID-19 infections in the UK we established a rich and granular bioresource of healthcare workers with the aim of gathering insights into early disease / asymptomatic SARS-CoV-2 infection. Combining exposure with multi-qualitative and quantitative assessments, we envision a more complete picture of immune response in this context. The samples and data securely curated this bioresource are now accessible to the wider scientific community by application.

Data availability
The COVID-19 consortium has developed access systems to facilitate the use of this bioresource and the data underlying this article by scientists for health-related research of public interest. However, although participants are pseudoanonymsed, there is data regarding home addresses, household characteristics, and other details that could potentially lead to identification. Research teams can therefore apply to use the bioresource via the application form that can be found on the study website (https://covid-consortium.com/application-for-samples/). This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Berta Grau-Pujol
Barcelona Institute for Global Health (ISGlobal), Hospital Clínic -Universitat de Barcelona, Barcelona, Spain The study is clearly described, as well as the changes occurred in study procedures. The emerging COVID-19 pandemic and availability of tests and materials during this period justifies these changes. Although it provides baseline information, it is worth sharing it, and thus indexing it.
Few comments on Methodology: It is not described which statistical power they had.
○ Study design and procedures in Cape Town and Sydney should be further described.

○
In core analysis, the authors use future tense, not sure that is correct.

○
Further statistical analysis could be conducted, such us regression analysis. Participants' site could be also considered.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility?

Are the conclusions drawn adequately supported by the results? Yes
Reviewer Expertise: Infectious diseases, microbiology, epidemiology.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.