Data note for linking the Avon Longitudinal Study of Parents and Children (ALSPAC) with the Public Health England (PHE) COVID-19 dataset [version 1; peer review: awaiting peer review]

This data note describes the test results for infection by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2 or COVID-19) of the index participants of the Avon Longitudinal Study of Parents And Children birth cohort study (ALSPAC, also known as the Children of the 90s). The records were collated, processed, record-linked and then extracted by Public Health England (PHE) to ALSPAC. ALSPAC provided PHE with the NHS numbers of 12,774 of the index cohort ‘children’, who were aged between 27 and 30 years during this period, and 1,033 of the index children’s ‘parents’ (only data on a small subset of the mothers was permissible for this dataset during this period). PHE conducted the linkage using deterministic methods and returned periodic data extracts of all the COVID-19 test results they could match to those participants. ALSPAC obtained both the positive and negative COVID-19 test results from PHE from the time when testing was first available in February 2020 until late August 2021, just before PHE was dissolved. ALSPAC is uniquely placed to provide a longitudinal dataset of the health, education and other factors on the cohort participants before and during the COVID-19 pandemic. This provides the opportunity to place context to the COVID-19 test data provided by PHE (the timings, patterns, and results). The result is to provide a resource for current and future research into the COVID-19 pandemic.


Introduction
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a multigenerational birth cohort study established with the stated aim of compiling information on participants' health and social exposures and subsequent outcomes across the course of their life. Within this there is a need to characterise disease exposures, particularly relevant during the coronavirus disease 2019  pandemic. Information about the cohort participants COVID-19 experiences was collected prospectively, at key timepoints, using self-reported questionnaires i and efforts were made to collect samples ii . To complement the self-report data, ALSPAC was permitted access to extracts of centrally collected data from Public Health England. This Data Note describes data provided via record linkage from the PHE pillar 1 and pillar 2 COVID-19 test records. These data were available from February 2020 until August 2021 when PHE was disbanded.
The PHE data set includes information on positive and negative COVID-19 tests. The PHE records include polymerase chain reaction (PCR) tests and some lateral flow tests (LFT). The nature of the test is only recorded for positive tests. The PHE data can be linked with other ALSPAC self-reported data, assayed biological samples, abstracted clinical notes, and used for COVID-19 and other research. The COVID-19 data collected from PHE has contributed to case selection for more detailed study into COVID-19 and the after-effects, for example long-COVID.

Materials and methods
The ALSPAC Sample The Avon Longitudinal Study of Parents and Children (ALSPAC) is a prospective, population-based study. The initial recruitment took place from September 1990 to December 1992 inclusive, and has been described in detail in previous papers iii,iv . The Avon area is a former county covering Bristol and the surrounding areas in the Southwest UK, as shown in Figure 1   together with the current county boundaries for context. Within ALSPAC the original pregnant women and their partners are referred to as Generation Zero (G0) and the index children as Generation One (G1). This provides a baseline sample of 14,901 G1 participants who were alive at 1 year of age v and many of their parents and carers also still involved in the study. Starting in 2011, the participants were sent 'fair processing' materials describing ALSPAC's intended use of their health and administrative records and were given clear means to consent or object via a written form. Data were not extracted for participants who have objected, or who were not sent fair processing materials.
Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool.
Public Health England and COVID-19 data Public Health England (PHE) was an executive agency of the Department of Health and Social Care (DHSC) in England which began operating on 1 April 2013 to protect and improve health and wellbeing and reduce health inequalities. Its formation came as a result of the reorganisation of the National Health Service (NHS) in England outlined in the Health and Social Care Act 2012. It took on the role of the Health Protection Agency, the National Treatment Agency for Substance Misuse and a number of other health bodies. It was an executive agency of the Department of Health and Social Care, and a distinct delivery organisation with operational autonomy.
The DHSC set PHE's priorities annually, and for 2019/20 this included an "integrated surveillance system" and "investigation and management of outbreaks of infectious diseases". PHE carried out contact tracing in the early stages of the COVID-19 pandemic and began providing COVID-19 epidemiology surveillance information from late April 2020, combining community, primary care, secondary care, virology and mortality surveillance data to support national and regional planning in relation to the pandemic.

The ALSPAC-PHE COVID-19 dataset
The whole ALSPAC cohort was initially run through a filter to remove any participants of the index children's generation (G1) who had either opted out of sharing any health data or for whom we did not have permission under section 251 vii to request any health data and further to remove any participating parents or carers (G0) who had not given explicit permission to use their health data. This was initially performed by ALSPAC and then PHE additionally applied the National Opt-Out to those participants supported under section 251. ALSPAC did not have any section 251 support for the parents or carers at the time that this dataset was generated. Figure 2 shows how the sub-cohort of 13,807 ALSPAC participants was constructed to be sent to PHE for data matching.

Record linkage methodology
A unique Link ID was generated, and the identifiers were sent to PHE through a secure file transfer, after being encrypted to AES-256 standard, and a 20-character password was provided through an alternative channel. Two files were sent to PHE containing the NHS numbers and dates of birth of the ALSPAC cohort for whom data linkage was allowed. The first file contained the NHS numbers of the consented participants and the second file contained details of the participants supported under section 251. This allowed PHE to apply the National Opt-Out viii only to the second (section 251) file, and to allow the full flow of data for the consented file. The record linkage was performed using the NHS numbers, with the dates of birth as used as a confirmation cross-check. The PHE data was returned to ALSPAC through the secure file transfer system EGRESS. These files were also encrypted, and password protected. They were provided as separate excel files. There were initially six files provided although these were quickly consolidated into two: • COVID Positive -initially as separate files for section 251 G1 participants, consented G1 participants and consented G0 mums.
The first dataset was provided in October 2020 and included 3,851 test results. It is important to note the effect of the National Opt-Out here. There were 2,003 test results from 1,189 of the 7,573 participants whose data provision was supported by section 251. It is known that there were 35 individuals who had at least one test record but whose data was not permitted to be shared. It is not known how many tests this represented nor how many, if any, of these tests were positive.
There were duplicate samples with test records which had the same specimen date but more than one report date. These (*) Excluded includes those who may have died or those who have moved out of area and lost contact. The terms 'Safe-guarding' and 'Care-case' were sometimes used in the early days of ALSPAC to indicate that the child was taken into care or there were safe-guarding concerns, however it may also have meant any contact may be inappropriate for any other reason and that manual review might be required. Due to this lack of clarity, they have not been given the opportunity to consent since the advent of GDPR UK, and so have been excluded here.
were consolidated into a single test record and the latest report date was retained.
A refresh and update of the data was provided in November 2020 which included 5,394 test results. There were an additional 1,563 test results in the November 2020 update however it was observed that 20 test results from the October 2020 dataset were dropped from the November 2020 dataset. PHE explained that the dataset was constructed so that once an individual tested positive to COVID-19 then all their earlier negative results were dropped from the dataset. PHE advised they could not provide any information on how many negative test results were dropped. For this reason, the final dataset was constructed by consolidating all the individual datasets. This maximised the granularity of the negative tests and the positive test data was unchanged.
PHE provided a total of 43,438 test results. There were 1,451 (3.34%) positive test results and 41,987 negative tests. It is unknown how many negative test results were dropped throughout the reporting period. The further descriptions of the ALSPAC-PHE COVID-19 dataset are regarding the final combined dataset. It is important to be aware that each time a data extract was provided the National Opt-Out was applied to the group of participants supported by section 251, although the number of individuals was only provided for the first data extract. This may explain why there are 15 positive cases missing from the September 2021 data extract when compared to the combined dataset. Figure 3, below, shows how the number of COVID-19 tests performed on the ALSPAC participants and reported by PHE changed over time. It is important to note that once a participant had a positive test then any negative tests results for that participant since the previous update sent to ALSPAC were lost.
The tests reported in Figure 3 are aggregated weekly. Figure 4, below, shows the COVID-19 cases of the ALSPAC participants over time as indicated by a positive test result provided by PHE. The graph shows the individual regions of the UK aggregated into weekly totals. The regional sub-totals are provided below in Table 2. The South-West region accounts for 84.2% of cases. The UK national cases, weekly, are shown in Figure 5, below, for reference.

East Midlands 19
East of England 10 London 96 North-East 3 North-West 19

South-East 47
South-West 1,222 West Midlands 24 Yorkshire and The Humber 11   The delay in days between the date given as when the specimen was taken and the date given as when the result was reported is shown in Table 8, below.
The proportion of the ALSPAC participants with a positive test result presenting with symptoms at the time the test was taken is shown below in Table 9.
Other variables were provided by PHE for 1,451 positive COVID-19 test results, but these were either very poorly populated, completed with a value indicating the variable status was generally unknown, or is unavailable for research due to a potential to be disclosive of ALSPAC participants.

Consent
Permissions for the use of data collected via questionnaires and clinics and record linkage was based on the recommendations of the ALSPAC Ethics and Law Committee and NHS Research Ethics Committee's at the time. Study participants have the right to withdraw their consent for elements of the study or from the study entirely at any time. Full details of the ALSPAC consent procedures are available on the study website.

Data availability
ALSPAC data access is through a system of managed open access. The steps below highlight how to apply for access to the data included in this data note and all other ALSPAC data: i. Please read the ALSPAC access policy which describes the process of accessing the data and samples in detail, and outlines the costs associated with doing so.
ii. You may also find it useful to browse our fully searchable research proposals database, which lists all research projects that have been approved since April 2011.
iii. Please submit your research proposal for consideration by the ALSPAC Executive Committee. You will receive a response within 10 working days to advise you whether your proposal has been approved.
The availability of our linked participant records is dependent on our ethical approvals and contractual arrangements with the NHS. If you are interested in using these data then please contact the ALSPAC Data Linkage Team (alspac-linkage@bristol. ac.uk).