Household serial interval of COVID-19 and the effect of Variant B.1.1.7: analyses from prospective community cohort study (Virus Watch)

Introduction: Increased transmissibility of B.1.1.7 variant of concern (VOC) in the UK may explain its rapid emergence and global spread. We analysed data from putative household infector - infectee pairs in the Virus Watch Community cohort study to assess the serial interval of COVID-19 and whether this was affected by emergence of the B.1.1.7 variant. Methods: The Virus Watch study is an online, prospective, community cohort study following up entire households in England and Wales during the COVID-19 pandemic. Putative household infector-infectee pairs were identified where more than one person in the household had a positive swab matched to an illness episode. Data on whether or not individual infections were caused by the B.1.1.7 variant were not available. We therefore developed a classification system based on the percentage of cases estimated to be due to B.1.1.7 in national surveillance data for different English regions and study weeks. Results: Out of 24,887 illnesses reported, 915 tested positive for SARS-CoV-2 and 186 likely ‘infector-infectee’ pairs in 186 households amongst 372 individuals were identified. The mean COVID-19 serial interval was 3.18 (95%CI: 2.55-3.81, sd=4.36) days. There was no significant difference (p=0.267) between the mean serial interval for VOC hotspots (mean = 3.64 days, (95%CI: 2.55 – 4.73)) days and non-VOC hotspots, (mean = 2.72 days, (95%CI: 1.48 – 3.96)). Conclusions: Our estimates of the average serial interval of COVID-19 are broadly similar to estimates from previous studies and we find no evidence that B.1.1.7 is associated with a change in serial intervals. Alternative explanations such as increased viral load, longer period of viral shedding or improved receptor binding may instead explain the increased transmissibility and rapid spread and should undergo further investigation.


Introduction
The serial interval is defined as "the period of time between analogous phases of an infectious illness in successive cases of a chain of infection that is spread person to person" (Feinleib, 2001). Serial interval is often measured as the duration between symptom onset of a primary case and symptom onset of its secondary cases. This is a key epidemiological measure because it can allow investigation of epidemiological links between cases, and it is an important parameter in infection transmission models used to inform infection control strategies. The doubling time of epidemic infections is in part dependent on both the generation time (time between infections regardless of the symptomatic status) and the R number (the average number of secondary infections each infection produces). Diseases with shorter generation time but similar R values will have shorter doubling times. Since the generation time is seldom observable, in practice the serial interval is used as a proxy for it. Mean serial intervals vary widely for different respiratory infections and have been estimated at 2.2 days for influenza A H3N2, 2.8 days for pandemic influenza A(H1N1)pdm09, 7.5 days for respiratory syncytial virus, 11.7 days for measles, 14 days for varicella, 17.7 days smallpox, 18.0 days for mumps, 18.3 days for rubella and 22.8 days for pertussis (Vink et al., 2014).
Published estimates of the serial interval of coronavirus disease 2019  are largely from Asian countries prior to the emergence of variants of concern (VOC). A meta-analysis of serial interval estimates for COVID-19 found mean serial intervals ranged from 4.2 to 7.5 days with a pooled mean of 5.2 (95%CI: 4.9-5.5) (Alene et al., 2021). A more recent concerning feature of COVID-19 epidemiology has been the emergence of a range of SARS-CoV-2 variants with mutations that may increase transmissibility, reduce the protective effect of immunity acquired from natural infection or vaccination and/ or increase clinical severity (Alene et al., 2021). These include B.1.1.7 (first described in England), 501Y.V2 (first described in South Africa) and P.1 (B.1.1.28.1 -first described in Brazil). Each of these variants rapidly became dominant in the country in which they were first described. For the B.1.1.7 variant, increased transmissibility is thought to explain the rapid emergence and global spread. Since either increased R or decreased serial interval could potentially explain more rapid emergence of B.1.1.7, it is important to understand whether serial interval differs. To date, however, there are no published comparisons of the serial interval for B.1.1.7 and previously circulating strains. We analysed data from putative household infector -infectee pairs in the Virus Watch Community cohort study to assess the serial interval of COVID-19 and whether this was affected by emergence of the B.1.1.7 variant.

Methods
This study has been approved by the Hampstead NHS Health Research Authority Ethics Committee. Ethics approval number -20/HRA/2320. All members of participating households provided informed consent for themselves and, where relevant, for children that they were responsible for. This was electronically collected during registration. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
The Virus Watch study is an online, prospective, community cohort study following up entire households in England and Wales during the COVID-19 pandemic. Participants were recruited into the Virus Watch study using a range of methods including by post, social media, SMS messages or personalised letters with incentives from General Practices. Participants were eligible if all household members agreed to take part, if they had access to the internet (Wi-Fi, fixed or on a mobile phone) and an email address. At least one household member had to be able to read English to complete the surveys. Participants were not eligible if their household was larger than 6 people (due to limitations of the online survey infrastructure). The full study design and methodology has been described elsewhere (Hayward et al., 2021). Study data were collected and managed using REDCap electronic data capture tools hosted at University College London (Harris et al., 2009). Data collection began on 24 June 2020 and is ongoing. As of 11th April 2021, 49,149 people across England and Wales have joined the study. Participants prospectively complete detailed daily symptom diaries recording the presence and severity of any symptoms of acute respiratory, gastrointestinal and other illnesses. At the end of each week participants are emailed a link to complete a weekly online survey where they report any symptoms from that previous week as well as the dates and outcomes of any COVID-19 swabbing conducted as part of NHS Test and Trace, work-based testing schemes, and other research studies.
Symptom data were extracted and grouped into illness episodes using Stata (StataCorp, 2019). The start date of an illness episode was defined as the first day any symptoms were reported, and the end date was the final day of reported symptoms. A seven-day washout period where no symptoms were reported was used to define the end of one illness episode and the start of a new illness episode. Swab results were matched to illnesses that were within 14 days of each other. Putative household infector-infectee pairs were identified where more than one person in the household had a positive swab matched to an illness episode. Although negative serial intervals are possible, in practice it is not possible to assess the direction of transmission between pairs, so it was assumed that the minimum interval was zero days. According to the World Health Organization report on SARS-CoV-2 transmission, the estimated latest transmission can occur up to nine days after the infector's symptom onset & the incubation period for the infectee can be up to 14 days (WHO, 2020). Thus, the longest time interval between an infector's and an infectee's onset of symptoms was considered at 23 days. Our analysis conducted using R version 4.0.3 (R Core Team, 2020), considered pairs of cases with symptom onset occurring between 0 and 23 days apart

Amendments from Version 1
The distinction between the generation time and the serial interval and their relation to the epidemic doubling time was added in the introduction� � Any further responses from the reviewers can be found at the end of the article REVISED in households as possible transmission pairs (Geismar, 2021). Serial interval was calculated as the number of days between symptom onset of the pairs of cases. Figure 1 presents a hypothetical household with four confirmed COVID-19 cases and their respective symptom onset date. Any case with a symptom onset date within 23 days of a previous case will be paired. Where there were multiple potential infectors for the same infectee, these pairs were excluded from the analysis. 'Person #4' has two potential infectors as her symptom onset date is within 23 days of 'Person #2' and 'Person #3' 's symptom onset dates. Pairs containing 'Person #4' as an infectee will be removed since we cannot determine her "true" infector. Thus, we only retain the two most likely transmission pairs: 'Person #1' to 'Person #2' and 'Person #2' to 'Person #3'.
Data on whether or not individual infections were caused by the B.1.1.7 variant were not available. We therefore developed a classification system based on the percentage of cases estimated to be due to B.1.1.7 in national surveillance data for different English regions and study weeks. These surveillance data utilise a proxy indicator of B.1.1.7 known as Spike-gene target failure (SGTF) which can be picked up on most polymerase chain reaction (PCR) assays used in English community testing programmes. Infections in regions and weeks when >75% of strains were SGTF were classified as occurring in B.1.1.7 "hotspots". Infections in regions and weeks when <25% of strains were SGTF were classified as occurring in "non-hotspots". Infections in regions and weeks when 25% to 75% of strains were SGTF were classified as "undetermined" since no significant threshold was reached. Mean serial interval and 95% confidence intervals were compared in hotspot and non-hotspot areas using Welch two-sample t-tests.

Discussion
Our estimate of the mean serial interval of COVID-19 (3.18 days (95%CI: 2.55 -3.81)) is within the range of previous studies reviewed by Griffin et al. (2020), but slightly lower than pooled estimates from meta-analysis of data from international studies in the first few months of the pandemic (5.2 (95%CI: 4.9 -5.5)) (Alene et al., 2021). Differences in populations, social contact, and timeframes may explain the range of estimates reported. The implementation of control measures, regular testing, isolation and improved knowledge of SARS-CoV-2's transmission since the start of the pandemic, may have reduced the potential for an infected person to transmit the disease over a long period of time. Multiple studies observed and attributed the decrease of the serial interval to increased control measures (Bi et al., 2020;Lavezzo et al., 2020;Zhao et al., 2020). Ali et al. (2020) modelled the serial interval over time accounting for timeliness of cases' isolation and found that "serial intervals are positively associated with isolation delay". Another potential explanation of shorter serial intervals may be due the frequent and close contact among household members. This could lead to transmissions occurring earlier in the course of infection, which would result in shorter serial intervals.
Strengths of the study include the relatively large number of pairs compared to most studies, the prospective daily recording of symptoms and weekly reporting of swab test results in a large household cohort, and our ability to assess whether a variant with apparent increases in transmissibility has an altered serial interval. Limitations of our analysis include reliance on samples taken during the national symptomatic testing programme to assess infection, meaning we are likely to have missed some infections and household transmission events. Pooled asymptomatic proportion of SARS-CoV-2 infections is estimated at 23% (95% CI 16%-30%) and we cannot assess serial intervals when either case is asymptomatic (Beale et al., 2020). We can also not assess the possibility of negative serial intervals which may arise when transmission occurs prior to symptom onset and incubation period is short. Missing negative serial interval values could lead to an overestimation of our mean serial interval estimate. Finally, we do not have     Our analysis does not provide evidence to suggest that changes in serial interval explain the rapid emergence of B.1.1.7. Alternative explanations such as increased viral load (Kidd et al., 2021) or improved receptor binding may instead explain the increased transmissibility and rapid spread and should undergo further investigation.

Data availability Underlying data
We aim to share aggregate data from this project on our website and via a "Findings so far" section on our websitehttps://ucl-virus-watch.net/. We will also be sharing individual record level data on a research data sharing service such as the Office of National Statistics Secure Research Service. In sharing the data we will work within the principles set out in the UKRI Guidance on best practice in the management of research data. Access to use of the data whilst research is being conducted will be managed by the Chief Investigators (ACH and RWA) in accordance with the principles set out in the UKRI guidance on best practice in the management of research data. We will put analysis code on publicly available repositories to enable their reuse. Given the content of our dataset (information on infector-infectee pairs per geographic regions) for this study, we currently cannot release the data at the individual level. Data access requests to data can be made to the Virus Watch chief investigators (ACH or RWA) at the following email address: viruswatch@ucl.ac.uk. The exclusion of negative serial intervals might lead to overestimation of the mean serial interval? This could be clarified at the end of the second Discussion paragraph?
The standard deviation of serial intervals could also be presented in the abstract and also in Table 2 and the main text.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
This comparison presents a number of difficulties, most of which are discussed in the paper. These difficulties are inherent to estimates of the serial interval distribution, and, although should be taken into account, in my opinion, do not invalidate the conclusions, since data about different variants was obtained at about the same time, using the same procedure. The data is restricted to infector-infectee pairs within households, so pairs with people in different households are not taken into account. This will probably bias down the observed serial intervals since contacts within households tend to be closer.

○
In the introduction, it is mentioned that serial intervals and the doubling time of the epidemic are related to the reproduction number -but to be precise, it is the generation interval (time between infections) that has this relation. Since the generation interval is seldom observable, the serial interval is used as a proxy for it. I believe this should be acknowledged in the paper.
○ Due to limitations in the software used, households with more than 6 people were not included in the study. It would be good to include in the text how many households are in this situation. One limitation of the study is that households with many people would tend to lower down the serial interval distribution, since if more people live together, they may presumably have closer contacts than in households with less people. However, since the same kind of bias was applied to all virus variants, I believe this would not affect the final conclusion.
○ Asymptomatic cases were not included (since the serial interval is not defined for asymptomatic cases). This might be a source of bias if one variant has a larger percentage of asymptomatic infections than the other.

○
Negative serial intervals were also not considered (the absolute value of the serial interval is used instead). Previous studies report a rate of approximately 13% negative serial intervals (Griffin et al., 2020).

○
The distribution of serial intervals will depend on NPIs. Despite the fact that the study considers cases occurring in different regions at the same time, it might be important to check if different regions were not having different sets of restrictions in place (for example, in a region under lockdown, people would stay at home most of the time, potentially decreasing the within-household serial interval). This potential difference between hotspots was not discussed in the paper.

○
Although the mean serial interval is particularly important, the whole distribution affects estimates of the reproduction number -so it might be worth comparing the distributions, not only their means.

○
The study is an important contribution to understanding the observed differences between variants, and a worthy contribution to the literature.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Partly

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Signal processing/statistics applied to epidemiology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Author Response 29 Oct 2021
Cyril Geismar, University College London, London, UK 1.The data is restricted to infector-infectee pairs within households, so pairs with people in different households are not taken into account. This will probably bias down the observed serial intervals since contacts within households tend to be closer. We thank the reviewer for their valuable insights. We agree that the clustering method we implemented may impact our results. We state this limitation in the first paragraph of the Discussion section. To consider transmissions occurring outside the household we would require data about 1) the contact between members in our cohort and 2) data from those outside our cohort. Due to ethical and security concerns, point 1 is not possible as it requires us to have access to a deeper volume of data akin to surveillance for which we do not have ethical approval for. Point 2 is not possible as our study actively requires participants to knowingly provide data; we only have this at the household level. ○ 2. In the introduction, it is mentioned that serial intervals and the doubling time of the epidemic are related to the reproduction number -but to be precise, it is the generation interval (time between infections) that has this relation. Since the generation interval is seldom observable, the serial interval is used as a proxy for it. I believe this should be acknowledged in the paper.
We thank the reviewer for pointing this out and will update for all audiences. ○ 3. Due to limitations in the software used, households with more than 6 people were not included in the study. It would be good to include in the text how many households are in this situation. One limitation of the study is that households with many people would tend to lower down the serial interval distribution, since if more people live together, they may presumably have closer contacts than in households with less people. However, since the same kind of bias was applied to all virus variants, I believe this would not affect the final conclusion.
We thank the reviewer for providing their insights into the limitation on household size. We agree that the bias applies to all virus variants and is not likely to impact the conclusion. Furthermore, according to the ONS report "Families and households in the UK: 2020": "The average household size in the UK is 2.4 while there were 162,900 (0.6%) households in the UK with seven or more people''. Therefore we believe that our results are representative. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/families/bulletin .

Asymptomatic cases were not included (since the serial interval is not defined for asymptomatic cases). This might be a source of bias if one variant has a larger percentage of asymptomatic infections than the other.
We agree with the reviewer. We acknowledged in the paper that we were unable to measure asymptomatic transmissions and that the estimates provided are strictly measures of the serial interval when both the infector and infectee are symptomatic. We thank the reviewer for pointing this out with supporting evidence from Griffin et al. (2020)'s rapid review. We did acknowledge the possibility of COVID-19 negative serial interval and explained why we were not able to consider this: "Although negative serial intervals are possible, in practice it is not possible to assess the direction of transmission between pairs, so it was assumed that the minimum interval was zero days." Given the nature of our data (self-reported), we were not able to determine cases when transmission occurs prior to symptom onset. Although we acknowledge the possibility of a significant proportion of negative serial interval in COVID-19 transmissions, we do not believe that there is sufficient evidence from enough studies with significant transmission pairs to report a specific percentage. ○ 6. The distribution of serial intervals will depend on NPIs. Despite the fact that the study considers cases occurring in different regions at the same time, it might be important to check if different regions were not having different sets of restrictions in place (for example, in a region under lockdown, people would stay at home most of the time, potentially decreasing the within-household serial interval). This potential difference between hotspots was not discussed in the paper. We thank the reviewer for their comments. To mitigate any variation related to the localised implementation of policies, our recruitment process was conducted using the royal mail list to recruit evenly across the UK; therefore our results are a close estimate to the average and were less likely impacted by local variation. ○ 7.Although the mean serial interval is particularly important, the whole distribution affects estimates of the reproduction number -so it might be worth comparing the distributions, not only their means. We used the Welch two sample t test to compare VOC and non-VOC hotspots. This statistical test is a test of the distribution and not the mean itself as it accounts for the standard deviation which is a measure of the distribution. ○ Competing Interests: No competing interests were disclosed.