Infection patterns of endemic human coronaviruses in rural households in coastal Kenya

Background: The natural history and transmission patterns of endemic human coronaviruses are of increased interest following the emergence of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Methods: In rural Kenya 483 individuals from 47 households were followed for six months (2009-10) with nasopharyngeal swabs collected twice weekly regardless of symptoms. A total of 16,918 swabs were tested for human coronavirus (hCoV) OC43, NL63 and 229E and other respiratory viruses using polymerase chain reaction. Results: From 346 (71.6%) household members, 629 hCoV infection episodes were defined, with 36.3% being symptomatic: varying by hCoV type and decreasing with age. Symptomatic episodes (aHR=0.6 (95% CI:0.5-0.8) or those with elevated peak viral load (medium aHR=0.4 (0.3-0.6); high aHR=0.31 (0.2-0.4)) had longer viral shedding compared to their respective counterparts. Homologous reinfections were observed in 99 (19.9%) of 497 first infections. School-age children (55%) were the most common index cases with those having medium (aOR=5.3 (2.3 – 12.0)) or high (8.1 (2.9 - 22.5)) peak viral load most often generating secondary cases. Conclusion: Household coronavirus infection was common, frequently asymptomatic and mostly introduced by school-age children. Secondary transmission was influenced by viral load of index cases. Homologous-type reinfection was common. These data may be insightful for SARS-CoV-2.


Introduction
Four endemic species of human coronavirus (hCoV), HKU1, OC43, NL63 and 229E, are widespread and associated primarily with mild acute respiratory illness 1 . Infections with endemic hCoVs are reportedly more severe in young children and the elderly 2, 3 . In the last two decades, three new members of this virus family have emerged as human pathogens; severe acute respiratory syndrome coronavirus (SARS-CoV) 4 , Middle East respiratory syndrome coronavirus (MERS-CoV) 5 and most recently SARS-CoV-2 6 . The pandemic spread and continued circulation beyond the initial wave of infection suggests a potential for SARS-CoV-2 to become resident within the human population. A focus on the natural history and transmission characteristics of current little-studied endemic species of hCoV may give insight to the future behaviour of this emergent relative 7 .
Using data from a study of 47 households in rural Kenya, we have previously reported baseline data on the occurrence of hCoV 8 and a detailed analysis of reinfection with hCoV-NL63 9 . In the present study, we investigate the natural history of infection and transmission patterns of three endemic hCoV within these households.

Household data
This study utilizes data from a prospective household-based cohort study conducted in one administrative location within the Kilifi health and demographic surveillance system (KHDSS) 8,10 on the Kenyan coast. The study design and methods have been described elsewhere 8,11 . Briefly, with a primary objective of characterising 'who infects whom' with respiratory syncytial virus (RSV), households with an infant born after the end of the 2008/2009 RSV season (referred to as the study infant) and at least one elder sibling (aged <13 years) were enrolled. The study period spanned a complete RSV season from 8 th December 2009 to 5 th June 2010. Nasopharyngeal specimens (NPS) were collected from all household members irrespective of symptoms, once a week in the first four weeks and twice-a-week thereafter until the study end. A household was defined as members (who need not be related) of one or more building units who share the same cooking facility. The study had a good retention rate (>80%) of households and of individuals over the study period 11 .
The study was approved by the Kenyan National Ethical Review Committee and the University of Warwick's Biomedical Research Ethical Committee in the United Kingdom. Individual written informed consent was obtained from all study participants aged ≥18 years. For those <18 years old, written consent was obtained from the parent or guardian.
Molecular testing of the NPS collections using multiplex RT-PCR assay A previously described real time multiplex RT-PCR (mPCR) assay with targets for 15 respiratory viruses was used 12 . The target pathogens were human coronavirus (hCoV species (also called types) OC43, NL63 and 229E), RSV A and B, rhinovirus (RV), adenovirus (AdV), parainfluenza virus (types 1-4), influenza virus (types A, B and C) and human metapneumovirus (hMPV). A preliminary screen of the NPS showed the last three virus groups were uncommon during the surveillance period and hence not screened for the remainder of the NPS collections 8 . A specimen with a cycle threshold (Ct) value of ≤35.0 for a specific virus target was considered positive.

Statistical analysis
Data analyses were undertaken in STATA Version 13.0 (Stata-Corp, College Station, Texas, USA). Descriptive statistics for continuous variables are presented as mean (± standard deviation) and median (interquartile range (IQR)). Categorical variables were summarised using counts and proportions and the chi-square test of association was used to examine the independence. The Mood's median test was used to investigate equality of median times across levels of categorical variables. Two or more groups were compared using test for equality of proportions.
Type-specific individual hCoV infection episodes were defined as a period with positive mPCR result(s) of the same type with no more than 14 days apart 13 . Episodes where no samples were collected and tested for >7 days before or after the infection episode were considered left-or right-censored, respectively. An episode was considered symptomatic if the individual was identified with any of the following symptoms during the infection episode; cough, runny nose, sore throat, nasal flaring, indrawing, crackles, wheeze, fever, unable to feed, head nodding, lethargy, unable to talk, cyanosis or difficulty breathing. Co-infection was assigned when within the hCoV infection episode an NPS collection was mPCR positive for a different hCoV species or another of the viruses tested, namely; RSV, RV, or AdV. Detection of two or more individual infection episodes by the same hCoV type in a household within a span of 14 days constituted a household outbreak. For each household hCoV introduction, a primary (index) case was defined as the first person(s) to test positive for hCoV by mPCR while secondary case(s) were the rest of the members who are part of the same household outbreak. For individuals with multiple hCoV infection episodes, reinfections were classified as either homologous (same hCoV species) or heterologous (different hCoV species) with respect to previously detected species during the study period. As an example, if an individual has three infections in the temporal order OC43, NL63 and OC43, then the second infection episode would be heterologous to the first, and the third homologous to the first infection episode and heterologous to the second episode.
Durations of virus shedding were estimated using a midpoint method which was defined as the period starting midway between the first positive sample and the previous negative sample and ending midway between the last positive sample and the subsequent negative sample. Further details on this approach are provided elsewhere 13 . Kaplan Meir (KM) curves were used to describe the survival functions (time to end of virus shedding) by different categorical variables across the three endemic hCoV types. Adjusted hazard ratios (aHR) obtained from multivariable Cox proportional hazards (PH) models were used to estimate the influence of several factors on the duration of shedding and symptoms. Logistic regression models were used to identify risk factors for spread of infection from the primary cases to other household members. The risk factors considered were age, sex, household size, presence of respiratory symptoms, presence of other respiratory pathogens and peak viral load in an infection episode. The peak/highest viral load was defined as the lowest Ct value in an individual infection episode and was categorised into three levels; low (>=30), medium (20-29) and high (<20). To account for clustering either at individual or household level, robust cluster variance estimator was used in the Cox PH and logistic regression models discussed above.

Baseline characteristics
A total of 483 individuals from 47 households had NPS collected over the six-month period. The mean number of household members was 10.5 (SD=6.5) classified into small (4-7 members), medium (8-16 members) and large (17-37 members). The median age of participants at the start of sampling was 10.7 years (IQR: 4.0 -23.4). The cohort had 214 (44.3%) male participants. Of the 47 study infants, the average age at the start of the study was 3.9 (SD=2.6) months and 22 (46.8%) of the infants were males. A total of 16,918 NPS from 483 individuals were successfully tested for OC43, 229E and NL63. The median number of NPS collected from study participants was 41 (IQR: 30 -44).

HCoV infection episodes
The pattern of shedding of each of the three hCoV types and of all hCoVs, is displayed in Figure 1. and 105 (16.7%) of OC43, NL63, 229E or any hCoV infection episodes, respectively were either right or left-censored and were excluded in survival analysis.
On average, the peak viral load of the individual infection episodes was higher in symptomatic compared to asymptomatic episodes ( Figure 2

Reinfection
Details of reinfections in the study cohort are given in Table 3

Transmission of hCoV in households
All the 47 households had at least one of the three hCoV detected while hCoV-OC43, NL63 and 229E were detected  The risk of generating a secondary case after introduction of any of the three endemic hCoV in the household was higher for index cases whose peak viral load was medium (aOR=5.29, 95% CI: 2.34 -11.96, p-value <0.001) or high (aOR=8.12, 95% CI: 2.92 -22.51, p-value <0.001) compared those with a low peak viral load. However, being a symptomatic index case was not associated with increased risk of infecting other members of the household (aOR=0.97, 95% CI: 0.42 -2.21,

<0.001
Key: aHR, adjusted Hazard Ratio; hCoV-hCoV coinfection denotes infection episodes in which an individual tested positive for two or more hCoVs.  p-value=0.933) compared to asymptomatic index cases (Extended data: Supplementary Table S3).

Discussion
Longitudinal studies of households have played an important role in developing understanding of the epidemiology of respiratory viruses 7,14 . Here we continue this approach, reporting an intensive surveillance of 483 household members in rural coastal Kenya 11 , to delineate the natural history of infection and transmission patterns of three endemic coronaviruses (OC43, NL63 and 229E). This involved the application of sensitive molecular diagnostic methods 7,14 , and additionally applied sampling that was frequent and irrespective of observed symptoms 8 . The hCoV types were common in this setting with each of the 47 households, and about 72% of the enrolled household members, experiencing infection with at least one of three targets over the six months of the study. A note of caution in interpreting the results of this study is that infection status determined by PCR assay is not necessarily indicative of active infection or an individual's infectiousness.
Crude attack rates were highest for hCoV-OC43 and lowest for 229E, higher in general for younger age classes (<15 years of age), school-age children and for males. These results are broadly consistent with the findings by Monto et al. who also found highest incidence for OC43, lowest for 229E, and higher incidence among those aged below 5 years for NL63 and OC43 7 .
The three hCoV types had differing durations of shedding ranging from 3.5 days (229E) to 7.5 days (OC43). However, these median time estimates are influenced by our sampling frequency: predominantly every 3-4 days. The duration of shedding was longer in episodes with high peak virus load and which were symptomatic. Consistent with findings from other studies, we report occurrence of hCoV infection episodes among asymptomatic individuals 15-17 who had lower viral load 18 and shorter durations of virus shedding compared to symptomatic episodes. Despite asymptomatic infections being predominant (>70% of episodes) the above findings suggest they were less likely to transmit infection compared to symptomatic individuals. The duration of symptomatic episodes was related to peak virus load as reported elsewhere [25] and tended to decline with increasing age.
Participants of all ages had appreciable risk of infection for the three endemic viruses suggesting previous infection does not provide solid immunity. This is supported by our observation that, within the short period of the study, reinfections were common and as frequently of homologous as heterologous type. Overall, 20% of individuals with a first infection of one or other type, were reinfected by the same type at least once, most commonly for type NL63 (24.5%). Homologous reinfections were frequently (>30%) symptomatic. We report no difference in the proportion of symptomatic cases between the first episodes and reinfection episodes and note that the time to reinfection with homologous was similar to heterologous episodes (~40 days). Our observations indicate that immunity to reinfection is commonly short lived and does not appear to be type specific. A recent serological study involving 10 adult men detected reinfections from seasonal coronaviruses but most frequently occurring after an interval of 12 months 19 . A limitation of our analysis is that reinfections might in fact have been prolonged shedding from a single infection. This is likely not a major effect as in most presumed reinfections (>70%) there were at least 4 PCR test negative results between episodes.
Older children (siblings and cousins) and other adults were the major introducers of hCoV transmission into the household compared to RSV transmission in the same households whereby older children (> 32%) were the leading primary cases 11 . Similarly, children have been reported to form the highest proportion of index cases in the USA and UK 7,20 . However, presence of older adults, children, smokers and individuals with chronic ailments within the households in the UK study was associated with increased household transmission 20 . Secondary transmission of hCoV to other household members upon introduction was high (48%) for any of the three hCoVs (ranged from 39% to 62% across type). This differs from a recent study in England which concluded that the vast majority (>90%) of observed hCoV infections were acquired outside the household 20 . In our study, the risk of secondary transmission was higher among index cases with high viral loads. Interestingly, there was no significant association between the presence of symptoms among index cases and the risk of secondary transmission, as observed elsewhere 20 .
In conclusion, endemic coronaviruses are common within the household setting, infecting all age groups, and often without eliciting symptoms. Secondary transmission following household introduction is associated with viral load but not, it appears, with symptomatic status, and homologous reinfection is common for all hCoV types.

Dongsheng Hu
School of Public Health, Shenzhen University Health Science Center, Shenzhen, China significant reservations, as outlined above.
on in the discussion but I feel that this could be elaborated.
Results: in baseline characteristics, for gender references 'male' is referred to when these were a minority in the study set. The majority population (female) should instead be referred to unless gender was unavailable/ not disclosed for some subjects.

7.
Results: The sentence 'Symptomatic individuals contributed to 240…of samples that tested negative' is confusing to read and I am not really sure what this is saying -please rephrase for clarity.

8.
Results: There are several places where 'any HCoV type' is referred to, but I am not sure whether this is referring to specifically mixed infections-in some cases, it seems to be. Can this be defined somewhere? 9. Results: Reinfection data describe occasions where individuals tested positive in sequential tests, but these were categorised as reinfection. On what basis was that judgement made? It seems counterintuitive.

12.
Results: Reinfection data describe the frequencies of reinfection with each of the three HCoVs studied -do these frequencies differ from the overall frequencies? 13.
Discussion: The abstract mentions SARS-CoV-2, but no parallels are made in the discussioneither take this out of the abstract or make an interpretation in the discussion. 14.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results?

Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: RNA virology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.