Revealing the extent of the COVID-19 pandemic in Kenya based on serological and PCR-test data

Policy makers in Africa need robust estimates of the current and future spread of SARS-CoV-2. Data suitable for this purpose are scant. We used national surveillance PCR test, serological survey and mobility data to develop and fit a county-specific transmission model for Kenya. We estimate that the SARS-CoV-2 pandemic peaked before the end of July 2020 in the major urban counties, with 34 - 41% of residents infected, and will peak elsewhere in the country within 2-3 months. Despite this penetration, reported severe cases and deaths are low. Our analysis suggests the COVID-19 disease burden in Kenya may be far less than initially feared. A similar scenario across sub-Saharan Africa would have implications for balancing the consequences of restrictions with those of COVID-19.


Main Text:
The potential risk from SARS-CoV-2 to Africa was identified early in the global pandemic (1). As the epicenter of transmission moved from East Asia to West Asia and Europe and then to North America, there was speculation as to the likely impact of the pandemic on the African continent with its young populations, infectious disease burden, undernutrition and fragile health infrastructure. However, as health systems and economies of high-income countries strained, the reported burden of COVID-19 cases and associated deaths in Africa remained low (2). Even today, other than for South Africa, this remains true throughout sub-Saharan Africa. The question is whether this is the result of lower risk due to demographic structure (young age (3)) or cross-reacting immunity (e.g. pre-existing SARS-CoV-2 crossreactive T cells (4)) a low reproduction number from imposed interventions (such as school closures and lockdowns (5)) or environmental conditions (e.g. temperature and humidity (6)), or under-reporting. The reason this remains a conundrum is, at least in part, a paucity of good quality data to reveal the probable extent of SARS-CoV-2 spread in African populations.
Following the first confirmed COVID-19 case in Kenya on 13th March 2020, the Kenyan Government moved rapidly, closing international borders, schools, restaurants, bars and nightclubs, banning meetings and social gathering and imposing a dusk to dawn curfew and movement restrictions in the two major city counties, Nairobi and Mombasa, considered epidemic hotspots at the time (7). The major concerns from unmitigated spread were a limited surge capacity of the Kenyan health system (8) and groups of the Kenyan population identified as potentially highly vulnerable to infection and/or severe disease (9). Throughout the months of April, May and into June 2020 few people in Kenya were reported SARS-CoV-2 test positive by polymerase chain reaction (PCR), or severely diseased or dying with COVID-19 as the established cause (10). There followed a relaxation of some measures in June and July including opening of restaurants and places of worship and the removal of travel restrictions into and out of Mombasa and Nairobi counties. As of 10 th August 2020, there have been 23,873 laboratory-confirmed positive swab tests out of over 320,000 tests, and 391 deaths with a positive test result in Kenya. This should be compared with the 200-250,000 cases and 30-40,000 deaths attributable to SARS-CoV-2 for similar sized countries in Europe (France, Italy, UK) at equivalent months into their epidemics.
The reason for this apparently low level of COVID-19 disease in Kenya is unknown, and non-reporting is a potential explanation. However, two pieces of information suggest an explanation that SARS-CoV-2 has spread extensively with a smaller proportion than expected having severe disease. First, a regionally-stratified seroprevalence study of 3098 Kenyan blood donors sampled between May and June reported a national estimate of 5.2% (adjusted to reflect the population distribution by age, sex and region) (11). Seroprevalence was higher in Nairobi (8.5%) and Mombasa (9.3%). These levels of seropositivity are comparable to those reported in May in the UK (12), April/May in Spain (13), and March/April in some US cities (14), where in contrast to Kenya, high numbers of PCRpositive cases, hospitalizations and deaths have also been reported. Second, early in August 2020, we noticed that test-positive PCR cases were declining in Mombasa, the second largest city in Kenya. This went counter to evidence of increased mixing, and hence reproduction potential, arising from Google Mobility data for Mombasa showing a steady reversion in mobility towards pre-COVID-19 intervention levels from early April (Fig. S1).
We developed a simple SEIR compartmental mechanistic and data-driven transmission model for Kenya, which integrates three sources of longitudinal data: national time series PCR tests, the Kenyan serological survey and Google mobility behavioural data (15). Our aim is to derive a coherent picture of SARS-COV-2 epidemiology in Kenya and reveal the historic and future patterns of spread across the country and by county. This approach allows us to generate a crude estimate of infection-to-fatality ratio (IFRcrude) between estimated infections and observed deaths with a positive test result; reported deaths are not used as primary data for inference, but rather as a validation data set for model predictive accuracy (see supporting information for description of model validation).
As at 10 th August 2020, a substantial proportion of PCR positive tests have been samples from the capital Nairobi (10,575 positive tests), and Kenya's second largest city, Mombasa, has reported the next highest number of PCR positive tests (1,962). We find that the rate of new infections peaked on May 31st 2020 (CI May 25th -June 15th) in Mombasa and July 10th 2020 (CI July 2nd -July 19th) in Nairobi, and is now declining (Fig. 1H,G). The model suggests that the PCR test and serology data can be explained by the initial presence of <10 infected individuals in Mombasa and <50 in Nairobi on 21st February, three weeks before the first reported case in Kenya. Thereafter, growth of transmission was rapid in both counties.
In early March, the reproductive ratio was 2.23 and 2.01 in Mombasa and Nairobi, . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint respectively, and the doubling-time estimates were 3.85 and 4.55 days, respectively. After March, the transmission curves flattened substantially. By April 16 th , the reproductive ratio was 1.18 and 1.04, respectively and the doubling times were 27.7 and 93.5 days, respectively. This change is consistent with the introduction of containment measures by the Kenyan government, and evidence of substantial reduction in mobility (see Google Mobility data Fig.   S1). From late April through May and June and into July the evidence suggests movement restrictions became steadily less effective ( Fig. 2; Fig.S1). The waning effectiveness of movement restrictions results in an inferred increase in R and an increased rate of epidemic growth; however, in Mombasa and Nairobi we predict that reduction in susceptibility of the population had already caused the effective reproductive ratio (Reff; the mean number of secondary cases accounting for reduced susceptibility) to fall significantly below the basic R value (Fig. 2).
By accounting for the delay of an average of 19 days between infection and death (supporting information for details on infection to death distribution) we find the transmission curve, estimated from PCR tests and serology, generates a good prediction of the observed daily deaths in Nairobi and Mombasa (Fig. 1 E&F). We emphasize that we did not use daily death data in transmission model inference, therefore the good fit to the observed trend in deaths with a PCR-confirmed test result represents an out-of-sample validation of the modelling (16). We predict the peak of positive PCR test samples occurred at the end of July or early August in Nairobi and earlier, mid-June, in Mombasa. The lag between transmission peak and positive swab testing peak being explained by both the delay between infection and becoming detectable by PCR, and the period after an infected individual has ceased being actively infectious but remains detectable by PCR (17) (Fig. 1 A&B). As of the beginning of August 2020 we estimate that about 28.5% (CI 17.3%-38.9%) of the Nairobi population, and 27.1% (CI 19.2-37%) of the Mombasa population would be serologically positive with SARS-CoV-2, (Fig. 1 C&D). This estimated level of seropositivity is substantially higher than has been estimated in countries that have been hit hard by the pandemic (12)(13)(14) however they are in broad agreement with a recent study in Niger state, Nigeria (18). We caution over-interpreting the forecasts of seroprevalence; the projections could be markedly lower if we assumed significant decay in specific antibody over the timescale of observation of a few months. Importantly, our projections of the proportion exposed are not conditional on the assumption of waning seropositivity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint Accounting for the sensitivity of the serological assay, and the delay between infection and seroconversion, we estimate that the actual exposure of the population to SARS-CoV-2 by August 1st was 40.9% (CI 24.3%-54.7%) in Nairobi and 33.8% (CI 23.7%-46.5%) in Mombasa ( Fig. 1 C&D). Such levels of population exposure are predicted to be associated with herd-immunity in these urban populations given the estimated reproductive numbers and effective population size at risk of exposure (Peff) due to heterogeneity in the susceptibility, transmissibility and social interactivity in the population (supporting information for more details on effective population size in transmission modelling); the effective population size for Nairobi was inferred as 77.0% of actual population size (CI 45.4%-98.8%), for Mombasa 56.4% (CI 39.6%-76.8%). These low herd immunity estimates rest upon inferred variation in risk across the population and reduced reproductive potential due to present interventions.
There remains a possibility of future increase in transmission if population mobility continues to rise, if population mixing patterns alter leading to changed risk heterogeneity or if immunity is short lived, leading to a rebound in reported cases. It is possible that nonantibody mechanisms of immunity (for instance T cells (3)) also contribute to apparent herd immunity at low serological prevalence.
The fitted IFRcrude values for both Nairobi (IFRcrude = 0.014% (CI: 0.010%-0.023%) ) and Mombasa (IFRcrude = 0.02% (CI:0.014%-0.028%)) are substantially lower than the ageadjusted IFR expected for Kenya under full ascertainment (IFRverity = 0.26%; (19) and supporting information). This is a crude observational value for the infection to fatality ratio, since we do not currently have an estimate of the reporting bias of deaths of individuals infected with SARS-CoV-2. Therefore, our estimate of IFRcrude potentially reflects lower detection in Kenya compared to China, as well as any lower mortality risk due to fewer comorbidities. We extended our model-based inference to each of the 47 counties in Kenya (see dataset S1 for parameter estimates, peak time estimates and IFRcrude estimates for each county). For counties with enough data we replicated the inference approach used for Nairobi and supporting information for details on how counties were matched to similar counties). We find that, in addition to the two main Kenyan city counties, the rate of new infections in semiurban counties neighbouring Nairobi (Kiambu, Kajiado, and Machakos) and Mombasa is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint (Kwale), and the Ugandan border county Busia, also peaked before August 2020; Kwale county (neighbouring Mombasa on the Tanzanian border) is predicted to have been the first to peak (posterior mean of peak time May 21 st 2020). Note, these early peaks in Kwale and Busia counties might be an artifact of cross-border importations and a testing strategy skewed towards a highly mobile population (i.e. truck drivers). However, the infection rate is predicted to still be rising in other more rural counties, most notably in counties constituting the former Rift valley, Eastern and North Eastern provinces. We forecast that the final county to reach peak infection rate will be the arid and sparsely populated Wajir county (posterior mean 24th September 2020; Fig. 3A).
Because of the lag between infection and the observability of the infected person (whether by swab PCR test, serology test, or death), we estimate that both daily PCR positive test detections and daily observed deaths attributed to COVID-19 across all of Kenya were peaking in early August 2020. Accessing complete data on hospitalisation rates in Kenya has been challenging. However, sentinel clinical surveillance from county hospitals suggests a modest increase in adult admissions trending towards pre-COVID numbers in June and July 2020 but, on average, lower than observed counts from at a similar period in 2018 and 2019 (20). Our prediction is that rates of new PCR positive tests and deaths attributed to COVID-19 will enter a long-tailed decline from August 2020 onwards; that is, the rate of epidemic decline will be less steep than the rapid growth phase as the decrease in the main cities is partially off-set by growth in the rest of Kenya (Fig. 3 B&C). It is likely hospitalisation incidence will follow a similar trend to the infection and death rates. By fitting an IFRcrude value for each county in Kenya, we expect that by the end of December 2020 less than 1000 deaths in Kenya will be cumulatively attributed to COVID-19 (Fig. 3C). It should be noted that our projections are based on an assumption that the PCR testing rate in Kenya is either increasing or flat in each county, if the testing rate declines then we expect the rate of PCR positive tests to decline even more steeply than predicted in this study.
Our modelling analysis provides a coherent account of the SARS-CoV-2 pandemic in Kenya.
Whilst the full picture of the epidemiology in Kenya will not be established until causespecific mortality data become available (e.g. from resumption of Demographic Surveillance System and verbal autopsy activities), out model, fitted to three sources of nationwide longitudinal data, suggests that the number of COVID-19 cases and the mortality risk attributable to the SARS-CoV2 epidemic are substantially lower in Kenya than in Europe and the USA. Given that the model suggests 34-41% of the urban population have already been is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint exposed, and that the epidemic has peaked during a period of relatively little restrictions and physical distancing, it seems likely that a significant factor in the epidemic resolving is through population immunity. If this trend continues, then the direct and indirect health and socio-economic consequences of control measures need to be balanced carefully as we obtain more accurate estimates of the consequences of disease. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint

Fig. 2. Estimated basic and effective reproductive numbers in Kenya since
Feb 21st 2020. The posterior mean reproductive number for Nairobi (red curves), Mombasa (green curves), and the averaged over reproductive number estimates for all other Kenyan counties (blue curve). Shown are both the basic reproductive numbers (expected secondary infections in a susceptible population adjusted for mobility changes since the epidemic start; solid curves), their near-term projections (dashed curves), and effective reproductive numbers (expected secondary infections accounting for depletion of susceptible prevalence in the population; dotted curves). The effective reproductive number varied significantly from county to county and is not shown. Restrictions aimed at reducing mobility in risky transmission settings (black dotted lines) are labeled in groups. The chronologically ordered restrictions in each group are: 1) First PCRconfirmed case in Kenya, suspension of all public gatherings, closure of all schools and universities, and retroactive quarantine measures for recent returnees from foreign travel, 2) suspension of all inbound flights for foreign nationals, imposition of a national curfew, and regional lockdowns of Kilifi, Kwale, Mombasa and Nairobi counties, and 3) additional no-movement restriction of worst affected areas within Mombasa and Nairobi, and, closure of the border with Somalia and Tanzania. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. .  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. The views expressed in this publication are those of the author(s) and not necessarily those of any of the funding agencies. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. Data and materials availability: The full Kenyan SARS-CoV-2 line list contains sensitive personal information that could potentially allow the identification of individual cases. The analysis performed in this study only required a highly aggregated dataset derived from the Kenyan linelist. Other data used in this paper was openly available. All data is available in the main text or the supplementary materials. The code base underlying the analysis is accessible at the open github repository https://github.com/ojal/KenyaSerology.

Figures S1-S4
Tables S1 Dataset S1-S6 . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint

Data description
We use three daily time series as sources of data to infer model parameters for each Kenyan county: • PCR positive swab samples by collection date, ( + ) , denoting the number of observed on day n in county c. The county-specific time series of PCR positive swabs was derived from the Kenya linelist by identifying cases with reported PCR positive swabs in that county by their date of sample collection (or lab confirmation date if collection date was not available). We excluded positive swabs that were traced either at entry into Kenya or due to being a contact of an identified case, so as to focus on Kenyan population surveillance.
• Numbers of sero-positive ( + ) , and sero-negative ( − ) , blood samples collected from regional centres of the Kenyan National Blood Transfusion Service (KNBS) on day n originating from county c.
• Daily estimates of relative human mobility , compared to a baseline of the same date in the previous year (2019) derived from Google mobility trends (15).
• Deaths with a positive PCR-confirmed test by date of death for each county, ( + ) .
The ( + ) , time-series ran between 13th March and 10th August, however we removed data from 7th until 10th August because of delay between the sample date and the positive swab test being confirmed and added to the Kenyan linelist. The ( + ) , and ( − ) , data was collected in May and June. Residual blood samples for serology were obtained from regular blood donors attending 4 regional KNBTS centres (Mombasa, Nairobi, Eldoret and Kisumu). The study methodology is fully described in Uyoga et al (11).
We assumed that changes in trends in SARS-CoV-2 transmission in Kenya were due to changes in the underlying population mobility. In particular, by changing frequency of indoor congregations. Therefore, we calculated , as the average change in baseline mobility over the "retail and recreation", "grocery and pharmacy", "transit stations", and, "workplaces" settings (Google defined categories), and also over the week prior to day n, in order to average over weekend effects. Due to incomplete data, and the likely bias introduced by using a mobility estimate derived from smartphone users in predicting the mobility of semi-rural populations outside of the major urban conurbations in Kenya, we consider only three areas: Nairobi, Mombasa and the pan-Kenyan aggregate (Fig. S1).

Transmission model
The dynamics of transmission were assumed to follow a simple SEIR transmission model with an effective population size parameter ( ) (21). The parameter accounted for the effect of population heterogeneity in lowering the proportion of the total population that must become immune for incidence to start decreasing, due to depletion of susceptibles rather than increased social distancing, compared to the prediction of a fully homogeneous model. In homogeneous SEIR models both the early exponential growth rate in incidence and the proportion of the total population that must become immune for incidence rate to start decreasing, e.g. "herd-immunity", can be determined from the reproductive number 0 and . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint the mean durations of latency and infectiousness (22). For heterogeneous models of transmission, where potentially different at-risk groups are at different risk of contracting the infectious pathogen and have different infectious potential, determining 0 from early growth in incidence aggregated over the different at-risk groups doesn't give sufficient information to estimate the overall proportion of the population required to become immune before achieving herd immunity. This aspect of heterogeneous models of transmission has been widely investigated, for example, in the context of comparing vaccination coverage thresholds for elimination between uniform and targeted vaccination policies (23). In the context of the SARS-CoV-2 pandemic modelling literature, the role of population heterogeneity in lowering the herd-immunity threshold compared to the prediction of a homogeneous population transmission model has again been identified (24, 25). In this study, we have taken a phenomenological approach; the effect of heterogeneity in the population was encoded in the effective population parameter , and this parameter was inferred jointly with 0 .
The rate of infectious contacts per infectious individual, , was given as the time-varying instantaneous reproductive number ( ), that is the number of secondary cases per infected individual assuming both a fully susceptible population and the time-varying mobility rate being fixed at time t (26), rescaled by the recovery rate from infection ( ). The instantaneous reproductive number for a county was assumed to change daily proportional to the change in the mobility rate: when t was in day n.
The "area" of the county was Nairobi or Mombasa for those counties, otherwise the Pan-Kenyan mobility estimate was used. The baseline reproductive number 0 was inferred for each county.
Equation (2) implies the effective (instantaneous) reproductive, that is the instantaneous reproductive ratio where the reduction in susceptibility is also accounted for: = ( )/( ).
In total, the transmission model for each county had four unknowns is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint scale ( ), the number of latently infected individuals on 21st Feb ( (0)), and the number of actively infectious individuals on 21st Feb ( (0)).

Observation model
The underlying transmission of SARS-CoV-2 is not observed, rather we have access to PCR positive swab tests and serological tests (positive and negative) aggregated by date and county. Therefore, we developed an observation model that connects unobserved daily transmission rates, which depend on the unknown transmission parameters, to a likelihood of the observed test data. The observation model itself had two unknown parameters per county: the mean detection rate per PCR-detectable individual per day by swab testing, and the clustering factor of these detections. By defining a likelihood-based observation model we gained access to modern techniques in Bayesian inference to infer both the unknown parameters of the underlying transmission process and the unknown parameters of the observation process.
The number of people who would test positive, either as PCR positive or as sero-converted, on each day n depended on: 1) the rate of new incidence on each day s < n, and, 2) the probability that someone who was infected on day s is detectable = − days later. The daily numbers of new incidence ( ) on each day n predicted by the transmission model was, The probability that an infected individual would be determined as having been infected days after infection by either a PCR test or a serology test was denoted, respectively, ( ) and ( ). The maximum sensitivity of the PCR test in a typical Kenyan setting was part of our inference methodology (see below), therefore we treated ( ) as being 0 for the first 5 days post-infection (the mean time between infection and symptoms for symptomatic cases (27)) and then was fitted to data on diagnostic accuracy given in Zhou et al (17), the fitted functional form for PCR-detectability more than 5 days after infection was: Where was the tail distribution function of a Gamma distribution with fitted shape parameter � = 18.4 and fitted scale parameter � = 1.1. This aligns with Zhou et al that the median period to become PCR undetectable after symptoms was 20 days (CI 17-24 days) (17). The lag between symptoms and maximum detectability by serological assays has been reported as 21 days in a recent metastudy of reported diagnostic sensitivities (28). We assumed that, given an additional 5 day lag after infection, the sensitivity of the serological assay increased linearly from 0 at infection to a maximum of 82.5% (11) over 26 days, ( ) = 0.825 /26 for 0 < < 26 days, ( ) = 0.825 for ≥ 26 days.
The number of people in the county who would test positive on each day, with a PCR test ( + ) , and/or a serology test ( + ) , is, therefore, where was the midpoint of day n.
. CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint was the false positive rate for the serology assay (see table S1). Underlying equation (6) is an assumption that the PCR test is 100% specific to SARS-CoV-2.
The total number of swab samples collected, including negative tests, on each day was not available. The total number of tests performed in coastal counties which were referred to the KWTRP laboratory for PCR was available, and showed considerable variation per day (see Fig S3 for total swab tests per day in Mombasa referred to KWTRP). Moreover, there was variation in the setting at which the swab test was collected, e.g. at hospital, from a walk-up testing facility etc, as the focus of Kenyan public health teams has shifted over the course of the epidemic. Therefore, there was a high degree of variability in the detection probability per PCR-detectable infected individual per day. To account for the high variability in detection, we modelled the distribution of PCR positive swab tests collected on each day using a negative binomial distribution, Where was the mean detection rate per PCR-detectable, per day. Hence, is the expected number of positive swab tests that would be sampled on day n, given ( + ) PCRdetectable infected individuals, and was the clustering coefficient 1 for the sampling process: = 0 recovers a Poisson distribution for the number of positive swab tests collected each day, > 0 allowed the model to infer much greater variance in daily positive than expected from a Poisson distribution (29). It should be noted that the parameter effectively absorbs swab sample biasing, either for or against selecting individuals who have been exposed to SARS-CoV-2, amongst the swab tested subjects along with the sensitivity of the PCR test as performed in realistic situations.
The number of donor blood samples tested by day of sample collection was available, however, the reported uncertainty in the maximum sensitivity of serology assay was fairly high: the posterior mean sensitivity was 82.5% (credible interval 69.6-91.2%; (11)). The posterior uncertainty in the serological sensitivity influenced the confidence the inference method placed on the serological sample data; if the test sensitivity was known to high precision we would treat each days serological samples as a binomial draw from an underlying proportion of seroconverted individuals given by equation (4). Given that the sensitivity of the serological assay was itself an uncertain factor we fitted the posterior uncertainty in the testing sensitivity to a beta distribution: ∼ ( � = 33.6,̂= 7.13). This implied that the appropriate observation model for the number of positive serological samples on day n (( + ) ), out of the total number of serological samples being collected on day n, , = ( + ) + ( − ) , was a Betabinomial distribution, Given an underlying realization of the transmission process the mean number positive serological samples on day n is  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint

Parameter inference for each Kenyan County
The observation model gives the following log-likelihood function for the unknown parameters = ( , ) given the sampling data for a county, = {( + ) , ( − ) , ( + ) } =1,2,... : Where and are, respectively, the probability mass functions for the negative binomial and beta-binomial distributions, as described in the observation model subsection. The first day where samples were included in the log-likelihood calculation was 12th April, due to testing being even more irregular before that date (Fig. S2), and the last day was 6th August, due to right-censoring of samples collected on 7th-10th August appearing in the linelist dated 10th August.
We divided the Kenyan counties into two groups: those counties with and without serology samples. The availability of serology data was highly important for this study because without serology data, the detection rate was not identifiable jointly with and the initial conditions using weak priors. Therefore, we inferred unknown parameters in the counties with serology data using weak priors (priors described below), whereas for counties without serology data we used strong priors based on the posterior distribution of the counties that had serology data (method described below). There were two exceptions to the county grouping: Kajiado county, which has no attributed serology samples but a substantial number of PCR-positives, and was added to group of counties fitted using weak priors, and West Pokot county, which had some serology data but small numbers of PCR-positives, and was added to the group using strong priors.
The prior distributions for 0 , , and were the same for each county using weak priors: 0 ∼ ( � = 2, � = 2.5/2), ∼ ( � = 2,̂= 1), ∼ ( � = 3, � = 0.5/3). These priors reflect weak confidence that: 1) the fundamental 0 value in Kenya would be similar to that seen in the early Chinese epidemic, 2) the most apriori likely value of was 1, thus recovering the standard homogeneous SEIR model, and 3) that the daily detected PCRpositive swab data would be moderately overdispersed compared to a Poisson distribution. The prior distributions for and the initial numbers of infected (0) + (0) in counties with weak priors differed by county: • Nairobi.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint The choice of weak priors for the counties with serology data was informed by estimates of urban density, and reflect our belief that the epidemic in Kenya was seeded by international travel to both Nairobi and Mombasa international airports and that there was apriori likely to be a higher number of infected individuals in Nairobi and Mombasa on February 21st. Our prior belief about mean detection rate was informed by openly available testing rate data for Kenya (31) (Fig. S3), which was likely to be mainly concentrated around Nairobi, available data on testing rate in Mombasa (Fig. S2) and a weak belief that detection elsewhere would be about half as effective as in Mombasa.
The prior distributions for all parameters for counties without serology data were much stronger, reflecting that in the absence of significant amounts of data, we expected these counties to have similar parameters to other Kenyan counties. To create the strong priors we pooled the posterior parameter draws of exemplar counties, and then projected this multivariate empirical distribution onto a collection of independent marginal distributions by applying maximum likelihood to each set of parameters (each parameter was assumed to have a Gamma marginal distribution except , which was assumed to have a Beta marginal distribution. Therefore, inference in the counties without serology penalised against drawing parameters that were outside the univariate credible intervals of their exemplar county. The exemplars used were: • Isiolo and Nakuru counties. Exemplar counties were Kajiado and Machakos.
• Baringo, Bomet, Elgeyo Marakwet, Embu, Kakamega, Kirinyaga, Laikipia, Mandera, Meru, Murang'a, Nandi, Nyandarua, Samburu and Tharaka Nithi, West Pokot counties. Exemplar counties were Homa bay, Tana River, Turkana, and Wajir counties. The choice of exemplar counties was based on Kajiado and Machakos containing a mixture of urban and rural dwelling broadly similar to Isiolo and Nakuru counties. Whereas, the other Kenyan counties lacking serology data were predominantly rural. Therefore, we used a mixture of four rural exemplar counties distributed over the country.
for each county using the NUTS-HMC sampler implemented by the Julia language package dynamicHMC.jl. Solving the likelihood function for a proposed value of involved solving the ODE system (2), we used the highly performant DifferentialEquations.jl package for ODE solutions (33). The HMC method required a log-likelihood gradient, , which, for our use-case of a small ODE system with a low number of parameters, was most efficiently supplied by forward-mode automatic differentiation (34) implemented by the package ForwardDiff.jl.
The MCMC chain converged for each county (all MCMC chains and MCMC diagnostics can be accessed through the linked open code repository). The posterior mean (and 95% Cis) for each parameter, as well as date of maximum infection rate and estimated county-specific IFR, can be found in supplementary Data S1. West Pokot county (1 positive PCR-confirmed test and 2/17 positive serology samples) was an outlier in the analysis in that the inference converged on a posterior mean estimate of 0 which was less than one.
Nowcasting and Forecasting for Kenyan counties . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint The parameter draws from the posterior distribution defined the uncertainty in our model nowcasts and forecasts for each county, since the underlying transmission model was deterministic. Therefore, posterior distributions for epidemic quantities were created by simulating the epidemic for each ( ) , = 1,2, . . . ,10 4 posterior draws, for example the draws from the distribution of day of peak infection rate were, The central estimate is the posterior mean, [ | ] = 1 10 4 ∑ ( ) , and the 95% credible intervals were the 2.5% and 97.5% empirical quantiles of the draws from the day of peak. This approach was used through-out the main document. It should be noted that for the daily positive PCR tests we showed both the posterior mean and CIs for the mean daily detection rate of new PCR-postive swab tests, [ | ], and the posterior prediction interval, [( + ) | ] (see equation (7)). The two quantities have the same posterior average, but ( + ) | has substantially higher variance since it includes both uncertainty in the parameter estimates and the fundamental variance in the observed daily detection of positive swab tests.

Inferring a crude infection fatality ratio
The commonly used infection fatality ratio (IFR) by age estimates from Verity et al (19), weighted by the Kenyan population distribution given by the 2019 census, implied a basic IFR prediction in Kenya of = 0.264%; that is 264 deaths per 10,000 infections. This assumed an uniform attack rate across age-groups in Kenya.
We used the posterior predictions of the underlying daily infections in Kenya counties to infer a crude infection fatality ratio ( ) for each Kenyan county. The lag between infection and death, for those infected individuals who die, was defined as the convolution between three time duration distributions: 1. The duration of time between infection and symptoms (days), which we assumed was distributed ( � = 1.644, � = 0.363) (27). 2. The duration of time between initial symptoms and severe symptoms (days), sufficient to seek hospitalisation, which we assumed was distributed (1,5) (35). 3. The duration of time between severe symptoms and death estimated from UK hospital data (36). This was an empirical distribution with mass function . We discretized the first two distributions to give probability mass functions for the number of days between infection and symptoms, and for the number of days between symptom onset and severe symptom onset. The probability mass function for the (discrete) number of days between infection and death, for those who died, , was given as a discrete convolution over these probability mass functions: ( ) = * * ( ) for the probability that death occurs days after infection.
The most likely lag between infection and death was 14 days, however, the distribution was fairly heavy-tailed with mean lag between infection and death 19 days (Fig. S4).
We assumed that the number of deaths observed each day were Poisson distributed, and accounted for the lag between infection and death, . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. .
The conditionality on is given to emphasise that the number of infections per day depended on the unknown parameters for the county. Using | ∼ (̂= 1/ ) as a weak prior for each of the county specific estimates of the crude IFR observed in Kenya, conditional on a realization of , we found that the posterior mean estimator took a simple form (see Posterior distribution of crude infection fatality ratio in supplementary text below): Similar to equation (11) Model refinement to account for increased test rate at the capital The publicly available ourworldindata.org coronavirus dataset (31) showed that the daily number of tests in Kenya increased from 29th March at an approximate rate of 1.6% per day relative to the overall mean tests per day (mean 2482 tests per day in Kenya between 29th March -3rd August; Fig. S3). This increase in testing was not reflected in increased testing at KWTRP testing laboratory (Fig. S2), therefore, we hypothesised that the increased rate of testing in Kenya was being driven by increased testing in Nairobi and the surrounding counties of Kajiado, Kiambu and Machakos. To test this hypothesis we considered an alternate model for detection where is calendar day dependent: The baseline mean detection rate remained the target of inference. The same period for inference (12th April -6th August) was used.
Incorporating the increased testing rate into the inference model improved the model information criteria relative to the basic hypothesis that mean detection rate had been constant since 12th April. This was observed using two different model information scores: the popular in-sample Bayesian model fit score, Deviance information criterion (DIC; (16)) defined as , and the out-of-sample log predictive density (LPD; (16)) for deaths. The LPD for a model was defined as, The LPD measure is the sum log probability of observing the death data for the county, ( | ) was the probability mass function for a Poisson distribution with mean , the notation ( ) emphasises that the number of infections on day m depends on the k th draw of the parameters. Although the IFR was optimised to fit the death data, we also wish to emphasise that the deaths were not used to infer other model parameters; this justified our description of LPD as an out-of-sample metric. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. The average value for the posterior P-value over the counties was = 0.66 (minimum-maximum spread: 0.12-0.85, Data S1), therefore, the model was typically more accurate at predicting the daily number of deaths actually observed than replicated data. This might be caused by deaths occurring with lower variance than predicted by a Poisson distribution.

Notation for distributions used in this study
In this study, we have used a number of parameter symbols that are also the most commonly used symbols for various common parametric distributions. Moreover, these parametric distributions are used in the underlying analysis frequently with their distribution parameters defined as functions of underlying transmission model states. To reduce misunderstanding reserve symbols with "hats" as referring to the parameters of a parametric distribution and use "=" to refer to the value of the parameter. Find below the choice of parametrization for the parametric distributions used in the study: • Exponential distribution.
(̂= ), with mean . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint

Posterior distribution of crude infection fatality ratio
Here we give details of the posterior distribution of . It is well known that the exponential distribution is a conjugate prior to the mean of a Poisson distribution; that is that given a model where data ∼ (̂= )for = 1, . . . , , and a prior ∼ (� = 1/ 0 ), then the posterior distribution of is gamma distributed with an analytical solution: | ∼ ( � = 1 + ∑ , � = 1/( 0 + )).
We demonstrate that the posterior crude infection fatality ratio, conditional on some fixed value of the parameters , | ( + ) 1 , . . . , ( + ) , , and with the prior distribution | ∼ (̂= 1/ 0 ) is also gamma distributed. This holds despite the mean number of deaths changing daily, and the posterior distribution only depends on the total number of observed deaths and the cumulative number of time delayed infections. First, we define the time delayed infections on each day using the lag distribution [equation (12)], � = ∑ ( − ) < . We define the total time delayed infections, ̃= ∑̃, and the total deaths, + = ∑ ( + ) .
Then, after cancelling terms, the posterior distribution of | given the daily deaths is, By identifying the denominator as the ( + ) ℎ moment of an exponential distribution we arrive at, The right hand side of this expression is the density function of a ( � = + + 1, distribution, that only depends on the total deaths and time-delayed infections. Therefore, the posterior mean estimator for the crude infection fatality ratio conditional on the death data and a fixed value of is: is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020.   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020.   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint Table S1. Parameters  Probability that an infected individual would be detectably seropositive on day after infection, ( ) ( ) is linearly increasing over 26 days to saturate at 82.5% sensitivity, based on report delay in seroconversion (28) and maximum sensitivity of serological assay (11). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020.   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint All datafiles described below can be found in the opendatacsvs/ folder on the github repository.
Data S1. (separate file) Inferred parameters and posterior predictive P-values for each Kenyan county. The posterior mean and 95% credible intervals for each model parameter (both transmission and observation models), with posterior mean and 95% credible intervals for the peak day of infection rate and county-specific infection fatality ratio. Final column is the proportion of synthetic replicated death time series with greater log-predictive density score than the observed death time series for that county (the posterior predictive P-value for the model). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20186817 doi: medRxiv preprint