Lessons learned and lessons missed: impact of the coronavirus disease 2019 (COVID-19) pandemic on all-cause mortality in 40 industrialised countries and US states prior to mass vaccination

Background: Industrialised countries had varied responses to the COVID-19 pandemic, which may lead to different death tolls from COVID-19 and other diseases. Methods: We applied an ensemble of 16 Bayesian probabilistic models to vital statistics data to estimate the number of weekly deaths if the pandemic had not occurred for 40 industrialised countries and US states from mid-February 2020 through mid-February 2021. We subtracted these estimates from the actual number of deaths to calculate the impacts of the pandemic on all-cause mortality. Results: Over this year, there were 1,410,300 (95% credible interval 1,267,600-1,579,200) excess deaths in these countries, equivalent to a 15% (14-17) increase, and 141 (127-158) additional deaths per 100,000 people. In Iceland, Australia and New Zealand, mortality was lower than would be expected in the absence of the pandemic, while South Korea and Norway experienced no detectable change. The USA, Czechia, Slovakia and Poland experienced >20% higher mortality. Within the USA, Hawaii experienced no detectable change in mortality and Maine a 5% increase, contrasting with New Jersey, Arizona, Mississippi, Texas, California, Louisiana and New York which experienced >25% higher mortality. Mid-February to the end of May 2020 accounted for over half of excess deaths in Scotland, Spain, England and Wales, Canada, Sweden, Belgium, the Netherlands and Cyprus, whereas mid-September 2020 to mid-February 2021 accounted for >90% of excess deaths in Bulgaria, Croatia, Czechia, Hungary, Latvia, Montenegro, Poland, Slovakia and Slovenia. In USA, excess deaths in the northeast were driven mainly by the first wave, in southern and southwestern states by the summer wave, and in the northern plains by the post-September period. Conclusions: Prior to widespread vaccine-acquired immunity, minimising the overall death toll of the pandemic requires policies and non-pharmaceutical interventions that delay and reduce infections, effective treatments for infected patients, and mechanisms to continue routine health care.


Introduction
Many industrialised countries experienced a rise in all-cause mortality in the first wave of the coronavirus disease 2019 (COVID-19) pandemic, while others avoided any excess deaths 1 . These excess deaths were due to infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), delays and disruptions in the provision and use of healthcare for other diseases, loss of jobs and income, disruptions of social networks and support, and changes in nutrition, drug and alcohol use, transportation, crime, and violence 2,3 .
Decline in infections following initial lockdowns and other restrictions, and advances in knowledge about the SARS-CoV-2 transmission and infection, presented a window of opportunity for countries to implement pandemic control measures and strengthen health and social care provision that would minimise the impacts of subsequent waves [4][5][6][7][8] . Comparative analysis of excess deaths prior to mass vaccination against COVID-19 helps understand how effectively these measures were implemented and how resilient the health and social care system was in each country. We quantified the weekly mortality impacts of the first year of the COVID-19 pandemic, from mid-February 2020 to mid-February 2021, in 40 industrialised countries, listed below. We used this period because mortality due to the pandemic was negligible before mid-February 2020 1 , and vaccination rates against SARS-CoV-2 were still relatively low before mid-February 2021 in these countries (no more than 4% of the population had received both doses in any of these countries, as per Our World in Data 9 ). After mid-February 2021, the effect of vaccines on mortality was expected to appear in some countries, which should be subject to a distinct analysis.

Data sources
We included industrialised countries with complete or near-complete registration of deaths in our analysis if: • Their total population in 2020 was more than 100,000. We excluded countries (e.g., Liechtenstein) with data but with smaller populations because, in many weeks, the number of deaths would be small or zero. This would, in turn, lead to either large uncertainty that would make it hard to differentiate between those places with and without an effect or unstable estimates because the model is fitted to many weeks with zero deaths.
• We could access up-to-date weekly data on allcause mortality divided by age group and/or sex that extended through February 2021.
• The time series of data went back at least to the beginning of 2016 so that model parameters could be reliably estimated. For countries with longer time series, we used data starting in 2010.
The 40 countries in our analysis were divided into five geographical regions: the Pacific (Australia, New Zealand, South Korea), the Americas (Canada, Chile, the USA), Central and Eastern Europe (Austria, Bulgaria, Croatia, Czechia, Estonia, Hungary, Latvia, Lithuania, Montenegro, Poland, Romania, Serbia, Slovakia, Slovenia), Southwestern Europe (Cyprus, France, Greece, Italy, Malta, Portugal, Spain), Northwestern Europe (Belgium, England and Wales, Germany, Luxembourg, the Netherlands, Northern Ireland, Scotland, Switzerland) and Nordic (Denmark, Finland, Iceland, Norway, Sweden). In addition to national estimates, we separately estimated excess deaths for all 50 US states and the District of Columbia. Some US states are larger than most countries included in our analysis (e.g., California's population of ~40 million is larger than those of 33 of the countries in the analysis), and the extent and temporal dynamics of the pandemic were heterogeneous across states due to their relative autonomy in policy formulation and implementation.
The sources of population and mortality data are provided in Table 1. We calculated weekly population through interpolation of yearly population, consistent with the approach taken by national statistical offices for intra-annual population calculation 10 . Population for 2020 and 2021, where not available, was obtained through linear extrapolation from the last five years. We obtained data on temperature from ERA5 11 , which uses data from global in situ and satellite measurements to generate a worldwide meteorological dataset, with full space and time coverage over our analysis period. We used gridded temperature estimates measured four times daily at a resolution of 30 km to generate weekly temperatures for each first-level administrative region, and gridded population data to generate population estimates by first-level administrative region in each country. We weighted weekly temperature by population of each first-level administrative region to create national level weekly temperature summaries.

Statistical methods
We used a probabilistic model averaging approach to estimate what deaths were expected to be between mid-February 2020 and mid-February 2021 had the pandemic not occurred, and compared these estimates with actual deaths from all causes in each country. The analytical method was designed to enhance comparison across countries and over time, and account for medium-long-term secular trends in mortality, the potential dependency of death rates in each week on those in preceding week(s) and in each year on those in preceding year(s), and factors that affect mortality including seasonality, temperature and public holidays.

Amendments from Version 1
Based on the helpful comments from reviewers, we have added results on US states in the abstract, reorganised the text, included additional citations, and done a sensitivity analysis on how different models are weighted. We have also updated the section Comparison with other estimates to include the most recently published results from other sources.
Any further responses from the reviewers can be found at the end of the article Table 1. Sources of data on deaths and population.

Start of time series
Sex-specific analysis (see Methods for details)

Analysis age groups (see Methods for details)
Australia ABS 1  The total mortality impact of the COVID-19 pandemic is the difference between the observed number of deaths from all causes of death and the number of deaths had the pandemic not occurred, which is not directly measurable. The most common approach to calculating the number of deaths had the pandemic not occurred has been to use the average number of deaths over previous years, e.g., the most recent five years, for the corresponding week or month when the comparison is made. This approach however does not take into account long-and short-term trends in mortality or time-varying factors like temperature that are largely external to the pandemic, but also affect death rates.
We developed an ensemble of 16 Bayesian mortality projection models that each make an estimate of weekly death rates that would have been expected if the COVID-19 pandemic had not occurred 12 . We used multiple models because there is inherent uncertainty in the choice of model that best predicts death rates in the absence of pandemic. These models were formulated to incorporate features of weekly death rates, and how they behave in the short-term (week to week) and medium-term (year to year), as follows: • First, death rates may have a medium-to-long-term trend 13 that would lead to a lower or higher mortality in 2020-2021 compared to earlier years. Therefore, all models included a linear trend term over weekly death rates.
• Second, death rates have a seasonal pattern [14][15][16][17] . We included weekly random intercepts for each week of the year. To account for the fact that seasonal patterns "repeat" (i.e., late December and early January are seasonally similar) we used a seasonal structure 18,19 for the random intercepts. The seasonal structure allows the magnitude of the random intercepts to vary over time, and implicitly incorporates time-varying factors such as annual fluctuations in flu season.
• Third, death rates in each week may be related to rates in preceding week(s), due to short-term phenomena such as severity of the flu season. We formulated four sets of models to account for this relationship. The weekly random intercepts in these models had a first, second, fourth or eighth order autoregressive structure 18,19 . The higher-order autoregressive models allow death rates in any week to be informed by those in a progressively larger number of preceding weeks. Further, trends not picked up by the linear or seasonal terms would be captured by these autoregressive terms.
• Fourth and additionally, mortality in one year may depend on mortality in the previous year, in a different way for each month, because phenomena such as seasonal flu may lead to longer-term dependencies in mortality.
To allow for this possibility, we used two sets of models, with and without a first order autoregressive term over years for each month.
• Fifth, beyond having a seasonal pattern, death rates depend on temperature, and specifically on whether temperature is higher or lower than its long-term norm during a particular time of year [20][21][22][23][24][25] . The effect of temperature on mortality varies throughout the year, and may be in opposite directions for different times of year. We used two sets of models, one without temperature and one with a weekly term for temperature anomaly, defined as deviation of weekly temperature from the local average weekly temperature over the entire analysis period.
• Finally, death rates may be different around major holidays such as Christmas and New Year either because of changes in human activities and behaviour or, for the countries whose data are registration based, because of delays in registration. We included effects (as fixed intercepts) for the weeks containing Christmas and New Year in all countries. For England and Wales, Scotland and Northern Ireland, we also included effects for the week containing other public holidays, because reported death rates in weeks that contain a holiday were different from other weeks. This term was tested but not included for other countries because the effect was negligible.
These choices led to an ensemble of 16 Bayesian models (2 yearly autoregressive options × 4 weekly autoregressive options × 2 temperature anomaly options). The ensemble of models is shown in The term α 0 denotes the overall intercept and α holiday(week) is the holiday intercept, applied to weeks with a holiday. For example, if a week includes the 25 th of December then α holiday(week) = α Christmas . For weeks that did not contain a holiday, this term did not appear in the above expression. All intercepts were assigned (0,1000) priors. The term β ⋅ week represents the linear time trend. The coefficient β was also assigned a (0,1000) prior.
The models used different orders (first, second, fourth or eighth) of the autoregressive term ζ ( ) week i with the superscript i denoting the order for weekly mortality patterns. The firstorder autoregressive term is defined as ζ (1) week ~ ( where the parameter φ lies between -1 and 1 and captures the degree of association between the number of deaths in each week and the preceding week. Hyperpriors are placed on the parameters ϰ 1 = log ((1-φ 2 )/ 2 σ ζ ) and ϰ 2 = log ((1+φ)/(1-φ)) which were assigned logGamma(0.001,0.001) and (0,1) distributions respectively. Similarly, an i th order autoregressive term is given by with -1 < ϕ j < 1. The parametrisation of these models was based on the partial autocorrelation function of the sequence ϕ j

The term
month year η is an autoregressive term of order 1 over years and independent across months, indexed to the month and year to which each particular week belongs. For each month, the autoregressive prior for was the same as that for ζ (1) week described above. As described above, this term appeared in half of our models.
The term θ week captures seasonality in mortality trends with a period of 52 weeks. The sums of every 52 consecutive terms θ week + θ week+1 + ... + θ week+51 were modelled as independent Gaussian with zero mean and variance 2 θ σ . We used a logGamma(0.001,0.001) prior on the log precision log(1/ 2 θ σ ). Each week is assigned an index between 1 and 52 depending on which week of the current year it is (the incomplete week 53 is mapped to either index 1 or 52 depending on whether it has greater overlap with week 52 of the current year or week 1 of the following year).
The effect of temperature anomaly on death rates is captured by the two terms γ and ν week of year . The term γ⋅temperature anomaly week is the overall association between (log-transformed) death rates and temperature anomaly in a week. The term ν week of year ⋅temperature anomaly week captures deviations from the overall association for each week of the year. It consists of 52 terms with an independent and identically distributed prior defined via ν week of year ~ (0, Finally, the term ε week is a zero-mean term that accounts for additional variability. It is assigned an independent and identically distributed prior ε week ~ (0,   Table 2 shows the terms included in each of the 16 models in the ensemble.
We used data on weekly deaths from the start of time series through mid-February 2020 to estimate the parameters of each model, which were then used to predict death rates for the subsequent 52 weeks as estimates of the counterfactual death rates if the pandemic had not occurred. These were then compared to reported deaths to calculate excess mortality due to the pandemic. For the projection period, we used recorded temperature so that our projections take into consideration actual temperature in 2020-2021. This choice of training and prediction periods assumes that the number of deaths that are directly or indirectly related to the COVID-19 pandemic was negligible through mid-February 2020 in these countries 1 , and separates the training data from subsequent weeks when impacts may have appeared.
All models were fitted using integrated nested Laplace approximation (INLA) 27 , implemented in the R-INLA software (version 20.03). We used a model averaging approach to combine the predictions from the 16 models in the ensemble 28,29 . Specifically, we took 2,000 draws from the posterior distribution of predicted deaths under each of the 16 models, and pooled the 32,000 draws to obtain the posterior distribution of deaths if the COVID-19 pandemic had not taken place. This approach generates a distribution of estimates that has equal samples from each model in the ensemble, and hence incorporates both the week θ week η month year (γ + v week of year ) ·temperature anomaly week uncertainty of estimates from each model and the uncertainty in the choice of model. The reported credible intervals represent the 2.5 th and 97.5 th percentiles of the resultant posterior distribution of the draws from the entire ensemble. We report the number of excess deaths, excess deaths per 100,000 people, and relative (percent) increase in deaths together with their corresponding 95% credible intervals. For the purpose of reporting, we rounded results on number of deaths that are 1,000 or more to the nearest hundred to avoid giving a false sense of precision in the presence of uncertainty; results less than 1,000 were rounded to the nearest ten. We also report the posterior probability that an estimated increase (or decrease) in deaths corresponds to a true increase (or decrease). Posterior probability represents the inherent uncertainty in how many deaths would have occurred in the absence of the pandemic. In a country and week in which the actual number of deaths is the same as the posterior median of the number expected in a no-pandemic counterfactual, an increase in deaths is statistically indistinguishable from a decrease; in such a situation, there is a 50% posterior probability of an increase and a 50% posterior probability of a decrease. Where the entire posterior distribution of the number of deaths expected without the pandemic is smaller than the actual number of deaths, there is a ~100% posterior probability of an increase and a ~0% posterior probability of a decrease and vice versa. For most countries and weeks, the posterior distribution of the number of deaths expected without the pandemic covers the observed number, but there is asymmetry in terms of whether much of the distribution is smaller or larger than the observed number. In such cases, there would be uneven posterior probabilities of an increase versus decrease in deaths, with the two summing to 100% (for example, 80% and 20%).
Posterior probabilities more distant from 50%, toward either 0% or 100%, indicate more certainty. We also evaluated the sensitivity of our results to how the different models are weighted. Specifically, in the sensitivity analysis, the number of draws from each model was inversely proportional to the absolute error of prediction in the validation analyses described below. The results of the sensitivity analysis were virtually identical to those with equal draws, with weekly median excess deaths estimates differing by up to 2.4% for individual countries, and by 0.1% when averaged across all countries and weeks.
We did all analyses separately by sex and age group (0-44 years, 45-64 years, 65+ years) for countries with 2020 population of at least two million, where age-and sex-specific data were available (Table 1). For countries with 2020 population less than 2 million, we did our analyses for two age groups (0-64 years and 65+ years) because, in many weeks, the number of deaths in the age group 0-44 was small or zero, which would have led to either large uncertainty or unstable estimates. For the same reason, for countries with population under 500,000 (Iceland and Malta), we did our analyses for both sexes and all age groups combined. Models were also run for all ages and both sexes combined; the posterior medians of resultant estimates were nearly identical to the sum of the age-sex-specific ones, with a mean relative difference of 0.2%, ranging from -1.7% to 1.1%. For this reason, in figures and tables that are for all ages and both sexes, we report results from the combined model so that the uncertainty of the estimates is correctly reported.
We report results for the entire year, as well as for three non-overlapping periods: the first wave of the pandemic (from mid-February 2020 through end of May), the (northern hemisphere) summer period (from beginning of June to mid-September 2020) and subsequent wave(s) (from mid-September 2020, when schools normally open in the northern hemisphere, to mid-February 2021).

Validation of no-pandemic counterfactual weekly deaths
We tested how well our model ensemble estimates the number of deaths expected had the pandemic not occurred by withholding data for 52 weeks starting from mid-February (i.e., the same projection period as done for 2020-2021) for an earlier year and using the preceding time series of data to train the models.
In other words, we created a situation akin to 2020-2021 for an earlier year. We then projected death rates for the weeks with withheld data, and evaluated how well the model ensemble projections reproduced the known-but-withheld death rates. We repeated this for three different periods: 2017-2018 (i.e., train model using data from January 2010 to mid-February 2017 and test for the subsequent 52 weeks), 2018-2019 (i.e., train model using data from January 2010 to mid-February 2018 and test for the subsequent 52 weeks), and 2019-2020 (i.e., train model using data from January 2010 to mid-February 2019 and test for the subsequent 52 weeks). We performed these tests for each country using data for both sexes and all ages. We report the projection error (which measures systematic bias) and absolute projection error (which measures any deviation from the data). Additionally, we report coverage of the projection uncertainty; if projected death rates and their uncertainties are well estimated, the estimated 95% credible intervals should cover 95% of the withheld data.
The results of model validation (Table 3) show that the estimates of how many deaths would be expected had the pandemic not occurred from the Bayesian model ensemble were unbiased, with mean relative projection errors of 1.5% (between 0.5% and 2.2% in different years). The mean relative absolute error was between 8.0% and 8.7% in different years. 95% coverage, which measures how well the posterior distributions of projected deaths coincide with withheld data was 96% for all years, which shows that the posterior distribution is well estimated.

Excess mortality between mid-February 2020 and mid-February 2021
Taken over the entire year, both sexes and all ages, an estimated 1,410,300 (95% credible interval 1,267,600-1,579,200) more people died in these 40 countries than would have been expected had the pandemic not taken place. This is equivalent to 141 (127-158) additional deaths per 100,000 people and a 15% (14-17) increase in deaths over this period in all of these countries combined. The number of deaths assigned to COVID-19 in these countries over the same period was 1,256,861, which is 89% of the excess all-cause death toll ( Table 4). The number of excess deaths were largest in the USA (623,100; 521,200-750,700), followed by Italy (118,800; 88,500-149,300) and England and Wales (102,100; 75,300-128,600) ( Figure 1 and Table 4). Within the USA, California (71,800; 64,100-79,500) and Texas (57,400; 48,100-67,200) experienced the largest number of excess deaths, about the same as excess deaths in Spain and France, respectively ( Figure 2).
In Iceland, Australia and New Zealand, mortality was 3-6% lower over this period than what would be expected if the pandemic had not occurred, with posterior probabilities of the estimated decrease being a true decrease ranging 82-94% ( Figure 3). South Korea and Norway experienced no detectable change in mortality (54% and 74% probability of an increase respectively, with posterior median estimated increases <2%), and Finland, Greece, Cyprus and Denmark experienced increases of 2-5% ( Figure 3A), with posterior probabilities that these changes represent an increase in death ranging from 84% to 97%. At the other extreme, the populations of the USA, Czechia, Slovakia and Poland experienced at least 20% higher mortality over these 52 weeks than they would have had the pandemic not occurred; the increase was between 15% and 20% in England and Wales, Italy, Portugal, Spain, Romania, Slovenia, Lithuania, Bulgaria, Chile, Belgium and Switzerland; the posterior probabilities that these countries experienced an increase in deaths were >99%. Because baseline mortality (i.e., death rates expected without the pandemic) varied across countries, the ordering of countries in terms of excess deaths per 100,000 people ( Figure 3B) differed from the ranking of percent increase. Bulgaria, Romania, Lithuania, Czechia, Poland, Slovakia and Portugal experienced more than 200 excess deaths per 100,000 people and Italy, USA, England and Wales, Slovenia, Spain, Croatia, Belgium and Montenegro between 150 and 200, all with posterior probabilities of an increase in deaths >99%. There was as much variation in excess mortality across US states as across the 40 countries together, with Hawaii having experienced the same level of mortality as would have been expected without the pandemic, Maine a 5% increase, and, at the other extreme, New Jersey, Arizona, Mississippi, Texas, California, Louisiana and New York at least 25% higher mortality over this year ( Figure 4).

Dynamics of excess mortality
There was substantial heterogeneity across countries in terms of the patterns and dynamics of excess mortality over time ( Figure 5 and Figure 6). Some countries in Central and Eastern Europe -Bulgaria, Lithuania, Poland, Romania, Serbia and Montenegro -had no or little excess mortality in the first wave of the pandemic (mid-February 2020 to end of May 2020), but experienced between 5% and 13% increase in mortality during the (northern hemisphere) summer (June 2020 to mid-September 2020; Figure 7A). In contrast, some countries with medium or high levels of excess mortality in the first wave returned to death rates in the summer that were about the same as the no-pandemic baseline (England and Wales, Belgium, Scotland, Northern Ireland, Sweden, Netherlands, France, Switzerland, Luxembourg and Cyprus) or only slightly higher than this baseline (Canada, Italy and Spain). Portugal and the USA experienced a similar increase in mortality over the summer -10% (1-21) and 17% (12-24), respectively -to what they had in the first wave. During the same period, Australia, New Zealand and Iceland had a mortality deficit compared to levels that would have been expected without a pandemic. In Australia and New Zealand, which were in winter season in this period, this reduction has been attributed to fewer deaths from seasonal flu due to reduced contact among people 30-33 . Chile, the other southern hemisphere country in our analysis, had 12% (8-17) higher mortality in the first wave, followed by an even larger increase of 21% (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26) during the (southern hemisphere) winter period.
The subsequent wave(s) of the pandemic (mid-September 2020 to mid-February 2021) saw yet more changes in excess deaths patterns across countries. While New Zealand, Australia, Iceland, Finland, Norway, Cyprus and South Korea remained resilient to the rise in mortality (i.e., no or <2% increase in mortality compared to the no-pandemic baseline), many countries in Europe, especially in Central Europe, experienced a rise in mortality compared to the no-pandemic baseline: by >40% in Slovakia, Czechia and Poland, and by 20-40% in England and Wales, Italy, Austria, Hungary, Montenegro, Croatia, Portugal, Switzerland, Romania, Lithuania, Bulgaria and Slovenia, all with posterior probabilities of positive excess mortality greater than 99%. Excess deaths also reappeared in other countries that had experienced a medium to large toll in the first wave including Belgium, Spain, Scotland, Northern Ireland, Sweden, Canada, France and the Netherlands -some at the same level (France and Northern Ireland) and others at lower levels (Canada, Scotland, Spain, Belgium, Sweden) than the first wave but all lasting for many weeks during this period. The USA had an even larger increase in mortality compared to the no-pandemic baseline after mid-September than it had in the first wave and summer months, making it the only country to maintain a steady burden of excess mortality. There were nonetheless variations in excess deaths over time across different states in the USA (Figure 8), reflecting the autonomy that states, and their governors and legislatures, had with regard to key responses 6,34 .
As a result of these heterogeneous dynamics, there was virtually no correlation between excess mortality in the first wave and the summer period among countries (correlation coefficient of percent increase in the two periods = 0.03), and weakly negative correlation between excess mortality in the first wave and mid-September and later (correlation coefficient = -0.15). This was translated to a variable distribution of excess mortality burden across the three periods ( Figure 7B)  Age and sex-distribution of excess mortality Countries differed in how excess deaths were distributed across age groups (Figure 9). In Denmark, Sweden, France, Switzerland, Belgium and Slovenia >95% of all excess deaths were in those aged 65 years and older. On the other hand, Estonia, Finland (which had the smallest detectable excess mortality of any country), USA, Canada, Lithuania and Chile had the largest share of excess deaths in people aged younger than 65 years. Of the 35 countries with a detectable increase in mortality (defined as median estimated increase of >2%) and sufficient data to analyse by age group, Canada experienced the largest share of excess deaths in those aged younger than 45 years (16% of all excess deaths), followed by the USA (5%) and Finland (5%;  noting that excess death rates in Finland, although detectable, were lower than in other countries). The high mortality toll in younger Canadians may have been due to COVID-19 death at home 35 and an increase in deaths from drug overdose 36 . This division arises largely from how much specific segments of the society, such as workers or care home residents, were exposed to infection. Percent increase in mortality was similar between men and women in most countries ( Figure 10). There were nonetheless some exceptions, e.g. in Chile, Montenegro, Serbia and the Netherlands deaths increased by a larger percent <90% probability of increase >90% probability of increase of up to 50% >90% probability of increase of 50%−100% >90% probability of increase of 100% or higher in men (12%-16%) than women (6%-9%); in contrast, in Slovenia, women (15%) experienced a slightly larger percent increase than men (14%).

Strengths and limitations
The main strength of our work is the development and application of a method to systematically and consistently use time series data from previous years to estimate how many deaths would be expected in the absence of pandemic through early 2021. The models incorporated important features of mortality, including seasonality of death rates, how mortality in one week or year may depend on previous week(s) and year(s), and the seasonally-variable role of temperature. To our knowledge, our models are the only ones that formally incorporated In some countries, there was a reduction in mortality relative to a no-pandemic baseline in some weeks, shown as negative numbers. The country's total excess death toll is the net effect of these reductions and increases in other periods, with all bars adding to 100%. See Figure 6 for weekly percent increase in mortality. the role of temperature on weekly mortality, and accounted for dependency of mortality in one week on preceding week(s) and in one year on preceding year(s). This methodology allows more robust estimation of the total impacts of the pandemic, especially as more time elapses since the beginning of the pandemic. It also enables comparisons of excess deaths across countries on a real-time basis. By modelling death rates, rather than simply the number of deaths as is done in most other analyses, we account for changes in population size and age structure. We used an ensemble of models which typically leads to more robust projections and better accounts for both the uncertainty associated with each individual model and that of model choice. As a result, our approach gives a more complete picture of the inherent uncertainty in how many excess deaths the pandemic has caused than approaches that are not probabilistic or use a single model.
A limitation of our work is that we did not have data on underlying cause of death. Having a breakdown of deaths by underlying cause will help develop cause-specific models and understand which causes have exceeded or fallen below the levels expected. Nor did we have data on total mortality by individual or community sociodemographic status to understand inequalities in the impacts of the pandemic beyond deaths assigned to COVID-19 as the underlying cause of death. Where data have been analysed for population subgroups, excess mortality tends to be higher in marginalised individuals and communities 37-39 . More detailed data will allow more granular analysis of the impacts of the pandemic, which can in turn inform resource allocation and a more targeted approach to mitigating both the direct and indirect effects of the COVID-19 pandemic.

Comparison with other estimates
The Financial Times and The Economist's excess deaths tracker report the number of excess deaths for various countries based on comparisons of deaths in 2020 and 2021 with 2015-2019 averages. This approach does not account for general trends in mortality nor for factors like temperature that affect mortality and vary from year to year. The Economist has also recently published a set of excess deaths estimates using data from the Human Mortality Database and the World Mortality Dataset 40 , and an ensemble of gradient boosted decision trees. Countries with small, medium and large number of excess deaths are largely consistent between our analysis and these sources. There are nonetheless some differences. For example, we estimated ~76,100 excess deaths for Spain, compared to ~81,300 by Financial Times and ~86,300 and ~89,600 respectively by the two models from The Economist. Our median excess death estimate for Denmark was about twice as large as that of Financial Times, and those for Greece and Serbia about one third smaller. Similarly, The Economist's two models predicted a mortality deficit of ~2,900 and ~3,200 deaths, respectively, for South Korea, while we estimated no detectable change in mortality. Nonetheless, the 95% credible intervals of our estimates contained those of Financial Times and The Economist.
The Institute for Health Metrics and Evaluation has released numbers of excess deaths by fitting a model for seasonality (the details of the seasonal model are not currently available) and projecting the residuals for pre-2020 using a spline model. The models do not account for temperature, as ours do, but hot summer weeks with particularly large deaths were excluded. Several sources have commented that the estimates are likely an overestimate, especially in their earlier version 41-43 . For example, the Institute estimated ~156,800 excess deaths for the UK for the same period as our analysis, compared to ~111,500 by us and ~118,500 by the national statistical offices for England, Wales, Scotland and Northern Ireland. They estimated ~573,500 excess deaths for the USA (revised downwards from an estimate of ~760,000 in September 2021) compared to ~623,100 by us after accounting for temperature. They estimated ~33,900 deaths for Canada, compared to ~19,800 by us and ~20,900 by Statistics Canada, and ~31,100 excess deaths for Portugal, compared to ~20,700 by us. EuroMoMo fits a sinusoidal seasonal model to death counts but does not report country-specific excess deaths and hence could not be compared with our results.
The UK Office for National Statistics (ONS) calculated a number of age-standardised measures of excess mortality for 15 European countries based on comparisons of deaths in 2020 with 2015-2019 averages 44 , as did Eurostat for the monthly number of deaths. These analyses did not account for temperature and holidays, and the Eurostat analysis did not account for changes in population. The ONS concluded that Norway, Finland, Denmark and Latvia, Cyprus and Estonia had a mortality deficit whereas our estimates indicated no detectable excess mortality for Norway, and increases from 2 to 8% for the other countries. Differences between our results and those of the ONS may be partly related to the fact that ONS analysis also included the pre-pandemic months and did not account for interannual variations in temperature. For example, in the northern hemisphere, the first and last three months of 2020 were on average warmer than the average of the past five years but weeks 13-40 were on average slightly cooler.

Discussion
The magnitude of excess mortality in the first wave of the COVID-19 pandemic was related to two factors, as seen in quantitative and qualitative studies on the response to, and impacts of, the pandemic's early waves 1,6,8,[45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61] . First, how well countries, and subnational entities such as US states, managed the early months of the pandemic -specifically the agility of imposing timely lockdowns and other social distancing measures and border controls (e.g., complete or partial travel restrictions and/or quarantine for travellers) and adequate and effective testing, contact tracing and isolation of infected individuals and their contacts. Second, how prepared and resilient the health and social care system was to control the spread of infection, in the community as well as in health facilities and care homes, while continuing routine care.
Countries eased or maintained travel restrictions and distancing measures of the first wave to different extents and at different paces 5,62 . They also differed in terms of testing for surveillance and identifying infected individuals, how well and how fast they traced contacts, and how they supported the isolation of infected individuals and their contacts 51,59 . Australia and New Zealand took advantage of their island geographies and pursued an approach of disease elimination 63 -following strict lockdowns they imposed tight border control which kept cases to sporadic small numbers and allowed careful contact tracing and isolation. Iceland, Norway and South Korea did not close their borders but put in place various forms and durations of quarantine/isolation and testing for travellers. They also effectively integrated their well-coordinated public health capabilities 64 with modern biomedical (e.g., genomics) and digital technologies (e.g., data from credit card transactions, mobile phones and CCTV [closed-circuit television] footage), and did widespread symptomatic and asymptomatic testing to identify, track and isolate infected individuals and their contacts, and to successfully suppress the epidemic 47,65-69 , with additional restrictions only when there was a surge in infections. All three countries also have a strong healthcare system that continued to provide routine care alongside care for COVID-19 patients.
At the other extreme, many countries in Central and Eastern Europe, which had put strict measures in place and had experienced no detectable excess mortality during the first half of 2020, removed restrictions on travel and social contact in summer of 2020, at times to a greater extent or at a faster pace than their Western European counterparts 58,62,70,71 . With virtually the entire population still susceptible to infection, this set into motion community transmission, which coincided with the introduction of more transmittable variants of SARS-CoV-2 which were not controlled as fast and as strictly as earlier in 2020, leading to their true 'first wave' in Autumn 2020 which was equivalent to or worse than those in their Western European counterparts in magnitude and duration ( Figure 6 and Figure 7). Some Mediterranean countries, such as Malta and Greece, and Northwestern European countries, such as Austria and Germany, were also largely spared during the first half of 2020, only to see an increase in deaths in autumn and winter, due to a combination of (tourism-related) travel and increased local mobility and social interactions 72 .
Between these extremes, other countries in Europe and Canada mandated or encouraged masks and face coverings, continued some forms of distancing measures (including occasional lockdowns), increased their testing capacity to various extents, and restarted routine healthcare. There were also improvements in treatments and protocols following large-scale trials and analyses of routine care data 73-75 . These changes meant that, despite the repeated rise in infections, the mortality toll from COVID-19 and other diseases was lower than the first wave but nonetheless considerable in these countries 73 . The continued death toll in these countries may have been because distancing measures were not as stringent as those in the first wave, and because testing, contact tracing and isolation support did not reach the coverage or depth needed to contain transmission, as did those in Iceland and South Korea 51,76 . This was compounded by more transmittable variants and that the second wave occurred in winter when more time is spent indoors with less ventilation. The experience of the USA did not resemble that of any of the other countries. Rather, different states saw a rise in infections and deaths at different times 77 , because there was little coordinated national response and because periods of extensive travel, such as Thanksgiving and Christmas holidays, led to spread of infection across states.
The observed patterns of excess mortality in the first year of the pandemic indicate that the pandemic's death toll in the next year is likely to depend on three factors: The first, and most important factor in the countries analysed here will be the breadth and pace of vaccination, including whether vaccination is extended to school-aged children and the use of boosters to enhance immunity especially against new variants of SARS-CoV-2, because vaccines have been shown to be highly effective in preventing (severe) COVID-19 and deaths in trials and in real-world settings 78-81 . Even with high vaccine coverage, some adherence to other measures may be needed when the number of infections rises, because vaccine efficacy is less than 100%, especially against new variants of SARS-CoV-2 81 , and because the morbidity and longer-term health morbidity impacts of infection may be significant. Second, as the direct impacts of the COVID-19 pandemic are reduced through vaccination, the indirect impacts will become more visible. These include how much the backlog of routine care and persistently high health system pressure impacts deaths from other conditions, and the impacts on jobs and income. Mitigating these requires economic and social policies that generate secure employment and income support, and strengthening health and social care. A third, and perhaps more uncertain factor, is the magnitude of direct COVID-19 deaths that might be expected in (northern hemisphere) winter 2021-2022, because retraction of non-pharmaceutical interventions that mandate or facilitate social distancing and use of masks before the entire population is vaccinated may lead to circulating SARS-CoV-2 infections in countries as a whole as well as in specific geographical and sociodemographic subgroups of the population. In mid-February 2021, vaccination rates were still low in the countries included in our analysis, with the highest rates in the UK (22% of adults with one dose and 1% with two doses), Serbia (12% and 3%, respectively), the USA (11% and 4%, respectively) and Chile (11% and 0.3%, respectively). Since then, vaccination accelerated in industrialised countries and emerging economies and in many countries 70% or more of the population have been vaccinated. Even in those, specific geographical or social subgroups of the population may have lower vaccination rates. Further, for countries where supply and access limit the pace of vaccination, the coming year could look as it did for the countries in this paper: a choice between lockdowns and a large death toll. To avoid these scenarios, which both have adverse health and wellbeing impacts, vaccine access and roll out must be accelerated and accompanied with actions to both delay and contain infections, especially new variants of concern, through effective and timely testing, contact tracing and isolation support and measures such as mask wearing in crowded indoor settings.

Data availability
Underlying data Input data on deaths, population and temperature are available at https://doi.org/10.5281/zenodo.5535829 12 .
This repository contains the following underlying data: • data/data.csv (data on deaths, temperature and population by age group, sex, country and week) • output/result_summaries.csv (weekly estimates of predicted deaths, excess deaths, excess death rates per 100,000 and relative increase in deaths) Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
The original data sets used in the study are publicly available from the following locations: My only concern is about the "model averaging approach" followed by the authors. First of all, why not select a single model using the different model selection criteria available in INLA? Some of the models are nested so it makes me wonder whether the 'extra' components are needed. Most likely the answer is yes, and then the authors should choose the model with the extra components.
Secondly, I agree that following a Bayesian model averaging is useful but I am not sure that this can be achieved by simply "pooling" the samples obtained from INLA (see, for example, Gómez-Rubio et al., 2020 1 where we illustrate how to do BMA with INLA by weighting the different models according to the marginal likelihood). The authors should have weighted the samples according to the values of the marginal likelihood for each model (which are reported by INLA). These values could have been used to compute the posterior probability of each model for model selection as well. Having said this, it looks like the results produced are similar to those provided by other analyses, which may indicate that weighting the samples have no major impact (i.e., all models produce very similar death estimates), but I think this is something that the authors should check.
model performs best in out-of-sample prediction depends on the country and year, even among those that have related terms, as seen in Appendix Figure 1 of Kontis et al.. 1 . This year-to-year and country-to-country variability in what the best model is, is itself due to the complex behaviour of mortality data over space and time, and means that the best model for the pandemic period is uncertain. In our validation analyses, 13 out of 16 models ranked best (i.e., had the smallest absolute projection error) for at least one country and one validation year. Even when looking at individual countries, in only 2 out of 33 countries did the same model emerge as the best model in all three validation years. What model is best is not only about choosing the one with the most components, but also the choice of prior (e.g., the order of the autoregressive term). Finally, and as a consequence of the aforementioned issues, the use of multiple models and model averaging provides a more complete picture of the inherent uncertainty in how many excess deaths the pandemic has caused than approaches that are not probabilistic or use a single model. Having said this, it looks like the results produced are similar to those provided by other analyses, which may indicate that weighting the samples have no major impact (i.e., all models produce very similar death estimates), but I think this is something that the authors should check.
Although in-sample measures of fit are well-suited for applications where there is an interest in the parameters of the model, for projection tasks like ours, models must be evaluated and weighted based on their out-of-sample performance, which is what they have been designed for. Projections of individual models can be combined in a fully Bayesian approach, by assigning weights based on out-of-sample performance, or simply by assigning uniform weights (i.e., using an unweighted average). In this work, we used the latter approach for the following reasons: First, methods for combing projections in a fully Bayesian manner typically use simpler base models (e.g. simple linear regressions) 4,5 and in our case, such an approach would not have been feasible due to its computational complexity. Second, calculating weights based on out-of-sample performance requires further validation to decide how weights should be defined (e.g., as the reciprocal of absolute projection error, or in some other way that penalises models with larger errors), as we have done in prior work 1 . Validating the choice of model weights in turn requires an additional hold-out period which in our case was not available for many countries, due to the short duration of their time series. Choosing uniform model weights avoids this problem and has been shown to produce equal or superior results to using non-equal weights in many applications 2,6 .
Nonetheless, based on this comment, we now include a sensitivity analysis on how the different models are weighted. Specifically, in the sensitivity analysis, the number of draws from each model was inversely proportional to the absolute error of prediction in the validation analyses. As seen, the results are robust to this choice, with the current approach having the advantage of being usable for countries with shorter time series.
We believe that the Delta and Omicron waves should be the subject of distinct analyses for two reasons: First, as stated in the Introduction of our paper, the effectiveness of vaccination in reducing deaths motivates distinct analyses and interpretation prior to v versus after mass vaccination, with the focus of our paper on the former. Second, because the number of deaths that would be expected had the pandemic not occurred is not directly measurable, its estimation requires projection modelling based on past time-series of weekly deaths. The longer the period of projection since the beginning of the pandemic, the more difficult and uncertain the projections, possibly requiring methods that are different from those used for the first year.

Specific comments: Abstract:
Data from US and US states have been prominently displayed in the Results section but are missing.
We have included in the abstract as correctly suggested.

The conclusion statement does not mention the lessons missed (NPI, delays, etc).
We have made the use of NPIs explicit in the abstract's conclusion, from its original implicit use.

Methods:
Reference to the "Our World in Data" website is not provided.
We had initially done this as a hyperlink. We have now also cited the corresponding publication.
The authors have extensively adjusted for the temperature variation data. This is important as this is an important determinant. Two other variables could be important: (a) Age distribution of populations in these countries; and (b) Ambient pollution. Data of both are widely available and it would be interesting to evaluate whether these variables influence the outcomes.
Our statistical models were designed to estimate weekly death rates, had the pandemic not occurred. With this objective, the models do not adjust for temperature or any other variable. Rather the models include, as covariates, variables that are predictors of mortality, vary from week to week (because those that do not vary from week to week are captured by other model terms), and are themselves not affected by the pandemic (so that we make unbiased estimates of counterfactual, no-pandemic mortality).
Temperature was included in our model because it is an established predictor of short-term mortality, and was itself not affected by the pandemic. We did not include air pollution in the model, because the pandemic and the associated policy and behavioural responses, in particular the reduction in vehicular traffic and industrial activity, led to changes in air pollution 1,2 . This in turn means that the observed pollution is different from what pollution levels would have been without the pandemic. In contrast, researchers have estimated a short-term pandemic impact of only ~0.01°C on ambient temperature 3 , which is orders of magnitude smaller than seasonal and interannual variation.
The age distribution of each country is already embedded in past data on deaths and population, and is therefore implicitly used to produce our projections. More details provided below.
As the Reviewer correctly points out, our main results are based on analysis that used data from all ages together, for three reasons: First, at least three terms in our model implicitly take into account any change in age distribution that might occur from week to week, which is the relevant time interval for our work: (1) The model has a random intercept for each week of the year (θ week ) which allows deaths to change from week to week, some of which may be partly due to differential changes in age groups; the weekly terms repeat across years although their magnitude can change due to variations in temperature. (2) The autoregressive structure of the model allows these terms to be related to deaths in the previous week which means that phenomena that gradually change the age structure of deaths on a weekly scale (e.g., weekly dynamics of flu) are captured. (3) Similarly, we have a term that captures monthly mortality, also with autoregressive structure at annual scale, which implicitly captures phenomena that change deaths at annual scale in a correlated way. Second, when the time unit of analysis is the week, rather than the year as is the case in most mortality projections, the number of deaths in many country-week-age units is zero or small. Using all ages together can make model fits more stable, hence trading off agerelated precision with how well model parameters are estimated. Third, running a single model for all ages means that the uncertainty of the estimates is correctly reported.
That being said, we have also repeated the analysis by age group and compare the estimates with those of the combined model in the section titled statistical methods. This comparison shows that the two approaches generate similar estimates of counterfactual no-pandemic mortality, for the aforementioned reasons.
Assuming the comment refers to the sentence starting with "We used data on weekly deaths from the start of time series through mid-February 2020 …", "mid-February 2020" is correct as estimation of model parameters was based on data before the pandemic.

Results:
Do we have data on excess mortality for US states? May be interesting to add (unless it is part of a forthcoming study). No data are presented regarding "lessons learned". May be useful to study the social determinants such as GDPs, income, and health systems.
The countries studied here are all high-income although they differ in health care spending and health service infrastructure 4 . It would be a separate body of work, and a distinct paper, to quantify how various features of health system -from local public health and primary care to specialist care -may have affected the outcome of the pandemic. We now discuss this issue at the beginning of the Discussion section, with reference to a number of studies that have qualitatively and quantitative analysed the role of both health system resilience and early social distancing measures on the impacts of pandemic.

Discussion:
Please discuss the data presented and their limitations, and then focus on the implications of the study.
We have reorganised the text as suggested.