COVID-19 – exploring the implications of long-term condition type and extent of multimorbidity on years of life lost: a modelling study [version 2; peer review: 1 approved, 2 not approved]

Background: COVID-19 is responsible for increasing deaths globally. Estimates focused on numbers of deaths, do not quantify potential years of life lost (YLL) through COVID-19.  As most people dying with COVID-19 are older with underlying long-term conditions (LTCs), some speculate that YLL are low. We aim to estimate YLL attributable to COVID-19, before and after adjustment for number/type of LTCs. Methods: We first estimated YLL from COVID-19 using WHO life tables, based on published age/sex data from COVID-19 deaths in Italy. We then used aggregate data on number/type of LTCs inform a Bayesian model for likely combinations of LTCs among people dying with COVID-19. From these, we used routine UK healthcare data from Scotland and Wales to estimate life expectancy based on age/sex/ combinations of LTCs using Gompertz models. We then calculated YLL based on age, sex, type of LTCs and multimorbidity count. Results: Using the standard WHO life tables, YLL per COVID-19 death was 14 for men and 12 for women. After adjustment for number and type of LTCs, the mean YLL was slightly lower, but remained high (11.6 and 9.4 years for man and women, respectively). The number and type of LTCs led to wide variability in the estimated YLL at a given age (e.g. at ≥80 years, YLL was >10 years for people with 0 LTCs, and <3 years for people with ≥6). Conclusions: Deaths from COVID-19 represent a substantial burden Open Peer Review


Introduction
The SARS-CoV-2 pandemic, the virus causing COVID-19, emerged in late 2019 and continues to have substantial impact on populations and healthcare systems throughout the world. This manuscript presents a revised version of an analysis initially conducted in March 2020, at which time Italy, the first European nation to experience a major outbreak of COVID-19, was seeing rapidly escalating numbers of cases and deaths. In the UK, at that time, the initially small number of hospitalisations and deaths were beginning to rise. The analysis sought to estimate the burden of COVID-19 deaths in terms of potential years of life lost (YLL), at a time when individual-level data on COVID-19 deaths was scarce.
When severe, coronavirus disease 2019 (COVID-19) causes acute respiratory failure, often requiring mechanical ventilation 1 . At the beginning of April 2020, more than 1,200,000 confirmed cases have been reported globally, including 67,000 deaths 2 . In response to this threat, governments introduced nonpharmaceutical interventions such as physical distancing and the delivery of health services has radically changed, with resources diverted towards the management of COVID-19 and away from their usual activities 3 . These measures have aimed to limit a surge in cases that risks overwhelming healthcare services 4 , and have continued and repeated in various forms throughout the world.
Since few health care systems could have responded adequately to the increased need for acute care without these changes, these decisions were in some ways inevitable. However, as societies seek to "return to normal", decisions about the extent and nature of ongoing measures to limit spread of COVID-19 will be more difficult. These choices will require balancing the likely direct effects on mortality from COVID-19 against the likely indirect impacts on mortality for other conditions -due, for example, to inadequate access to necessary services for many people with long-term conditions (LTCs), potential reluctance of the public to attend for acute events such as myocardial infarction, or impacts from forced unemployment, loss of income and social isolation. The indirect effects are likely to be complex, most will be downstream, and will require extensive research to be better understood. However, we need to capture the direct effects of COVID-19 as accurately as possible now, via currently available data and methodologies.
In April 2020, most reports of COVID-19 deaths used raw counts 2 . This may give a distorting picture of the mortality burden, however, as it does not consider how long someone who died from COVID-19 might otherwise have been expected to live. As people dying from COVID-19 are predominantly older and have pre-existing LTCs 5-7 , some have speculated that many of these people would have soon died of other causes and that life expectancy may therefore not being greatly impacted 8,9 . While multimorbidity, the presence of multiple LTCs, is known to be associated with increased mortality 10 , people with multimorbidity nonetheless can be expected to live for many years 11 . Raw counts of deaths may therefore mislead policy-makers and the public, causing them to either over-or under-estimate the total impact of COVID-19 related deaths.
Within epidemiology, there is a standard measure used to account for this difficulty, the years of potential life lost (YLL) 12 . YLL can be expressed per-capita as the average number of years an individual would have been expected to live had they not died of a given cause. The conventional approach to YLL uses data on the age at which deaths occurred combined with typical life expectancy at a given age, to estimate a weighted average of the number of years lost. YLL is used to allow fair comparisons of the health impact of different policies, such as different measures to address the pandemic. However, given the controversial role of multimorbidity in COVID-19 deaths it is also important to calculate YLL additionally considering the effects of the presence of a single LTC or multimorbidity. Therefore, we propose to quantify the burden of mortality related to COVID-19, both using the conventional age-based YLL measure, and YLL additionally accounting for type and number of underlying LTCs. We draw upon data sources available in April 2020, as this modelling study aimed to estimate the potential YLL at an early stage in the pandemic, when the impact was emerging. It should be noted, however, that events unfolding throughout the pandemic are likely to impact the YLL. Any estimate, particularly in the context of a pandemic, is dependent on what populations are exposed, and to what extent. Updated estimates, taking account of events which transpired in the UK and beyond, are the subject of ongoing collaborative efforts and we have not attempted to model these. Rather, this manuscript provides a detailed and reproducible quantification of YLL using techniques targeting the specific challenges of estimation at the early stages of the pandemic.

WHO standard YLL approach
The standard approach for calculating years of life lost is to apply the distribution of ages among those who died from a specific cause to a standard life-table. For the purposes of international comparison, we opted to use the WHO 2010 Global Burden of Diseases table as the reference 13 , which presents YLL

Amendments from Version 1
This is a revised version of the manuscript in response to reviewer comments. All changes are detailed in the response to each of the reviewer comments. In summary, we have revised the text to clarify that this is a modelling approach which was developed early in the COVID-19 pandemic to meet the challenge of estimating years of life lost reflecting multimorbidity at a time when individual patient data were scarce. Also, we have refined our modelling and the Bayesian long-term conditions model now fully converges -this is reflected in Table 1 and Table 2. The overall findings, along with their interpretation, are largely unchanged. The online repository containing all code and data has been updated along with a README file with linked to appropriate sections. by age, but not by sex or extent of multimorbidity. This method involves summing the expected years of life remaining from the table according to the number (or for the mean YLL the proportion) of people dying within each age-band. We applied the age distribution of COVID-19 deaths in Italy from published data to estimate the YLL 14 .
We chose the WHO life tables to allow comparison of the burden of COVID-19 deaths with other conditions in an international context. However these, unlike many national-level life tables, do not stratify by sex. Furthermore our subsequent modelling draws upon data from specific setting based on availability early in the pandemic (namely data on COVID-19 deaths from Italy, and life-expectancy estimates based on data from Wales). Therefore, following comments from academic colleagues via social media, we performed sensitivity analyses using life tables from Italy (2017), United Kingdom (2016Kingdom ( -2018 and, for comparison, the United States (2017).
Overview of modelling to accommodate long-term conditions and multimorbidity The remainder of the methods describes our approach to estimating YLL accounting for number and type of underlying LTC, along with age and sex. Our modelling comprised three main components: (i) estimating the prevalence of, and correlations between, LTCs among people dying with COVID-19; (ii) modelling UK life expectancy based on age, sex, and each combination of these LTCs separately; and (iii) combining these models to calculate the estimated YLL per death with COVID-19. These are summarised by age-group, sex, and multimorbidity counts (that take into account different combinations of LTCs).
The data sources used for each of these stages of modelling are summarised in Figure 1.

Rapid review
To inform our estimates of number and type of LTCs, we first sought to identify the most detailed data available for underlying long-term conditions among people dying of COVID-19. We performed a rapid review to identify data on underlying conditions for people dying with COVID-19. We searched the WHO repository of COVID-19 studies on 24 th March 2020. To identify studies reporting data on LTCs among people who had died from Covid-19, we screened titles and abstracts of all epidemiological, clinical, case-series and review articles (n=1685). We identified and screened 77 potentially relevant full-text articles, of which four reported aggregate data on LTCs among people who had died of . Three were small studies (32, 44, and 54 deaths, respectively) based in Wuhan, China 5-7 . However, the fourth was a comprehensive report from the Istituto Superiore di Sanità (ISS) (published each Tuesday and Wednesday) including data on 11 common LTCs (ischaemic heart disease, atrial fibrillation, heart failure, stroke, hypertension, diabetes, dementia, chronic obstructive pulmonary disease, active cancer in the past 5 years, chronic liver disease and chronic renal failure), as well as the number of patients who had 0, 1, 2 or ≥3 LTCs for 701 of the 6801 people who died with COVID-19 in Italy 14 . In view of the smaller sizes of the Chinese studies, and the greater dissimilarity of these populations with the UK relative to the Italian data, we opted not to include these in the analysis. These data were used to construct a plausible scenario for the prevalence of combinations of LTCs among people who died from COVID-19 for the modelling presented here.

Long-term condition prevalence and correlation models.
This first stage of our modelling aimed to estimate the prevalence and correlation between specific LTCs among people dying with COVID-19.
We utilised aggregate data on COVID-19 deaths from the Istituto Superiore di Sanità in Italy. Since we were unable to obtain individual patient data for the Italian case-series of deaths from COVID-19, we had to infer the joint prevalence of LTCs from the summarised information available, i.e. the marginal distribution of multimorbidity counts (the row sums, or total number of diseases for each patient, wherein counts of ≥3 LTCs were collapsed into the single category of 3+) and the marginal distributions of LTC frequency (the columns sums, or the total number of patients with each LTC). To that end, we developed a Bayesian latent process model of disease prevalence and correlation and fitted it using Markov chain Monte Carlo (MCMC) to both elements in the published data. This analysis was applied jointly to the small number of deaths that had occurred in Scotland, primarily to aid convergence in Bayesian model fitting by providing some information about the correlation between LTCs 15 . The Scottish subset of the data contained a partial record of known LTCs for individual patients, but the multimorbidity count per patient, as well as the marginal frequency of each LTC, were missing (hence, modelled as latent). Bayesian priors for the correlations between diseases were specified with a tendency to zero (shrinkage). Numerical investigations indicated little sensitivity of convergence to the strength of shrinkage, so we opted for weak shrinkage as a precautionary approach. This model gave us the full matrix of correlations between every combination of LTCs at the level of individuals, therefore providing us with a complete dependence structure of LTCs presented within the sample of COVID-19 mortalities. In order to propagate uncertainty through the analysis, from this fitted model (effective sample size of MCMC 410) we simulated 10,000 notionally "typical" patients, with plausible combinations of LTCs (under the combined Italian and Scottish data).
To test the sensitivity of our findings to the estimated correlations, we also estimated the YLL under two opposite extremes (i) that LTCs were independent and (ii) that LTCs were highly correlated. Unlike the Bayesian LTC mode, these sensitivity analyses did not use the information on the multimorbidity counts from the ISS report, but only the proportion of patients with each of the eleven comorbidities. For the "independent" scenario we created 11 vectors comprising 1s and 0s (respectively with and without the long term condition) corresponding in length to the number of patients. We then sampled from these vectors with replacement to obtain 10,000 simulated patients. For the "highly correlated" scenario we first sorted each vector, then combined them to form a 710x11 matrix, then sampled each row with replacement to obtain 10,000 simulated patients. This generated a dataset where individuals with one comorbidity which reduces life expectancy were more likely to have other comorbidities which reduce life expectancy (and vice versa).
Age models. Next, we modelled the relationship between age and multimorbidity counts among people dying with COVID-19. We were unable to obtain direct estimates of the association between age and extent of multimorbidity among patients who had died from COVID-19. Therefore, we modelled two scenarios: independence between age and multimorbidity count (i.e. no correlation between age and multimorbidity count among people dying of COVID-19), and a positive association between age and multimorbidity count. To inform the latter, we examined data within the Secure Anonymised Information Linkage (SAIL) databank for 145 patients who had influenza recorded as the cause of death in their death certificate in 2011. SAIL is a repository of routinely collected healthcare data (including primary care, hospital episodes, and mortality data) from a representative sample covering approximately 70% of the population of Wales. While influenza is a different condition, these data were used for the sole purpose of estimating correlations between age and multimorbidity counts (conditioning on death), and did not inform the model in any other way. We found that for men, age increased by 4.7 years per unit increase in the number of LTCs until the count reached 6 after which there was no evidence of further increase. For women, the figure was 2.6. Therefore, we performed the modelling assuming that for COVID-19 the mean age increased by 5 years per unit increase in multimorbidity count across the range from 0 to 6 LTCs in men. To allow for some degree of uncertainty around this estimate by sampling from a normal distribution. We arbitrarily chose a standard deviation of 0.5. We estimated this similarly for women, but using a mean increase of age of 3 years per increase in multimorbidity count. We incorporated this information in a model fitted to the summary age data provided in the Italian case report. We obtained 10,000 samples from the posterior distribution for inclusion in the YLL calculations. SAIL analyses were approved by SAIL Information Governance Review Panel (Project 0830). Approval for the use of individual patient data in the analysis was given by the NHS Public Health Scotland Caldicott officer.
Survival models. For patients aged 50 years or older at death, we estimated mortality according to age, sex and combinations of each LTC using the SAIL. From these data, we identified all participants aged over 49 years who were registered with a participating practice for the duration of 2011 (approximately 0.85 million people). This period was selected as electronic coding of diagnoses was well established, and it allowed >6 years of follow-up. Age and sex were extracted from primary care records. We also identified all LTCs for which we had information of COVID-19 deaths from Italy. LTCs were identified using a combination of primary care data (using Read diagnostic codes) and hospital episodes (using ICD-10 codes). Individuals were considered to have a LTC if they had a relevant diagnostic code entered prior to 31 st December 2011. Relevant codes were identified from the Charlson comorbidity index and the Elixhauser comorbidity index 16,17 , which had established algorithms for identification from ICD-10 codes 18 , and have been adapted for using Read codes in primary care 19 . Code lists are available in the supplementary material 15 .
All-cause mortality was assessed by linkage to national mortality registers from 1 st January 2012 until August 2018 (last available data). Participants were censored if they de-registered from a participating SAIL practice. We used the flexsurv package in R (version 1.1.1) to fit a Gompertz model treating age as the timescale 20 . We assessed the fit of this distribution graphically (supplementary material) 15 . In models stratified by sex we included all the LTCs as main effects as well as age-LTC interactions that improved the model fit in terms of the Akaike information criterion. In sensitivity analyses we also included two-way (comorbidity-comorbidity) and three-way (comorbidity-comorbidity-age) interaction terms for the four comorbidities with the largest effect measure estimates (COPD, heart failure, liver failure and dementia) requiring 12 additional parameters. To propagate uncertainty from the survival models we obtained 10,000 samples of the coefficient estimates by sampling from a multivariate normal distribution corresponding to the coefficients and variance-covariance matrix from the regression models.

Combination of comorbidity and mortality models.
In the final analysis, we combined 10,000 samples from all three sources: LTC combination models, age models and survival models. We used the rate and shape parameters with the cumulative distribution function implemented in the flexsurv package to calculate the survival probabilities at 3-month intervals from aged 50 to 120 (to allow all curves to descend to zero). From these times and survival probabilities we estimated the mean survival, or life expectancy.
Bayesian models were written in the JAGS language 21 and implemented using runjags for R (version 2.0.4) 22 , survival models were fit using the flexsurv package in R (version 1.1.1) 20 , and for the final analysis the model-outputs were also combined in R (version 3.6.1). The 95% uncertainty intervals were obtained using empirical bootstrapping, with the number of samples in the mean equal to the effective sample size from the LTC correlation model. All code, data (except individual-level data for Scotland), intermediate outputs and diagnostic plots are provided on GitHub (https://github.com/dmcalli2/covid19_yll_final) 15 .

WHO life tables
The proportion of men and women in 10-year age-bands was reported for the 6801 deaths included in the ISS case report. On applying the proportion in each age-band to the WHO Global Burden of Disease 2010 life tables for men, we found that the YLL was 14.4 per person using the whole cohort and 14 after excluding those aged under 50. For women, comparable figures were 12.2 and 11.8 years, respectively. In sensitivity analyses using alternative life tables, life expectancy was lower (particularly for men), however the estimates YLL remained above 10 years for both men and women, regardless of life table used (detailed results shown in https://github.com/dmcalli2/ covid19_yll_final/blob/master/Scripts/Addendum.md).

Comorbidity models
For 710 patients who had died with COVID-19 for whom information on LTCs was presented in the ISS report 14 , the proportion with each LTC was as follows:-ischaemic heart disease 27.8%, atrial fibrillation 23.7%, heart failure 17.1%, stroke 11.3%, hypertension 73%, diabetes 31.3%, dementia 14.5%, chronic obstructive pulmonary disease 16.7%, active cancer in the past 5 years 17.3%, chronic liver disease 4.1%, chronic renal failure 22.2%. The ISS report also presented the proportion of patients who died with each of the following multimorbidity counts: 0 (2.1%), 1 (21.3%), 2 (25.9%) and ≥3 (50.7%). Using these data, alongside individual-level patient data for a small number of patients from Scotland to aid with model fitting, we were able to simulate a set of realistic notional patients with specific combination of LTCs. The correlations between every pair of LTCs are shown in the appendix and the full posterior distributions from the modelling are available at GitHub (https://github.com/dmcalli2/covid19_yll_final) 15 .

Age models
Based on the proportions reported for each age-band, for men the mean age for the ISS deaths was 77.9 years when people aged less than 50 were excluded and 77.4 years overall. For women the figure was 81.1 for both. The models we fit to these data to smooth out the distribution and to make it easier to accommodate different scenarios for the association between age and multimorbidity counts comorbidity are shown in Figure 2; the distribution of age and multimorbidity counts for men and women are shown under the assumption that these are independent, and under the assumption that multimorbidity is associated with age.

Survival models
The coefficients for the survival models are shown in the supplementary appendix. Briefly, all LTCs other than hypertension were associated with increased mortality (in a model including 10 other LTCs), and for each LTC the association with mortality was attenuated as the baseline age increased. Figure 3 shows the survival curves applied to different age and combinations of LTCs, stratified by age-band and multimorbidity count. This figure shows how these associations and age relate to survival across the age range from 50 to 110 years old.

Years of life lost
For men the average YLL on adjusting for number and type of LTC as well as age was 11.6(10.9-12.4). For women this value was 9.4(8.7-10). The results were similar under the different assumptions for the age-multimorbidity association and in both sensitivity analyses, whether assuming strongly correlated or independent LTCs (Table 1). For comparison, the YLL based on age alone using the WHO tables was 14.0 and 11.8 for men and women, respectively. Figure 2. Modelled distribution of age in ISS population, assuming age is associated with comorbidity counts, and assuming age and comorbidity are independent. Coloured bars indicate the comorbidity count from zero (dark/blue) to 11 (light/yellow). Individual lines represent survival curves for a single simulated patients with a given set of LTCs. From light to dark (yellow to blue) they show decreasing multimorbidity counts (11 to 0). There are 10, 000 lines, one for each notional patient. Lines run from the age at which each simulated patient died (survival probability = 1) to when they would have died under the model (survival probability = 0). Patients with the same age and total multimorbidity count will have a different survival curve if they have a different set of 11 LTCs.
Across the simulated patients there was substantial variation in YLL adjusted for multimorbidity count (Figure 4).
On stratifying the YLL estimates by sex, age and multimorbidity count (for the simulated patients) there were clear differences ( Figure 5, Table 2) with the YLL ranging from around 2-years per person in men or women aged 80 with large numbers of LTCs, to around 35 years in younger people without any LTCs (Table 2). For most age-bands and most multimorbidity counts the YLL per person remained above 5. In sensitivity analyses including the survival models with additional comorbidity-comorbidity and comorbidity-comorbidity-age interaction terms, (despite these models having a better fit based on AIC) than the model presented here, the YLL only changed minimally from that seen in the main analysis. This was true overall YLL for each sex (13.1, 95% CI 12.2-14.0 and 10.5; 95% CI 9.7-11.3 for men and women respectively) and on additionally stratifying on age and multimorbidity count (as shown in Table 2). For the latter comparison, the largest difference -0.7 YLL -was seen in women aged 50-59 with six comorbidities. For most age-comorbidity bands the YLL was the same, to one decimal place, under both survival models.

Summary of main findings
Using published data on people who have died from COVID-19 and survival models based on age and multimorbidity count in a general population in the United Kingdom, we estimated the burden (years life lost) from COVID-19 related mortality. We make a number of important observations. First, using the WHO GBD 2010 life tables as the reference 13 , the estimated YLL was over a decade for COVID-19 deaths with 14 YLL in men and 12 in women. As such, mortality from COVID-19 represents a substantial burden to individuals and comparable to high burden LTCs such as ischaemic heart disease and chronic obstructive pulmonary disease. Second, YLL estimated from models using the prevalence of underlying LTCs based on patients dying from COVID-19 in Italy and age-, sex-and multimorbidity countspecific survival models in the UK did not drastically impact the YLL. Across both men and women, the number of YLL dropped to 11.6 and 9.4 years respectively. Third, across most age and multimorbidity count strata the estimated YLL per person remained substantial and generally above 5 years. This means that even after accounting for multimorbidity count, most individuals lost considerably more than the "1-2 years" suggested by some commentators 23 perhaps 24,25 reflecting the high prevalence of multimorbidity in this population, especially in those over the age of 50 years 26,27 . Finally, whilst the YLL remained high across most age-and multimorbidity count strata, the presence of multimorbidity did indeed influence the magnitude of the YLL. For example, in the elderly, over the age of 80, the estimated YLL in people with no LTCs was 7 years falling to less than two years with an increasing multimorbidity count.
YLL is a widely used metric to compare the relative impact of different causes of death and is used to guide policy-making and health service delivery and to prioritise interventions aimed at preventing deaths 28 . Using UK reports for approximate comparisons, the YLL in England and Wales for other conditions ranged, per capita from 8.2 for chronic obstructive pulmonary disease, 11.6 for coronary heart disease, 13.1 for pneumonia, and 21.6 for asthma 29 . Therefore, against these benchmarks, mortality from COVID-19 represents a substantial burden to individuals. It should be noted, however, that YLL for an emergent infection such as COVID-19, particularly in a pandemic, will be sensitive to the specific circumstances of the virus spreading, mitigation strategies, and potential future treatment or vaccines. These estimates, therefore, relate to the specific conditions at the time of modelling and will need to be updated particularly as vaccination or other strategies alter susceptibility or severity of infection. It is important to note, however, that it would be a misuse of any such modelling if it were used to criticise decision-making undertaken at the time.
The estimated YLL can vary substantially depending on the reference population chosen and the age distribution among those who die. Moreover, where attempts are made to account for underlying conditions in those who died, the accuracy will

Independent
Independent 11.5 (10.9-12.1) 9.6 (9.1-10.2) Highly correlated   depend on the quality and completeness of data both for those deaths, and in the reference population used to obtain estimates of survival according to those underlying conditions. Nonetheless, although imperfect, we would argue that public health agencies should present estimates of YLL for COVID-19, alongside the more usual counts of deaths. We have already seen that if agencies do not do so, commentators can and will fill this vacuum, sometimes making substantial errors such as using life expectancy at birth to make inferences about the years of life lost by someone who has already lived into later life and thereby considerably underestimating the impact of the disease on individuals 23 . In additional to reporting YLL, metrics such as excess deaths and quality-adjusted life years are important to fully contextualise the loss of life seen in the pandemic.
It should be noted that these estimates were made early in the pandemic and could not account for specific patterns and events which emerged within the UK. For example, these analyses were performed before the impact of COVID-19 in care homes in the UK became apparent. SAIL contains data on all participants registered with a GP (and so would include care-home residents), however our estimates of life expectancy do not distinguish between people who live in care-homes and those who do not. As such our analyses would not reflect the YLL at a population level where care-homes are disproportionately impacted. Our estimates, given the data sources which were available at the time, are more likely to reflect the YLL of COVID-19 deaths among hospitalised patients.
Finally, our estimates of YLL only attempt to quantify the direct effects of COVID-19. Indirect impacts on mortality (e.g. through pressure on healthcare services of unintended consequences of lockdown measures) should also be considered, and are not captures by our YLL calculation.

Strengths and limitations
Our analysis is novel in that it adjusts YLL for the number and type of underlying LTCs. This is important as people with underlying multimorbidity are recognised to be more vulnerable to COVID-19. However, although we had data for eleven common and important LTCs, we did not have markers of underlying disease severity among those who died. Severity of the underlying LTC has considerable impact on life expectancy 30 . Moreover, we had no data for rarer severe LTCs, which may nonetheless be common among those who die from COVID-19 at younger ages. As such, the attenuation of YLL following adjustment for LTCs may be an underestimate. However, we think that this effect is unlikely to be substantial enough to reduce YLL to the orders of magnitude suggested by some commentators. Indeed, on stratifying by age and multimorbidity counts, we rarely found average YLLs of below three. Also, we were not able to adjust our estimates for other factors and exposures (such as socioeconomic status, occupation, smoking, health behaviours) which would have given a more accurate representation of lifeexpectancy in the absence of COVID-19.
Socioeconomic status is a particularly pertinent issue, as it may influence not only outcomes from infection (e.g. through multimorbidity and other risk factors) but also the likelihood of exposure (e.g. higher proportions of occupations for which home-working was not feasible). Since socio-economic status also predicts mortality there is a possibility of residual confounding due to the lack of data on socioeconomic status available for our models. To prevent mean inflation through rare deaths in younger people, who only modelled deaths in people over 50 years, however deaths among younger people may influence estimates YLL.
We did not have access to large quantities of individual-level data with which to estimate the prevalence of different combinations of LTCs. Therefore, we fitted a complex model (which was methodologically innovative and will be the subject of a separate publication) to estimate the joint probabilities, using the overall (marginal) estimates of each LTC, and the overall multimorbidity counts alongside a small amount of individual-level data from Scotland to help with model fitting. This model (i) represents the best estimate for the joint probabilities given the available data and importantly, (ii) the results for overall YLL remained substantially similar in widely different sensitivity analyses assuming either that LTCs are highly correlated among people dying from COVID-19 or that they are entirely independent.
Finally, given the emergent nature of the coronavirus pandemic, this study was conducted rapidly and under pressure of time. We chose the best data for age, sex and prevalence of LTCs that was available to us at the time of our modelling, but better-quality individual-level data specific to individual countries will yield substantially more reliable estimates. We would suggest that each public health agency should produce country-specific estimates, using the same LTC definitions in those who died as in the reference population and ideally to an agreed international protocol. Our study has used complex state-of-the-art statistical modelling and inference techniques, which rely on expensive computer simulations. Given the time constraints, we had to find an acceptable trade-off between estimation accuracy and time constraints. Therefore, we will continue to refine our work to improve the convergence of the numerical procedures, although we do not expect that our conclusions, either about the overall YLL per capita, or about the distribution of YLL within the population, will substantially change. We have also provided all our data (except individual-level data form the Scottish population, for which we provide a simulated substitute dataset) and code to allow others to check our modelling and correct any errors 15 .

Conclusion
Among patients dying of COVID-19, there appears to be a considerable burden in terms of years of life lost, commensurate with diseases such as coronary heart disease or pneumonia. While media coverage of the pandemic has focused heavily on COVID-19 affecting people with 'underlying health conditions', and while the number and type of LTCs certainly influence the life expectancy and YLL for individuals, adjustment for number and type of LTCs only modestly reduces the estimated YLL due to COVID-19 compared to estimates based only on age and sex. Public health agencies and governments should report on YLL, ideally adjusting for the presence of underlying LTCs, to allow the public and policy-makers to better understand the burden of this disease.

Data availability
All code, data (except individual-level data for Scotland), intermediate outputs and diagnostic plots are provided on GitHub: https://github.com/dmcalli2/covid19_yll_final. Additionally, a few of my previous comments appear to have been misinterpreted (apologies for any lack of clarity), so I provide a few clarifications at the end.

Comment requiring additional input:
Your main clarification is: "Our analysis was therefore an attempt to model, at a very early stage in the pandemic and with limited and incomplete data, the impact of COVID-19 on years of life lost (YLL)… Our analytic approach was specifically designed and developed to estimate YLL in spite of the sparse data available at the time. It was highly novel in that it used aggregated data on comorbidity to derive estimates of combinations of comorbidity. We did not, and do not, suggest that ours should be the definitive estimate of YLL." I appreciate this, effectively you argue it is a methods piece then (almost), providing a framework that can potentially "be useful for early-stage estimation in future pandemics". I agree, this is novel and interesting.
However, while the manuscript now captures these limitations fairly well, the abstract does not, whatsoever. This is particularly important since this is all most people are likely to read and it could be extremely misleading. It is currently framed as an estimation of, and implies policy conclusions around, the YLL figure(s). I would suggest, as a minimum, amending the abstract to make the data issues clear and up-front and that the estimated YLL is potentially flawed for this reason.
If it is primarily a methods paper though, i.e. with any scope for usefulness for policymaking at an early stage of future pandemics as you suggest, I would also expect validation of the methods. The adage "all models are wrong, but some are useful" comes to mind. The method only has scientific/policy value if the conclusions the model gives reflect reality. It is not useful if it would give a completely different answer given better data, and so would imply different policy decisions to what would have been enacted (the model needs to effectively mimic the results given 'gold standard' data). Having used "sparse and aggregate data" it is currently still not obvious without this validation whether the model is actually useful in a policy/scientific sense or whether waiting for good quality data is a necessary input to accurately predict YLL accounting for multimorbidity. Ideally, as in my previous review, I would still like to see the method validated in some way, e.g. with more robust updated data from a coherent setting and not across multiple countries, and the manuscript presented as a methods piece. (As above, however, I do see that the manuscript now better reflects the limitations of the study, so deemphasising the policy significance of the presented YLL figure (particularly in the abstract), might be enough for this paper if you plan to do this validation separately. If this is the case though, you should probably add an additional limitation to this version that the method requires validation prior to recommending use in future pandemics.)

Clarifications to previous comments:
"Author response. Thank you for these comments. Taking first the issue of using data from patients who have died to estimate YLL…it is necessary to use data from people who have died…" -The issue is not taking 'people who have died', of course data on deaths is required to estimate YLL (thank you for pointing this out). The issue is, as I alluded to in other comments, if all (or half, more reasonably, even) the people who died were in a care home, as an example, the life expectancy of this selected group is very different from a general population of the same age (median, around 20 months has been estimated, although data is not great -https://www.pssru.ac.uk/pub/3211.pdf). You need to be able to say something about observables (co-variates) of the population of people in your death dataset compared to observable of deaths (/life expectancy) in the general population (especially when the deaths and the life expectancy are from completely different countries/contexts). That is the potential selection effect I mentioned, i.e. it is likely hiding a lot of other confounding. From your response, though, it seems like you are trying to demonstrate what is possible to estimate with flawed data at the start of a pandemic, which is fine. In which case, I would expect this selection effect to be picked up in the limitations (like you have now incorporated in response to my other comments).
○ "The standard approach to YLL typically uses just age and sex. We develop this further to include comorbidity, to address the specific point raised in the context of COVID-19, that because of common comorbidities most people who died with COVID-19 would have died soon in any case". Yes, but adding 'comorbidity' in this way does not pick up the severity (proxied by care home/deprivation and other observables I mentioned). Sex and age are largely more straight-forward data points, but co-morbidity measurement is not (there is a lot of literature on this).
○ "By definition, YLL cannot be observed, neither by hospital data, nor by controlled experiment…". Again, thank you for pointing this out. But it can, of course, be estimated based on (either older or more updated) data, so it can be 'observed' in that sense. What I was getting at with this comment was similar to the above, if around half of the excess ○ deaths are care home residents in most of Europe, and severe co-morbidity is frequently observed, then the YLL you estimate does not intuitively 'fit' with these additional observations from the existing literature (it might! But, it needs more work to validate first, I think. As you say, the YLL likely reflects a sub-population of non-care home/without severe conditions -potentially, the YLL in a system, Italy, having suffered long-term austerity and where the health system is overwhelmed). This is where I was getting at the need for some sort of validation in terms of latterly available (superior) data.
"We would also note that none (to our knowledge) have attempted to model the observed events in care homes". -Comas-Herrera et al. ~42% recorded deaths on average, higher (around half) in earlier parts of epidemic 1 . Reference 2[ref-] -excess care home mortality in England, ~55% of PHE estimated excess mortality in general population over same period.
○ "life expectancy at birth. Once an individual is known to have survived to an older age, their life expectancy is much higher than it is at their birth". Yes, but on average in population, not for most deprived/severely multimorbid/care homes, for instance.
○ "Rather we calculate years of life lost for a given set of observed deaths. Since it is not possible to observe years of life lost we are not able to present an observed versus expected comparison". Yes, but there is now better data on this, even from the same period you analyse. How does it compare to the updated set(s) of 'observed deaths'?
○ "To summarise, we think that this YLL can reasonably be generalised to high-income countries…". Why is that reasonable? The Covid death profile in UK/Germany/Sweden/Denmark/etc. is completely different, isn't it? Dependent on some combination of context-specific pre-existing conditions (e.g. 'stock' of health and risk factors in the population/resilience and capacity of services to respond) and reactive policy measures, surely? ○ "However, since ten years is already rather a long period, we think that the former is the more pertinent for policy-making". The more pertinent for policy-making should be the "true" value, surely (i.e. best available estimate given available up-to-date data Peter Hanlon, University of Glasgow, Glasgow, UK Thank you for your comments that the previous revision has significantly improved the manuscript. Thank you also for your further comments and clarifications which we respond to below. In short, we agree with the need to revise the abstract to reflect the clarifications and limitation highlighted within the main text. We also agree with the need to interpret the policy-relevance of the findings in the context of their limitations and have revised the discussion with this in mind. Our response to the specific points raised are detailed below along with all changes to the abstract and manuscript text.

Reviewer comment:
Your main clarification is: "Our analysis was therefore an attempt to model, at a very early stage in the pandemic and with limited and incomplete data, the impact of COVID-19 on years of life lost (YLL)… Our analytic approach was specifically designed and developed to estimate YLL in spite of the sparse data available at the time. It was highly novel in that it used aggregated data on comorbidity to derive estimates of combinations of comorbidity. We did not, and do not, suggest that ours should be the definitive estimate of YLL." I appreciate this, effectively you argue it is a methods piece then (almost), providing a framework that can potentially "be useful for early-stage estimation in future pandemics". I agree, this is novel and interesting.
However, while the manuscript now captures these limitations fairly well, the abstract does not, whatsoever. This is particularly important since this is all most people are likely to read and it could be extremely misleading. It is currently framed as an estimation of, and implies policy conclusions around, the YLL figure(s). I would suggest, as a minimum, amending the abstract to make the data issues clear and up-front and that the estimated YLL is potentially flawed for this reason.

Author response
Thank you for this comment, and for highlighting that the abstract requires updating to reflect the clarifications and limitations highlighted in the text. We have added the following to the background section of the abstract to highlight both the limitations of the data sources on which the findings are based: "We aim to estimate YLL attributable to COVID-19, before and after adjustment for number/type of LTCs, using the limited data available early in the pandemic." And changes the final sentence of the conclusion to emphasise the points raised (particularly around residual confounding, and the lack of other markers of underlying health status): "More comprehensive and standardised collection of data (including LTC type, severity, and potential confounders such as socioeconomic-deprivation and care-home status) is needed to optimise YLL estimates for specific populations, and to understand the global burden of COVID-19, and guide policy-making and interventions."

Reviewer comment
If it is primarily a methods paper though, i.e. with any scope for usefulness for policymaking at an early stage of future pandemics as you suggest, I would also expect validation of the methods. The adage "all models are wrong, but some are useful" comes to mind.

Author response
Thank you for these comments. We agree with the reviewer's point about the limitations of this model to guide (particularly current) policy decisions or in the context of a future pandemic. It is also entirely fair that the abstract requires updating for this revised version. Changes to the abstract as a whole are detailed in the response to the comment above. We have added the following text to the conclusion of the abstract to emphasise the limitations of our model: "More comprehensive and standardised collection of data (including LTC type, severity, and potential confounders such as socioeconomic-deprivation and care-home status) is needed to optimise YLL estimates for specific populations, and to understand the global burden of COVID-19, and guide policy-making and interventions." We would maintain that further work to estimate YLL based on updated or more detailed data that may now be available is beyond the scope of this manuscript, and will be better explored in separate studies focused on specific populations and policy contexts.
We agree with the need for data from 'a coherent setting' -as you point out -and have added the following to the limitations: "Our model, due to limited data available at the time, combined data on Covid-19 deaths and life expectancy data from different countries and contexts. While this synthesis of data sources allowed an estimation to be generated at an early stage, it limits the generalisability to specific contexts. Summaries of YLL relating to a specific country or context should ideally use data (both life-expectancy and Covid-19 related) from that context. A comparison of such estimates (based on individual-level and country specific data) with our approach (modelling aggregate-and individual-level data from multiple sources early in the pandemic) would be important to test the utility of this approach for future pandemics.
Despite these limitations, our findings do indicate that adjusting for number and type of LTCs does not substantially reduce the estimated YLL compared to the standard approach. Our analysis does not, however, offer a definitive estimation of YLL across all contexts, nor does it necessarily fully adjust for underlying health status. For example, further work based in Scotland has illustrated that the life expectancy in care-home residents, and therefore the estimated YLL, is substantially different from the general population. This is important given the large proportion of COVID-19 deaths that have occurred in care homes. Additionally, it indicates that additional factors are likely to influence underlying health status, life expectancy, likelihood of dying from Covid-19, and by extension YLL. These factors are not fully represented by the presence or absence of specific LTCs. Some of these factors are likely to be challenging to estimate from routine data alone, and producing YLL estimates which account for these factors should be an area of future investigation." Reviewer: Clarifications to previous comments:

Author:
Thank you for the clarifications below, and our apologies for the previous misinterpretation.
In short, we agree with the points now clarified and have incorporated these into the revised text for the limitations section given above. We highlight the changes in response to the specific points below.

Author response
Thank you for this clarification. The additional text in the abstract conclusion highlights the need to address residual confounding (as you rightly raise here). We also agree about the selection effect and expand on this in the following (also quoted above in fuller context): "Our analysis does not, however, offer a definitive estimation of YLL across all contexts, nor does it necessarily fully adjust for underlying health status. For example, further work based in Scotland has illustrated that the life expectancy in care-home residents, and therefore the estimated YLL, is substantially different from the general population. This is important given the large proportion of COVID-19 deaths that have occurred in care homes. Additionally, it indicates that additional factors are likely to influence underlying health status, life expectancy, likelihood of dying from Covid-19, and by extension YLL."

Reviewer comment:
"The standard approach to YLL typically uses just age and sex. We develop this further to include comorbidity, to address the specific point raised in the context of COVID-19, that because of common comorbidities most people who died with COVID-19 would have died soon in any case". Yes, but adding 'comorbidity' in this way does not pick up the severity (proxied by care home/deprivation and other observables I mentioned). Sex and age are largely more straightforward data points, but co-morbidity measurement is not (there is a lot of literature on this).

Author response
We agree with the reviewer that adjusting for underlying health status is complex and challenging and that an adjustment for the presence/absence of some specific comorbidities does not fully capture that (as proxied by care home status, etc, as the reviewer rightly points out). We highlight this point in the new text quoted in the comment above.

Reviewer comment
"By definition, YLL cannot be observed, neither by hospital data, nor by controlled experiment…".

Author response:
Thank you for clarifying this point. We agree that the YLL is likely to also reflect underlying factors of the population and health system etc. We have added the following text (again repeated from the longer section above) to highlight this point: "While this synthesis of data sources allowed an estimation to be generated at an early stage, it limits the generalisability to specific contexts. Summaries of YLL relating to a specific country or context should ideally use data (both life-expectancy and Covid-19 related) from that context. A comparison of such estimates (based on individual-level and country specific data) with our approach (modelling aggregate-and individual-level data from multiple sources early in the pandemic) would be important to test the utility of this approach for future pandemics."

Reviewer comment:
"We would also note that none (

Author response
Thank you. We have added these references along with another estimation of lifeexpectancy and YLL in care-home residents compared to the general population to the discussion section where the issue of care-homes is discussed.

Reviewer comment
"life expectancy at birth. Once an individual is known to have survived to an older age, their life expectancy is much higher than it is at their birth". Yes, but on average in population, not for most deprived/severely multimorbid/care homes, for instance.

Author response
We agree these are additional important sources of potential confounding, and have added to the abstract and discussion sections as quoted in response to comments above.

Reviewer comment
"Rather we calculate years of life lost for a given set of observed deaths. Since it is not possible to observe years of life lost we are not able to present an observed versus expected comparison". Yes, but there is now better data on this, even from the same period you analyse. How does it compare to the updated set(s) of 'observed deaths'?

Author response
We discuss this at greater length in response to the second comment above. In summary, we agree that comparison of our estimate to 'context-specific' estimates based on individual level-data will be important particularly if this method for dealing with limited data is to be utilised in future settings. As we discuss above, and in our previous response, we believe such a comparison is beyond the scope of this manuscript: "Summaries of YLL relating to a specific country or context should ideally use data (both lifeexpectancy and Covid-19 related) from that context. A comparison of such estimates (based on individual-level and country specific data) with our approach (modelling aggregate-and individual-level data from multiple sources early in the pandemic) would be important to test the utility of this approach for future pandemics."

Reviewer comment
"To summarise, we think that this YLL can reasonably be generalised to high-income countries…". Why is that reasonable? The Covid death profile in UK/Germany/Sweden/Denmark/etc. is completely different, isn't it? Dependent on some combination of context-specific pre-existing conditions (e.g. 'stock' of health and risk factors in the population/resilience and capacity of services to respond) and reactive policy measures, surely?

Author response
This is a very reasonable point as we have added to the limitation section to highlight this: "While this synthesis of data sources allowed an estimation to be generated at an early stage, it limits the generalisability to specific contexts. Summaries of YLL relating to a specific country or context should ideally use data (both life-expectancy and Covid-19 related) from that context."

Reviewer comment
"However, since ten years is already rather a long period, we think that the former is the more pertinent for policy-making". The more pertinent for policy-making should be the "true" value, surely (i.e. best available estimate given available up-to-date data). Would need to further demonstrate this with validation checks.

Author response
We agree that policy decisions should be based on best available data and using analytic approach that limits bias and confounding.

Author response:
This is a fair point; we did not mean to claim that these studies directly support our findings with respect to multimorbidity. However the overall estimates we give are broadly in keeping with subsequent attempts to model YLL. The challenge of 'other relevant factors', as the reviewer points out, is important as highlight in response to other comments.

Author response:
This is a reasonable interpretation, as discussed above in response to other comments. The following additional text from the discussion (also quoted above) speaks to this:

Mei Sum Chan
University of Oxford, Oxford, UK Thank you for the opportunity to review this manuscript. This review relates to the version of the manuscript submitted for review (v1) and does not cover the addendum as the addendum was not mentioned in this version, nor does it refer to the authors' responses to comments on the manuscript. The review also takes into account that this version was posted on 23 April 2020 and hence did not have access to the better quality data and results from similar studies that were published after this date.
Overall: This is a highly topical investigation into the demographic impact of mortality from COVID-19 and the role of underlying conditions in these mortality patterns. The message that there is a large direct mortality burden from COVID-19, measured in terms of YLL, and that this the burden increases with number of underlying conditions, is clearly made by the study, particularly through results from the first approach (the standard WHO life table approach) and sensitivity analyses of the second approach. However, insufficient evidence was provided on the validity of the second and main approach, a novel complex simulation study of COVID-19 mortality in people with different LTC profiles, both in terms of the statistical rigour and the use of health data within the model. I recommend that substantial model development is carried out to resolve statistical issues and to assess model diagnostics, preferably in the separate methodological investigation that the authors mentioned, before being used in its final form in this empirical study to make policy recommendations.
The majority of the sections were clear, and compelling reasons were given for conducting this research and for using these results to inform policy and public dialogue. The use of a rapid review to acquire and assess data on underlying conditions is also commendable. The aims were not fully clear and did not match the rest of the manuscript: a target or reference population was not specified, no results by type of multimorbidity were reported even though they were accounted for in the model, and the manuscript focused on the design and application of the novel second approach when this was not a stated aim. The methods, while detailed, were also unclear for the second approach: multiple datasets, model descriptions and procedures were interweaved in no particular order in each subsection, it was difficult to differentiate the role of data vs assumptions, and apart from scenario testing, no model diagnostic or checking procedures were mentioned for the Bayesian components. Some key results on the LTC model convergence and posterior distribution were reported in the Discussion rather than the Results section. One of the conclusions did not appear to match the results well.
The authors have provided the non-IPD data, annotated code and codelists for this analysis, and given details on the access request procedures for the IPD datasets. Display items for modelled distributions and YLLs were useful but were difficult to interpret for survival curves (Figure 3). A summarised version of Figure 3 would be more useful.
My key concerns were: The authors reported in the Discussion section that the LTC model in the second approach 'did not fully converge'. This is a key statistical issue should have been reported much earlier in the manuscript and steps taken to ensure model convergence at the analysis stage. No model diagnostics (for model selection, convergence or goodness of fit) were reported for the LTC model and age model components. Therefore I have major concerns over the validity of the model structure and its results. I also wonder how the authors managed to extract results from the LTC model and how these results should be interpreted since the model was not optimised. The authors' expectation that the results will not change substantially with future modifications to achieve convergence is not sufficient. If the model lacked input data, why were influenza rather than COVID-19 deaths (specifically the influenza deaths in 2011; Methods section) used to supplement the age model but not the LTC model? What quantity of data or correlation structure in the data was required to enable the model to converge? Is there a way to jointly model the LTC joint prevalence and age distribution? The authors should consider placing more emphasis on the first approach or the simpler sensitivity analyses, or eliminating features of the model that exacerbated non-convergence or may have led to overspecification in this study. Hopefully this would also reduce the reliance of the model on distributional assumptions input by the authors. 1.
The abstract described little of the data and methods used for the second approach. The source of data was stated as 'UK healthcare data', which was somewhat misleading when in fact only a combination of Scottish and Welsh health records were used and were mixed with aggregate level Italian death data. I could not detect the use of statistical models from reading the abstract, much less a simulation study using a combination of two Bayesian models and one parametric survival model.

2.
It was not clear which population the authors intended to model or make recommendations 3.
for. This was not stated in the aims, results or discussion, but the choice of data appeared to have a UK focus. The authors have used Scottish, Welsh and Italian data mixed togetherpresumably these were the datasets that were available to the authors in that short window. However, which population(s) would the results relate to? If only Italian data on underlying conditions was identified in the rapid review, why not focus on Italian data for the rest of the analysis for consistency? The choice of country matters as patterns in multimorbidity and mortality with multimorbidity depend on care provision and demographic characteristics (Nunes et al 2016). The results would also be more informative if country-level differences in mortality profiles at ages 50+ in Italy, Scotland and Wales did not contribute to the reported YLLs.
Additionally, the populations used in each model component ( Figure 1) were mismatched. The LTC and age models were based on deaths from COVID-19 and/or influenza and their LTC profile at the time of death in several datasets, while the survival model was based on the unselected SAIL population and their LTC profile at baseline. These choices were reasonable within each model component, but inconsistent when these components were combined. The choice of the complex LTC and age model design appeared to be a consequence of using sparse data (deaths from COVID-19 and/or influenza). It would be useful to assess whether the correlation structures of age and LTCs of those who died from these causes were substantially different from the structures of their respective wider populations.

4.
Since the authors reported in the Discussion section that the LTC model 'had wide posteriors (indicating substantial uncertainty)', the rationale for using a complex LTC model component as part of the main model in addition to the two extreme scenarios of independent LTCs and highly correlated LTCs is unclear. Running the model using those two scenarios alone was sufficient to provide a range of YLLs that would still address the aims of this study and avoid many of the issues raised in this review. In fact, Table 1 shows that the ranges of YLLs described by these two scenarios were not large and the YLLs were substantially larger than zero when age and multimorbidity counts were treated as associated.

5.
The authors' second conclusion that 'adjustment for number and type of LTCs only modestly reduces the estimated YLL due to COVID-19 compared to estimates based only on age and sex' seems to gloss over the more policy-relevant finding that there were substantial differences in YLLs by multimorbidity count (Table 2).

6.
Minor comments: Please refer to the relevant citation for 'age distribution of COVID-19 deaths in Italy from published data' in the WHO standard YLL approach subsection. 1.
Please spell out the first mentions of 'IPD' and 'SAIL', and move the description of SAIL to the first mention.

2.
Please specify what the 'high correlation' scenario is rather than the procedure used to obtain it (I think this scenario is specified more clearly in the R code rather than the main text.) The 'independent' scenario probably does not require explanation, but if explained, the word 'random' should be added.

3.
The authors' comparisons with YLLs from a 'UK report' for pneumonia was useful, and vital in the absence of LTC model and age model diagnostics. However, this 'UK report' only contains YLLs for England and Wales and not the whole of UK. Published YLLs for other flu pandemics or outbreaks could also be more relevant comparisons.

4.
It should be acknowledged that the analysis does not take into account COVID-19 deaths at ages under 50 years, which were rare but may have a non-negligible impact on the summary YLLs for the second approach.

5.
The authors refer to the same Github repository for all supplementary material and appendices, which contains a large number of files, mostly datasets, codelists, code and raw output, and no explanation of its contents. It would be helpful to report the interpretations of the outputs and to refer to specific files in the repository in each instance. 6.
The estimation of YLLs by type of LTCs was alluded to in the aims but not reported. Some results on YLLs for particular combinations, perhaps the most prevalent combinations, would be more clinically informative than the YLLs by multimorbidity counts. Alternatively, a ranking of LTCs by their lethality or a combined measure of lethality and prevalence would be informative too.

7.
The colour coding in Figure 3 was not explained (but is presumably the same as the other figures). 8.
The resolutions of the figures could be improved so that the axis labels are more readable. 9.
The caption for Table 2 should be revised to clarify that YLLs are tabulated by LTC count and not type.

10.
The authors acknowledged that YLL for COVID-19 is an 'imperfect' measure, but did not explain why. It would be helpful to add limitations of either YLL approach (eg the YLL models used here do not allow for competing risks) or consider alternative health metrics (eg YLD or DALY) in the Discussion section.

11.
The 'orders of magnitude suggested by some commentators' mentioned in the Strength and limitations section should cite the relevant reference. Evidence of suggestions or claims by multiple commentators rather than a single commentator would also be appreciated.

12.
The Discussion section focused on YLLs by number of LTCs and did not consider the wider context of multiple interlinked risk factors for COVID-19 deaths, several of which are available in the SAIL data (socioeconomic and health behavioural factors). The relative importance of age, sex and number of LTCs (and types of LTCs) should be given some consideration too.

13.
The YLL terminology is correct, but it may be worth clarifying (especially to readers who are unfamiliar with YLL or epidemiology) that these YLLs estimate the direct impact of COVID-19 deaths rather than the indirect impact of COVID-19-related outcomes. 14.
The comparison of raw death counts with YLLs per person as reporting metrics should be 15. refined as the authors and health organisations appear to stand at cross purposes. One key use of raw death counts is to track the evolution of the pandemic in real time, whereas these YLLs do not track time trends or describe the aggregate mortality burden. I have limited experience with Bayesian models and would recommend that a reviewer with specific expertise in Bayesian modelling in JAGS is invited at the next round to add any further comments on the Bayesian components of the models if these remain in the analyses.

If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: No competing interests were disclosed.

Author Response 29 Jan 2021
Peter Hanlon, University of Glasgow, Glasgow, UK Thank you for your review. We have uploaded a revised version of the manuscript based on your comments and those of the other reviewers. Our responses to each of your comments, along with quoted changes to the manuscript text, are shown below.
Reviewer comments are shown in italics.

Reviewer comment
Thank you for the opportunity to review this manuscript. This review relates to the version of the manuscript submitted for review (v1) and does not cover the addendum as the addendum was not mentioned in this version, nor does it refer to the authors' responses to comments on the manuscript. The review also takes into account that this version was posted on 23 April 2020 and hence did not have access to the better quality data and results from similar studies that were published after this date.

Overall: This is a highly topical investigation into the demographic impact of mortality from COVID-19 and the role of underlying conditions in these mortality patterns. The message that there is a large direct mortality burden from COVID-19, measured in terms of YLL, and that this the burden increases with number of underlying conditions, is clearly made by the study, particularly through results from the first approach (the standard WHO life table approach) and sensitivity analyses of the second approach. However, insufficient evidence was provided on the validity of the second and main approach, a novel complex simulation study of COVID-19 mortality in people with different LTC profiles, both in terms of the statistical rigour and the use of health data within the model. I recommend that substantial model development is carried out to resolve statistical issues and to assess model diagnostics, preferably in the separate methodological investigation that the authors mentioned, before being used in its final form in this empirical study to make policy recommendations.
The majority of the sections were clear, and compelling reasons were given for conducting this research and for using these results to inform policy and public dialogue. The use of a rapid review to acquire and assess data on underlying conditions is also commendable. The aims were not fully clear and did not match the rest of the manuscript: a target or reference population was not specified, no results by type of multimorbidity were reported even though they were accounted for in the model, and the manuscript focused on the design and application of the novel second approach when this was not a stated aim. The methods, while detailed, were also unclear for the second approach: multiple datasets, model descriptions and procedures were interweaved in no particular order in each subsection, it was difficult to differentiate the role of data vs assumptions, and apart from scenario testing, no model diagnostic or checking procedures were mentioned for the Bayesian components. Some key results on the LTC model convergence and posterior distribution were reported in the Discussion rather than the Results section. One of the conclusions did not appear to match the results well. Figure 3 would be more useful.

Author response:
We thank the reviewer for this summary of their assessment. We outline our response to each of their comments in detail below. In summary, we have added text to explain and justify the use of multiple datasets (due to modelling being conducted early in the pandemic in the context of sparce available data). We have redeveloped our model-fitting approach to achieve statistical convergence and include a detailed description of this, along with model diagnostics, in the online appendix.

Reviewer comment
My key concerns were: The 1.

Author response:
Thank you for these comments. We have revised our model fitting which now converges fully. We thus arrived at very similar multimorbidity estimates to the previous model, and as such the overall findings of the paper are unchanged. Model-fitting details are discussed on the online repository, along with all model diagnostics. We summarise briefly here. The multimodality of the posterior and extreme flexibility of our original model led to poor convergence. By constraining the model a small amount through 1) soliciting more informative priors that led to fewer samples being rejected during sampling, 2) changing how the correlation matrix between conditions was sampled during MCMC fitting to sample whole matrices rather than individual elements which is much more efficient, and 3) treating the absence of a diagnosis in the individual patient data as absence of the disease at a clinically significant level (as all the cases were reviewed by medical teams at the ISS to this standard) we were able to dramatically improve MCMC mixing and satisfy established convergence diagnostics. The previous model maximised flexibility (in terms of joint probabilities of LTCs) and propagated this uncertainty throughout the final estimates. The updated model sacrificed some of this flexibility while still propagating uncertainty which facilitated model convergence. Both approaches result in similar estimates, and are based on all available data.

Reviewer comment
The abstract described little of the data and methods used for the second approach. The source of data was stated as 'UK healthcare data', which was somewhat misleading when in fact only a combination of Scottish and Welsh health records were used and were mixed with aggregate level Italian death data. I could not detect the use of statistical models from reading the abstract, much less a simulation study using a combination of two Bayesian models and one parametric survival model.

Author response:
Thank you. We have added detail to the abstract, within the word limits of the journal, to highlight the variety of data sources and modelling strategies employed.

Reviewer comment It was not clear which population the authors intended to model or make recommendations for. This was not stated in the aims, results or discussion, but the choice of data appeared to have a UK focus. The authors have used Scottish, Welsh and Italian data mixed together -presumably these were the datasets that were available to the authors in that short window. However, which population(s) would the results relate to? If only Italian data on underlying conditions was identified in the rapid review, why not focus on Italian data for the rest of the analysis for consistency? The choice of country matters as patterns in multimorbidity and mortality with multimorbidity depend on care provision and demographic characteristics (Nunes et al 2016). The results would also be more informative if country-level differences in mortality profiles at ages 50+ in Italy, Scotland and Wales did not contribute to the reported YLLs.
Author response: We think that this YLL can reasonably be generalised to high-income countries, but most likely to hospitalised patients rather than those who died in care homes. We also argue that public health agencies should produce YLL estimates for their own specific settings using the individual-level patient data to which they now have access, as is currently underway in at least one country. We have added a caveat around care homes to highlight that our estimates cannot be extended to this population. The following text has been added to the manuscript: "We chose the WHO life tables to allow comparison of the burden of COVID-19 deaths with other conditions in an international context. However, these, unlike many national-level life tables, do not stratify by sex. Furthermore, our subsequent modelling draws upon data from specific setting based on availability early in the pandemic (namely data on COVID-19 deaths from Italy, and life-expectancy estimates based on data from Wales). Therefore… we performed sensitivity analyses using life tables from Italy (2017), United Kingdom (2016-2018) and, for comparison, the United States (2017)." "We chose the best data for age, sex and prevalence of LTCs that was available to us at the time of our modelling, but better-quality individual-level data specific to individual countries will yield substantially more reliable estimates. We would suggest that each public health agency should produce country-specific estimates, using the same LTC definitions in those who died as in the reference population and ideally to an agreed international protocol." "It should be noted that these estimates were made early in the pandemic and could not account for specific patterns and events which emerged within the UK. For example, these analyses were performed before the impact of COVID-19 in care homes in the UK became apparent. SAIL contains data on all participants registered with a GP (and so would include care-home residents), however our estimates of life expectancy do not distinguish between people who live in care-homes and those who do not. As such our analyses would not reflect the YLL at a population level where care-homes are disproportionately impacted. Our estimates, given the data sources which were available at the time, are more likely to reflect the YLL of COVID-19 deaths among hospitalised patients."

Reviewer comment
Additionally, the populations used in each model component (Figure 1)   . It would be useful to assess whether the correlation structures of age and LTCs of those who died from these causes were substantially different from the structures of their respective wider populations.

Author response:
The age and sex models were based on people who had died from COVID-19 as YLL can only be inferred as a counterfactual based on the characteristics of those who have died with COVID-19. The use of influenza death data was simply to obtain a plausible estimate for how strongly correlated multimorbidity would be with age in the context of death. As the reviewer suggests, we have compared the correlation of age and LTCs in these deaths with the population as a whole, and obtaining similar results (the point estimate for the wholepopulation lay within the confidence interval for those who died with influenza, although the estimate for the population as a whole is more precise owing to the larger sample size). Finally, in order to estimate YLL it was necessary to obtain estimates of life expectancy based on characteristics of people who died of COVID-19. Assessing life expectancy required us to analyse living patients (which we identified from SAIL, and analysed based on LTC profile, as the reviewer points out). It is true that these estimates are drawn from a range of different populations, and we have added the following text to provide context for these decisions: "We draw upon data sources available in April 2020, as this modelling study aimed to estimate the potential YLL at an early stage in the pandemic, when the impact was emerging. It should be noted, however, that events unfolding throughout the pandemic are likely to impact the YLL. Any estimate, particularly in the context of a pandemic, is dependent on what populations are exposed, and to what extent. Updated estimates, taking account of events which transpired in the UK and beyond, are the subject of ongoing collaborative efforts and we have not attempted to model these. Rather, this manuscript provides a detailed and reproducible quantification of YLL using techniques targeting the specific challenges of estimation at the early stages of a pandemic."

Since the authors reported in the Discussion section that the LTC model 'had wide posteriors (indicating substantial uncertainty)', the rationale for using a complex LTC model component as part of the main model in addition to the two extreme scenarios of independent LTCs and highly correlated LTCs is unclear.
Running the model using those two scenarios alone was sufficient to provide a range of YLLs that would still address the aims of this study and avoid many of the issues raised in this review. In fact, Table 1 shows that the ranges of YLLs described by these two scenarios were not large and the YLLs were substantially larger than zero when age and multimorbidity counts were treated as associated.

Author response:
Thank you for this comment. As we have discussed above, and in the online GitHub repository, the model now converges. The overall findings of the paper are unchanged based on this new (converged) model.

Reviewer comment
The authors' second conclusion that 'adjustment for number and type of LTCs only modestly reduces the estimated YLL due to COVID-19 compared to estimates based only on age and sex' seems to gloss over the more policy-relevant finding that there were substantial differences in YLLs by multimorbidity count (Table 2).

Author response:
We think both findings are interesting, but ultimately the wider point is the most relevant for policy. We are happy to amend the conclusion to reflect that both points are important: "While media coverage of the pandemic has focused heavily on COVID-19 affecting people with 'underlying health conditions', and while the number and type of LTCs certainly influence the life expectancy and YLL for individuals, adjustment for number and type of LTCs only modestly reduces the estimated YLL due to COVID-19 compared to estimates based only on age and sex."

Reviewer comment
Minor comments: Please refer to the relevant citation for 'age distribution of COVID-19 deaths in Italy from published data' in the WHO standard YLL approach subsection. Author response Thank you. We have added the relevant citation.

Reviewer comment Please spell out the first mentions of 'IPD' and 'SAIL', and move the description of SAIL to the first mention.
Author response Thank you. We have expanded the first mentions of these.

Reviewer comment
Please specify what the 'high correlation' scenario is rather than the procedure used to obtain it (I think this scenario is specified more clearly in the R code rather than the main text.) The 'independent' scenario probably does not require explanation, but if explained, the word 'random' should be added.

Author response:
The high-correlation scenario was intended to maximise the multimorbidity load at the individual level. We have added the following text "This generated a dataset where individuals with one comorbidity which reduces life expectancy were more likely to have other comorbidities which reduce life expectancy (and vice versa)"

Reviewer comment
The authors' comparisons with YLLs from a 'UK report' for pneumonia was useful, and vital

Reviewer comment It should be acknowledged that the analysis does not take into account COVID-19 deaths at ages under 50 years, which were rare but may have a non-negligible impact on the summary YLLs for the second approach. Author response:
We have added the following text to the discussion: "To prevent mean inflation through rare deaths in younger people, who only modelled deaths in people over 50 years."

Reviewer comment
The authors refer to the same GitHub repository for all supplementary material and appendices, which contains a large number of files, mostly datasets, codelists, code and raw output, and no explanation of its contents. It would be helpful to report the interpretations of the outputs and to refer to specific files in the repository in each instance. Author response: Thank you for this comment. We have organised the GitHub repository including the addition of a README file, and have edited the references to this in the text to allow the relevant sections to be more easily located.

Reviewer comment
The estimation of YLLs by type of LTCs was alluded to in the aims but not reported. Some results on YLLs for particular combinations, perhaps the most prevalent combinations, would be more clinically informative than the YLLs by multimorbidity counts. Alternatively, a ranking of LTCs by their lethality or a combined measure of lethality and prevalence would be informative too.

Author response:
This statement refers to the modelling approach where we simulated likely combinations of LTCs among people who had died of COVID-19. Our life expectancy estimates are therefore based on specific combinations of LTCs, which are then aggregated across LTC counts. We do not report estimates for specific combinations of LTCs as our YLL calculations are, by their nature, population-based estimates and presenting specific LTC combinations risks could give the impression that these estimates should be applied to individuals.

Reviewer comment
The colour coding in Figure 3 was not explained (but is presumably the same as the other figures). Author response: Thank you, we have added this to the legend for figure 3.

Reviewer comment
The resolutions of the figures could be improved so that the axis labels are more readable.

Author response:
We shall re-upload the figures with the revised version.

Reviewer comment
The caption for Table 2 should be revised to clarify that YLLs are tabulated by LTC count and not type.

Author response:
Thank you. We have added the following footnote to the table to clarify this: "*Estimates are based on life-expectancy calculates for specific types and combinations of LTCs, which are then aggregated across LTC counts."

Reviewer comment
The authors acknowledged that YLL for COVID-19 is an 'imperfect' measure, but did not explain why. It would be helpful to add limitations of either YLL approach (eg the YLL models used here do not allow for competing risks) or consider alternative health metrics (eg YLD or DALY) in the Discussion section.

Author response:
We have added the following text to the discussion to expand on this: "In additional to reporting YLL, metrics such as excess deaths and quality-adjusted life years are important to fully contextualise the loss of life seen in the pandemic."

Reviewer comment
The 'orders of magnitude suggested by some commentators' mentioned in the Strength and limitations section should cite the relevant reference. Evidence of suggestions or claims by multiple commentators rather than a single commentator would also be appreciated.

Author response:
Thank you, we have added citations to this section.

Reviewer comment The Discussion section focused on YLLs by number of LTCs and did not consider the wider context of multiple interlinked risk factors for COVID-19 deaths, several of which are available in the SAIL data (socioeconomic and health behavioural factors). The relative importance of age, sex and number of LTCs (and types of LTCs) should be given some consideration too.
Author response: We have added he following text to the discussion section to highlight this issue: "Also, we were not able to adjust our estimates for other factors and exposures (such as socioeconomic status, occupation, smoking, health behaviours) which would have given a more accurate representation of life-expectancy in the absence of COVID-19. Socioeconomic status is a particularly pertinent issue, as it may influence not only outcomes from infection (e.g. through multimorbidity and other risk factors) but also the likelihood of exposure (e.g. higher proportions of occupations for which home-working was not feasible). Since socioeconomic status also predicts mortality there is a possibility of residual confounding due to the lack of data on socioeconomic status available for our models. We also only modelled deaths in people over 50, and deaths in younger people, while rare, are likely to result in high YLL."

Reviewer comment
The YLL terminology is correct, but it may be worth clarifying (especially to readers who are unfamiliar with YLL or epidemiology) that these YLLs estimate the direct impact of COVID-19 deaths rather than the indirect impact of COVID-19-related outcomes.

Author response:
We highlight the issue of direct versus indirect effects in the introduction. We have edited these to emphasise that YLL estimation concerns the direct impact of COVID-19, and have added text to the discussion: "These choices will require balancing the likely direct effects on mortality from COVID-19 against the likely indirect impacts on mortality for other conditions -due for example to inadequate access to necessary services for many people with long term conditions (LTCs), potential reluctance of the public to attend for acute events such as myocardial infarction, or impacts from forced unemployment, loss of income and social isolation. The indirect effects are likely to be complex, most will be downstream, and will require extensive research to be better understood. However, we need to capture the direct effects of COVID-19 as accurately as possible now, via currently available data and methodologies." "Finally, our estimates of YLL only attempt to quantify the direct effects of COVID-19. Indirect impacts on mortality (e.g. through pressure on healthcare services of unintended consequences of lockdown measures) should also be considered, and are not captured by our YLL calculation."

Reviewer comment
The comparison of raw death counts with YLLs per person as reporting metrics should be refined as the authors and health organisations appear to stand at cross purposes. One key use of raw death counts is to track the evolution of the pandemic in real time, whereas these YLLs do not track time trends or describe the aggregate mortality burden. Author response: We agree that there are limitations to YLL, and that other measures (such as raw death counts, excess deaths etc.) are important to be used alongside. We have added the following to the discussion: "In additional to reporting YLL, metrics such as excess deaths and quality-adjusted life years are important to fully contextualise the loss of life seen in the pandemic."

Reviewer comment I have limited experience with Bayesian models and would recommend that a reviewer with specific expertise in Bayesian modelling in JAGS is invited at the next round to add any further comments on the Bayesian components of the models if these remain in the analyses.
Author response Thank-you for your review.

Division of Population Health, Health Services Research & Primary Care, University of Manchester, Manchester, UK
Thank you for the opportunity to review a very interesting article exploring the impact of COVID-19 on years of life lost (YLL), adjusting for multimorbid patterns in the population. The topic is clearly highly relevant, and the authors do a good job of explaining how moving beyond the raw numbers of COVID deaths is increasingly important for policymakers, one way of doing so is through YLL. They use a variety of datasets from different countries/contexts to estimate a raw YLL and a multimorbidity-adjusted YLL. The headline figures the authors report are startling, 13 and 11 years for men and women respectively after adjusting for multimorbidity. I do, however, have some major concerns on these estimates, mostly to do with data sources, potential selection effects, and how the reported modelling figure relates to observed actual figures. I would like to see the authors justifying/addressing these prior to recommending indexing. Below I provide more details on the major concerns and then more specific comments on each section. I hope the authors find these useful for strengthening their paper and look forward to seeing the revised version.
Major comments: You draw on a variety of data sources. I'm not entirely convinced it makes sense to put all that you've chosen together, though, or that the datasets (especially now) are the best available. You draw on data primarily from Italy for the 'LTC' and 'Age' models. But Italy is quite a specific context in terms of COVID-19, one where the health system was overwhelmed and, like any other one context, where specific policies were implementedboth of which likely affect the death profile. Is the profile of deaths in this context actually going to be generalisable to any other? The 'Survival' model, on the other hand, is from Welsh data. Again, I'm not convinced the mortality profile will be comparable to Italy in 'normal times' let alone 'COVID times'. For example, life expectancy in the UK is 2-3 years less than in Italy and the UK burden of premature death by LTCs is higher than the EU average. The use of WHO life tables is again a potential concern, particularly with no sexspecific details. Life expectancy differs significantly across sex, as does the negative effect of COVID-19. Particularly for the Italian data, the very small numbers are also concerning. You say in the article that at the time this was best available, which makes sense. Surely now there is a better source, though? It would be nice to see all data sources coming from the same country to avoid additional confounding introduced by policy/other context differences. Also, YLL will likely vary significantly by place and over time (e.g. with improvements to policy/treatments and as immunity begins to develop in populations).
Making the data sources clearer and more unified would help justify the where/when of the YLL figure you actually present. At the moment it is not at all clear how to interpret the YLL to any context. Linked to the above, but probably a more fundamental concern, I wonder if some of the datasets come from a biased population (i.e. selection concerns)? My concern is that both the 'LTC' and 'Age' models use data from dead patients only, but the 'Survival' model then uses all patients with healthcare contacts in Wales. The data sets focusing only on patients who have died can't be taking into account differences in likelihood of exposure to the virus (for example, more deprived persons with more comorbidities are more likely to be exposed through types of employment and subsequently to die -there are likely going to be different LTC patterns/relationships in the dead versus general population) or severity (for example, a majority of deaths in the UK and other countries are from care home patients -a person with asthma in the general population is going to have a very different life expectancy to someone in a care home with asthma). Can you justify the use of LTC patterns only in patients who have died? On this, I do wonder if there is something you can do with data on those who test positive to at least slightly address this point, perhaps in a two-part model, i.e. first conditioning on exposure to the virus given a set of observables (ideally containing at least deprivation, preferably also setting, e.g. care home), then on subsequent probability of death? ○ The YLL figure just doesn't seem to sit with observed reality. I realise this is a modelling study, but it would be nice to compare your findings to what we have actually observed. For example, what is the average age of death expected from your model compared to observed COVID age of death? Something to contextualise the very high observed average age of death of COVID patients (by some accounts very similar or higher than the average life expectancy for the general population) with the YLL figure you give is needed in the discussion section and/or throughout the results. Introduction I agree with the authors that YLL is an improvement over raw counts of deaths. But, there are also other methods that improve on raw counts, for example excess deaths. It would be good to contextualise/compare the YLL concept with excess deaths, and with concepts such as quality-adjusted life years (the latter might come up again in limitations section also).

○ ○
As above, agree that LTCs and multimorbidity are clearly important considerations for COVID YLL. However, so are other confounders, such as deprivation, etc. Again, would just be nice for some discussion of these other important aspects that are not currently considered in your analysis, either briefly here and/or discussion section.

Methods
Already mentioned in my summary at the top, but the neglect of sex-stratification in the WHO tables seems a major one. Can you justify/look for alternatives? ○ Can you justify the data sources from multiple countries and what the YLL that comes from combining them means? And/or, can you update the datasets to reflect a more coherent, understandable context? The strongest data seems to be the SAIL data you have access to. So, maybe drawing on ONS/CQC data (and life tables too) and/or other publications from the UK context might be a way to do this now? And/or, the US CDC seems to have a lot of data reporting on co-morbidities now, with at least much larger numbers if you still want to try and estimate a 'global' YLL estimate.
○ "IPD" (p.4) -spell out ○ For the LTC models, it is not obvious to me how/why you incorporate the Scottish data here. Can you make more clear what this adds and why it is necessary (especially with n=33)?
○ For the age models, why are you using SAIL data on influenza deaths? This is a different disease, so not obvious this is relevant. As above, perhaps there is better data available now? ○ "We arbitrarily chose a standard deviation of 0.5" (p.5). How much influence does this have? Sensitivity analysis? ○ "SAIL is a…" (p.5) You have already introduced SAIL data. Might want to move this up to first introduction.

Results
Throughout, I wondered how the figures you present would compare to the observed. This information should be available now, it would be nice to see validation of the model in this/another way. The definition of a COVID death varies somewhat across different countries, but tends to be fairly loose (e.g. suspected, or a positive test within X-days but not necessarily the recorded primary cause on the death certificate). Can you discuss implications of using this data, especially in relation to impact of other LTCs? Is it reducing the YLL of other conditions?

○
○ "As such, the attenuation of YLL following adjustment for LTCs may be an underestimate." As above, could also be an over-estimate without accounting for ○ ○ exposure to virus/severity of LTCs/different healthy system and time context. "This model did not fully converge and had wide posteriors (indicating substantial uncertainty) for the correlation between LTCs. We nonetheless included the results of this model in our analysis". This sounds like a pretty big issue. Why is it not converging exactly? What happens when you simplify the model to allow it to converge?
○ "Finally, given the emergent nature of the coronavirus pandemic…". This paragraph fine for a pre-print, but I think highlights the need for an update now in light of emerging data/evidence.

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results? Partly your comments and those of the other reviewers. Our responses to each of your comments, along with quoted changes to the manuscript text, are shown below.
Reviewer comments are shown in italics.

Reviewer comment
Thank you for the opportunity to review a very interesting article exploring the impact of COVID-19 on years of life lost (YLL), adjusting for multimorbid patterns in the population. The topic is clearly highly relevant, and the authors do a good job of explaining how moving beyond the raw numbers of COVID deaths is increasingly important for policymakers, one way of doing so is through YLL. They use a variety of datasets from different countries/contexts to estimate a raw YLL and a multimorbidity-adjusted YLL. The headline figures the authors report are startling, 13 and 11 years for men and women respectively after adjusting for multimorbidity. I do, however, have some major concerns on these estimates, mostly to do with data sources, potential selection effects, and how the reported modelling figure relates to observed actual figures. I would like to see the authors justifying/addressing these prior to recommending indexing. Below I provide more details on the major concerns and then more specific comments on each section. I hope the authors find these useful for strengthening their paper and look forward to seeing the revised version.

Author response
We thank the reviewer for their comments on this manuscript and are grateful for their recognition of the importance and relevance of the topic. We also appreciate that the reviewer highlights areas where the estimates could be improved with current data, particularly around data sources, selection effects, and how these estimates relate to the observed patterns of mortality in the pandemic. We would like to frame our response to these, however, by clarifying what this work is, and what it is not. We undertook this work at an early stage in the pandemic (published 23 rd April) in Europe. At this point, data on deaths in the UK was only just beginning to emerge and was not yet available for analysis. The outbreak in Italy was underway but data on the extent of the impact there remained sparse and poorly resolved (partly, for confidentiality reasons). Our analysis was therefore an attempt to model, at a very early stage in the pandemic and with limited and incomplete data, the impact of COVID-19 on years of life lost (YLL). A major motivation for doing so was to identify whether the high levels of comorbidity among people dying with COVID-19 might mean, as some commentators suggested, that very few years of life were being lost, expressed in the phrase that people dying 'would have died soon anyway'. Our analytic approach was specifically designed and developed to estimate YLL in spite of the sparse data available at the time. It was highly novel in that it used aggregated data on comorbidity to derive estimates of combinations of comorbidity. We did not, and do not, suggest that ours should be the definitive estimate of YLL from COVID-19 more than 6 months later when large quantities of individual-level data from single countries are now available and when the diagnosis and hospital treatment of the disease have undergone such rapid improvement. Indeed, in our original conclusion we suggested that "Public health agencies and governments should report on YLL, ideally adjusting for the presence of underlying LTCs, to allow the public and policy-makers to better understand the burden of this disease".
An example of one such effort is a large collaborative effort to calculate YLL in the UK including data on COVID-19 deaths in Wales (SAIL). This project will not use the specific methods developed for this work, which were designed to estimate YLL adjusting for multimorbidity in the context of sparse and aggregate data (such as in the early stages of a pandemic), but will use simpler methods that are possible when individual-level data are available. Therefore, while the reviewer is correct that various other data-sources could now be utilised to yield estimates of YLL, and that these data would be more complete and more specific to the circumstances of the outbreak in the UK, we do not believe that a revised version of this paper is the place for such an analysis.
The question, therefore, is whether the work reported via Wellcome Open Research hasto use the term from the journal's guidance --sufficient "academic merit" to meet the standard for publication. We would argue that the current paper has academic merit for three reasons: -We addressed the important clinical/policy question at the time, as to whether the high prevalence of comorbidity/multimorbidity among people dying with COVID-19 meant that the YLL from COVID-19 was substantially lower than the life expectancy conditional on age and sex alone. We demonstrated that it did not, and several other publications have confirmed this conclusion. This was an important finding.

1.
We demonstrated that emerging data on the characteristics of people who die during pandemics, even if available only in aggregate form, could be used to robustly estimate years of life lost. We also provided all of the data and code required to implement this method.

2.
In doing that, we have developed a novel methodological workflow for inference from incomplete, sparse data. This is methodologically a greater challenge than inference from more recent complete, rich data. Our analytical approach is neither necessary, nor statistically efficient for very large, patient-level data sets, but will be useful for early-stage estimation in future pandemics.

3.
Therefore, in revising this paper, we have incorporated the reviewer's comments in two ways: Where the concerns are methodological and based on our approach to data available at the early stages of the pandemic, we have updated analyses and amended our results accordingly.

○
Where the reviewer highlights data which emerged, or events which transpired, after our initial analysis, we highlight this in the discussion.

○
We have also rewritten the paper to contextualise the analysis as an attempt to estimate YLL early in the pandemic, with analytical approaches specific to the challenges faced at that time.
Detailed responses to the comments are given below.

Author response
Thank you for these comments. Our choice of data source was based on the best available data at the time. Given that time has now passed and events developed, and continue to evolve, we have added the following to the introduction to clarify that. "The SARS-CoV-2 pandemic, the virus causing COVID-19, emerged in late 2019 and continues to have substantial impact on populations and healthcare systems throughout the world. This manuscript presents a revised version of an early analysis -using data sources available in April 2020 -at which time Italy, the first European nation to experience a major outbreak of COVID-19, was seeing rapidly escalating numbers of cases and deaths. In the UK, at that time, the initially small number of hospitalisations and deaths were beginning to rise. The analysis sought to estimate the burden of COVID-19 deaths in terms of potential years of life lost (YLL), at a time when individual-level data on COVID-19 deaths was largely lacking." and "We draw upon data sources available in April 2020, as this modelling study aimed to estimate the potential YLL at an early stage in the pandemic, when the impact was emerging. It should be noted, however, that events unfolding throughout the pandemic are likely to impact the YLL. Any estimate, particularly in the context of a pandemic, is dependent on what populations are exposed, and to what extent. Updated estimates, taking account of events which transpired in the UK and beyond, are the subject of ongoing collaborative efforts and we have not attempted to model these. Rather, this manuscript provides a detailed and reproducible quantification of YLL using techniques targeting the specific challenges of estimation at the early stages of a pandemic."

Reviewer comment
Linked to the above, but probably a more fundamental concern, I wonder if some of the datasets come from a biased population (i.e. selection concerns)? My concern is that both the 'LTC' and 'Age' models use  Author response Thank you for these comments.
Taking first the issue of using data from patients who have died to estimate YLL. As the calculation of YLL aims to estimate the potential life expectancy of people dying of a specific cause (under the counterfactual that they had not died of this cause), it is necessary to use data from people who have died. This is a well-established approach for calculating YLL.
The standard approach to YLL typically uses just age and sex. We develop this further to include comorbidity, to address the specific point raised in the context of COVID-19, that because of common comorbidities most people who died with COVID-19 would have died soon in any case. We agree however that other factors do influence life expectancy, and agree that in future work, and depending on the exact policy question, variables such as socioeconomic deprivation and ethnicity could be included. We fully agree concerning care homes, however. The high proportion of deaths from COVID-19 occurring among care-home residents only emerged after our analysis, as we indicated in an addendum we rapidly published, partly in response to this issue of care homes, 3 weeks after the original publication, and now incorporate into the main manuscript. Nonetheless, we strongly agree that care home residents, are a special population, in whom severe disease, multimorbidity and frailty are likely to be commoner. We also agree that there are good biological reasons for suspecting that care home residents may be overrepresented among COVID-19 deaths compared to other causes of death (because it is an infectious disease and people in care homes are in a communal residence), and that the inclusion of more care home residents would likely have lowered the YLL in our analyses. Therefore, we would argue that the best approach for determining life expectancy in this group would be to estimate it using data which includes care home residency status ( https://github.com/dmcalli2/covid19_yll_final/blob/master/Scripts/Addendum.md) We have now incorporated this point, and the point about socio-economic status into the main manuscript as follows: "It should be noted that these estimates were made early in the pandemic and could not account for specific patterns and events which emerged within the UK. For example, these analyses were performed before the impact of COVID-19 in care homes in the UK became apparent. SAIL contains data on all participants registered with a GP (and so would include care-home residents), however our estimates of life expectancy do not distinguish between people who live in care-homes and those who do not. As such, our analyses would not reflect the YLL at a population level where care-homes are disproportionately impacted. Our estimates, given the data sources which were available at the time, are more likely to reflect the YLL of COVID-19 deaths among hospitalised patients." And "Socioeconomic status is a particularly pertinent issue, as it may influence not only outcomes from infection (e.g. through multimorbidity and other risk factors) but also the likelihood of exposure (e.g. higher proportions of occupations for which home-working was not feasible).[FC (1] Since socio-economic status also predicts mortality there is a possibility of residual confounding due to the lack of data on socioeconomic status available for our models." And "Additionally, while our estimates do account for multimorbidity, they do not take account of other factors which may impact the likelihood of contracting, or dying from, COVID-19. Socioeconomic status has been found to be an important risk factor for exposure and severity of COVID-19. So too is ethnicity." While the reviewer makes an interesting point about exploring patterns in people who test positive, this is beyond the scope of this manuscript (as described above), particularly as such data did not exist at the time at which these estimates were calculated.  figure you give is needed in the discussion section and/or throughout the results.

Author response
By definition, YLL cannot be observed, neither by hospital data, nor by controlled experiment. It can only be inferred as a counterfactual based on the characteristics of those who have died with COVID-19. Age and sex are the most important characteristics, and to these our study has added comorbidity. We agree that additional characteristics should now be added, but such data were not available when we developed our models and, as we have argued, it is beyond the scope of the current project to add these now. Moreover, a number of other studies have since attempted to estimate YLL from COVID-19 obtaining estimates have been broadly similar to our own. We would also note that none (to our knowledge) have attempted to model the observed events in care homes (see our qualifications in response to previous point) Finally, we would point out that the average life expectancy referred to by the reviewer is the average life expectancy at birth. Once an individual is known to have survived to an older age, their life expectancy is much higher than it is at their birth.

Reviewer comment
Introduction I agree with the authors that YLL is an improvement over raw counts of deaths. But, there are also other methods that improve on raw counts, for example excess deaths. It would be good to contextualise/compare the YLL concept with excess deaths, and with concepts such as qualityadjusted life years (the latter might come up again in limitations section also).

Author response
Thank you. We agree that these other metrics are important and that an assessment of YLL should be viewed alongside these other measures. We have added the following to the discussion: "In additional to reporting YLL, metrics such as excess deaths and quality-adjusted life years are important to fully contextualise the loss of life seen in the pandemic."

Reviewer comment
As above, agree that LTCs and multimorbidity are clearly important considerations for COVID YLL. However, so are other confounders, such as deprivation, etc. Again, would just be nice for some discussion of these other important aspects that are not currently considered in your analysis, either briefly here and/or discussion section.

Author response
We agree with the reviewer that these are important additional factors. We have added the following text (discussed more fully above): "Socioeconomic status is a particularly pertinent issue, as it may influence not only outcomes from infection (e.g. through multimorbidity and other risk factors) but also the likelihood of exposure (e.g. higher proportions of occupations for which home-working was not feasible). There is therefore a likelihood of residual confounding due to the lack of data on socioeconomic status available for our models." And "Additionally, while our estimates do account for multimorbidity, they do not take account of other factors which may impact the likelihood of contracting, or dying from, COVID-19. Socioeconomic status and ethnicity have been found to be important risk factors for severity of COVID-19."

Reviewer comment
Methods Already mentioned in my summary at the top, but the neglect of sex-stratification in the WHO tables seems a major one. Can you justify/look for alternatives?

Author response
We have added the following additional analyses to the text: Methods: "We chose the WHO life tables to allow comparison of the burden of COVID-19 deaths with other conditions in an international context. However, these, unlike many national-level life tables, do not stratify by sex. Furthermore, our subsequent modelling draws upon data from specific setting based on availability early in the pandemic (namely data on COVID-19 deaths from Italy, and life-expectancy estimates based on data from Wales). Therefore, following comments from academic colleagues via social media we performed sensitivity analyses using life tables from Italy (2017)

Author response
As discussed in detail above, our analysis and approach are based on data sources available early in the pandemic. We have retained this approach but added the following text: "To inform our estimates of number and type of LTCs, we first sought to identify the most detailed data available for underlying long-term conditions among people dying of COVID-19. We performed a rapid review…"

Reviewer comment "IPD" (p.4) -spell out Author response
We have changed to "individual patient data".

Reviewer comment
For the LTC models, it is not obvious to me how/why you incorporate the Scottish data here. Can you make more clear what this adds and why it is necessary (especially with n=33)?

Author response
This was an issue of statistical identifiability. Estimating joint probabilities (i.e. dependence between comorbidities) from marginal probabilities (aggregated disease counts) alone is an ill-posed problem. Individual-level patient data is much more informative and even sparse noisy individual-level data from Scotland about the correlation between comorbidities was sufficient to break symmetries between otherwise equivalent and hence unidentifiable comorbidity configurations and thereby improve model convergence.

Reviewer comment
For the age models, why are you using SAIL data on influenza deaths? This is a different disease, so not obvious this is relevant. As above, perhaps there is better data available now? Author response This approach was taken as a compromise given the lack of individual patient data on COVID deaths that was available at the time. We had only aggregate data on LTCs among people who had died, and therefore did not have data on the correlation between age and number of comorbidities. It is possible, but not self-evident, that these would be correlated (as conditioning on COVID-19 death may have influenced the relationship between multimorbidity and age). Therefore, we performed two analyses -assuming no correlation, and assuming correlation between age and multimorbidity. The use of influenza death data was simply to obtain a plausible estimate for how strongly correlated multimorbidity would be with age in the context of death. We accept that influenza and COVID-19 are different diseases, however in the absence of COVID-19 data we had to inform these correlations in some way and so selected a respiratory virus known to cause increased mortality particularly in older people and those with comorbidities. We have added the following to the methods to clarify. "While influenza is a different condition, these data were used for the sole purpose of estimating correlations between age and multimorbidity counts (conditioning on death), and did not inform the model in any other way" Reviewer comment "We arbitrarily chose a standard deviation of 0.5" (p.5). How much influence does this have? Sensitivity analysis? Author response Increasing the standard deviation to 1 led to an estimated YLL of 13.7 (95% CI 12.7-14.9) for men and 11 (95% CI 10.1-11.9) for women, so it had very little impact on the findings. We have modified the R code in the publicly available repository to allow others to modify the SD as they see fit.
Reviewer comment "SAIL is a…" (p.5) You have already introduced SAIL data. Might want to move this up to first introduction. Author response Thank you. We have moved this information to the first mention of SAIL.

Results
Throughout, I wondered how the figures you present would compare to the observed. This information should be available now, it would be nice to see validation of the model in this/another way.

Author response
We agree this is an interesting and important question. However, as we describe in detail above, this is an entirely distinct methodology to the current manuscript. In the introduction of the manuscript we set out the definition of YLL:-"Within epidemiology, there is a standard measure used to account for this difficulty, the years of potential life lost (YLL).(12) YLL can be expressed per-individual who died as the average number of years an individual would have been expected to live had they not died of a given cause. The conventional approach to YLL uses data on the age at which deaths occurred combined with typical life expectancy at a given age, to estimate a weighted average of the number of years lost. YLL is used to allow fair comparisons of the health impact of different policies -such as different measures to address the pandemic. However, given the controversial role of multimorbidity in COVID-19 deaths it is also important to calculate YLL additionally considering the effects of the presence of a single LTC or multimorbidity"

Reviewer comment
The generalisability of the estimate is discussed throughout the manuscript. To summarise, we think that this YLL can reasonably be generalised to high-income countries, but most likely to hospitalised patients rather than those who died in care homes. We also argue that public health agencies should produce YLL estimates for their own specific settings using the individual-level patient data to which they now have access, as is currently underway in. at least one country.

Reviewer comment You compare the estimated YLL to established infections/LTCs. Can you say something about likely changes as vaccines/treatments emerge (in light of the billions and billions being spent on them relative to other conditions), and as COVID-19 becomes endemic in populations?
Author response Thank you for this comment. We have added the following text: "It should be noted, however, that YLL for an emergent infection such as COVID-19, particularly in a pandemic, will be sensitive to the specific circumstances of the virus spreading, mitigation strategies, and potential future treatment or vaccines. These estimates, therefore, relate to the specific conditions at the time of modelling and will need to be updated, particularly as vaccination or other strategies alter susceptibility or severity of infection. It is important to note, however, that it would be a misuse of any such modelling if it were used to criticise decision-making undertaken at the time."

Reviewer comment
The definition of a COVID death varies somewhat across different countries, but tends to be fairly loose (e.g. suspected, or a positive test within X-days but not necessarily the recorded primary cause on the death certificate). Can you discuss implications of using this data, especially in relation to impact of other LTCs? Is it reducing the YLL of other conditions?

Author response
We agree that in future burden-of-disease estimation studies, where COVID-19 deaths are compared to other causes, difficult decisions will need to be made about the attribution of deaths to different causes such as dementia and COVID-19. However, we think that this is beyond the scope of the current manuscript which was focussed on estimating YLL from COVID-19 while accounting for the age, sex and multimorbidity of those dying with COVID-19.
Reviewer comment "As such, the attenuation of YLL following adjustment for LTCs may be an underestimate." As above, could also be an over-estimate without accounting for exposure to virus/severity of LTCs/different healthy system and time context.

Author response
We agree that the attenuation of YLL (ie the decrement in YLL when life expectancy is calculated using age, sex and comorbidity rather than just age and sex) could (compared to a model containing all predictors of mortality) be over or underestimated due to unmeasured confounding. However, since ten years is already rather a long period, we think that the former is the more pertinent for policy-making.
Reviewer comment "This model did not fully converge and had wide posteriors (indicating substantial uncertainty) for the correlation between LTCs. We nonetheless included the results of this model in our analysis". This sounds like a pretty big issue. Why is it not converging exactly? What happens when you simplify the model to allow it to converge?

Author response
We have revised our model fitting which now converges fully. We thus arrived at very similar multimorbidity estimates to the previous model, and as such the overall findings of the paper are unchanged.
Model-fitting details are discussed on the online repository, along with all model diagnostics. We summarise briefly here. The multimodality of the posterior and extreme flexibility of our original model led to poor convergence. By constraining the model a small amount through 1) soliciting more informative priors that led to fewer samples being rejected during sampling, 2) changing how the correlation matrix between conditions was sampled during MCMC fitting to sample whole matrices rather than individual elements which is much more efficient, and 3) treating the absence of a diagnosis in the individual patient data as absence of the disease at a clinically significant level (as all the cases were reviewed by medical teams at the ISS to this standard) we were able to dramatically improve MCMC mixing and satisfy established convergence diagnostics. The previous model maximised flexibility (in terms of joint probabilities of LTCs) and propagated this uncertainty throughout the final estimates. The updated model sacrificed some of this flexibility while still propagating uncertainty which facilitated model convergence. Both approaches result in similar estimates, and are based on all available data.

Reviewer comment
"Finally, given the emergent nature of the coronavirus pandemic…". This paragraph fine for a pre-print, but I think highlights the need for an update now in light of emerging data/evidence.

Author response
For health technology assessments of new interventions (e.g. of a novel drug), we would agree that an interim analysis based on aggregate data would be of limited use and that the decision should wait for a full analysis with all available data. However, as we have outlined more fully in our opening statement, we think that this work does have scientific and policy value as it stands -not least because it specifically addressed the question of whether the high prevalence of comorbidity/multimorbidity per se among those dying with COVID-19 should mean that the consequences of deaths from COVID-19 to individuals and society are less than the total rates would imply. Using the available data we showed that this view was not supported by the data. This conclusion has been built-upon via subsequent work by other[DM2] researchers: Alex Robinson, Self, USA Figure 3 is misleading. Looks like an artifact of painting lots of lines in little space. The lines should be much smaller than 1 pixel in thickness, but are not.

Comments on this article
Too, it might be interesting to see medians in addition to averages for some of these conclusion numbers.
Also, with respect to YLL's, it might be worthwhile to assign some values to those years. As in, do you consider the year when you're 80 years old to be equivalent in any way to the year you're 30? Could be better. Could be worse. But I'll bet on worse. And, way, way shorter if they are the same.
behind the analysis is more fundamentally flawed than previous commentators have noted.
Let me explain: The study starts with the correct assumption that, as people get older, their life expectancy increases since they can no longer die younger than what they already are at each point in time (so, their life expectancy is based on the average lifespan of other people who have lived at least as long as the person is now). However, comparing the age of a person who has just died with the distribution of lifespans of people who got to live at least the same time does not answer any relevant question here. Following this study's logic, you could get lifespan data from people who matched any arbitrary variable -say, people whose first name started with the letter "D" -compare their lifespan (which will not deviate from that of the general population) with their life expectancy on the day they died and find out that these people still had more than 10 years to live. Now does having a first name that starts with the letter "D" cost you 10 years of your life? Of course not.
I think a more meaningful comparison would be the lifespan of people with and without a certain feature, in this case the presence of COVID-19. It is stated," The ISS report also presented the proportion of patients who died with each of the following multimorbidity counts: 0 (2.1%), 1 (21.3%), 2 (25.9%) and ≥3 (50.7%). " Then it is stated," the proportion with each LTC was as follows:-ischaemic heart disease 27.8%, atrial fibrillation 23.7%, heart failure 17.1%, stroke 11.3%, hypertension 73%, diabetes 31.3%, dementia 14.5%, chronic obstructive pulmonary disease 16.7%, active cancer in the past 5 years 17.3%, chronic liver disease 4.1%, chronic renal failure 22.2%. " Then it is stated: "As such, mortality from COVID-19 represents a substantial burden to individuals and comparable to high burden LTCs such as ischaemic heart disease and chronic obstructive pulmonary disease. " Then it is stated, "Using UK reports for approximate comparisons, the YLL for other conditions ranged, per capita from 8.2 for chronic obstructive pulmonary disease, 11.6 for coronary heart disease, 13.1 for pneumonia, and 21.6 for asthma ." My comment: What we have is most with COVID have comorbidities in which the comorbidities have YLL comparable to COVID. You cannot divide one from the other. 75% have both COVID AND 2 or more comorbidities. There are simply not enough deaths from 0 comorbidity patients to say much about COVID YLL as a disease in and of itself.
Again we revisit 75% have two comorbidities and only 2% none.
What is the tie vote?. A common statistical practice is to use overall survival and the assault to the body which comes first is the culprit. Given only one thing to be labelled as reason for death, a patient riddled with metastatic breast cancer is first assumed to die of breast cancer and not the PE, or HAI, or COVID or other alphabet.
Also, we have to be very careful in how we provide editorial in conclusions of burden since we also know countries with poverty have a lower life expectancy than those more fortunate. The economic devastation, and job loss, is a great threat to YLL for our youthful of which most of these do not have secure academic positions expectancy, implemented via an excel tool. This would allow the commentators or others to explore the impact of different mortality rate ratios based on different assumptions as to the degree of residual confounding.
We have taken the former approach. As we are not aware of any empirical evidence to provide us with an estimate for the magnitude of the residual confounding due to unmeasured characteristics (e.g. frailty, functional limitation).
This is because, in order to make the assertion that those dying from COVID19 are atypical of their fellows who are similar in terms of age, sex and comorbidity we would argue that empirical evidence to support that claim is needed. Not least because, although we cannot know how strong they are, there may be selection pressures in the opposite direction. For example, someone with relatively mild COPD might go food shopping themselves, whereas someone with more severe disease might have someone else shop for them, thereby reducing their infection risk. Since the risk of death is the product of the risk of infection and the case fatality, this mechanism would tend to select for less severe COPD among those dying from COVID-19.
We argue that additional data, ideally on functional limitations (e.g. able to walk to shops, able to walk up stairs) and frailty measures (e.g. grip strength, lung capacity, six-minute walking distance) should be obtained to allow us to estimate the YLL more accurately using more empirical evidence.
Nonetheless, we think that this reasoning should not be applied to care home residents. Our results came out before the large numbers who were dying in care homes became apparent and this was not the focus of our work. Instead we agree that we should estimate mortality (and YLL) in care homes separately. Importantly, care home residents are a well-defined population so the task of estimating life expectancy in this group should be acheivable in most settings.
with COVID-19 from a care home and COVID-19 deaths within a care home (although further complicated by ONS capturing both death directly from COVID_19 where COVID-19 or suspected COVID-19 was mentioned anywhere on the death certificate. Your data set of 701 deaths in Italy is quite small with the rapid increase in UK deaths and the model established updating the model with a larger data set I believe has some urgency, although ONS together with Palantir should already have this analysis.