Studying accelerated cardiovascular ageing in Russian adults through a novel deep-learning ECG biomarker [version 1; peer review: awaiting peer review]

Background: A non-invasive, easy-to-access marker of accelerated cardiac ageing would provide novel insights into the mechanisms and aetiology of cardiovascular disease (CVD) as well as contribute to risk stratification of those who have not had a heart or circulatory event. Our hypothesis is that differences between an ECG-predicted and chronologic age of participants (δage) would reflect accelerated or decelerated cardiovascular ageing Methods: A convolutional neural network model trained on over 700,000 ECGs from the Mayo Clinic in the U.S.A was used to predict the age of 4,542 participants in the Know Your Heart study conducted in two cities in Russia (2015-2018). Thereafter, δage was used in linear Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 12 Wellcome Open Research 2021, 6:12 Last updated: 25 JAN 2021


Introduction
The concept of biological age as distinct to chronologic age has attracted substantial interest. From a public health and clinical perspective, a valid and reliable measure of biological age would contribute to risk stratification and prioritisation of individuals or groups for treatments and interventions to reduce exposure to risk factors.
Various molecular biomarkers of biological age have been proposed including telomere length 1 and DNA-methylation patterns 2,3 . These markers require invasive procedures, including blood taking and processing of DNA samples. Moreover, attempting to determine biological age of a particular organ as opposed to techniques which use leucocyte telomere lengths to derive a general biological age for example 4 requires tissue-specific samples. These factors make them unsuitable for routine use in clinical practice, as well as making their collection and analysis in large population-based studies prohibitively expensive. Establishing a new biomarker which is non-invasive and easily deployable at large scale in all settings, including resource constrained ones would therefore have many advantages.
The concept of heart age has been previously used to assess the physiologic ageing of the cardiovascular system 5 . A range of heart age prediction tools have been developed and typically require a combination of biochemical (e.g. serum cholesterol levels (LDL, HDL)), diabetes diagnosis, etc.) and other inputs (e.g. chronologic age, body mass index (BMI), diastolic and systolic blood pressure (DBP and SBP), mean arterial pressure (MAP)) 6 . Others incorporate imaging parameters such as coronary artery calcification, and carotid intima-media thickness into their risk prediction models 5,7,8 .
The electrocardiogram (ECG) is a ubiquitous and key clinical tool to carry out cardiovascular disease (CVD) risk stratification, diagnosis and prognosis 9,10 . We have recently developed a deep-learning algorithm that has been applied to routinely collected digitised 12-lead ECGs from a large clinical population in the U.S.A that has achieved high-accuracy in the prediction of left ventricular systolic dysfunction (LVSD) both retrospectively 11 and prospectively 12 , and atrial fibrillation while in normal sinus rhythm 13 .
We have also developed a separate deep-learning model based on the same population to accurately predict biological age exclusively from digitised 12-lead ECG data 14 . Subtracting true chronologic from ECG-predicted age (ECG-age) yields a difference that we call δage (delta-age). This metric will be positive for individuals predicted to be biologically older than their chronologic age, or negative if predicted to be younger. In the US clinical population used to train the prediction algorithm, the predicted age had a mean absolute error of 7 years (SD: 5.6 years) compared to chronologic age 14 . A δage of 7 years or greater predicted a higher risk of CVD events as well as a higher prevalence of low ejection fraction, hypertension and coronary disease 14 .
We suggest that δage may be regarded as a biologically proximal marker of the relative rate of cardiovascular ageing that reflect cumulative damage to date. We propose that it may have the potential for CVD risk stratification and case management as well as being a powerful tool for research to elucidate novel mechanisms of accelerated cardiac ageing and identify potential drivers of CVD differences across populations and between individuals.
In this study, we investigate how the novel biomarker ECG-derived δage relates to CVD risk profiles in a large crosssectional study. We do this by studying the association of δage with established CVD risk factors as well as markers of agerelated cardiovascular abnormalities in a cross-sectional study conducted in adults (35-69 years) from two Russian cities (Arkhangelsk and Novosibirsk; 2015-18).

Data sources and population
We used data obtained in a cross-sectional study in the Russian cities of Arkhangelsk and Novosibirsk. The study population included 4,542 men and women aged 35-69 years recruited from the general population between 2015 and 2018, as described in detail elsewhere 15 . All participants provided written informed consent as detailed in a previous publication describing the study protocol 15  Information on smoking status, alcohol use and education level were obtained through computer assisted personal interview (CAPI) questionnaires at baseline interview 15 . Alcohol use was assessed using the Alcohol Use Disorders Identification Test (AUDIT) instrument 16 . Education was coded as 3 categories (incomplete secondary, secondary and higher education) as used in an earlier Russian study 17 .
Participants were invited for a health check where a questionnaire and physical examination were performed in one of three study clinics 15 . Of those contacted and invited to take part, 47% attended the health check. A resting 10-25 seconds digital 12-lead ECG was obtained using the Cardiax device (IMED Ltd, Hungary) and the raw ECG signal was stored digitally 15 . Blood pressure and pulse rate were measured using the OMRON 705 IT automatic blood pressure monitor (OMRON Healthcare). The mean value of the second and third readings of systolic (SBP) and diastolic blood pressure (DBP) were used in the analysis. Mean arterial pressure (MAP) was calculated as: Body mass index (BMI) was measured using the following formula: where height (h; metres) was averaged across two measurements and obtained following previously defined methods 15 and weight (w; kg) was obtained using the TANITA BC 418 body composition analyser (TANITA, Europe GmbH) or a weighing scale when the TANITA could not be used 15 . Pulse wave velocity was obtained using the Vicorder device (Skidmore Medical Ltd, UK). Up to 8 pulse wave measurements were obtained based on quality of the measurements 15 , and the pulse wave velocity measurement used in this study was the average of all available measurements.
During the course of the health check, blood was obtained from the participants who were requested not to eat or drink alcohol during the 4 hours immediately before their booked appointment. Serum samples were taken, centrifuged, and aliquoted into 1.8-mL barcoded cryovials which were then frozen at −80°C within 2 hours. These were then analysed in a single batch at the end of the fieldwork at an independent laboratory in Moscow. The lipid profile of each sample was measured and included high-density and low-density lipoproteins.
High-sensitivity cardiac troponin T (hs-cTnT) and N-terminal pro-B-type natriuretic peptide (NT-proBNP) were measured using a high-sensitivity electrochemiluminescence immunoassay (Roche Diagnostics GmbH, Hitachi, Japan) on a Cobas e411 analyser. The lower limit of detection for hs-cTnT test was 3 ng/L. Of the total 3,864 participants, 69 (1.79%) were below the limit of detection for Hs-cTnT which was 3 ng/L and were recoded to 2.9 ng/L 18 . Similarly, 22 participants (0.57%) were below the lower limit of detection for NT-proBNP which was 5 ng/L and their values were recoded to 4.9 ng/L 18 .
Presence of plaques was estimated by high-resolution ultrasound using the Vivid q machine (GE HealthCare) with 6~13 MHz linear transducer in both side carotid arteries (common carotid artery, bifurcation, internal carotid artery). The still image for each identified plaque and cine loops in transverse and sagittal views was recorded digitally. The off-line reading of recorded images was performed with EchoPAC software (GE Vingmed Ultrasound, Horten, Norway) by an experienced reader (AR). Carotid plaque was defined as a focal structure encroaching into the arterial lumen at least 0.5 mm or 50% of the surrounding intima media thickness (IMT) value or IMT >1.5 mm as measured from the media-adventitia interface to the intima-lumen interface based on Mannheim Consensus 19 . Detailed methods are previously described elsewhere 19,20 . The current analysis uses the presence or absence of any plaques -a dichotomous variable.

ECG analysis of patients with multiple ECGs
During the course of the data collection, 402 randomly selected participants (202 in Arkhangelsk and 200 in Novosibirsk) were invited to retake the complete assessments performed in the study, including both baseline questionnaire and health-check measurements. The repeat interviews and examinations were performed one year after the first interview, with a two-week either side date difference allowed 21 . Out of the 402 total records, 369 repeat ECGs were available and were linked to the original ones taken a year earlier, both δage and a between-pair Pearson's correlation was calculated using R v3.4.4 base functions.
Overview of the AI model A convolutional neural network (CNN) model using Keras with a Tensorflow 2 (Google, Mountain View, CA) backend was previously developed and validated. A total of 774,783 unique subjects with ECGs were used to develop the neural network: 399,750 in the training set, 99,977 in the internal validation set and 275,056 ECGs in the holdout testing set. The network contained stacked blocks of convolutional, max pooling, and batch normalization 22 . A detailed description of the network is described in our previously published paper 14 . The network had a single output (ECG-age) as continuous number. This model was not re-trained using the Russian data. The measure δage was obtained by subtracting the chronologic age from the ECG-age. In this way, a positive δage would represent an individual in which the ECG would predict an older age than the chronologic age (older by ECG), while a subject with a negative δage would represent an individual found to be younger by ECG.
Statistical methods R v3.4.4. statistical software was used for statistical analysis. Continuous variables were standardised as SD units by subtracting the average and dividing by the standard deviation to allow comparable effects in regression analyses. Hs-cTnT and NTroBNP measurements were natural log transformed as they were not Gaussian distributed.
We used linear regression models to assess association of the risk factors as exposures using δage as the outcome. All models were adjusted for a priori confounders including chronologic age, sex and clinic where the measurement was taken, where the latter accounted for any systematic potential protocol and device calibration differences. Real chronologic age was also included in every model to remove the effects of any potential correlation between δage and chronologic age. A test for trend for smoking was performed by converting the ordered categories of number of cigarettes smoked into integer values. Basic models including one risk factor at a time were further adjusted for other risk factors plus education level.
We assessed whether any of the main effects varied by sex by undertaking an analysis of interactions. To assess the extent to which any results might be driven by the subset of people with pre-existing cardiovascular disease, sensitivity analysis was carried out by excluding any participant with either an ECG indicated myocardial infarction (MI) or self-reported history of stroke or MI and repeated the same linear regression models.

Results
A total of 4,780 resting ECGs of 15 seconds were collected from 4,378 participants in the main study sample (MSS) (402 ECGs were repeated for a randomly selected subset of MSS participants one year later in the repeat study sample (RSS)).
After exclusion of participants with missing data for any variable considered, the final set consisted of 3,863 and 369 ECGs for the MSS and RSS, respectively. Overall, mean chronologic age was 54.6 years (SD 9.7) on the day of the first ECG measurement and 1,597 (41.3%) of them were men.
The ECG-age for all participants had a mean of 59.8 years (SD 10.6). The average δage of this study population was 5.32 years (SD 7.6) ( Figure 1). This difference in predicted versus chronologic age was observed to be relatively constant for the age groups between 35 and 60 years old, with average δage converging towards the line of identity at older ages ( Figure 2). The RSS measurements were correlated with their respective MSS measurements from the previous year (Pearson correlation r = 0.72) (Figure 3).
Linear regression analysis of known CVD risk factors on δage (adjusted for sex, chronologic age and clinic where the ECG was taken) showed that SBP, DBP, MAP, BMI, LDL/HDL ratio and smoking status were associated with an increase in δage (P<0.0001); DBP and BMI showed the strongest effects, which were almost identical (1 SD unit increase leading to + ~1.6 δage years) ( Table 1). These effects were attenuated on mutual adjustment for confounders, but all variables remained statistically significant other than the AUDIT alcohol score (Table 2). Mutually adjusting smoking status and BMI revealed an increase in the effect of smoking. Further adjustment for education had minimal effect (results not shown).
In the fully adjusted model (Table 2), chronological age showed a negative association with δage, with a 2.66 (95% CI: 2.40, 2.92) years reduction in δage for a 10-year increase in chronologic age, and this effect was driven by the older age groups (Figure 2). This is consistent with the line of best fit moving closer to the line of identity with increasing chronological age as shown in Figure 2. δage differed between the two cities, in particular, the Novosibirsk population had a lower δage than that in Arkhangelsk (-1.21, 95% CI: -1.47, -0.54 years). Sex showed no significant effect on δage when studied by itself or in relation to individual CVD risk factors and no significant sex-interactions were observed. In the fully adjusted model ( Two markers of cardiac pathology (NTproBNP and hs-cTnT) showed positive associations between their log concentrations     and δage (P<0.0001) ( Table 3). Adjustment for other CVD risk factors marginally increased the strength of association of NTproBNP and attenuated the association of hs-cTnT with age. The associations of both of these biomarkers with δage remained significant after adjustment for risk factors (Table 3). These associations were marginally reduced by adjustment for education level.
Carotid plaque was associated with δage after adjustment for age, sex and clinic but this association was attenuated to null after further adjustment for CVD risk factors. Pulse wave velocity was positively associated with δage, although adjustment for CVD risk factors attenuated the effect appreciably (Table 3); this was largely accounted for by a shared association with systolic blood pressure.
Finally, we estimated the independent associations of MAP and pulse wave velocity on δage (Table 4). This highlighted the strong significant association of MAP and pulse wave velocity with δage with MAP showing an effect of 1.57 (95% CI: 1.33, 1.81) δage years increase per standard deviation unit increase of MAP (Table 4, Model 1). On mutual adjustment, the association between pulse wave velocity and δage was substantially reduced, while the association between MAP and δage only marginally reduced. Additional adjustment for other CVD risk factors further attenuated these associations.
However, the association with MAP remained statistically significant (P<0.0001).
Sensitivity analyses were performed to further explore the robustness of the effects observed. Following exclusion of all participants who self-reported having had a myocardial infarction or stroke or were found to have ECG-based evidence of myocardial infarction, associations with CVD risk factors changed only slightly. There was an increase of the adjusted association with smoking and the LDL/HDL ratio and a reduction   in the strength of association with hs-cTnT and NT-proBNP (Table 5-Table 7).

Discussion
In this study we have shown that the difference between the ECG-age obtained from a convolutional neural network and chronologic age, called here delta-age (δage), contains information about the level of exposure of an individual to established CVD risk factors and to markers of cardiac damage.
This novel biomarker is strongly associated in this Russian study population with established risk factors of CVD (systolic blood pressure, body mass index, smoking and serum cholesterol levels) in a direction that is intuitively consistent with δage being a marker of accelerated cardiovascular (vascular) ageing. In parallel we have found strong evidence that δage is associated with established markers of cardiac damage such as NTproBNP and hs-cTnT.
In order to explore the potential of this marker as a measure of accelerated cardiovascular ageing, we studied its association with pulse wave velocity, an established marker of vascular ageing. Carotid femoral pulse wave velocity is a proxy measurement for aortic stiffness 23 . Pulse wave velocity is known to be pressure dependent 24 , which could account for the strong attenuation of the effect of the association of pulse wave velocity with δage we observe when arterial pressure is accounted for. Despite this, intrinsic stiffness independent of the measured arterial pressure is a marker of ageing 25 . In this study, we have assessed this cumulative damage by considering both MAP and pulse wave velocity in the same model, this showed a positive association of pulse wave velocity with δage even after adjustment for MAP. Due to the complex nature of the interaction between pulse wave velocity and blood pressure, further research with well-powered cohorts would be required to disentangle the relationship of δage with traditional vascular markers of ageing.
The robustness of the δage biomarker and its ability to capture information about the intrinsic cardiovascular health of an individual can be observed in the reproducibility of our measurements taken one year apart in this study (Pearson's r=0.72), and across several years in a previous study 14 . Both of these results suggest that this measure is stable across time.
In addition, δage captured the effect of complex interactions such as the negative confounding effect of smoking on BMI and the positive trend on δage observed with increasing amounts of cigarettes smoked. Of special interest would be to assess the effects on δage in an intervention study aimed at CVD health and risk improvement.
One of the most striking results from our study is to show how an algorithm that has been exclusively trained in a clinical population from the U.S.A provides such an informative picture about the CVD risk of a different population in Russia, suggesting that the information captured by the algorithm is Table 7. Sensitivity analysis on the effect of markers of heart abnormalities and ageing biomarkers on δage (n = 3,521). Participants were excluded if they presented with either self-reported myocardial infarction, self-reported stroke or ECG-based evidence of myocardial infarction.  likely to be a generalizable marker of accelerated or decelerated cardiovascular ageing of an individual, and therefore robust and applicable across a number of healthcare settings.

Model 1: Adjusted for Age, Sex and Clinic
In the context of this Russian study, it is striking to observe that the mean δage (5.32 years in the overall study) is consistently higher across all younger adults (ranging from 35 to 60 years) and the strength of the association with both risk factors and markers of cardiac abnormalities described in this study increases when including only these younger adults in the models (data not shown). Previous studies showed that reduced telomere length, a biomarker of ageing, is associated with an increase in risk of CVD and all-cause mortality in an adult Russian cohort 4 . Furthermore, the difference between the ECG-predicted age from this model and chronologic age (δage) has been shown to be a strong independent predictor of cardiovascular and all-cause mortality in complementary work done in a separate population 26 . Combining these results with the presented data on CVD risk factors and markers of cardiac damage one could hypothesise that this novel biomarker contains information which might help us understand the causes of the abnormally high CVD mortality in working-age adults in Russia and its relationship to the observed average δage, although similar mortality studies would need to be performed in Russia to confirm this.
A weakness of this study is that it was based on a population sample with suboptimal response rates. However, we are primarily focussed on understanding associations within the studied population rather than wanting to generalise to the target population of all residents of the two cities being studied. To this extent the potential lack of representativeness may not be that important. Other potential weaknesses include the fact that the devices and protocols used to measure ECGs were not the same in training and test samples (model design in the U.S.A and CVD risk analysis in Russia); however, despite that this non-random misclassification can add noise to our analysis, the results obtained from the CVD risk factors analysis suggest these measurements are robust with regards to expected direction of the associations. Finally, a strength of this study is the availability of a wide range of different CVD risk factors and markers of cardiovascular abnormalities.
Obtaining this novel proximal cardiovascular biomarker does not involve an added cost beyond performing a 12-lead ECG and therefore has the potential to be widely used in routine clinical settings, including in low-and middle-income countries and can be applied retrospectively to digitally-acquired or digitized ECGs. Currently, there are hand-held devices which can undertake clinically approved ECGs in order to diagnose cardiovascular phenotypes or illnesses such as atrial fibrillation and others. The use of these devices might open the door to even less costly real-time longitudinal assessments of cardiovascular health using deep-learning models 27 .
Further research is needed to explore whether the δage biomarker has similar patterns of associations in other populations with different CVD contexts, as well as further explore and disentangle the determinants of the δage gap using established ECG features known to be affected by ageing and other CVD risk factors. Other research avenues might explore the genetic and epigenetic determinants of δage. For implementation in a clinical setting, it will be important to identify clinically relevant thresholds of this biomarker in order to establish CVD risk and streamline case management and prioritisation of patients.

Data availability
Data cannot be made available due to access needing to be reviewed by a steering committee upon request. All data used in this study can be requested through the Know Your Heart metadata portal (https://metadata.knowyourheart.science/) upon registration. Registered users are able to browse the variable-level metadata for the Know Your Heart study; however, the site does not provide access to the data themselves.
Bona fide researchers may apply to the study steering group (email: david.leon@lshtm.ac.uk) for an anonymised subset of the data. The primary criteria used by the steering group are scientific coherence of the proposed use of the data and the match between the stated aims of the proposed research and the variables requested. This means that applicants need to justify the sets of variables they wish to have. In some instances, the research topic of the application may be close to or replicates ongoing research either within the core team or by previous applicants. In these cases, this overlap would be communicated to the applicants with a suggestion for the way forward. More information can be found in the 'User Guide' document on the website.