Performance evaluation of a multinational data platform for critical care in Asia

Background: The value of medical registries strongly depends on the quality of the data collected. This must be objectively measured before large clinical databases can be promoted for observational research, quality improvement, and clinical trials. We aimed to evaluate the quality of a multinational intensive care unit (ICU) network of registries of critically ill patients established in seven Asian low- and middle-income countries (LMICs). Methods: The Critical Care Asia federated registry platform enables ICUs to collect clinical, outcome and process data for aggregate and unit-level analysis. The evaluation used the standardised criteria of the Directory of Clinical Databases (DoCDat) and a framework for data quality assurance in medical registries. Six reviewers assessed structure, coverage, reliability and validity of the ICU registry data. Case mix and process measures on patient episodes from June to December 2020 were analysed. Results: Data on 20,507 consecutive patient episodes from 97 ICUs in Afghanistan, Bangladesh, India, Malaysia, Nepal, Pakistan and Vietnam were included. The quality level achieved according to the ten prespecified DoCDat criteria was high (average score 3.4 out of 4) as was the structural and organizational performance -- comparable to ICU registries in high-income countries. Identified strengths were types of variables included, reliability of coding, data completeness and validation. Potential improvements included extension of national coverage, optimization of recruitment completeness validation in all centers and the use of interobserver reliability checks. Conclusions: The Critical Care Asia platform evaluates well using standardised frameworks for data quality and equally to registries in resource-rich settings.


Introduction
The availability of high quality data systems to inform delivery, evaluation and improvement of health care is recognised as a central tenet of high quality health systems 1 . In critical care, where patient populations are heterogeneous, treatments complex and where the sequelae of care requires considerable human and financial resource, intensive care unit (ICU) registries have been instrumental in providing a mechanism for continuous, sustainable, wide scale data collection to enable service evaluation and facilitate national benchmarking of care quality. Until recently, these registries have been concentrated in high income countries, with the notable exceptions of networks in Brazil 2 and Sri Lanka 3 . Absence of these systems in resource constrained countries severely hamper efforts to build accountability for healthcare quality.
The need to invest in systems which provide data to drive research and improvement has been highlighted by recent recommendations as part of a series of strategies to address the imbalance in quality of care that exists internationally 1 . Recent growth in global internet connectivity and mobile technology has given opportunity for the digital health information system to be implemented and scaled in low and middle-income countries (LMICs).
The global coronavirus disease 2019 (COVID-19) pandemic has accelerated the role of registries in driving global research. For example, registries in Brazil, Australia, Europe, and in Asia have been instrumental as part of collaborations for pre-COVID-19 large scale multicentre studies 4,5 , observational research on COVID-19 6 and more recently interventional research, as exemplified by the randomized, embedded, multi factorial adaptive platform for community acquired pneumonia (REMAP-CAP) operational through registries in the USA and in South Asia 6 .
Whilst registries are increasingly being promoted for their role in enabling greater accountability of healthcare quality, and for their ability to facilitate multi centre clinical trials, the quality of data such systems provide requires rigorous evaluation 7,8 . To date, evaluation of existing vertical programme assessments for digital clinical and research registries, and for the World Health Organisation (WHO) endorsed district health information system platform 9 , have focused predominantly on the ongoing challenges of missingness and inaccuracies in reporting 10 . Few evaluations have extended to assess the timeliness, consistency, interoperability and accessibility of the data for external comparison 11,12 , despite these dimensions of data quality being essential for clinical research 13 . This study evaluates a network of seven federated registries operational in Asia which together use a single cloud-based platform as part of a collaboration for implementation and research in critical care. Critical Care Asia (CCA) is a collaborative programme of critical care research, training and quality improvement in Asia 14 . The CCA currently connects 97 ICUs in seven countries to provide diverse high-quality data to generate evidence and feedback in near real time for service improvement and research, akin to the foundations of a learning health system 15 . We sought to systematically evaluate the performance of CCA registries in Afghanistan, Bangladesh, India, Malaysia, Nepal, Pakistan and Vietnam using two pre published quality assurance frameworks 16,17 . We hypothesized that the quality of data arising from this federated network of registries would be high and comparable to the quality arising from ICU registries in high-resource settings.

Ethical considerations
This performance evaluation was classified as an audit and exempted from ethical review by the Oxford Tropical Research Ethics Committee (OxTREC) on June 16 th , 2020. The evaluation was conducted on registry data collected between June and December 2020.
Frameworks for assessment of performance The Directory of Clinical Databases (DoCDat) framework was established to inform researchers and clinicians on currently functioning clinical databases and to provide an independent assessment of their scope and quality 16 . Several high quality national registries in Australia, New Zealand and in the United Kingdom have used this same framework to evaluate data quality previously 11,12 . The framework (Table 1) consists of 10 items; four relating to registry coverage and six relating to reliability and validity of the data. Each item is rated on a scale of 1 to 4, with level 1 representing the least rigorous methods and Level 4 representing the most rigorous. The instrument was shown to have good face and content validity and to have no floor or ceiling effects 16 . A further framework to objectively assess registry quality especially in the development and implementation phase was published in 2002 and is also used in this evaluation (Table 2) 17 . This framework is divided into three main categories, and each category was applied to the central coordinating center and to the local sites. In case of disagreement between reviewers, final scoring was reached by consensus.

Performance review
Features and functions of the platform pertaining to data capture, quality and management were described and made available to a total of six reviewers. To maximize insight into the registry network while minimize potential sources of bias, a variety of scorers were identified. Three reviewers were independent reviewers with established track records in high quality critical care registry implementation and research in both high-income settings and LMICs. Three scorers were members of the CCA coordinating team (LP, TR, AB). Independent reviewers had full access to documentation, reports, training material and platform code, pertinent to the quality assurance

Amendments from Version 1
In this version we expanded the discussion identifying challenges with source data verification and quality assurance procedures that can be taken to mitigate the issue of registry data reliability.
Any further responses from the reviewers can be found at the end of the article  6 , and the inclusion of newly implemented registries (Afghanistan, Bangladesh, Malaysia and Vietnam). Basic information on these registries is detailed in Table 3.

Registry structure overview
Registry structure for established registries in India, Pakistan and Nepal was already published 15,19,21 . In brief, the CCA platform has a modular structure, where a core dataset of 33 variables  captured within the first 24 hours of admission to ICU and 5 variables at discharge, provides episodic information to enable evaluation of case mix, acuity, organ support and outcomes 19,22 . Additional modules complement the core data set providing stakeholders with a mechanism for embedding measures to evaluate care processes synonymous with care quality, and undertake observational and interventional research ( Figure 1). The registry platform has a customisable user mobile and desktop interface and accessible data entry support tools. Minimum data connectivity requirements (3G data and offline function) along with downloadable data exports facilitate the registries adoption in settings which may previously have failed to implement digital systems due to poor internet coverage or limited access to hardware. Integrated analytics dashboards and reports displaying trends in information, activity and quality indicators provide a mechanism for service reporting and cycles of audit and feedback with the clinical teams 15 .
The network has a federated system for registry data storage, whereby national registries house their data and are supported to establish infrastructure and skills to manage and curate data. All anonymised registry and trial data is backed up to a central server. A summary of registry implementation procedures reported using the template for intervention description and replication (TIDieR) checklist is detailed in the extended data 18 .

Data collection procedures
Data is recorded prospectively and extracted directly from patient charts by data collectors daily and contemporaneous to clinical care. Laboratory tests are reported in the ICU's routine unit of measurement and harmonised to a single measure. A comprehensive field specification and data collection guide are made available to all stakeholders through the platform. Data collectors are remotely trained prior to commencing data collection using a demo platform and ongoing 24 hr online support is available. Follow up meetings are offered weekly to enable ongoing feedback and improvement regarding data quality and support with registry led research and audit. Census checks with independent admission data are used to monitor cohort inclusion daily or weekly at users' preference. The platform's existing internal data quality mechanisms, field completeness, value range validity and branching logic prompt users to missed or potentially spurious responses.

Results
Assessment of performance using the DoCDat criteria A summary of the performance of the registries using the DoCDat criteria is shown in Table 4 18 , and compared to the average evaluation of other existing DoCDat databases 11,16 . The median score achieved by the registries across all criteria was 3.4 (minimum 1.4, maximum 4). Detailed scoring of each criterion is described below, while the score assigned by each external and internal reviewer is detailed in  (Table 3). Registry team members contact each ICU on a daily or weekly basis as preferred by the registry and validate admission, discharge and bed occupancy. The recruitment completeness was assessed through a dedicated section of the online platform. The process of daily or weekly validation of recruitment completeness was conducted in all but one registry, and in 77% of all ICUs (Table 6).

C. Variables included.
Mean score 3.3. All seven registries reported the core data set and were able to derive severity of illness and prediction of mortality using published   (Table 7). Two registries (IRIS in India and PRICE in Pakistan) also collected medium to long term patient centred outcomes (i.e. after hospital discharge) and quality of life indicators such as the Euro quality of Life 5-dimensions tool (EQ5D-3L) 24 and scales for post traumatic stress disorders (PTSD).

D. Completeness of variables.
Median score 3.8. All core variables were reported in the seven registries with < 5 % missingness, sustained over the 6-month period (Table 7). Overall, the availability of the core data set was 98.9%.  All vital signs had a completeness >97%, while the variable with lowest score regarded type and dose of vasoactive drugs (96.6%).

E. Capture of raw variables.
Median score 3.8. Raw data accounts for all fields in the core data set. Weekly meetings and 24/7 remote support between the CCA platform team and collaborating registries were reported using an online project management tool, which provided an audit trail of user queries, responses and platform development in response to recurring themes from user feedback.

F. and G. Explicit rules for how variables are recorded.
Median scores 4.0 and 3.8 respectively. A detailed data dictionary complete with field specifications was available for all variables in the dataset and was uniform across the registries.

H. Reliability of coding.
Mean score 3.7. The CCA platform's use of SNOMED CT (www.snomed.org) and Observational Medical Outcomes Partnership (OMOP) common data model mapping (www.ohdsi.org), ensures international standardized nomenclature covering both diagnostic conditions and operative procedures in all collaborating registries. However, no intra-rater or inter-rater reliability testing was performed.
I. Independence of observations. Median score 3.8. The primary outcome assessment for all episodes of care, was observed independent of patient care and independent from the clinical team. Similarly, secondary outcomes pertaining to vital status as 30 days-up to one year following ICU admission were captured by investigators blinded to existing encounter data.
J. Data validation. Mean score 3.5. Data is validated internally according to the CCA dataset definitions. Fields are validated for completeness, consistency of response across sibling or parent-child fields. Inbuilt mandatory rules developed based on cycles of testing and analysis in CCA network sites ensure completeness of core dataset, and alerts within the user interface prompt users to complete supplemental fields. Illogicalities and inconsistencies in relational fields are minimised using inbuilt branching logic. Data validation reports, updated every 24hrs, are accessible to end users via the platforms reports interface. Clinicians and administrators can also interrogate the CCA data set directly by downloading reports, viewing data via the real-time dashboards, or by submitting requests for analyses to the CCA registry implementation team. Free text fields are used only to supplement predetermined menus which have been generated from pre-existing guidance e.g. for Center of Disease Control definitions, or for the Acute Physiology and Chronic Evaluation (APACHE) IV diagnostic codes).
Assessment using the framework of procedures proposed by Arts et al. 17 The CCA platform fulfilled all criteria proposed by this framework, with the exception of 1 item ( Table 2) pertaining to the central coordinating center checking on interobserver variability. The scoring for this framework was homogeneous across all reviewers.

Discussion
This independent evaluation of federated critical care registries from seven LMICs in Asia performed better than previously reported evaluations of multi centre databases using the DoCDat criteria 2,11 . Key components of the platform were standardised field specification, inbuilt validation at data entry, audit reporting on completeness, consistency and validity checks of the data. The greatest limitation of the registries when evaluated against the criteria were in national geographical coverage and the absence of source verification of data.
The representativeness criteria was the lowest scoring as the CCA network spread is inhomogeneous with large differences across countries. The primary goal of capturing outcomes information is to identify high-performance hospitals or health-care delivery systems in order to uncover the best practices responsible for their superior outcomes and seek to implement them in other settings. A limited coverage across the collaborating registries limits the ability to benchmark care nationally and internationally, but such benchmarking may have limited utility in healthcare systems in developing countries. This is due to both difficulty in capturing outcomes after ICU discharge and infeasibility of complex risk adjusted stratification. Although historically national coverage has been considered a key criterion to enhance data quality, we do not consider this to be the case for a federated network system spanning across several countries. The focus is on the community of practice rather than the extent of coverage, on the actual use of the data for unit level or multicenter quality improvement initiatives, audit and feedback rounds and clinical trials. Yet, efforts to increase expansion inside individual countries continue, with new centers joining the registry on a regular basis.
Some of the challenges faced by the CCA registry are specific to LMICS, others are more common and observed across registries worldwide 11,12 . Completeness of recruitment is still not assessed in one third of the CCA ICUs and limits the exact knowledge of patients missed by the registry. On the other hand, the patient census often was higher than the reported admitted patients on ICU admission books, questioning the reliability of routinary admission books as a representation of the exact count of admitted patients. Staffing and retention of dedicated data collectors are also recognized challenges faced by registries worldwide 11,12 .
Data collection, data entry and verification are frequently carried out by staff from diverse clinical or non-clinical backgrounds with verification of data accuracy that may be seldom performed at unit level. Despite no formal audit of a sample of medical records was performed, similar rates of discrepancies (i.e. around 5%) found in previous registries 11,25 . may be expected from the CCA federated registry system. Source data verification (SDV), whilst not a formal part of routine registry data quality assessment, is conducted on registry data, used for clinical trials. A powerful infrastructure for enabling clinical research in settings where trial resource and experience is limited, CCA collaborating centers participate in international clinical trials, including REMAP-CAP and MegaROX in part because the trial CRFs have been embedded into the registry platform. Up to 100% of study data for trial enrolled patients is subject to SDV. The operationalisation of clinical research through the registry platform is an important mechanism for assessing and improving overall data quality, and for establishing a culture of clinical audit, feedback and research whereby there is direct linkage of data collected to evaluating patient outcomes and delivering service improvement. What remains perhaps more uncertain is the reliability of the underlying source documentation. Reviewing source documentation (SDR) to assess the underlying quality of the data is largely absent from healthcare data internationally. Assessing documentation for patterns of data and deviations is likely to reflect significant biases in both what and when information (individual data and clinical events) is recorded as clinical practice varies widely both within a given setting and internationally. Limitations and potential flaws in reliability of registry data have been highlighted in the past 26 . Rigorous and regular assessments of registry data such as the one performed in this article may overcome some of these limitations. Continuous audit and analysis at unit, regional and national level also contribute to strengthening data collection and interpretation procedures.
With the increased use of registries for registry-embedded clinical trials and observational research there is a drive for improved data quality 27 . In addition to the mandatory field completeness, range checks, primitive and entity data-type constraints, additional mechanisms are in place for data quality assurance: data version management, access control for curated data sets, role-based access, verified audit trails and source verification of data. Registries can also allow a better understanding of how close standard care arms are to routine care, through the validation of trial data in the context of pre-existing registry data. Finally, data interoperability across multinational registries is currently being facilitated by the increasing integration of international coding systems (e.g. SNOMED), use of Common Data Models and the participation in data sharing initiatives such as the Linking of Global Intensive Care (LOGIC) consortium 28 .
The architecture of the CCA registry facilitates ICUs retaining ownership of submitted data. The CCA registry provides contributors with a platform for capture of unit level data using a common data structure, and enables real time analysis to inform clinical care and service delivery via dashboards and collated reports. In fact, ICU beds in Asian hospitals constitute an average 9% of hospital beds, highlighting the importance of reliable and comparable data 29 . Leveraging the same data platform, ICUs can contribute patient and hospital de-identified data to the CCA for benchmarking, multi-centre research purposes and quality improvement. Investigator initiated research can also be started by ICU registry leads within the network and on approval and agreement of clinical and institutional collaborators.
Similarly to the DoCDat criteria, Arts et al. suggested the need for transparent data definitions, standardized data collection guidelines and central training of individuals involved in data collection 17 . The CCA failed to meet one of the suggested criteria concerning the interobserver variability checks on collected data. This would require the data collection performed by different individuals with a subsequent check against the source files, a resource-intensive procedure that constitutes a challenge for all quality clinical registries 27 . Yet, all the other domains pertaining to both the central coordinating center and peripheral ICUs were fulfilled. This provides factual endorsement for the federated system experimented by the CCA network of multiple registries with both national and international coordination.
Across the globe, registries are now being leveraged to support large scale multi-centre clinical trials and evaluate complex improvement interventions. Regarding trial recruitment, adapted registry platforms promote rapid onboarding, inform site selection and improve patient recruitment, and can facilitate study monitoring through inbuilt data quality and validation processes 6 . Potential limitations of registry-based trials concern the controlling for confounding and bias 30 . The CCA network is already supporting several of the REMAP-CAP arms trials 6,31,32 , while also enabling observational and outcome research 23 .
This study has some limitations. The assessment was limited to core data as this dataset was available throughout all registries in the network. While other data domains will presumably share similar infrastructure scoring, the completeness of data may vary. The assessment included registries with diverse size and experience, with aggregate scoring performed without emphasis on single registry's scores and improvement points.

Conclusions
The CCA federated registry system is a rapidly growing network that provides high quality ICU data concerning case mix, processes of care and clinical outcomes from seven Asian countries. The system had a high performance when assessed using rigorous predefined scoring systems tackling completeness, reliability, validity and organizational infrastructure. While representativeness and interobserver reliability checks were identified as potential areas for improvement, overall performance was equal to national registries in high income settings. To Reviewer 1 (Paul Young): Comment by reviewer: This study describes a performance evaluation of the Critical Care Asia Registry using a recognised framework. This technical report is well written and thorough. As outlined, the framework that has been used is well-established. The greatest uncertainty is probably about whether the data that are entered into the registry are truly an accurate reflection of what is in the source data. This is a concern in all registries but could conceivably be a particularly troublesome issue in lower income countries where resources are more limited. My only significant comment is whether more attention to this point in the Discussion would be appropriate.

Data availability
Response: Thank you for this insightful comment. We acknowledge that the issue of registry data reliability is a constant concern. We expanded the paragraph in the discussion identifying challenges with source data verification and quality assurance procedures that can be taken to mitigate this issue, as follows: "Data collection, data entry and verification are frequently carried out by staff from diverse clinical or non-clinical backgrounds with verification of data accuracy that may be seldom performed at unit level. Despite no formal audit of a sample of medical records being performed, similar rates of discrepancies (i.e. around 5%) found in previous registries [11,24] may be expected from the CCA federated registry system. Source data verification, whilst not a formal part of routine registry data quality assessment, is conducted on registry data, used for clinical trials. A powerful infrastructure for enabling clinical research in settings where trial resource and experience is limited, CCA collaborating centers participate in international clinical trials, including REMAP-CAP and MegaROX in part because the trial CRFs have been embedded into the registry platform. Up to 100% of study data for trial enrolled patients is subject to SDV. The operationalisation of clinical research through the registry platform is an important mechanism for assessing and improving overall data quality, and for establishing a culture of clinical audit, feedback and research whereby there is direct linkage of data collected to evaluating patient outcomes and delivering service improvement. What remains perhaps more uncertain is the reliability of the underlying source documentation. Reviewing source documentation (SDR) to assess the underlying quality of the data is largely absent from healthcare data internationally. Assessing documentation for patterns of data and deviations is likely to reflect significant biases in both what and when information (individual data and clinical events) is recorded as clinical practice varies widely both within a given setting and internationally. Limitations and potential flaws in reliability of registry data have been highlighted in the past [25]. Rigorous and regular assessments of registry data such as the one performed in this article may overcome some of these limitations. Continuous audit and analysis at unit, regional and national level also contribute to strengthening data collection and interpretation procedures." To Reviewer 2 (Stefano Finazzi)