Healthcare use by people who use illicit opioids (HUPIO): development of a cohort based on electronic primary care records in England [version 1; peer review: 2 approved]

Background: People who use illicit opioids such as heroin have substantial health needs, but there are few longitudinal studies of general health and healthcare in this population. Most research to date has focused on a narrow set of outcomes, including overdoses and HIV or hepatitis infections. We developed and validated a cohort using UK primary care electronic health records (Clinical Practice Research Datalink GOLD and AURUM databases) to facilitate research into healthcare use by people who use illicit opioid use (HUPIO). Methods: Participants are patients in England with primary care records indicating a history of illicit opioid use. We identified codes including prescriptions of opioid agonist therapies (methadone and buprenorphine) and clinical observations such as ‘heroin dependence’. We constructed a cohort of patients with at least one of these codes and aged 18-64 at cohort entry, with follow-up between January 1997 and March 2020. We validated the cohort by comparing patient characteristics and mortality rates to other cohorts of people who use illicit opioids, with different recruitment methods. Results: Up to March 2020, the HUPIO cohort included 138,761 patients with a history of illicit opioid use. Demographic characteristics and all-cause mortality were similar to existing cohorts: 69% were male; the median age at index for patients in CPRD AURUM (the database with more included participants) was 35.3 (IQR 29.1-42.6); the average age of new cohort entrants increased over time; 76% had records indicating current tobacco smoking; patients disproportionately lived in deprived neighbourhoods; and all-cause Open Peer Review


Introduction
Opioids are a class of controlled drugs that include illicit substances such as heroin, substitution therapies such as methadone and buprenorphine, and pain medication such as morphine and codeine. While these drugs have both therapeutic and recreational uses, compared with most psychoactive drugs there is a high risk of physical or psychological dependence 1 . The Diagnostic and Statistical Manual of Mental Disorders describes mild, moderate or severe 'opioid use disorders' 2 and the International Classification of Diseases provides criteria for 'harmful patterns of use of opioids' and 'opioid dependence' 3 .
The frequency of illicit opioid use is difficult to estimate, in part because people who use these drugs are poorly represented in traditional epidemiological surveys. One study suggests that 0.8% of people aged 15-64 in England are dependent on illicit opioids 4 , corresponding to approximately 300,000 individuals. There have also been growing concerns about dependence on prescription opioids, although the scale of this problem remains unclear 5,6 .
The health harms associated with illicit opioids are well-known. Many cohort studies have found high mortality rates 7 . Use of illicit opioids is directly associated with multiple health and social harms such as infections, accidents, homelessness, and imprisonment 1 . Co-occurrence of tobacco smoking, poor nutrition, and poor access to healthcare mean that almost all causes of death are more common among people who use illicit opioids than the general population 8 .
The main treatment for dependence on illicit opioids is opioid agonist therapy (OAT); a pharmacological treatment of longacting opioids such as methadone or buprenorphine. In England, OAT is provided by specialist drug and alcohol services or GPs, depending on local commissioning arrangements. A large body of evidence demonstrates the effectiveness of OAT across outcomes including mortality, physical and mental health, and criminal activity 9-13 .
Despite extensive evidence regarding the risks of illicit opioids and the benefits of OAT, there are important unanswered research questions. Epidemiological research and health interventions have focused on outcomes perceived to be 'drug-related', such as overdoses and HIV or hepatitis infections. Meanwhile, there is limited research into engagement with primary care services, healthcare quality, and treatment options for noncommunicable diseases and mental health problems 14 . These are important areas of research because the population of people who use illicit opioids in England (as in many other countries) is ageing 15 and the majority of excess deaths are now caused by non-communicable diseases such as liver disease, chronic obstructive pulmonary disease, and cardiovascular disease 16 .
To facilitate research in these areas, we aimed to develop and validate a phenotype that identifies people with a history of illicit opioid use in longitudinal UK primary care electronic health records (Clinical Practice Research Datalink GOLD and AURUM databases).

Study design
We developed and validated a cohort of people with a history of illicit opioid use, including cross-sectional analysis of participant characteristics at baseline, and a cohort analysis of all-cause mortality rates.

Data sources
The Clinical Practice Research Datalink (CPRD) GOLD and AURUM are databases of anonymised electronic health records from primary care, including approximately 8% and 13% of the populations in the UK and England respectively 17-19 . Although the databases include similar clinical information, they differ in terms of data collection software, clinical classification system and geographical coverage. CPRD GOLD includes data from GP practices throughout the UK, while CPRD AURUM initially included England only, and more recently practices in Northern Ireland have been added. To maximise comparability we have restricted the cohort to patients registered in England, though the methods can be used to include patients in other parts of the UK.

Entry and exit dates
We selected patients who were registered at participating GP practices between 1 January 1997 and 31 December 2018 for GOLD, and between 1 January 1997 and 31 March 2020 for AURUM. Cohort entry was defined as the latest of 1 January 1997, the first date when good quality data were available for that patient, and the date of the first code indicating illicit opioid use. Cohort exit was the earliest of the date when the patient stopped being observed ('last collection date') or participating in CPRD (the patient transferred out of a participating GP practice), or death as recorded in their primary care record. In addition to these criteria, we excluded patients who were aged under 18 or 65 or older at cohort entry.

Selection of patients with a history of illicit opioid use
We focused on patients with a history of opioid use (rather than specifically current use) due to the typically long duration of opioid use 20-22 and the likelihood that patients would not have regularly recorded opioid use. We therefore included patients with illicit opioid use recorded prior to the cohort entry date.
CPRD data include two main types of codes: product codes and clinical codes. Product codes indicate a prescription made in a primary care setting, whilst clinical codes indicate a diagnosis or other clinical observation (sometimes also a prescription). We selected patients by identifying product codes indicating a prescription of OAT and clinical codes indicating a history of illicit opioid use, such as 'heroin dependence' (see extended data for a full code list 23 ). We prioritised specificity over sensitivity, aiming to use codes that are only applied to the target population. Our process for selecting codes is summarised in Figure 1.

Product codes.
In the UK, treatment for opioid dependence involves the prescription of methadone or buprenorphine 24 . However, these medications are also licensed for other indications including pain and palliative cough 25,26 . We therefore developed a method to identify medications that are specific to OAT.
We searched CPRD dictionaries to identify all methadone and buprenorphine product codes (full search terms are available as extended data 23 ). We found 175 codes in CPRD GOLD and 136 codes in CPRD AURUM. We compared these lists to an existing list of OAT medicines 27 and found no additional codes. We then followed a two-step process to identify products specific to OAT. First, we described the age-and sex-distribution of patients at the time of the first prescription. This showed two distinct groups: drugs mainly prescribed to younger men, and drugs mainly prescribed to older women (see extended data 23 ). Data from specialist drug treatment services shows that the population receiving OAT is three-quarters male and predominantly aged 18-64 28 . In contrast, the population prescribed opioids for pain relief is mainly older and female 29 . We therefore excluded medications where more than half of patients were female, the lower quartile of age was younger than 18 years, or the upper quartile of age was older than 64, as these codes are unlikely to relate specifically to OAT. The majority of codes excluded were transdermal buprenorphine patches, which are not indicated for OAT 26 . Second, a prescribing professional working in a community drug and alcohol service reviewed remaining products to check they are used for OAT.
Clinical codes. CPRD GOLD and AURUM differ in the clinical coding system used. CPRD GOLD uses Read codes whilst AURUM uses SNOMED codes. We used keywords to search CPRD dictionaries to find Read and SNOMED clinical codes that may illicit opioid use (methadone; buprenorphine; abus*; addict; dependen*; drug user; heroin; inject; misus*; opiate; opioid; overdose). Our search identified 1,098 Read codes and 1,800 SNOMED codes. Two authors (DL and PP) screened the codes for relevance, with conflicts resolved through discussion.
Where codes were likely to indicate illicit opioid use, but did not specifically mention opioids, we classified them as 'probable'. For example, codes indicating injection of illicit drugs were classified as 'probable' because an estimated 94% of people who inject drugs in the UK use heroin 30 . These codes have been excluded from our analyses to maximise specificity, but can be included in future research if greater sensitivity is needed.
Some clinical codes described prescriptions, tests or adverse reactions relating to methadone and buprenorphine. We excluded these where the indication was unclear.
After agreeing a list of codes, we checked the age-and sexdistribution of patients with these codes in the same way as we did with the product codes. A small number of codes were either prescribed to a majority of female patients or had an upper quartile of age older than 64. All of these codes represented dependence on medications prescribed for analgesia (for example 'misuse of Codeine tablets'), which we classified as 'probable' and excluded from our analysis.

External validation
We validated the HUPIO cohort by comparing it to other samples of people who use opioids. We anticipated the following characteristics: (a) the average age of patients entering the cohort would increase over time, as the cohort of people who use illicit opioids in England is ageing 15 ; (b) high prevalence of smoking, with a systematic review finding an average of 84% of people enrolled in addiction services currently smoke 31 ; and 70% of patients starting treatment for opioid dependence in England in 2018 recorded as current tobacco smokers 28 . We reported the prevalence of current-and ex-smoking based on existing codelists for smoking histories 32 ; (c) disproportionate representation of patients living in more deprived areas, as illicit opioid use and opioid-related deaths are consistently associated with deprivation 33,34 ; (d) higher mortality rates than the general population, as studies of mortality in this population consistently show very high mortality rates 7 . We compared the standardised mortality ratios (SMR) for our cohort to those reported in existing studies of all-cause mortality in this population in England, identified by a brief literature search using Pubmed using the terms (opiate OR opioid OR heroin) AND (mortality OR death) without restrictions on language or publication date.
In addition to these characteristics, we reported the proportion of patients with recorded histories of homelessness, prison, and alcohol dependence, based on existing phenotypes 35 and searches of clinical codes. We expected these experiences to be common among people with a history of illicit opioid use 36 . However, we did not know how consistently these experiences would be recorded in primary care, and therefore did not use these variables for validation purposes.

Statistical analysis (estimation of mortality rates and ratios)
We calculated mortality rates and by (a) calculating the duration of follow-up in the HUPIO cohort, stratified by sex, single-year-of-age, and calendar year. We accounted for aging by expanding follow-up for each participant into days, and summarising the number of days by sex, single-year-of-age and calendar year; (b) applying mortality rates in the general population of England 37 to these strata to calculate a number of expected deaths; (c) dividing the number of observed deaths by expected deaths and calculating 95% confidence intervals using the exact Poisson method.
All data manipulation and analysis was conducted using R version 3.6.2 38 .

Patient and public involvement
People who use illicit opioids were involved in discussions about the need for research into health and healthcare for this population. as GP practices stopped contributing data; and in AURUM increased over time to a maximum of 44,935 in October 2018.

Cohort size and characteristics
The majority of patients in both databases had a relevant clinical code and no OAT prescription, while the majority of patients with an OAT prescription also had a relevant clinical code (Figure 3).
The median age at baseline was 33.5 in GOLD and 35.3 in AURUM. The distribution of age groups was similar to that of patients entering treatment for opioid dependence in England (see extended data 23 ). We observed a linear increase in the mean age of patients entering our cohort, parallel and three years older than the mean age of patients entering treatment for opioid dependence in England (see extended data 23 ). When stratified by date, the mean age of patients entering the cohort was similar for GOLD and AURUM. The older average age in AURUM is therefore explained by patients entering the cohort at later dates than in GOLD. In both GOLD and AURUM, 69% of patients were male; similar to 72% of patients in opioid agonist treatment in England in 2018 28 , and 72% of participants in the Unlinked Anonymous Monitoring Survey of People who Inject Drugs 39 . There was a clear association between deprivation and a history of opioid use, with over 40% of patients living in the most deprived quintile of neighbourhoods.
In both databases, approximately three-quarters of patients were current smokers (at the most recent record of smoking) and a further 10% were ex-smokers. Characteristics of patients are shown in Table 1.

Mortality rates and ratios
Mortality rates were similar for patients in the two databases. During a combined 777,384 patient-years of follow-up, there were 8,908 deaths (crude mortality rate of 11.5 deaths per 1,000 person-years). Given age-and sex-specific mortality rates in the general population of England, we expected 1,634 deaths, giving an SMR of 5.4 (95% CI 5.3-5.5). Table 2 provides a summary of follow-up time, deaths, and mortality rates, stratified by database, age and sex. We identified two studies that also reported SMRs in populations with a history of illicit opioid use in England. The first identified 198,247 opiate users from national drug treatment and criminal justice databases between 2005 and 2009 8 and used linkage to national mortality records to estimate an SMR of 5.7 (95% CI 5.5-5.9). The second identified 6,683 people entering treatment for heroin dependence in South London between 2006 and 2019 16 , again using linkage to national mortality records to calculate an SMR of 6.6 (95% CI 6.1-7.1). In both studies, as in our cohort, the crude mortality rate was higher for men, while the SMR was higher for women.

Discussion
We developed and validated an electronic healthcare record phenotype that identified approximately 139,000 patients with a history illicit opioid use registered at primary care practices in England. Patient characteristics (age, sex, smoking history,   and deprivation) and mortality rates were comparable to other samples of this population.

Strengths and limitations
To our knowledge, this is the first study to develop a method for identifying people with a history of illicit opioid use within primary care records. Earlier studies have focused specifically on people prescribed opioids 40,41 , general illicit drug use or dependence 42,43 , and people prescribed OAT 28, 44 . The latter is a limited subset of this population, particularly given that OAT in England is not always prescribed by GPs. These studies have included patients prescribed any methadone or buprenorphine product and excluded those with doses suggesting indications other than OAT (such as pain or palliative cough). Yet over 70% of daily doses for these medications are missing from CPRD 45 , and therefore require imputation or exclusion. The method developed in this study avoids the need for imputation by using products that are specific to OAT. It is possible that we excluded some medicines that are used for OAT in addition to other indications, though few patients with prescriptions of excluded methadone or buprenorphine products had other codes indicating a history of illicit opioid use (see extended data 23 ).
For individuals prescribed OAT in primary care, CPRD includes details of these prescriptions. However, in England, many individuals are prescribed OAT in other settings and information about these prescriptions is not available. The main alternative source of national data on people prescribed OAT in England is the National Drug Treatment Monitoring System, which provides data on people in specialist community drug and alcohol treatment services 28 . Although the population using these services is likely to overlap with those in primary care, there may also be important differences. For example, people accessing only drug and alcohol services may have more complex drug treatment needs, whilst those only in primary care may have greater physical comorbidity.
The main strength of CPRD in relation to other research datasets for this population is that it offers unique insights into primary healthcare. It can be linked to external datasets to obtain information on care in hospitals, cancer services, and mental health services, as well as causes of mortality. Algorithms have also been developed to facilitate identification of particular aspects of healthcare such as pregnancy 46 , or health outcomes such as cardiovascular disease 47 .
There are also limitations in the data, particularly because all data are derived from routine healthcare records. For example, in CPRD there is no systematic recording of the type and frequency of drug use, and the degree of drug dependence. The data on smoking presented in this article suggests that some characteristics of this population are well-captured by GPs, as fewer than 10% had no records and the prevalence of smoking is comparable to that found in other studies. Other characteristics may be less well-captured, for example fewer-than-expected patients had records of homelessness or prison.
Depending on the research question, selection biases are likely to be important. To be included in the CPRD sample, individuals need to be registered with a GP, attend an appointment, and disclose their drug use. At present, we do not know what proportion of this population is registered with a GP. In one study of homeless people who inject drugs in London; a subgroup likely to have relatively high barriers to GP registration, 70% provided GP details 48 , suggesting that a large proportion of this population is registered. However, disclosure of drug use is likely to differ. Groups more likely to disclose drug use may include those prescribed OAT (either in primary care or specialist drug and alcohol services), and those who are more unwell and therefore have more GP appointments. This latter factor may lead to an overestimation in differences in morbidity and mortality when comparing people with a history of illicit opioid use to the general population. Consideration of selection biases is important when interpreting analyses using this data and designing sensitivity analyses. Where possible, triangulation with other sources, such as the National Drug Treatment Monitoring System 28 , can improve confidence in findings.

Implications for future research
To date, research into people who use illicit opioids has focused mainly on a narrow range of outcomes such as blood borne viruses and overdoses. The average age of people with a history of illicit opioid use is increasing, and consequently the importance of chronic health conditions is also increasing 16 . Key areas for future research include the epidemiology of these health issues, assessing the risks and benefits of existing interventions such as OAT in terms of a broader range of health outcomes, and understanding utilisation and quality of general healthcare for this population.

Conclusion
People with a history of illicit opioid use have substantial unmet health needs. Yet to date, large-scale longitudinal studies of healthcare and holistic health outcomes in this population have been limited. We developed and validated a method of identifying people who have a history of illicit opioid use in primary care data to facilitate further research and support improvements in healthcare. This important methodological study sets out an approach for identifying a cohort of primary care patients in England with a history of illicit opioid use. Consistent with good Open Science principles, codes and algorithms are provided for transparency, to provide detail for future publications based on the cohort identified, and to enable other researchers to apply the same or an adapted approach in their own work with the CPRD, AURUM or related databases. The combined cohort produces an extremely large resource (over 135,000 people), and the longitudinal nature of the data will allow for progression and change to be tracked at the patientlevel.

Further details
Key external validation work is presented, confirming the cohort profile is consistent with other profiles and with expectations. It would be useful for future validation work to address what proportion of people with a history of illicit opioid use have this noted in their primary care records, and whether biases exist in whether this is a. reported by patients, and b. recorded by practitioners. For example, recognition may be more likely in those who use primary healthcare more, feel more able to disclose illicit behaviour, and for whom opioid use is having greater health consequences. Practitioners can be reluctant to record drug use, and there is useful qualitative research that could provide useful context on such biases (e.g. Davies-Kershaw 1 ).
This will be a powerful and pragmatic tool for examining a wider than previously considered range of health outcomes for a cohort of people with a history of illicit opioid use, and their longer term progression.
Minor: Spell out IQR acronym in abstract.  Summary: This paper described a method to establish a cohort of people with a history of opioid use in England using primary care electronic records from 1997 to 2020. Records were searched based on algorithm of primary care notes on opioid use and OAT. At validation, the cohort were found to be comparable to other cohorts of opioid users in the region by age, gender, tobacco use and mortality. The limitations of identifying opioid users from primary care records included people with more physical conditions, who were more connected to primary care and who more likely to disclose their drug use.

○
Overall, the manuscript is well written and presented, and demonstrates good scientific methods and discussion. I have a few comments on the content and some questions for consideration: Abstract: Background: Is there a typo in "people who use illicit opioid use" ? ○ Entry and Exit Dates: Over such a long recruitment period, are participants comparable in terms of case ascertainment ie. do you expect that identifying opioid use has become better or worse over the period? And have there been any changes in the healthcare system such as access to OAT or uptake of primary care over the recruitment period?  Table 1, considering the length of the study period, do you know why the median follow up period was relatively short at under 4 years? Do patients frequently transfer out of GP practices, or is this specific to this cohort? Are participants uniquely identifiable across different participating GP practices over the study period? In future, it would be interesting to compare the frequency at which they present to primary and other healthcare services. This may speak to some of the current limitations such as I) how representative the cohort is to other international opioid using populations in terms of health service use, and ii) determine if there are subgroups within the cohort with distinct patterns of primary care use (that may have other distinct characteristics such as drug use patterns, homelessness, deprivation, etc).

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others?