Healthcare use by people who use illicit opioids (HUPIO): development of a cohort based on electronic primary care records in England

Background: People who use illicit opioids such as heroin have substantial health needs, but there are few longitudinal studies of general health and healthcare in this population. Most research to date has focused on a narrow set of outcomes, including overdoses and HIV or hepatitis infections. We developed and validated a cohort using UK primary care electronic health records (Clinical Practice Research Datalink GOLD and AURUM databases) to facilitate research into healthcare use by people who use illicit opioid use (HUPIO). Methods: Participants are patients in England with primary care records indicating a history of illicit opioid use. We identified codes including prescriptions of opioid agonist therapies (methadone and buprenorphine) and clinical observations such as ‘heroin dependence’. We constructed a cohort of patients with at least one of these codes and aged 18-64 at cohort entry, with follow-up between January 1997 and March 2020. We validated the cohort by comparing patient characteristics and mortality rates to other cohorts of people who use illicit opioids, with different recruitment methods. Results: Up to March 2020, the HUPIO cohort included 138,761 patients with a history of illicit opioid use. Demographic characteristics and all-cause mortality were similar to existing cohorts: 69% were male; the median age at index for patients in CPRD AURUM (the database with more included participants) was 35.3 (interquartile range 29.1-42.6); the average age of new cohort entrants increased over time; 76% had records indicating current tobacco smoking; patients disproportionately lived in deprived neighbourhoods; and all-cause mortality risk was 6.6 (95% CI 6.5-6.7) times the general population of England. Conclusions: Primary care data offer new opportunities to study holistic health outcomes and healthcare of this population. The large sample enables investigation of rare outcomes, whilst the availability of linkage to external datasets allows investigation of hospital use, cancer treatment, and mortality.


Introduction
Opioids are a class of controlled drugs that include illicit substances such as heroin, substitution therapies such as methadone and buprenorphine, and pain medication such as morphine and codeine. While these drugs have both therapeutic and recreational uses, compared with most psychoactive drugs there is a high risk of physical or psychological dependence 1 . The Diagnostic and Statistical Manual of Mental Disorders describes mild, moderate or severe 'opioid use disorders' 2 and the International Classification of Diseases provides criteria for 'harmful patterns of use of opioids' and 'opioid dependence' 3 .
The frequency of illicit opioid use is difficult to estimate, in part because people who use these drugs are poorly represented in traditional epidemiological surveys. One study suggests that 0.8% of people aged 15-64 in England are dependent on illicit opioids 4 , corresponding to approximately 300,000 individuals. There have also been growing concerns about dependence on prescription opioids, although the scale of this problem remains unclear 5, 6 .
The health harms associated with illicit opioids are well-known. Many cohort studies have found high mortality rates 7 . Use of illicit opioids is directly associated with multiple health and social harms such as infections, accidents, homelessness, and imprisonment 1 . Co-occurrence of tobacco smoking, poor nutrition, and poor access to healthcare mean that almost all causes of death are more common among people who use illicit opioids than the general population 8 .
The main treatment for dependence on illicit opioids is opioid agonist therapy (OAT); a pharmacological treatment of longacting opioids such as methadone or buprenorphine. In England, OAT is provided by specialist drug and alcohol services or GPs, depending on local commissioning arrangements. A large body of evidence demonstrates the effectiveness of OAT across outcomes including mortality, physical and mental health, and criminal activity [9][10][11][12][13] .
Despite extensive evidence regarding the risks of illicit opioids and the benefits of OAT, there are important unanswered research questions. Epidemiological research and health interventions have focused on outcomes perceived to be 'drug-related', such as overdoses and HIV or hepatitis infections. Meanwhile, there is limited research into engagement with primary care services, healthcare quality, and treatment options for noncommunicable diseases and mental health problems 14 . These are important areas of research because the population of people who use illicit opioids in England (as in many other countries) is ageing 15 and the majority of excess deaths are now caused by non-communicable diseases such as liver disease, chronic obstructive pulmonary disease, and cardiovascular disease 16 .
To facilitate research in these areas, we aimed to develop and validate a phenotype that identifies people with a history of illicit opioid use in longitudinal UK primary care electronic health records (Clinical Practice Research Datalink GOLD and AURUM databases).

Study design
We developed and validated a cohort of people with a history of illicit opioid use, including cross-sectional analysis of participant characteristics at baseline, and a cohort analysis of all-cause mortality rates.

Data sources
The Clinical Practice Research Datalink (CPRD) GOLD and AURUM are databases of anonymised electronic health records from primary care, including approximately 8% and 13% of the populations in the UK and England respectively 17-19 . Although the databases include similar clinical information, they differ in terms of data collection software, clinical classification system and geographical coverage. CPRD GOLD includes data from GP practices throughout the UK, while CPRD AURUM initially included England only, and more recently practices in Northern Ireland have been added. To maximise comparability we have restricted the cohort to patients registered in England, though the methods can be used to include patients in other parts of the UK.

Entry and exit dates
We selected patients who were registered at participating GP practices between 1 January 1997 and 31 December 2018 for GOLD, and between 1 January 1997 and 31 March 2020 for AURUM. Cohort entry was defined as the latest of 1 January 1997, the first date when good quality data were available for that patient, and the date of the first code indicating illicit opioid use. Cohort exit was the earliest of the date when the patient stopped being observed ('last collection date') or participating in CPRD (the patient transferred out of a

Amendments from Version 1
We have revised our article following feedback from reviewers. This includes: 1. Correction of errors identified by reviewers.
2. Discussion of barriers to recording of illicit opioid use in GP practices (which has been the subject of previous qualitative work).
3. An internal validation exercise using hospital admissions related to opioid use.

4.
Correction of the mortality rates after discovery that some participants had immortal time. A feature of our CPRD cohort was that all participants should have at least 12 months of followup after entry into CPRD. Where participants entered the cohort earlier than 12 months after entry into CPRD (because their first record of opioid use occurred earlier than 12 months after entry into CPRD), the time during this first 12 months was immortal. We, therefore, adjusted the analysis to start follow-up at the latest of 12 months after entry into CPRD or the first record of opioid use. This changed the overall SMR (quoted in the abstract) from 5.4 to 6.6.
Any further responses from the reviewers can be found at the end of the article participating GP practice), or death as recorded in their primary care record. In addition to these criteria, we excluded patients who were aged under 18 or 65 or older at cohort entry.

Selection of patients with a history of illicit opioid use
We focused on patients with a history of opioid use (rather than specifically current use) due to the typically long duration of opioid use 20-22 and the likelihood that patients would not have regularly recorded opioid use. We therefore included patients with illicit opioid use recorded prior to the cohort entry date.
CPRD data include two main types of codes: product codes and clinical codes. Product codes indicate a prescription made in a primary care setting, whilst clinical codes indicate a diagnosis or other clinical observation (sometimes also a prescription). We selected patients by identifying product codes indicating a prescription of OAT and clinical codes indicating a history of illicit opioid use, such as 'heroin dependence' (see extended data for a full code list 23 ). We prioritised specificity over sensitivity, aiming to use codes that are only applied to the target population. Our process for selecting codes is summarised in Figure 1.

Product codes.
In the UK, treatment for opioid dependence involves the prescription of methadone or buprenorphine 24 . However, these medications are also licensed for other indications including pain and palliative cough 25,26 . We therefore developed a method to identify medications that are specific to OAT.
We searched CPRD dictionaries to identify all methadone and buprenorphine product codes (full search terms are available as extended data 23 ). We found 175 codes in CPRD GOLD and 136 codes in CPRD AURUM. We compared these lists to an existing list of OAT medicines 27 and found no additional codes. We then followed a two-step process to identify products specific to OAT. First, we described the age-and sex-distribution of patients at the time of the first prescription. This showed two distinct groups: drugs mainly prescribed to younger men, and drugs mainly prescribed to older women (see extended data 23 ). Data from specialist drug treatment services shows that the population receiving OAT is three-quarters male and predominantly aged 18-64 28 . In contrast, the population prescribed opioids for pain relief is mainly older and female 29 . We therefore excluded medications where more than half of patients were female, the lower quartile of age was younger than 18 years, or the upper quartile of age was older than 64, as these codes are unlikely to relate specifically to OAT. The majority of codes excluded were transdermal buprenorphine patches, which are not indicated for OAT 26 . Second, a prescribing professional working in a community drug and alcohol service reviewed remaining products to check they are used for OAT.
Clinical codes. CPRD GOLD and AURUM differ in the clinical coding system used. CPRD GOLD uses Read codes whilst AURUM uses SNOMED codes. We used keywords to search CPRD dictionaries to find Read and SNOMED clinical codes that may indicate illicit opioid use (methadone; buprenorphine; abus*; addict; dependen*; drug user; heroin; inject; misus*; opiate; opioid; overdose). Our search identified 1,098 Read codes and 1,800 SNOMED codes. Two authors (DL and PP) screened the codes for relevance, with conflicts resolved through discussion.
Where codes were likely to indicate illicit opioid use, but did not specifically mention opioids, we classified them as 'probable'. For example, codes indicating injection of illicit drugs were classified as 'probable' because an estimated 94% of people who inject drugs in the UK use heroin 30 . These codes have been excluded from our analyses to maximise specificity, but can be included in future research if greater sensitivity is needed.
Some clinical codes described prescriptions, tests or adverse reactions relating to methadone and buprenorphine. We excluded these where the indication was unclear.
After agreeing a list of codes, we checked the age-and sexdistribution of patients with these codes in the same way as we did with the product codes. A small number of codes were either prescribed to a majority of female patients or had an upper quartile of age older than 64. All of these codes represented dependence on medications prescribed for analgesia (for example 'misuse of Codeine tablets'), which we classified as 'probable' and excluded from our analysis.

External validation
We validated the HUPIO cohort by comparing it to other samples of people who use opioids. We anticipated the following characteristics: (a) the average age of patients entering the cohort would increase over time, as the cohort of people who use illicit opioids in England is ageing 15 ; (b) high prevalence of smoking, with a systematic review finding an average of 84% of people enrolled in addiction services currently smoke 31 ; and 70% of patients starting treatment for opioid dependence in England in 2018 recorded as current tobacco smokers 28 . We reported the prevalence of current-and ex-smoking based on existing codelists for smoking histories 32 ; (c) disproportionate representation of patients living in more deprived areas, as illicit opioid use and opioid-related deaths are consistently associated with deprivation 33,34 ; (d) higher mortality rates than the general population, as studies of mortality in this population consistently show very high mortality rates 7 . We compared the standardised mortality ratios (SMR) for our cohort to those reported in existing studies of all-cause mortality in this population in England, identified by a brief literature search using Pubmed using the terms (opiate OR opioid OR heroin) AND (mortality OR death) without restrictions on language or publication date.
In addition to these characteristics, we reported the proportion of patients with recorded histories of homelessness, prison, and alcohol dependence, based on existing phenotypes 35 and searches of clinical codes. We expected these experiences to be common among people with a history of illicit opioid use 36 . However, we did not know how consistently these experiences would be recorded in primary care, and therefore did not use these variables for validation purposes.

Internal validation
We used hospital admissions where the patient had a diagnosis of 'mental and behavioural disorders due to use of opioids' (ICD10 code F11) to test the sensitivity of the primary care codelist. Among all patients in CPRD with linked Hospital Episode Statistics data, we requested dates of hospital admissions where F11 was recorded in any diagnostic position. We then reported the proportion of these patients who had a primary care code indicating a history of illicit opioid use based on the HUPIO codelist, and whether the primary care code was before the first hospital admission, in the 30 days after the first admission, or more than 30 days after admission. We compared the timing of these events because information may be recorded in primary care databases following receipt of hospital discharge summaries.

Statistical analysis (estimation of mortality rates and ratios)
We calculated mortality rates and standardised ratios for the subset of patients with linked ONS mortality data (further detailed about the linkage process are available in CPRD documentation) 18,19 . We requested mortality data for these patients, with a final date of follow-up of 1 May 2019. To minimise bias due to delayed death registration, which is likely to occur disproportionately for people who use illicit opioids due to the involvement of coroners in deaths due to drug poisoning, we stopped follow-up six months before this date (i.e. 30 October 2018). To calculate the standardised mortality ratio, we: (a) calculated the duration of follow-up in the HUPIO cohort, stratified by sex, single-year-of-age, and calendar year. We accounted for aging by expanding follow-up for each participant into days, and summarising the number of days by sex, single-year-of-age and calendar year; (b) applying mortality rates in the general population of England 37 to these strata to calculate a number of expected deaths; (c) dividing the number of observed deaths by expected deaths and calculating 95% confidence intervals using the exact Poisson method.
All data manipulation and analysis was conducted using R version 3.6.2 38 .

Patient and public involvement
People who use illicit opioids were involved in discussions about the need for research into health and healthcare for this population. The median age at baseline was 33.5 in GOLD and 35.3 in AURUM. The distribution of age groups was similar to that of  patients entering treatment for opioid dependence in England (see extended data 23 ). We observed a linear increase in the mean age of patients entering our cohort, parallel and three years older than the mean age of patients entering treatment for opioid dependence in England (see extended data 23 ). When stratified by date, the mean age of patients entering the cohort was similar for GOLD and AURUM. The older average age in AURUM is therefore explained by patients entering the cohort at later dates than in GOLD. In both GOLD and AURUM, 69% of patients were male; similar to 72% of patients in opioid agonist treatment in England in 2018 28 , and 72% of participants in the Unlinked Anonymous Monitoring Survey of People who Inject Drugs 39 . There was a clear association between deprivation and a history of opioid use, with over 40% of patients living in the most deprived quintile of neighbourhoods.

Cohort size and characteristics
In both databases, approximately three-quarters of patients were current smokers (at the most recent record of smoking) and a further 10% were ex-smokers. Characteristics of patients are shown in Table 1.

Mortality rates and ratios
In CPRD AURUM, linkage to ONS mortality records was conducted for 75,807/108,270 patients (70%), and in CPRD GOLD, linkage was conducted for 23,241/30,491 patients (76%). Mortality rates were similar in the two databases. During a combined 910,567 patient-years of follow-up, there were 12,404 deaths (crude mortality rate of 13.6 deaths per 1,000 person-years). Given age-and sex-specific mortality rates in the general population of England, we expected 1,872 deaths, giving an SMR of 6.6 (95% CI 6.5-6.7). Table 2 provides a summary of follow-up time, deaths, and mortality rates, stratified by database, age and sex. We identified two studies that also reported SMRs in populations with a history of illicit opioid use in England. The first identified 198,247 opiate users from national drug treatment and criminal justice databases between 2005 and 2009 8 and used linkage to national mortality records to estimate an SMR of 5.7 (95% CI 5.5-5.9). The second identified 6,683 people entering treatment for heroin dependence in South London between 2006 and 2019 16 , again using linkage to national mortality records to calculate an SMR of 6.6 (95% CI 6.1-7.1). In both studies, as in our cohort, the crude mortality rate was higher for men, while the SMR was higher for women.

Internal validation
Among patients hospitalised with a diagnosis of 'mental and behavioural disorders due to use of opioids', 89% of patients in GOLD and 88% of patients in AURUM also have a record in primary care data indicating a history of illicit opioid use. 72% and 67% of hospitalised patients respectively have the first relevant primary care code prior to hospitalisation, 2% and 3% have the primary code in the 30 days after hospitalisation, and 15% and 18% have the code more than 30 days after hospitalisation. A table of this information is provided in extended data.

Discussion
We developed and validated an electronic healthcare record phenotype that identified approximately 139,000 patients with a history illicit opioid use registered at primary care practices in England. Patient characteristics (age, sex, smoking history, and deprivation) and mortality rates were comparable to other samples of this population.

Strengths and limitations
To our knowledge, this is the first study to develop a method for identifying people with a history of illicit opioid use within primary care records. Earlier studies have focused specifically on people prescribed opioids 40,41 , general illicit drug use or dependence 42,43 , and people prescribed OAT 28,44 . The latter is a limited subset of this population, particularly given that OAT in England is not always prescribed by GPs. These studies have included patients prescribed any methadone or buprenorphine product and excluded those with doses suggesting indications other than OAT (such as pain or palliative cough). Yet over 70% of daily doses for these medications are missing from CPRD 45 , and therefore require imputation or exclusion. The method developed in this study avoids the need for imputation by using products that are specific to OAT. It is possible that we excluded some medicines that are used for OAT in addition to other indications, though few patients with prescriptions of excluded methadone or buprenorphine products had other codes indicating a history of illicit opioid use (see extended data 23 ).
For individuals prescribed OAT in primary care, CPRD includes details of these prescriptions. However, in England, many individuals are prescribed OAT in other settings and information about these prescriptions is not available. The main alternative source of national data on people prescribed OAT in England is the National Drug Treatment Monitoring System, which provides data on people in specialist community drug and alcohol treatment services 28 . Although the population using these services is likely to overlap with those in primary care, there may also be important differences. For example, people accessing only drug and alcohol services may have more complex drug treatment needs, whilst those only in primary care may have greater physical comorbidity.
The main strength of CPRD in relation to other research datasets for this population is that it offers unique insights into primary healthcare. It can be linked to external datasets to obtain information on care in hospitals, cancer services, and mental health services, as well as causes of mortality. Algorithms have also been developed to facilitate identification of particular aspects of healthcare such as pregnancy 46 , or health outcomes such as cardiovascular disease 47 .
There are also limitations in the data, particularly because all data are derived from routine healthcare records. For example, in CPRD there is no systematic recording of the type and frequency of drug use, and the degree of drug dependence. The data on smoking presented in this article suggests that some characteristics of this population are well-captured by GPs, as fewer than  10% had no records and the prevalence of smoking is comparable to that found in other studies. Other characteristics may be less well-captured, for example fewer-than-expected patients had records of homelessness or prison.
Depending on the research question, selection biases are likely to be important. To be included in the CPRD sample, individuals need to be registered with a GP, attend an appointment, and disclose their drug use. At present, we do not know what proportion of this population is registered with a GP. In one study of homeless people who inject drugs in London; a subgroup likely to have relatively high barriers to GP registration, 70% provided GP details 48 , suggesting that a large proportion of this population is registered. However, disclosure of drug use is likely to differ. Groups more likely to disclose drug use may include those prescribed OAT (either in primary care or specialist drug and alcohol services), and those who are more unwell and therefore have more GP appointments. This latter factor may lead to an overestimation in differences in morbidity and mortality when comparing people with a history of illicit opioid use to the general population. Qualitative research has found both practice-level and individual-level barriers to disclosing and recording illicit drug use 49 . In particular, patients and GPs who feel more stigma towards illicit drug use may be less likely to discuss the issue. Our internal validation suggested that approximately 90% of patients with opioid use recorded in hospitals also have illicit opioid use recorded in primary care, and the timings of these records suggest they are independent. This supports good sensitivity of the primary care codelist for patients who are registered at a GP practice. It does not provide evidence of sensitivity at a population level, because some individuals are not registered at a GP practice. Consideration of selection biases is important when interpreting analyses using this data and designing sensitivity analyses. Where possible, triangulation with other sources, such as the National Drug Treatment Monitoring System 28 , can improve confidence in findings.
The data presented here covers a long period of time . Changes happened in both the population and in health services. For example, investment in opioid agonist therapy has changed; while the average age of the population and the prevalence of long-term conditions have increased. This means that selection biases are likely to change over follow-up. In some cases, it may help to restrict analysis to a shorter time-period, or stratify by time-period.

Implications for future research
To date, research into people who use illicit opioids has focused mainly on a narrow range of outcomes such as blood borne viruses and overdoses. The average age of people with a history of illicit opioid use is increasing, and consequently the importance of chronic health conditions is also increasing 16 . Key areas for future research include the epidemiology of these health issues, assessing the risks and benefits of existing interventions such as OAT in terms of a broader range of health outcomes, and understanding utilisation and quality of general healthcare for this population.

Conclusion
People with a history of illicit opioid use have substantial unmet health needs. Yet to date, large-scale longitudinal studies of healthcare and holistic health outcomes in this population have been limited. We developed and validated a method of identifying people who have a history of illicit opioid use in primary care data to facilitate further research and support improvements in healthcare.

Data availability
Underlying data Researchers can study the HUPIO cohort by applying to the CPRD Independent Scientific Advisory Committee (ISAC). Approval is required if access to anonymised patient level data is being requested for research purposes.
Details of the application process and conditions of access are provided by CPRD at https://www.cprd.com/Data-access. This important methodological study sets out an approach for identifying a cohort of primary care patients in England with a history of illicit opioid use. Consistent with good Open Science principles, codes and algorithms are provided for transparency, to provide detail for future publications based on the cohort identified, and to enable other researchers to apply the same or an adapted approach in their own work with the CPRD, AURUM or related databases. The combined cohort produces an extremely large resource (over 135,000 people), and the longitudinal nature of the data will allow for progression and change to be tracked at the patientlevel.
Key external validation work is presented, confirming the cohort profile is consistent with other profiles and with expectations. It would be useful for future validation work to address what proportion of people with a history of illicit opioid use have this noted in their primary care records, and whether biases exist in whether this is a. reported by patients, and b. recorded by practitioners. For example, recognition may be more likely in those who use primary healthcare more, feel more able to disclose illicit behaviour, and for whom opioid use is having greater health consequences. Practitioners can be reluctant to record drug use, and there is useful qualitative research that could provide useful context on such biases (e.g. Davies-Kershaw 1 ).
This will be a powerful and pragmatic tool for examining a wider than previously considered range of health outcomes for a cohort of people with a history of illicit opioid use, and their longer term progression.
Minor: Spell out IQR acronym in abstract.

Is the work clearly and accurately presented and does it cite the current literature?
Response: The first version of this article used death data in primary care records. Since then, we have received linked mortality data from the Office for National Statistics for the HUPIO cohort. The mortality rates and ratios in the revised paper use this data. While there are undoubtedly errors in the linkage process, it is generally considered to be high quality and miss few deaths. The linkage uses NHS numbers, a unique identifier that is assigned to everyone registered at a GP practice, in combination with other identifiers. Further information about the linkage methodology is available here: Padmanabhan S, Carty L, Comment: In Table 1, considering the length of the study period, do you know why the median follow up period was relatively short at under 4 years? Do patients frequently transfer out of GP practices, or is this specific to this cohort?
Response: There are two reasons for this. First, GP practices are continually joining and exiting CPRD. The pool of practices participating in CPRD GOLD is shrinking, and when a practice leaves the database then follow-up for the patient ends. Likewise, the pool of practices for CPRD AURUM is growing and many practices joined later than 1997. Second, this is a mobile population and patients do frequently transfer out of GP practices. We have added a note to the table to clarify that these values are the follow-up duration for primary care data only, and longer follow-up is available where linked data are used (such as Hospital Episode Statistics or Office for National Statistics mortality records). In later analyses we will used matched cohorts of patients who do not have a history of illicit opioid use, for comparison. In these cohorts, follow-up is typically a bit longer because the general population is less mobile than people who use illicit opioids.
Comment: Are participants uniquely identifiable across different participating GP practices over the study period? In future, it would be interesting to compare the frequency at which they present to primary and other healthcare services. This may speak to some of the current limitations such as I) how representative the cohort is to other international opioid using populations in terms of health service use, and ii) determine if there are subgroups within the cohort with distinct patterns of primary care use (that may have other distinct characteristics such as drug use patterns, homelessness, deprivation, etc).
Response: No, unfortunately participants are not uniquely identifiable across different participanting GP practices. It is possible that some patients move from one participating practice to another, and therefore appear in the data twice. This is thought to be sufficiently rare in CPRD that each "patient" (in fact a patient-registration episode) can reasonably be considered unique. We agree that it will be interesting to analyse healthcare frequency. CPRD provides linkage to hospital data, and this will allow comparison with existing studies of hospital admissions in patients recruited via specialist OAT clinics.