The potential for linking cohort participants to official criminal records: a pilot study using the Avon Longitudinal Study of Parents and Children (ALSPAC)

Introduction: Linking longitudinal cohort resources with police-recorded records of criminal activity has the potential to inform public health style approaches to policing, and may reduce potential sources of bias from self-reported criminal data collected by cohort studies. A pilot linkage of police records to the Avon Longitudinal Study of Parents and Children (ALSPAC) allows us to consider the acceptability of this linkage, its utility as a data resource, differences in self-reported crime according to consent status for data linkage, and the appropriate governance mechanism to support such a linkage. Methods: We carried out a pilot study linking data from the ALSPAC birth cohort to Ministry of Justice (MoJ) records on criminal cautions and convictions. This pilot was conducted on a fully anonymous basis, meaning we cannot link the identified records to any participant or the wider information within the dataset. Using ALSPAC data, we used summary statistics to investigate differences in socio-economic background and self-reported criminal activity by consent status for crime linkage. We used MoJ records to identify the geographic and temporal concentration of criminality in the ALSPAC cohort. Results: We found that the linkage appears acceptable to participants (4% of the sample opted out), levels of criminal caution and conviction are high enough to support research, and that the majority of crimes occurred in Avon & Somerset (the policing area local to ALSPAC). Those who did not respond to consent requests had higher levels of self-reported criminal behaviour compared to participants who provided explicit consent. Conclusions: These findings suggest that data linkage in ALSPAC provides opportunities to study criminal behaviour and that linked individual-level records could provide robust research in the area. Our findings also suggest the potential for bias when only including participants who have explicitly consented to data linkage, highlighting the limitations of opt-in consent strategies.


Introduction
Policing in the UK increasingly seeks to take a public health approach to tackling crime, where the focus is on proactive prevention, the tackling of upstream risk factors, and on populations rather than individuals 1 . This approach is multidisciplinary, takes a joint agency approach, and relies on 'the skilled use and interpretation of data and the evidence base to ensure that interventions are designed, delivered and tailored to be as effective as possible' 1,2 . This can now be seen in operation within some UK police forces -for example, within Thames Valley Police 3 . Epidemiological analysis is an important approach to identify risk and protective factors for criminal and antisocial behaviours. Police records of criminality (e.g. convictions and cautions) do not contain data relating to an individual's exposure to potential risk factors for perpetrating crime, whereas longitudinal birth cohort studies have a wealth of data on the lives of their participants, and often their families, peers, and wider contexts, across the life course. Therefore, linking police data with cohort studies has the potential to add considerable value to research on criminal behaviour.
Accurate measures of participants' criminal behaviours are necessary for any such research to be valid. Some cohorts contain their own measures of criminality -these may be self-reported by the participants or by related individuals (e.g. teachers or parents). While this is a relatively straightforward way of measuring such behaviours, and has the advantage of capturing crimes irrespective of whether they appear on any official records, measurement error may be introduced through recall error (not being able to accurately recall past behaviours), or social desirability bias (choosing not to disclose certain behaviours). Further, there is a potential for measurement error based on questionnaire design (e.g. study wording or response options) and valuable data may not be recorded (e.g. details of criminal behaviour). Finally, a known limitation of cohort studies is that attrition is associated with socio-economic, demographic and health status which, in turn, may be associated with criminal behaviour. By relying on self-report measures of criminality, it is likely that cohort studies underestimate rates of criminality compared to the wider population 4 .
Record linkage of cohort data to official police records has the potential to address some of the limitations of self-reported data. As official records are not affected by recall error or social desirability bias, they can potentially provide greater detail and accuracy than would be feasible via self-report. Furthermore, attrition can be addressed using record linkage as criminality outcomes can be followed in individuals even if they miss opportunities to participate in study data collections. However, not all crimes come to the attention of the police or result in a formal record and so to rely solely on police records would under-estimate the prevalence of criminality in a cohort 5 . There is evidence to suggest that violence between people who know each other, less serious violence, and violence that involves alcohol are less likely to be reported to the police, and males are less likely to report violent victimisation than females 5 . Conversely, violence that involves injury or weapons, and violence perpetrated by a stranger are more likely to come to the police's attention. Finally, there is some evidence that offences against residents of the most deprived neighbourhoods are less likely to be reported to the police than offences against residents of less deprived areas 6 . The impact of this on accurate estimates would be enhanced where the factors (e.g. ethnicity) associated with policing practice were also predictive of failure to participate in study follow-up. In police records, data quality issues within the records may also lead to error (e.g. failure to link resulting from poor or inaccurate personal identifiers) and this may disproportionately impact some population groups. Linking cohort and official records also enables research questions to be addressed that would not otherwise be possible e.g. investigating self-reported and official records measuring different constructs and analysing discrepancies in these data sources, and comparing outcomes of those who self-report criminality vs. those with officially recorded criminality (and thus, those who have come to the attention of the justice system) 7,8 .

Amendments from Version 1
The reviewers' comments were addressed. This includes changes throughout the manuscript to refocus it specifically on the background, aims, results and limitations of the pilot study.
In the previous version of the paper, the aims and limitations of a wider full linkage were mixed in with the details of this pilot, meaning the reasoning behind the pilot was not as clear as it should have been to the reader. One of the research questions has been removed (on age periods) as on reflection this is too minor to warrant it being listed as a main question.
The Methods section has been improved: the sample description is now clearer; details of the antisocial and criminal behaviour measures have been added; and more detail added on the statistical analyses. The results section has been revised so that it aligns with the four research questions, and more detail added.
We have added a flow chart ( Figure 1) to describe the study sample. Figure 2 (which was previously Figure 1) has been re-done so that the spacing between each of the bars in the histogram is equal (the data are identical to the original version).
The tables have been revised and reordered, to reflect the restructuring of the results section. Table 1 is now the summary of number of offences by police force (previously Table 2). Table 2 (previously Table 1) and Table 3 and Table 4 (Table 5, Table 6, and Table 7 -which are equivalent to Table 2, Table 3 and  Table 4).

REVISED
In sum, a combination of official police records with self-reported criminal behaviours could allow research that uses the strengths of both sources of crime data while addressing some of their respective limitations. However, achieving linkage of a longitudinal cohort to any routine health or administrative data can be a complex and time-consuming process. In Scotland, the Edinburgh Study of Youth Transitions and Crime has successfully linked police records to a longitudinal populationbased cohort 7,8 . That cohort study has a criminality focus, they work closely with Scottish criminal justice policy makers and practitioners, and police record linkage was integral to the study's design from the start 7,8 . The legal, ethical and practical example set through their successful linkage therefore isn't a precedent that other UK cohorts with a more general purpose can necessarily follow. Also, Scotland and England have differing legal systems and police records. However, linkages of police records to general cohorts have been achieved in other countries, such as the NSW-CDS cohort study in Australia 9 , and the Swedish National Cohort Study 10 .
As with all data linkage projects in longitudinal studies, there are specific considerations relating to data protection and confidentiality, and wider considerations relating to participant trust and the acceptability of novel forms of data use. In the UK, criminal records were deemed 'sensitive' data in the Data Protection Act 1998 and are now considered 'special category' data in the EU General Data Protection Regulations (GDPR) and the UK's Data Protection Act 2018 (DPA). Both categories are subject to elevated levels of protection. The DPA 1998 allowed for the use of criminal records where studies gained explicit consent from study participants or where the data were anonymised (and therefore no longer relatable to an individual, thus no longer being subject to data protection and confidentiality law). In contrast, the new DPA 2018 provides a separate legal basis for using identifiable 'special category' records for scientific research which is in the public interest, subject to utilising sufficient safeguards (GDPR Article 89). Nevertheless, these routes available to meet DPA 2018 requirements do not alter the requirement for research use of individual data to meet the Common Law Duty of Confidentiality, which can be met through consent, anonymisation or meeting a public interest test. However, data linkage based on consent may systematically omit some individuals and population sub-groups and introduce bias into study findings. Therefore, alternative mechanisms to use data for individuals who have not necessarily provided consent are needed to minimise the risk of selection bias. Further to addressing the legal basis for record linkages, it is also necessary to examine the acceptability of data linkage to crime records for cohort participants and -in order to justify the intrusion to privacy of non-consented approaches -to determine whether the group of participants who do consent to data linkage are, in terms of criminal behaviour, representative of the wider cohort (in which case consent could be a practical basis for this data use).
It is also necessary to consider if any linkage is proportionateto be ethical, it has to be useful. In the case of linking cohorts to police records, it is currently unclear whether the levels of criminality are sufficient for a longitudinal population study to be a viable resource for future research projects. Furthermore, gaining a better understanding of the age crimes are committed and in which areas can help to identify key age periods and geographical locations for where data linkage may be the most valuable for research. This paper describes a pilot linkage project of participants from the Avon Longitudinal Study of Parents and Children (ALSPAC) to criminal conviction and official caution records in the UK Police National Computer (PNC) database held by the Ministry of Justice (MoJ). To our knowledge, this pilot project is the first to link criminal records to an English general population longitudinal cohort. The overall aim of this pilot was to test the feasibility of linking ALSPAC to official criminality records, and to determine if full linkage is likely to be worthwhile in terms of creating a useful resource for future research. Our specific research questions were: (1) What can participant responses to the study's proposed linkage to criminality records suggest about the level of acceptability of this to ALSPAC participants? (2) Are there sufficient levels of recorded criminal caution or conviction for the data resource to be useful in future research? (3) In what geographical area are crimes most commonly committed by ALSPAC participants? (4) Are those we have consent to link to crime data representative of the wider cohort in terms of their self-reported criminal behaviours?
The linkage in our pilot was restricted to an anonymous data extract of historic criminal convictions and cautions of ALSPAC study participants. No identifiers are present in the file meaning it cannot be linked to any participant records held within the ALSPAC databank.

Avon Longitudinal Study of Parents and Children
ALSPAC is a birth cohort study that recruited pregnant women who were resident in and around the city of Bristol, with a due date between April 1991 and December 1992. Full details are available in the cohort profiles 11,12 and a searchable data dictionary can be accessed from the study's website (http://www.bristol.ac.uk/alspac/researchers/access/). In brief, there were 14,541 pregnancies resulting in 13,988 children alive at one year of age (known as the 'core sample'). By age 18 years, an additional 718 children, who were eligible under the original study eligibility definition, but whose mothers had not joined the study during pregnancy, had also been recruited. The mothers, their partners, and the study children have been followed ever since through questionnaires and clinic visits.
The Project to Enhance ALSPAC through Record Linkage (PEARL) When the ALSPAC children reached legal adulthood (age 18 years), there was a postal campaign that aimed to re-enrol them into the study and to seek permission for linkage to their routine health and administrative records, including education, employment, earnings and benefits, and criminal conviction and caution records (hereafter, 'criminality records'). This was part of the Wellcome Trust funded 'Project to Enhance ALSPAC through Record Linkage' (PEARL). Each participant was sent a pack that included an information booklet and consent form, which provided a clear means to opt-out of ALSPAC, or to any of the proposed linkages. Due to factors related to establishing an appropriate ethico-legal basis for record linkage in ALSPAC and the negotiation of access to linked health records (i.e. unrelated to this crime data linkage), the participant information materials were initially issued in two batches. Batch one sought opt-in consent, which stated that linkage would only occur with explicit participant approval, while batch two was structured as an opt-out approach and notified participants that their routine records would be linked to ALSPAC unless they specifically opted-out (i.e. linkage would occur in the event of non-response). Participants that did not respond to batch 1 were a sent a new opt-out pack. Following participant consultation, the opt-in/out materials were structured as a series of specific linkage permission options to allow for individual level decision making. This led to participants returning forms that in effect indicated consent for some linkage categories even when the overall campaign was structured as an opt-out (e.g. an individual may have objected to the study's use of their employment, earnings and benefits records while consenting to the study's use of other records). The following participants were excluded from the pilot criminality linkage: participants who no longer wished to be part of ALSPAC; those who objected to linkage to their criminality records; those where we had evidence the participant had not received their information pack (e.g. it was returned by the postal service as 'addressee unknown'); and those who lacked capacity to consent. Due to the inclusion of a randomised controlled trial of linkage information materials 13 and other study factors, the participants selected to be in batch 1 and batch 2 were not selected at random and are likely to over represent participants with good histories of study participation.
Following the ALSPAC -MoJ pilot linkage, the study continued to issue opt-out linkage materials to all participants via postal campaigns and online promotion. Where practicable, consent was sought where participants attended a study clinic visit. This means there is an increasing number of participants who have opted-in to record linkage over time.

Linkage of ALSPAC to Police National Computer (PNC) data
The Police National Computer (PNC) is a large administrative database that was started in 1974 and contains information about police cautions and court convictions held on individual offenders in England and Wales 14,15 . Following negotiations between ALSPAC and the MoJ, it was agreed to conduct a pilot linkage exercise which would test the feasibility of the linkage mechanism through the production of an anonymous linked extract. For individuals for whom ALSPAC had permission to link to criminality records (those who opted-in to crime linkage from batch 1 or batch 2, and non-responders to batch 2 -except excluded cases), the following identifiers were sent to the MoJ: forename, surname, date of birth, current address, last four known addresses.
No attribute data about the participants was provided. This linkage was done in March 2013.
The MoJ conducted the linkage to the PNC using a deterministic linkage protocol with manual review (see 'Linkage Protocol' section below). Once linked, the MoJ provided an anonymised data extract detailing all historic criminal convictions and cautions that were linked to study participants. Direct individual identifiers were removed and replaced with two pseudonymised identifiers: 1) 'lcr_id', which uniquely identified individuals in the data set; and, 2) 'lcr_caseid', which identified unique cases and the criminal acts associated with it, which were nested within each individual's overall record (i.e. each individual with a link would have one or more criminality event records associated with at least one 'case'). ALSPAC has no means to reverse these pseudonyms to the participants' personal identifiers. The extract was securely sent to the ALSPAC data linkage team for analysis within their PEARL Data Safe Haven (at the University of Bristol).

Linkage protocol
The linkage was conducted by MoJ staff. In summary, they received a file of identifiers from ALSPAC and then processed (cleaned) these. They then searched the Home Office Police National Computer (HOPNC) live database. Where matches were found, the individual's PNC ID was extracted and subsequently used to extract criminality outcomes.
The automated HOPNC database search process returns a set of results, indicating varying levels of matching success according to a set of deterministic match rules. Matches are graded from 01 to 24, and in general, the higher the number, the more suspect the match. The process accommodates the tendency for criminal convictions to be assigned to alias identities rather than true identities. Each match level may be sub-divided into A or B levels, where B also uses data contained in Alias and AliasDateOfBirth tables. 'Suspect' matches are manually matched against the HOPNC live database by MoJ staff in order to obtain either an accurate PNCID or a status of no match.
ALSPAC was not provided with information on match strength or as to whether suspect matches were manually reconciled, dropped or retained. This was due to the primary aim of the project being to demonstrate the feasibility of subsequent research and to test the workflow process (i.e. the aims did not require the full linkage protocol to be implemented).

Cleaning & standardisation
The cleaning process used aimed to standardise identifiers prior to matching: When the study children reached legal adulthood (age 18), ALSPAC initiated a postal fair processing campaign to formally re-enrol the children into the study (prior to this parent-based consent was mandatory, although from age 9 children assented to data collection as well) and to simultaneously establish permissions for ALSPAC to link to their health and administrative records. All participants have been offered the right to opt-out (which is respected). This approach was developed with participant involvement.

Measures
Data was cleaned, managed and analysed using STATA version 15 16 .

ALSPAC data.
A variable was derived to summarise criminality linkage consent status at the time of the pilot linkage: optedin to criminality linkage; non-responder to batch 2; not sent to MoJ for criminality linkage (this includes those who opted-out of ALSPAC or to criminality linkage, those who were non-responders to batch 1, and those who never received a PEARL pack). Current (September 2019) criminality linkage consent status was also summarised in a similar way.
Measures related to family socio-economic position (SEP) were reported by the mother during her pregnancy with the study child: family occupational social class, defined as the higher of maternal and paternal social class and categorised as high (I-IIIN, professional, managerial, and non-manual skilled occupations) and low (IIIM-IV, manual skilled, semi-skilled and unskilled occupations); highest maternal education (university degree, A level, O level, vocational/none); housing tenure (owned/mortgaged, privately rented, council rented, other); and financial difficulties (quartiles of score with range 0-15, where the upper quartile (6+) is considered high). Child variables included sex and ethnicity (reported by the mother -White, non-White [no further disaggregation was possible due to small numbers]).
Antisocial and criminal behaviours were reported by the children at ages 14, 15.5, 17.5 and 18 years. A series of binary variables were derived based on whether they reported doing each of the behaviours in the previous 12 months (no, yes): theft (includes stealing cars, from a person, from a shop etc.); hit, kicked or punched someone on purpose; carried a knife or weapon for protection or use during a fight; deliberately damaged or destroyed property belonging to someone else; deliberately set fire to property or building (or attempted to); rowdy or rude in a public place. At 17.5 years, they also reported a series of measures related to having had involvement with the police and criminal justice system in the past year. A series of binary variables were derived (no/yes): in trouble with police; picked up by police and taken home; picked up by police and taken to station; told off/told to move on by police officer; stopped and told to empty pockets or bag; received official police caution; charged for committing a crime; been on trial in court. Due to the small numbers reporting these outcomes, two further aggregate variables were derived: received any 'punishment' (answered yes to receiving fine, in a young offenders' institution, having an Antisocial Behaviour Order (ASBO), or having mediation as an offender); and any criminal justice involvement (answered yes to having had a caution, conviction, being on trial, or receiving a 'punishment').

Statistical analyses.
We used descriptive statistics to summarise the number of convictions and cautions, the year the offences were committed (as a proxy for age of the participants), and where they were committed (which policing area). We then used ALSPAC questionnaire date to compare: (1) participants whose identifiers were sent to the MoJ to those whose identifiers were not sent; and (2) within the sent for linkage group, the participants who explicitly opted-in to linkage to those in batch 2 who did not opt-out, in terms of child sex and ethnicity, early life family SEP, and child-reported anti-social and criminal behaviours. Finally, we repeated these comparisons by current criminality linkage consent status. For these descriptive analyses, we excluded triplets and quadruplets (as their ALSPAC data are not released to researchers) and those who have withdrawn consent from ALSPAC participation, giving a sample size of 14,683. Note that due to missing data in the ALSPAC measures, the denominator for each individual comparison varies.

Results
Acceptability of linkage to criminality records At the time of the pilot linkage (March 2013), batch 1 (sent in 2011 to 7,790 participants) sought opt-in consent, while batch 2 (n=5,379, which included 4,708 non-responders from batch 1) gave participants the option to opt-out of the linkage, which would proceed in the event of non-response (given the granular nature of the opt-out form, some respondents to batch 2 opt-in to criminality linkages while opting-out of other linkage data sources). This resulted in permission to link to the criminality records of 7,361 participants (comprised of 2,966 who opted-in to crime linkage, and 4,395 who were non-responders to batch 2) ( Figure 1). Note that these figures represent a moment in time. Since the pilot linkage, there has been an increase in the number of participants for whom we have permission to link to criminality records. As of the present day (September 2019), out of 13,239 participants who have now been sent an opt-out PEARL pack, or have been asked in person for their explicit consent at a point where this was practicable (e.g. when attending an ALSPAC clinic), with regards criminality linkage: 5,062 (38%) have opted-in, 7,619 (58%) have not responded, 477 (4%) have opted-out of criminality linkage, and 81 (<1%) have withdrawn from the ALSPAC study overall.
The criminality opt-out rate is only slightly higher than that observed for education and health records (both 3%), and lower than that for earnings and benefits records (6%). Of the 477 who have opted-out of criminality linkage, 52% (n=247) have opted-out of all linkages, while the rest have consented to linkage to at least one other linkage data source. Note that the current day numbers include 177 participants (174 opt-in and 3 non-response) who enrolled with ALSPAC after the pilot linkage: these participants are not included in the analysis by current day consent status below as they do not have questionnaire data at earlier timepoints.

Levels of police-recorded criminality in the ALSPAC cohort
Of those whose identifiers were sent to the MoJ for linkage (n=7,361), 885 (12%) were successfully linked to a criminality record. These participants had a conviction, caution, reprimand or warning for 4,000 separate offences, comprising 2,635 criminal convictions and 1,365 official cautions, warnings or reprimands. Of those linked, 394 (44.5%) had received at least one conviction and 84 (9.5%) had received 10 or more convictions.
The offence class with the greatest number of offences was summary offences excluding motoring, followed by theft and handling of stolen goods, breach offences, drug offences, and violence against the person. Almost a third (31.6%) of offences related to serious crimes (defined as class 1-5).
Where and when crimes were committed The majority of the offences (n=3,454, 86%) were committed in the area covered by the Avon and Somerset constabulary ( Table 1). Neighbouring areas and London generally had higher numbers than areas further from the study catchment area. The earliest linked records were recorded in March 2002 (when participants would have been aged between 11 and 12 years). Of the years covered (up to 2013), offences were carried out most commonly in 2009 (n=629, 16%; Figure 2), when participants were approximately 18 years old.

Representativeness of sample included in data linkage
Of the 14,683 participants included in our analyses, 7357 (50.1%) had their identifiers sent to the MoJ for linkage (2963 of these had opted-in and 4394 were Batch 2 non-responders) and 7326 did not have their identifiers sent (this group includes those who opted-out of linkage to criminality records, and those who had not received a consent pack -either because they had not been included in the consent campaign by March 2013, or their pack was returned to sender). The overall pattern was of greater differences within the sent for linkage group (i.e. between those who were opt-in and those who were non-responders) than between the sent for linkage group overall compared to the not sent for linkage group. Those who opted-in to criminality data linkage were more likely to be female, of White ethnicity, and from a socio-economically advantaged background, compared to those in batch 2 who did not respond to the opt-out request ( Table 2). Participants who were in the linkage sample generally reported fewer criminal behaviours than those excluded from the linkage sample (Table 3). Further, those who opted-in generally reported fewer criminal behaviours than those who were non-responders. The pattern was broadly similar for contact with the criminal justice system, although the proportion of participants reporting such outcomes was small ( Table 4).
The comparisons by current consent status included 4884 opt-in, 7612 non-responders, and 2187 individuals with no permission for linkage. Overall, the patterns observed by current day consent status in SEP (Table 5), self-reported criminal behaviours (Table 6), and contact with the criminal justice system (Table 7) were similar to those observed by consent status at the time of the pilot linkage.
It is important to note that there are also differences in the proportions of missing data by consent status for each variable: those with opt-in consent have a lower proportion of missing data than those who have not responded to the consent campaign. This is true of both early-life (reported by participant's mother) variables and those reported by the participant themselves later in adolescence. For example, of those who had opted-in at the time of the pilot, 7% are missing maternal education data, compared to 25% of those who were non-responders and 13% of those who were not sent for linkage (percentages by current day consent status are 8%, 18% and 22% respectively). In general, the proportion     The N in each column differs by measure and time-point and is not shown in this table 2 Numbers are suppressed in the ' opt-in' column for these variables to prevent calculation of the small n in the 'non-response' column   The N in each column differs by measure and time-point and is not shown in this table of missing data increases over time; the differences between the consent groups in terms of missing data also increase. For example, for theft reported at age 18 years, the opt-in group at time of pilot linkage had 37% missing data, compared to 96% of the non-response group and 82% of the not sent for linkage group. The equivalent numbers by current consent status are 48%, 93% and 89% respectively.
Finally, we did not find a consistent pattern in self-reported anti-social and criminal behaviours when comparing participants who dissented to criminality linkage but did agree to at least one other linkage, compared to participants who did not dissent to any data linkage, or those who dissented to all data linkage options (but agreed to continue in ALSPAC). Comparison of these groups in terms of self-reported Table 7. Self-reported contact with criminal justice system by age 18 years by current crime linkage consent status. involvement with the criminal justice system was precluded by small numbers.

Discussion
We completed a pilot record linkage in 2013 to determine the feasibility of linking an English population-based cohort study (ALSPAC) to official criminality records, and to inform whether a full linkage would be a worthwhile future endeavour in terms of creating a useful resource for research. The pilot was conditional on the extract being anonymous and not able to be linked to information on individual participants within the ALSPAC databank.
We first aimed to determine whether linkage to criminality records was acceptable to study participants, and whether there was sufficient criminality in the sample for research purposes. Criminal behaviour is a potentially sensitive area and so it was a positive finding that almost 900 participants with criminality record(s) enabled the linkage to happen through either explicit consent (in response to the opt-in request) or not objecting (in response to the opt-out fair processing campaign): out of a sample of 7,361 ALSPAC participants, 885 participants were linked to one or more criminality records held in the Police National Computer database. Further, our finding that -to date -only 4% of the sample have explicitly opted-out of linkage to criminality records supports the view that such linkage is acceptable to the majority of study participants. The group of participants who dissented to criminality linkage -but not to all linkage data sources -was small and within this group levels of self-reported criminality were low.
With the available data we cannot determine if this sub-group of dissenters had engaged in a greater level of criminality compared to the rest of the sample and considered the research use of their criminality record to be sensitive. However, the proportion of participants who self-reported criminality and who did provide explicit consent could imply that participants trust the study to use these records appropriately for research. Whilst this could benefit from further research (ideally using mixed methods designs), this could inform future study designs and governance frameworks, and the considerations of ethical review boards.
In the sample of participants with criminal records, 4,000 convictions and cautions were recorded, many relating to serious crimes. If the linkage were repeated today, we would expect the number of criminal records to be substantially higher because (1) we now have permission to link to a larger sample and (2) there would now be more than 7 years of additional data. Therefore, we believe that there is a sufficient level of criminality in the ALSPAC sample for it to be a useful resource for crime-related research. However, it is unlikely that ALSPAC would have sufficient rates of less common crimes for these to be studied individually. We found criminal records from around the age of 12 years, but the majority of offences in our sample were committed later in adolescence. Therefore ALSPAC may not have sufficient numbers for research using linked criminality records at younger ages. Note that the PNC database is not 'weeded' (i.e. historical/spent convictions are not removed after a period of time) therefore this is not an explanation for the small amount of records at younger ages in our sample.
While all participants in the pilot were informed about the linkage and had not objected, only a sub-set of these had provided explicit consent. We found evidence suggesting different rates of self-reported criminal activity, and socio-demographic differences, according to consent status. Participants who explicitly consented to data linkage were more likely to be female, have higher socioeconomic status, lower levels of missing data and were less likely to self-report criminal behaviour. This pattern is similar to that found for general ALSPAC participation 11 . This suggests that studies using only an opt-in sample may underestimate rates of criminal behaviour in the full study population. As such, it is necessary to consider the potential for selection bias when using a sample that relies on explicit opt-in consent status when designing linkage methodologies and considering the appropriateness of data sharing requests.
Finally, in order to inform which sources of crime data could be worthwhile pursuing for future linkage and research, we determined where the crimes committed by our sample took place. Our finding that the majority of offences in the pilot linkage were committed in the Avon and Somerset Police (A&SP) area, which has a similar geographical footprint to the ALSPAC recruitment area, suggests linkage to local police data held by A&SP, which contain more detail than that held in the national PNC, would capture most offences (at least to age 18). Working at a local level provides the opportunity to identify areas of research of local importance. However, at older ages criminal activity may become less geographically clustered, meaning centralised national records may be of increasing value.

Strengths and limitations
There are several strengths and limitations to be considered in our pilot study. A strength was the wealth of data available on demographic measures and self-reported crime collected at multiple time-points, which allowed us to examine patterns in these variables by consent status. The ability to disaggregate our 'sent for linkage' group into those who actively opted-in and those who did not respond was a further strength as it enabled us to highlight the many differences between these groups. This is an important finding for other studies who are considering how to structure their consent campaigns, and will help inform the decision making of those reviewing this use of linked data in longitudinal studies. However, our evaluation is complicated by the fact that the sub-sample of participants included in the consent campaign, and those who were included in batch 1 versus batch 2, were not selected at random. This weakness is mitigated by the fact that this pilot study is intended to demonstrate viability rather than provide accurate association or prevalence estimates. Also, given that the sub-sample disproportionately included participants with strong levels of engagement, it can be hypothesised that this has led to an underestimate of recorded criminality within the sample.
The quality of data linkage relies on the accuracy of identifier records in both datasets (e.g. name, post code etc.). While ALSPAC's administrative database is generally of good quality, it is likely to be out of date for some participants who are lost to follow-up. For the PNC data, the identifier database is known to have accuracy problems and includes pseudonyms, out of date information and duplicates 14 . For example, individuals may report a false identity to the police. The linkage methodology used relied on deterministic matching that incorporates fuzzy parameters (i.e. where the requirement for all elements of the personal identifiers are relaxed in varying combinations). ALSPAC was not provided with a match quality score (which generates an estimate of the likelihood that two records relate to the same individual), which is counter to expectations that linkage quality estimates are transparent and available to the analyst 17 .
It is also important to consider the quality of the data that is being linked from both sources and their potential limitations for answering questions in this research area. For PNC data, this depends on reliable and accurate testimony and record keeping. For ALSPAC records, the use of self-reported measures of criminal behaviours are vulnerable to social desirability bias, although the figures provided here illustrate that many participants are willing to report criminal and anti-social activity. Furthermore, drop out by participants may lead to bias.
Finally, as this pilot only produced a fully anonymous file, which cannot be linked to the wider ALSPAC dataset, there were limits to what could be included in this evaluation. For example, we could not examine relationships between official criminal records and the self-reported measures.

Conclusions
We found differences in socio-demographic characteristics and rates of criminality according to the consent status of participants (i.e. explicit consent versus non-response to optout approaches), which suggest that methods of securing data must be considered carefully in future studies to reduce the risk of bias.
This pilot study illustrates that a full linkage of ALSPAC to crime records at an individual level would be a worthwhile future endeavour that would create a valuable resource for crime related research. Both local (Avon and Somerset) and national police records would be suitable for linkage, and linkage to both would be worthwhile. Advances in privacy preserving record linkage and 'Trusted Research Environment' secure research infrastructure and legislative changes (Digital Economy Act, DPA) may now enable linkage and the joint analysis of linked study-criminal record data under sufficiently controlled conditions to mitigate potential risks to confidentiality and help ensure that this form of data use is publicly acceptable. Individual-level linkage would enable direct comparisons between police-collected and self-reported criminal data, inform statistical strategies to account for missing data, and allow investigation of research questions related to the causal pathways to criminal behaviours using the wealth of life-course information collected by ALSPAC or other longitudinal studies. Once linked, these studies could provide valuable evidence to inform public health approaches to tackling crime.

Data availability
Underlying data ALSPAC data access, including linked PNC data, is through a system of managed open access. The steps below highlight how to apply for access to ALSPAC data.
• Please read the ALSPAC access policy which describes the process of accessing the data in detail, and outlines the costs associated with doing so.
• You may also find it useful to browse the fully searchable research proposals database, which lists all research projects that have been approved since April 2011 including those using linked data.
For enquiries regarding linked data, please contact data-linkage@ alspac.ac.uk. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Kimberlie Dean
School of Psychiatry, University of New South Wales, Sydney, Australia The submitted manuscript describes a pilot study in which a longitudinal cohort sample is linked to official criminal records. The submission is interesting and presents a useful contribution to the important and growing field of research using data arising from the linkage of cohorts/samples to large administrative datasets. The authors could consider the following points in making any amendments to the paper: There have been other studies employing a similar approach (i.e. linkage of criminal records/police data to longitudinal cohorts) that could have been cited (e.g. the NSW Child Development Study in Australia -Whitten et al. (2020) 1 ).

○
The authors rightly identify the inability to actually link the criminal records obtained for the sample to the ALSPAC dataset -this is a very serious limitation of the approach (and really undermines the whole point of linking cohorts to administrative data -it would actually be better to obtain an anonymised randomly selected sample from the criminal records database and not link it to the ALSPAC cohort at all!); the authors should consider being clearer about this and explain why direct linkage is not possible; it is certainly managed in other settings (in some Australian states, a third party record-linkage organisation (e.g. the NSW Centre for Health Electronic Record Linkage or CHeReL) which undertakes the linkage to manage privacy/confidentiality).

○
The most interesting aspect of the study is really the reported differences between those consenting to linkage, those not consenting and those not objecting (including in terms of self-reported criminal offending); the authors should consider formally testing these differences rather than just presenting descriptive statistics.
○ including the N for each age). The terminology in Table 3 (CCQ, TF3... etc) does not map directly onto the questionnaire names included in the link at the bottom of the table, which makes finding more information about the questionnaires slightly cumbersome.
It would also be useful to know how far back the PNC linkage goes, and whether Home Office Police National Computer (HOPNC) live database is ever 'weeded' (i.e. if historical/spent convictions are removed after a given period of time). Would we expect future linkage to list all convictions for the ALSPAC cohort (the age of criminal responsibility in England and Wales is 10 years old)? If PNC is not weeded it would be useful to say this explicitly in the text.