(2019). Geographically distributed longitudinal nitrogen dioxide and other air pollution sensor measurements in the Avon Longitudinal Study of Parents and Children cohort catchment area.

Longitudinal cohort studies provide unique opportunities to investigate the health impact of air pollution. We aimed to enhance the Avon Longitudinal Study of Parents and Children (ALSPAC) birth cohort study through the systematic collection of routinely monitored air pollution data collected by local authorities and the Department for Environment, Food and Rural Affairs (DEFRA) using a range of sensor technologies. These sensor data are in themselves not well suited for population epidemiology, rather these data are primarily used for validating and calibrating modelled air pollution concentration data over study areas. In this data note we describe the sources of routine air pollution monitoring data and detail data of pollutants including nitrogen dioxide, nitric oxide, nitrogen oxides, particulate matter, benzene and ozone collated from the local authorities that overlap the ALSPAC catchment area (Bristol, North Somerset, South Gloucestershire and part of Bath and North East Somerset).


Introduction
Systematic air pollution monitoring has been undertaken in the UK for nearly a hundred years, with extensive coverage from 1961 when the National Survey was established to monitor black smoke and sulphur dioxide at around 1200 sites 1 . In 1987, the European Union (EU) Limit and Guide Values and World Health Organisation (WHO, 1987) provided the framework for air pollution monitoring in the UK 2 , and in 1992, the Department of Environment in the UK established the Enhanced Urban Network (EUN), to measure five pollutants -carbon monoxide (CO), nitrogen oxides (NO X ), sulphur dioxide (SO 2 ), ozone (O 3 ) and particulate matter (PM 10 ) 3 . The subsequent Environment Act 1995 requires UK governments to produce a national air quality strategy, and in 1998 the Automatic Urban and Rural Network (AURN) was formed to provide comprehensive and continuous monitoring of a range of pollutants including nitrogen dioxide (NO 2 ). AURN sites also measure a range of other parameters including NO X , O 3 , CO and particles (PM 10 and PM 2.5 ) 4 . There are currently 147 sites across the UK in the AURN network, and these sites are the source of the most granular routine measurements available -NO 2 hourly means. Additional NO 2 and other air pollution measurements, typically subject to monthly readings, are made by a network of non-automatic monitoring sites. These sites, managed by DEFRA between 1993 and2005, are now managed by local authorities who, to fulfil the Environment Act 1995 are required to produce an annual Air Quality Status Report (ASR), containing monthly NO 2 measurements. Areas, where NO 2 levels are above those defined by the European Union (EU) Directive limits and the UK's national air quality objectives, are declared as Air Quality Management Area's (AQMA's) and require the implementation of a Local Air Quality Action Plan (LAQM) to reduce NO 2 to acceptable levels.
ALSPAC is a transgenerational prospective birth cohort study investigating influences on health and development across the life course 5 . In summary, 14,541 pregnant women who were resident in and around Bristol, England, and due to deliver between April 1, 1991 and December 31, 1992 were initially recruited, resulting in 14,062 live births and 13,988 children who were alive at one year. Subsequent phases of recruitment increased the number of enrolled children alive at one year to 14,899. The children and their families have been followed up intensively through questionnaires, study clinics and through linkage to routine datasets. Further information about ALSPAC is given on the study website. There is also a searchable dictionary of available data 6 .
ALSPAC maintain a lifelong database of participant residential addresses and other geographical data e.g. school addresses. Index participant addresses have been geocoded at the property and postcode centroid level and schools to postcode centroid. The ALSPAC geocoded database is updated iteratively. This database forms a foundation to which natural or social environment data with a geospatial aspect (e.g. air pollution) can be mapped to participants 7 .
The ERICA (Enhancing Environmental data Resources in Cohort studies: ALSPAC exemplar) project aimed to transfer the expertise available within the natural environment research community to the medical research community: particularly the data managers of longitudinal cohort studies. ERICA also sought to understand the governance and data management requirements needed to manage the provision of natural environment exposure data in health outcome investigations -to enable cohort studies such as ALSPAC to operate as effective platforms for investigating health outcomes of environmental exposures. To help inform the development of this platform, and to demonstrate its processes, ERICA included an exemplar research strand aiming to investigate the impact of NO 2 air pollution exposure -particularly in utero -on later child respiratory outcomes. To inform this investigation ERICA collected data on outdoor, fixed site NO 2 concentrations and other air pollution monitored within the study catchment. NO 2 concentrations were used to calibrate separate work to model detailed NO 2 exposure for the participants of ALPSAC using a diverse set of information sources and a methodology developed in a previous ALSPAC project 8 . Staff from the Bristol City Council Geographical Information Systems (GIS) team collaborated with this project to provide expertise on local authority NO 2 monitoring.

Methods
We sought to collect sensor data from four local authorities -Bristol City Council (BCC), Bath and North East Somerset (BaNES), North Somerset and South Gloucestershire -which correspond approximately with ALSPAC's original catchment area: the Bristol and Weston, Southmead and Frenchay District Health Authorities (Figure 1). We used two methods to obtain data: 1) searching online via publicly accessible repositories and local authority websites and downloading the publicly available datasets we found; and, 2) directly approaching the local authorities. Within this latter category there was a distinction in our relationship with BCC as they had contributed to the development of our project plan and were project collaborators. There was no formal relationship with the other three local authorities. We developed a framework defining which data and aligned metadata that were of interest to this project (Table 1) using insights from meetings with BCC. We provided the framework to our partners at BCC and asked that they identify and extract all the relevant data between 1990 and 2017. There was an individual funded to work on this at BCC, but not at the other three local authorities. A member of the ERICA team used the framework to inform their internet search strategy to identify relevant data and metadata relating to BaNES, North Somerset and South Gloucestershire. We then approached these three local authorities (via email) to request granular air pollution records.
Internet search protocol for routine air pollution data collected in the ALSPAC catchment area The purpose of the internet search was to collect air pollution monitoring data for the ALSPAC eligible area from publicly available online repositories in the most granular form available. Air pollution monitoring data are tables of numeric data, often Excel spread sheets that can be imported to Stata and merged with a master data set of monitoring data. However, tabulated data in PDF reports are another source. These cannot be abstracted in an automated manner, but PDF reports containing monitoring data can at least be collated in the first instance.
We searched for additional metadata based on the item list developed with Bristol City Council (See Table 1 Data framework and aligned metadata). Rich metadata are provided in annual monitoring reports that each Local Authority are obliged to report. The strategy was to use internet search engines with consistent search terms and to supplement this through directly contacting the Local Authority teams responsible for air quality management to provide links and direct data/metadata. Google was used to conduct the search given this is a non-specialist material likely to be hosted on local authority and government websites. The search was carried out between and June and September 2018 with some further searching during March 2019. The keywords were 'air pollution' and 'air quality' prefixed by the name of the local authorities 'North Somerset', 'BaNES' or Bath & North East Somerset, 'South Gloucestershire'. Our objective was not to select a representative set of responses, but rather to locate online datasets with prior knowledge of the information we were seeking based on our joint work with Bristol City Council. We therefore took an exploratory approach where the researcher investigated the returned links, and onward links from those sites, for access to air quality assessment records.
When searching for 'air pollution' prefixed with the name of the local authority the first links listed were typically for the given Local Authorities 'Air Quality Status Reports'. These reports are produced annually and are a main source of local air pollution data and metadata. We also identified links to the Department for the Environment Food and Rural Affairs (DEFRA) website describing the local authority's Local Plan for 'atmospheric pollution' which in turn links to the 'UK AIR' site. Selecting the 'data' page and then 'Data Selector' a main source of national air pollution data was identified. Repeated searching using slightly different terms did not yield other main sources of data other than the local authority websites and DEFRA. (Communications with local environmental monitoring teams at local authorities confirmed these sources as being the key sources of internet data). The search was broadened by checking if 'Air Quality Status Reports' were previously but no longer available on local authority websites was possible using a longitudinal online snapshot resource called the 'Way Back Machine' (https://archive.org/web/). After using this we could be confident that our collation of 'Air Quality Status Reports' was comprehensive. Pending Reports (e.g. AQMAs, air quality assessment reports for planning) and anecdotal guidance, historical insights, corporate knowledge from BCC staff Please note: Metadata information can be presented annually (e.g. in 1991 the measurement was X meters from the kerb) or in date ranges (e.g. between 1991 and 1998 the measurement was X meters from the kerb).

UK air pollution data sources background
The main national resource for air pollution is provided by DEFRA and is called. This site provides historic data that predates the transfer of monitoring responsibility to local authorities. From around 2009, the local authority reports have been published in the standardised format of Air Quality Annual Status Reports (ASR's), and monthly mean NO 2 readings are routinely presented, as well as the type and locations of the monitors. (LA's are also invited to on a voluntary basis to upload monthly NO 2 readings to the Local Air Quality Management section of the Defra website).

Air pollution dataset
Data collected Our dataset contains 3,362,846 records (Table 2). Each record is a point in time where a measurement may be expected -where the interval is monthly then a record will exist for each month, and where the interval is hourly a record will exist for each hour. Measurements may be missing for a given record, and there are multiple measurements for each record for sites where several pollutants are continuously monitored. The dataset is not cleaned, and all measures including apparent errors (sub-zero measurements) are included ( Figure 2). The data set was processed using Stata version 15 9 .

Types of air pollution monitoring
In the study catchment area air pollutants are monitored using various methods and equipment. In this section we describe the data by three categories of monitor type and measured pollutant: 1) NO 2 diffusion tubes (measuring NO 2 ), 2) Non-AURN automatic monitoring (measuring a range of pollutants including NO 2 , NO, NO x , CO PM 10 , PM 2.5 , O 3 and Benzene) and 3) AURN automatic monitoring (measuring a comprehensive set of pollutants including those measured at non-AURN sites with additional methodologies and set of meteorological measures).

NO 2 diffusion tube monitoring
Diffusion tubes are 7cm acrylic or polypropylene tube that contain a chemical reagent to absorb NO 2 10 , which is measured in units of micrograms per cubic metre (µg/m3). LA's are responsible for fulfilling the requirements of the Local Air Quality Management (LAQM) process as part of fulfilling the Environment Act of 1995. For NO 2 this has meant one hour mean measurements not exceeding 200 µg/m3 more than 18 times in a year, and an annual mean not exceeding 40 µg/m3. We

Automatic monitoring (non-AURN)
The Automatic Urban and Rural Network (AURN), run by the Department for Environment Food and Rural Affairs (DEFRA) monitors air pollution continuously (producing hourly mean data) using standardised methods produced by the European Commission 11 . The monitoring of the 5 sites that area or were part of the AURN in the ALSPAC eligible area (Bath Roadside, Bristol Centre, Bristol Old Market, Bristol St Pauls, and Temple Gate) are described in the following section (Table 4). In this section we describe the monitoring records we have collected on the remaining 17 sites in the ALSPAC eligible area that monitor data continuously but are not part of the AURN and thus not subject to the standardised AURN reference methods. The pollutants monitored by these sites are at the discretion of the local authority who run the sites.
The monitoring techniques and geolocations by all automatic monitoring sites (irrespective of AURN affiliation) are outlined in the LA produced Air Quality Management Progress Report's ( Table 2.1 Details of Automatic Monitoring Sites). For Bristol sites our dataset contains easting and northing coordinates, location type (kerbside, roadside façade, rural, urban background, urban centre) and the distance of either the diffusion tube or the receptor to the kerbside. For BaNES sites the dataset   NO 2 1997-2018(n = 178,367) NO 1997-2009, 2015-2018 NO X 1997-2018 (n = 178,391)

Underlying data
The ALSPAC databank is accessible as a managed access resource for the international research community. Prospective data users are encouraged to: 1) browse the catalogue of existing projects (http://bristol.ac.uk/alspac/researchers/publications/): data use is non-exclusive and it is the applicant's duty to maintain awareness of duplicate or overlapping initiatives; 2) consider the ALSPAC data access policy 6 ; and 3) apply for access (https://proposals.epi.bristol.ac.uk/). Standard geolocated data (e.g. IMD, urban/rural status, pseudonymized geographies for multi-level modeling) are available at each data time point. Selected sub sets of location based data are available via the UK Data Archive (https://www.data-archive.ac.uk/). Those considering bespoke linkages of spatially-indexed information should contact PEARL who manage ALSPAC data linkages (alspac-linkage@bristol.ac.uk). All applications are assessed for compliance with ALSPAC's governance and third-party data-use arrangements. Data users are required to return newly generated or derived data along with rigorous metadata for future reuse in ALSPAC. All users must abide by information security and governance requirements and uphold participant confidentiality.
Published outputs are reviewed for conformance to a publication checklist (http://www.bristol.ac.uk/media-library/sites/alspac/ documents/alspac-publications-checklist.pdf). ALSPAC withholds the right to request changes to publication to address risks relating to participant disclosure or bringing the study into disrepute.

Ethical statement
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.