Long term cognitive outcomes of early term (37-38 weeks) and late preterm (34-36 weeks) births: A systematic review

Background: There is a paucity of evidence regarding long-term outcomes of late preterm (34-36 weeks) and early term (37-38 weeks) delivery. The objective of this systematic review was to assess long-term cognitive outcomes of children born at these gestations. Methods: Four electronic databases (Medline, Embase, clinicaltrials.gov and PsycINFO) were searched. Last search was 5 th August 2016. Studies were included if they reported gestational age, IQ measure and the ages assessed. The protocol was registered with the International prospective register of systematic reviews (PROSPERO Record CRD42015015472). Two independent reviewers assessed the studies. Data were abstracted and critical appraisal performed of eligible papers. Results: Of 11,905 potential articles, seven studies reporting on 41,344 children were included. For early term births, four studies (n = 35,711) consistently showed an increase in cognitive scores for infants born at full term (39-41 weeks) compared to those born at early term (37-38 weeks) with increases for each week of term (difference between 37 and 40 weeks of around 3 IQ points), despite differences in age of testing and method of IQ/cognitive testing. Four studies (n = 5644) reporting childhood cognitive outcomes of late preterm births (34 – 36 weeks) also differed in study design (cohort and case control); age of testing; and method of IQ testing, and found no differences in outcomes between late preterm and term births, although risk of bias was high in included studies. Conclusion: Children born at 39-41 weeks have higher cognitive outcome scores compared to those born at early term (37-38 weeks). This should be considered when discussing timing of delivery. For children born late preterm, the data is scarce and when compared to full term (37-42 weeks) did not show any difference in IQ scores.


Introduction
Globally, preterm birth rates are rising with 10% of neonates born less than 37 weeks gestation 1 . Late preterm births (34-36 weeks) account for three quarters of all preterm births 2 . Early term births (37-38 weeks gestation) have also increased and contribute substantially to an overall decrease in gestational age at delivery. In the US, the average gestational age at delivery has decreased from 40 weeks in 1994 to 39 weeks of gestation in 2004 3 .
Early term delivery is associated with increased short term adverse physical morbidity, including respiratory distress syndrome, transient tachypnoea of the neonate and ventilator use, as well as an increased risk of infant mortality at 37 weeks compared to full-term delivery 4-6 . It is for this reason that both the Royal College of Obstetricians and Gynaecologists UK (RCOG) 7 and the American college of Obstetricians and Gynecologists (ACOG) 8 endorse the policy of elective birth after 39 weeks in order to reduce the risk of adverse outcome in infants born before full term (39-40 weeks gestation). There is a paucity of evidence regarding the long-term morbidity of this group, in particular the impact on cognitive function. Advanced gestational age is associated with a lower risk of having special educational need at school 9 . Davis et al. 10 has also shown that even amongst the weeks of term advanced gestational age is associated with better neurodevelopment as demonstrated by magnetic resonance imaging (MRI). As obstetric efforts worldwide continue to attempt to reduce stillbirth amongst term deliveries, induction of labour at an earlier gestational age is becoming more common, despite the guidance above, and therefore it is imperative to consider the long-term outcomes of deliveries before term to guide clinicians and parents on optimum timing of delivery.
The association between preterm birth and long-term neurological morbidity is better established with the risk increasing with decreasing gestational age, with extremely preterm babies (≤ 26 weeks) having the worst neurological outcomes 11,12 . The aetiology of this is hypothesized to be due to the disruption of the pathways of dendritic arborisation, synaptogenesis and the thickening of the developing cortex 13 . However, there is less evidence regarding long-term cognitive outcomes of late preterm/early term infants, and given they account for the largest proportion of singleton preterm births more research is necessary. A systematic review of 29,375,675 late preterm infants (34-36 weeks) 14 , found increased risks of cerebral palsy (RR 3.1, 95% CI 2.3-4.2) and lower likelihood of finishing school in the late preterm born infants (RR 0.96, 95% CI 0.95-0.97), but we could find no prior reviews on cognitive outcomes for early term births.
The aims of this systematic review are to describe the objectively measured cognitive outcomes in childhood up to the age of 18 years i) within each gestational week of term (37-42 weeks) and ii) of late-preterm (34-36 weeks) compared to term (37-42 weeks) deliveries. The results are necessary for informed decision making regarding timing of delivery.

Methods
This systematic review of the literature was conducted according to the STROBE guidelines 15 and reported according to the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines 16 (see Supplementary File 1). The study protocol was registered with the University of York Centre for Reviews and Dissemination International prospective register of systemic reviews (PROSPERO Record CRD42015015472). MEDLINE , EMBASE (1947EMBASE ( -2016 and PsycINFO (1945PsycINFO ( -2016 were searched using a search strategy developed and tested in collaboration with a librarian experienced in literature searching (Supplementary File 2). The searches were supplemented with a manual search through the reference lists of selected primary articles. A forward citation search was performed on all included studies. The first search date was 12 th January 2015 and the last search was 5th of August 2016. A subsequent search on Clinicaltrials.gov was performed on 2 nd June 2017.

Study selection
One reviewer (JL) screened all titles and abstracts and a second reviewer (BG) independently screened through a 10% sample of the 10,882 articles, by reading the title and abstracts of the first 100 articles of every 1000. The search was updated in August 2016, which yielded an additional 1023 titles and abstracts screened independently by two reviewers (SM, KM). After a consensus was reached, the full texts were retrieved and critically appraised by both reviewers independently (SM, KM). We contacted the individual authors of the included studies to obtain the data necessary to complete the results table. Reasons for exclusion were recorded.
Late-preterm birth was defined as a live birth from 34 to 36+6 completed weeks of gestation. The primary outcome was the results of standardised general intelligence quotient (IQ) tests before age 18 rather than specific domains of cognition. General cognitive ability of a physically and neurologically normal, healthy population of individuals was the key outcome measure recorded. Term birth was defined as a live birth from 37 to 42 completed weeks of gestation.
Studies were included if they reported the range of the participants gestational age, assessment of IQ using a validated score; and the age when IQs was assessed. There were no restrictions by study design, language or method of gestational age assessment. Preterm participants were included as long as there was a clear subgroup of gestational age of 34-36 weeks. Excluded studies included those with: unclear method of cognitive testing; if only selected domains of cognition (e.g. verbal intelligence) were tested; if educational outcomes rather than IQ reported; studies involving high-risk or atypical groups as controls (e.g. multiple births, intra-uterine growth restriction, those with bronchopulmonary dysplasia or brain haemorrhage). Full details of inclusion and exclusion criteria are presented in Supplementary File 3.
The quality of studies was assessed based on the representativeness of the general population, the method of measurement of gestational age and the recording of intelligence testing using the Risk of Bias Assessment Tool for nonrandomized studies (RoBANS) tool 17 ) (Supplementary File 4). Two independent reviewers extracted data from each paper on study location, design, population, IQ score used and the main results.
The studies differed widely in outcome measures of cognition used, and due to the large heterogeneity between the study designs and methods a meta-analysis was not possible.

Results
The database and additional record search identified 11,905 articles after removal of duplicates. After exclusions (see flow diagram, Figure 1), six studies and one conference abstract (which reported on both late preterm and outcomes within term and is therefore included in both groups), reporting on 41,344 children/ adolescents, were included in the review; four studies comparing the outcomes within term (37-42 weeks) and three studies comparing the outcomes of late preterm delivery (34-36 weeks) with term delivery.
The studies comparing the outcomes within term differed in a number of ways, including: age of testing (1 year, 4 years and 6 years); method of IQ testing (Bayley Scales of Infant Development (BSID), the Stanford-Binet general IQ test and the Wechsler Abbreviated Scale of Intelligence (WASI); details in Supplementary File 5); and the categories of gestational age investigated (37-41 weeks in one study, 37-42 weeks in two studies and 37-43 weeks in one study).
The studies comparing the outcomes of late preterm delivery (34-36 weeks) also differed in a number of ways, including: study design (three prospective cohort studies and one case control study); age of testing (2 years and 13-14 years); and the method of IQ testing (Bayley Scales of Infant Development (BSID) and the Wechsler Intelligence Scale for Children (WISC); details in Supplementary File 5). Table 1 provides details on the characteristics of the included studies. Three studies 18-20 and one study, which was only available as a conference abstract (on contacting the author no further information was available as it is not yet published) 21 , reporting on term deliveries (35,711 children/adolescents) and three studies 22-24 (as well as the conference abstract) on late preterm deliveries (5,644 children/adolescents). Table 2 shows the cognitive outcomes of each week of gestation among term births (range 37-43 weeks). In general, although the   20 was the only study to measure outcomes up to post-term (43 weeks gestation) and found an inverse Ushaped relationship of IQ score and gestational age. In this large study (n = 13,824), with a moderate risk of bias, full-term (39-41 weeks) was used as a reference group with mean differences in IQ scores reported at early term (37-38 weeks) weeks that were lower than full term and also post term (42-43 weeks), which had a higher mean difference from full term than early term (for full risk of bias see Supplementary File 4). The effect size cannot be summarised due to differences between the studies, but the IQ difference between 37 and 40 weeks was approximately 3 IQ points. This may not be clinically significant at an individual level, but would have an impact at a population level. Table 3 shows the results of the three studies included reporting childhood cognitive outcomes of late preterm birth (34-36 weeks) compared to term birth (37-42 weeks). The abstract    21 did not specifically compare late preterm and term deliveries statistically; however, the results were available for each week of gestation and have therefore been recorded in the table. This was the only study that showed a difference in IQ scores between late preterm (mean IQ 92.5) compared to term born children (mean IQ 98.3); however this was not statistically analysed in the published abstract and standard deviations were not available on contacting the authors. This was a large study (n = 20,093), but was assessed as having a high risk of bias as there was no information on the method of gestational age measurement. The two studies using the Bayley scores of infant development did not show a difference in scores between late preterm and term born infants; however this was only done at age 2 and there was no further follow up of the infants. The study by Romeo et al. 23 was assessed as having a high risk of bias as there was no mention of how gestational age was measured. The study by Narberhaus et al. 24 provided the longest follow up of the late preterm-born children, testing IQ using the WISC score (Wechsler Intelligence Scale for Children 25 ) at ages 13-14. No statistically significant difference was found in the IQ scores between late preterm (mean IQ 112.7, SD [standard deviation] 13.8) and term born children (mean IQ 113.6, SD 11.5). However, these results should be interpreted with caution as the risk of bias was high (no way to determine selective outcome reporting, only some mentioning blinding of outcome assessments and no clear indication of how gestational age was calculated).

Main findings
In this systematic review of seven studies (reporting on 41,433 children), the four studies investigating IQ scores within term deliveries found that children born at early term (37-38 weeks) had lower IQ scores at ages one, four and six compared to those born at full term (39-41 weeks). One study (n = 13,824) 20 found a decrease in IQ score at >42 weeks. In the four studies comparing late-preterm deliveries (34-36 weeks gestation) to their term counterparts there were no differences in cognitive outcomes at ages two, four and 14. Studies were heterogeneous and several were at high risk of bias, and therefore summary effect sizes cannot be reported. No studies were identified comparing

Strengths and limitations
The strengths of this review include the comprehensive and extensive search strategy, with no language restriction, combined with a detailed pre-defined eligibility criteria for study selection. At the screening stage, to reduce reporter bias, two reviewers independently screened a selected sample to check for accuracy and agreement regarding inclusion of studies. Two reviewers critically appraised all included studies independently. A wide range of cognitive assessments was used in the included studies providing a good overview of the various tests available, but this does make comparison between studies more difficult.
Despite the comprehensive nature of the search, the possibility of missing relevant papers cannot be excluded. We did not have the resources to translate the papers in foreign languages; however we did non-expertly translate to see if any papers fitted the inclusion criteria and none were thought to be relevant. Another limitation was the problems encountered with categorisations of gestational age. A number of studies (22 studies reporting cognitive outcomes of 3,357 infants) only listed <37 weeks of gestational age (all preterm births), which inevitably included those <34 weeks and therefore the whole study was excluded. This may have potentially exacerbated the risk of publication bias as we excluded these studies. Due to limited resources, attempts to contact the individual authors of these studies to see if data was available for 34-36 weeks of gestation was not performed. At delivery, birthweight and gestational age are highly correlated. There is a small but statistically significant correlation between birthweight and cognition in childhood and adulthood (each 1kg increase is associated with 0.13 standard deviation test score increase) 26 . Some studies did account for birthweight in analyses, and some did not. This may not be appropriate due to their high correlation, and birth weight may be a mediator of the relationship between gestational age and cognition, rather than a confounder. Future studies should report both birth weight and gestational age as a continuous measure, to allow their relative contributions to be measured. Structural equation modelling or similar techniques could be used to model the potential competing causal pathways. We excluded studies with intra-uterine growth restriction (IUGR) because we wanted to study normal healthy singletons, appropriate for gestational age and IUGR may be associated with adverse cognitive outcomes. This review is based on observational data with high levels of between-study heterogeneity, and therefore statistical analysis of the studies was not possible given that the studies were not directly comparable. Limited conclusions can be made regarding the mechanism of action of gestational age on long-term cognitive outcomes because of the nature of the observational data. There were a number of potential sources of bias across the included studies. Although most studies stated how the participants in the studies were chosen in an attempt to reduce selection bias, it is difficult to determine generalizability of the results outwith the populations that were studied. There was a large variation in the number of confounders (Table 1) adjusted for the various studies, and many did not account for indication for delivery and some did not account for socio-economic factors (strongly associated with cognitive outcomes); therefore there is a risk of residual confounding among the studies.

Interpretation
Comparing the outcomes within the weeks of term (37-42 weeks), this review has shown that cognitive scores in childhood differ throughout the weeks of term delivery and are lowest in those individuals born in early term gestation (37-38 weeks) when measured at ages one, four and six. Although this review specifically set out to look at cognitive outcomes,  1,536,482) showed that children born late term (41 weeks) performed better in school at the age of five through to 18 compared to those born at full term (39 or 40 weeks). Only one of the studies in the review looked at the effect of post-term delivery (>42 weeks) on cognitive outcomes and found those individuals to have a lower score compared to full term (39-40 weeks). This U-shaped relationship has previously been described in the study by MacKay et al. (n = 407,503), which found the lowest risk of special educational need at school in those born at 41 weeks gestation compared to those born <41 weeks and >41 weeks 9 . We identified a previous systematic review published in 2015 that also found a reduction in long term cognitive outcomes of children born early term compared to those born full term, but we were unable to reconcile data included in this previous review with source data 29 .
The mechanism of early term (37-38 week) delivery leading to lower cognitive outcome scores compared to full term deliveries (39-41 weeks) is likely to be multifactorial. Vohr et al. described how brain weight increases rapidly in the last trimester of pregnancy with brain weight at 38 weeks 90% of the weight at full term, which may account for the increased vulnerability of early term infants at school 5 . For those born post term (>42 weeks), it is thought the increased vulnerability at school age is due to poorer placental perfusion 30 .
Only three studies were included in the review comparing the cognitive outcomes of children born late preterm (34-36 weeks) with those born at term (37-41 weeks). There were no (statistically or clinically) significant differences in cognition found between these groups at ages two, four and 13. However, the quality of evidence from these observational studies is poor due to high risk of bias (high chance of residual confounding, no outcome assessor blinding and no way to ascertain if selective outcome reporting took place). We therefore do not make any clinical recommendations relating to the timing of delivery, as these observational data cannot be used for this purpose. The included studies all used term deliveries as the control group, but as this includes early term (which, as described above, have lower cognitive outcomes than 39+ weeks) there is a possibility that differences between <37 weeks and later were masked. The conference abstract, although it did not specifically compare the results for late preterm versus term delivery, if we compared late preterm deliveries (mean IQ 92.5) with only full term deliveries (39-41 weeks, mean IQ 98.3) the difference is large and provides further evidence of a partial dilution of the results in treating term deliveries as a continuum. This was a large study; however the data was taken from an old cohort study performed between 1959 and 1966, and many of the variables, including method of gestational age measurement, were not available. Three out of four of the studies had a high risk of bias, and they all assumed homogeneity between the term cases, which, as shown above, is not the case. Although a previous systematic review has shown a clear increase in physical morbidity associated with late preterm delivery 14 (34-36 weeks) compared to 37+ weeks, there remains a paucity of evidence regarding long term cognitive outcomes in this group. Future studies should use a full-term delivery group (39-41 weeks) as the control group and adopt uniform gestational age categorizations, ideally with similar outcome measures, to allow for easy comparison between studies. Individual level data should be made available as soon as possible to allow large scale individual participant meta-analysis.

Conclusion
Overall, this systematic review has found that children born at full term (39-41 weeks) have the highest cognitive outcome scores compared to those born at early term (37-38 weeks). Given the high prevalence of early term deliveries (the fastest growing proportion of singleton births in the US), small differences at an individual level in cognitive outcomes are likely to have a large impact at a population level. Further research is required to look at the potential reasons for this, and to consider outcomes of late-preterm delivery using a suitable control group of full term (39-41 weeks). The findings from this review have important implications for clinicians and the long-term cognitive outcomes based on gestation at delivery should be discussed with parents regarding optimum timing of delivery.

3.
sections in these groups compared to full term infants? Maybe these infants have a lower IQ due to other circumstances than being born at early term? It is mentioned shortly in discussion section under "the mechanism of early term……" but need further discussion.
Conclusion in both abstract and manuscript: …"for discussion with parents regarding optimum timing of delivery." Do you mean counselling parents who wish or request a caesarean section? Or all parents including those with sick foetuses? Should be specified in the conclusion.
Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes

Is the statistical analysis and its interpretation appropriate? Partly
Are the conclusions drawn adequately supported by the results presented in the review? Partly No competing interests were disclosed.

Is the statistical analysis and its interpretation appropriate? Yes
Are the conclusions drawn adequately supported by the results presented in the review? Yes No competing interests were disclosed. This systematic review attempts to identify differences in IQ scores in 2 comparison groups. The first group are term infants, by comparing early term 37-38 weeks with full term 39-41 weeks' gestation and by identifying differences with each gestational week. The second group compares late preterm infants' IQ scores, gestation 34-36 weeks, with term infants' IQ scores, gestation 37-42 weeks' gestation.
There is clearly a paucity of data in this regard and this study highlights this.
The search criteria were well identified by the researchers. The methods well explained and the results clearly outlined. The aims of the study are clear and very important to publish in order to highlight the importance for clear decisions regarding timing of delivery if that is possible.
The difficulty I have as a clinician, is in the interpretation of the results and I think needs some further qualification in the discussion by the authors.
The study gives us a clear message that term infants are better off being delivered at 39-41 weeks' gestation in order to maximise their IQ potential. However, the biggest difficulty in being able to interpret and extrapolate the results shown, is that we have no idea for the reason for early term birth versus full term birth and if it's different or similar between the groups in these studies. The researchers can only deal with what has been published but have made no comment regarding the indications for delivery and the potential impact that might have on the results if the studies were adjusted for this. Potentially, the infants born 37-38 weeks were electively delivered/induced but more likely is that there were a mix of both clear indications for delivery and elective deliveries and this may have been very different to the 39-41 week indications for delivery and contribute to the differences found. indications for delivery and contribute to the differences found.
Of the 4 articles looking at term infants IQ scores only Espel et al. 2014 (the smallest study) adjusted for obstetric complications. Although clearly there are differences in the IQ scores of the 2 groups and this is important, can this be helped? The only way to truly disentangle this is to look prospectively at term infants delivered electively, which is an almost impossible study to do with the numbers needed to evaluate the outcome measures. Alternatively adjustment for the reason for delivery particularly if there's an indication that can impact on the infant long term should be done.
The preterm data has smaller numbers, the largest study was adjusted for obstetric complications on analysis but similarly the delivery indications are paramount to teasing out the answer to this question. The researchers have clearly highlighted in the discussion that potentially including infants born 37-38 weeks in the comparison group have diluted the results to show no significant difference. Which in itself is interesting. Is being born at 37 weeks as bad as being born at 34 weeks? The studies are generally small making differences difficult to establish but adding the late preterm data from the Gyamfi-Bannerman et al. 2014 study has shown there are likely differences in IQ to be found.
A recent study by Cheong et al. showed that moderate and late preterm (MLPT) children do worse on all domains of assessment using the BSID II compared with term born controls, though moderate preterm children were included (32-33 weeks'), within the MLPT group, there was little evidence of an association between gestational age at birth and neurodevelopment or social-emotional development at age 2. The group acknowledged that there were potential differences between the MLPT infants' in the study compared with other MLPT infants as they were all born in a tertiary setting and hence may have been sicker. The rates of induction and caesarean section are well delineated in the paper.
The question of term IQ differences is a very interesting and worthwhile subject to pursue researching.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.