Elucidating the genetic architecture underlying IGF1 levels and its impact on genomic instability and cancer risk [version 1; peer review: awaiting peer review]

Background: Insulin-like growth factor-1 (IGF1) has been implicated in mitogenic and anti-apoptotic mechanisms that promote susceptibility to cancer development and growth. Previous epidemiological studies have described phenotypic associations between higher circulating levels of IGF1 in adults with higher risks for breast, prostate, ovarian, colorectal, melanoma and lung cancers. However, such evidence is prone to confounding and reverse causality. Furthermore, it is unclear whether IGF1 promotes only the survival and proliferation of cancerous cells, or also the malignant transformation of healthy cells. Methods: We perform a genome-wide association study in 428,525 white European ancestry individuals in the UK Biobank study (UKBB) and identify 831 independent genetic determinants of circulating IGF1 levels, double the number previously reported. Results: Collectively these signals explain ~7.5% of the variance in circulating IGF1 levels in EPIC-Norfolk, with individuals in the highest 10% of genetic risk exhibiting ~1 SD higher levels than those in the lowest 10%. Using a Mendelian randomization approach, we demonstrate that genetically higher circulating IGF1 levels are associated with greater likelihood of mosaic loss of chromosome Y in leukocytes in men in UKBB (OR per +1 SD = 1.038 (95% CI: 1.0101.067), P=0.008) and 23andMe, Inc. (P=6.8×10-05), a biomarker of genomic instability involved in early tumorigenesis. Genetically higher IGF1 is also associated with higher risks for colorectal (OR = 1.126 (1.048-1.210), P=1.3×10-03) and breast cancer (OR= 1.075 (1.0481.103), P=3.9×10-08), with similar effects on estrogen positive (ER+) (OR = 1.069 (1.037-1.102), P=2.3×10-05) and estrogen negative (ER-) (OR = 1.074 (1.025-1.125), P=3.9×10-08) subtypes. Conclusions: These findings give an insight into the genetic Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 12 Wellcome Open Research 2021, 6:20 Last updated: 01 FEB 2021


Introduction
Insulin-like growth factor-1 (IGF1) is a central component of an evolutionary conserved system that regulates the development and function of many tissues during embryonic development, growth and adulthood 1 . While circulating IGF1 is produced primarily in the liver, IGF1 is expressed in almost every human tissue and plays a key role in mediating metabolic, endocrine and anabolic effects of growth hormone (GH) 2 . GH stimulates the production of IGF1, which is used in clinical settings as an informative circulating marker for the diagnosis and monitoring of disease states of GH deficiency or excess. More widely, it has been posited as a potential marker with broader application for prediction of susceptibility of cancer and non-cancer aging traits 3-5 . Numerous epidemiological and clinical studies have described phenotypic associations between higher circulating levels of IGF1 in adults with higher risks for several cancers, including breast, prostate, ovarian, endometrial, colorectal and lung cancer, but also lower risk of ovarian cancer 6-12 . However, phenotypic associations provide limited inference on causal effects, due to the potential effects of reverse causality and confounding. Studies of patients with acromegaly, a disease model of GH-IGF1 axis overactivity, indicate elevated risks for colorectal cancer and also for (small, low-risk) thyroid malignant nodules -most clinical guidelines recommend routine screening by colonoscopy in such patients although the evidence is far from equivocal 13 .
Findings from in vivo and in vitro experiments show that GH and IGF1 stimulate cell proliferation, differentiation and motility, and inhibit apoptosis 14 . The lifecycle of a healthy human cell is maintained through tight regulation of cell proliferation, senescence and apoptosis orchestrated by intracellular and extracellular signaling factors. Imbalance in this regulation that lowers apoptosis and stimulates higher proliferation rates leads towards more frequent mitosis, favouring carcinogenesis 15 . In searching for biomarkers that can be applied to predict and diagnose cancer, previous research has highlighted the potential roles of IGF1 in both cancer development and progression 14,16 . Upon binding to the type 1 IGF receptor (IGF1-R), IGF1 stimulates a signal transduction cascade that results in higher proliferation rates and cell survival. This mitogenic activity is activated by RAS and AKT pathways that upregulate genes involved in cell proliferation, survival and angiogenesis 17 . Besides promoting mitogenesis, this cascade triggers downregulation of cell cycle suppressors, such as p27 and PTEN. The AKT pathway enables cancer cells to escape apoptosis mechanisms through the inhibition of pro-apoptotic proteins, such as BAD and FKHR, and activation of anti-apoptotic factors including NF-kappa B and MDM2 18 . Therefore, IGF1 may activate both mitogenic and anti-apoptotic pathways to promote cancer development and growth. However, it is debated whether its role is largely in promoting the survival, proliferation and metastasis of already abnormal cells, as opposed to driving genomic instability and malignant transformation of healthy cells.
Mendelian randomization (MR) provides a powerful genetic epidemiological approach to infer the causal effects of a candidate risk factor on an outcome, using genetic variants that are robustly associated with that exposure as instrumental variables 19 . However, previous efforts to apply this approach to study the role of IGF1 in cancer have been inconclusive. The first genome-wide association studies (GWAS) of circulating IGF1 identified only 10 genome-wide significant signals in ~31,000 individuals of European ancestry 20,21 . Two recent GWASs of circulating IGF1 in the UK Biobank study (UKBB) reported 265 independent signals in 194,174 women of European ancestry 22 and 416 signals in 358,072 European descent individuals 23 . The former study used these genetic signals in an MR framework to examine the role of IGF1 in breast cancer and found a positive effect of IGF1 on risk of estrogen receptor-positive (ER+) but not ER-breast cancer, while the latter study examined various cancer risks, and found a positive effect on risk of colorectal cancer but inconclusive or no evidence for prostate, breast, and other cancers.
Here, we conduct a GWAS of circulating IGF1 in UKBB using a more extensive approach to sample selection and signal selection than in previous studies 22-25 . We identify 831 independent signals for circulating IGF1 levels in white European ancestry adults, double the number of previously reported signals 23 . We use this expanded genetic instrumental variable to make more conclusive inferences regarding the causal roles of IGF1 in colorectal, and breast cancers, and on mosaic loss of Y-chromosome in leukocytes, a biomarker of genomic instability.

Discovery of IGF1 signals
After quality control procedures, genotype and phenotype data on circulating IGF1 levels were available in 428,525 white European individuals in UKBB. Genetic association testing, using a linear mixed model to control for relatedness and population structure, led to the discovery of 150,336 variants (MAF >0.1%) associated with IGF-1 levels at genome-wide significance (p<5×10 -8 ) ( Figure 1). After distance based clumping and approximate conditional analysis, these resolved to 831 independent signals (Extended data, Supplementary Table 1) 26 . We sought replication in an independent dataset (the EPIC-Norfolk study) of IGF-1 measurements in 7544 individuals 27 . Although this study was substantially smaller than UKBB, we detected strong directional consistency of effects (in 601/791 autosomal signals; binomial sign test P=1×10 -50 , including 118 at P<0.05) (Extended data, Supplementary Table 2) 26 . A GRS of 791 identified IGF1 signals explained ~7.5% of the variance in IGF1 levels in EPIC-Norfolk, and individuals in the highest 10% of the GRS had 0.96 SD [0.86-1.065] higher IGF 1 levels (5.5 nmol/L) compared to those in the lowest 10% of GRS. The GRS also explained 2.6% of the variance in IGF-binding protein-3 (IGFBP3) levels in EPIC-Norfolk (N=3314), which is the major binding protein for IGF1.
There was strong enrichment of GWAS signals in or near 60% of genes known to be involved in IGF1 core pathways, including IGF1 secretion, IGF1 serum balance and growth hormone secretion (Extended data, Supplementary Table 5) 26 . We also performed two hypothesis-free approaches to explore pathways that contribute to circulating IGF1. First, we looked for nonsynonymous variants that are highly correlated (linkage disequilibrium r 2 > 0.8) with any lead IGF1 signal, and identified 220 such variants, tagging 165 unique genes (Extended data, Supplementary Table 3) 26 . Predicted deleterious variants associated with IGF1 levels were identified in genes involved in the IGF-GH axis (CREB3L2, GH1, GHRHR, IGFBP3, JAK2, MAPT, RASIP1, SH2B1, SSTR5) and regulation of adrenal hormone secretion (CRHR1, MC3R, PDE11A, POMC), as well as cell growth, differentiation and death (PTPN13, CDKN1A, ZC3HC1). Secondly, we interrogated IGF1 associations at all variants genome-wide using MAGENTA and identified further pathways implicated in the regulation of IGF1 levels, including signaling by insulin, estrogen, and steroid hormone metabolism (Extended data, Supplementary Table 6) 26 . These analyses also highlighted many pathways with less obvious links to circulating IGF1 levels, including pathways responsible for neurodevelopment and immune response, more specifically viral response. Comparison between the pathways highlighted by MAGENTA and non-synonymous SNP analysis showed 79% overlap, indicating good consistency between these distinct approaches.
Examining the impact of IGF1 on genomic instability We combined the identified 831 independent signals into a single genetic instrument in a MR framework in order to estimate the causal relationship between circulating IGF-1 levels and loss of chromosome Y (LOY) in leukocyte DNA in men. LOY represents a marker of genomic instability and processes involved in early tumorigenesis 28,29 . In UKBB, each 1 SD (5.5 nmol/L) genetically predicted increase in IGF1 levels conferred a higher odds of LOY, measured using PAR-LOY (OR = 1.038; 95% CI: 1.010-1.067, P=0.008 in inverse-variance weighted (IVW) model, with directionally consistent associations in sensitivity models and non-significant EGGER intercept P=0.56) (Extended data, Supplementary Table 7) 26 . Conversely, we found no evidence for a causal effect of LOY on IGF1 levels (P=0.09). To confirm the validity of identified relationship between IGF1 and LOY, we performed a replication analysis in an independent dataset available for 653,019 male participants of European ancestry from the consumer genetics company 23andMe, Inc. Using a continuous measure of Y-chromosome relative intensity as the outcome (mLRR-Y), with negative mLRR values representing more loss of chromosome Y, we confirmed the effect of higher genetically predicted IGF1 levels on LOY (Beta: -0.0003, 95% CI: -0.0004 --0.0002, P=6.8×10 -05 ), with consistent findings across the sensitivity analysis (Extended data, Supplementary Table 7) 26 .
Examining the impact of IGF1 on cancer risks We then applied the same genetic instrument for circulating IGF1 levels to examine the potential causal influences of IGF1 on various cancer risks in published large-scale cancer consortium data and also UKBB data, which included smaller numbers of cases but covered a wider range of cancers.
Colorectal and thyroid cancers are more common in disease models of GH-IGF1 axis overactivity 13 and therefore may be regarded as positive controls for this MR approach. Unfortunately, there were too few cases of thyroid cancer (N=375 in UKBB only) to provide meaningful MR estimates. However, we found highly consistent effects of IGF1 on higher risk for colorectal cancer across IVW and all sensitivity models (IVW model: OR = 1.126; 1.048-1.210, P=1.3×10 -03 ; N=5486 cases in UKBB only).
For prostate cancer in men, IGF1 levels had positive effects on cancer risk in IVW models both in consortium data (OR  Figure 3). However, in both datasets this association showed moderate heterogeneity between variants (I2 = 58% and 33% in consortia data and UKBB, respectively) and failed to reach significance in any sensitivity model (Extended data, Supplementary Among other cancers that have been previously reported as phenotypically associated with circulating IGF1 levels 6-12 , we found no genetic evidence linking circulating IGF1 levels to risks of endometrial, lung, or ovarian cancers. We further explored causal influences of IGF1 on 17 other cancers in UKBB and consortia data as hypothesis generating. Genetically higher IGF1 levels conferred higher risk of multiple myeloma (OR = 1.261 per 1 SD; 1.026-1.550, P=0.028) with consistent findings after MR Radial filtering. Higher IGF1 was also associated with lower risks of biliary tract (OR = 0.731; 0.559-0.956, P=0.022) and liver cancers (OR = 0.699; 0.522-0.937, P=0.017) both before and after MR radial filtering (Figure 3).
Multivariable MR models IGF1 is a major driver of childhood growth and genetic associations with disease outcomes may therefore reflect lifetime IGF1 exposure. This is supported by our LD score regression analysis where genetic correlations with IGF1 levels were seen with anthropometric traits, including BMI (rg= -0.091; P=2.09×10 -5 ), body fat (rg= -0.191; P=4.50×10 -10 ) and height (height: females at age 10 and males at age 12, rg=0.157; P=3.34×10 -5 ). In addition, the MR analysis examining the effect of BMI on IGF1 levels showed significant causal association (OR 0.714, 95% CI 0.612-0.816, P= 0.016), suggesting that BMI can act as a confounder of the effect of IGF1 levels on cancer risk.
Consequently, we performed multivariate MR analyses to distinguish causal effects of adult circulating IGF1 levels from effects of adult height, as a summary indicator of childhood IGF1 exposure, and also adult BMI. In these multivariate models, genetically predicted IGF1 levels remained associated with higher risk of breast cancer (BMI-adjusted OR

Discussion
Here, we identified 831 independent genetic variants associated with circulating IGF1 levels, and used these variants to provide insights into the genetic regulation of IGF1 levels and the causal impact of IGF1 on early tumorigenesis and cancer risks. Our approach to GWAS sample inclusion and signal selection, using a combination of clumping and conditional analyses, led to twofold increase in the number of associated genetic determinants. The resulting stronger genetic instrument for IGF1 allowed us to detect evidence to support causal associations of IGF1 with LOY, a marker of genomic instability, and higher risks for colorectal and breast cancers, for which previous genetic studies had found inconclusive evidence 22-25 .
We showed that genetically predicted higher IGF1 levels were associated with higher risk of breast cancer, with similar effects on both ER+ and ER-subtypes, and consistent effects on any breast cancers in both BCAC and UKBB datasets. Recent MR studies using a smaller set of IGF1-associated variants found evidence for a role of IGF1 in ER+ but not ER-breast cancer 22,23 . There is previous evidence that IGF1 may promote early risk factors for breast cancer, such as higher mammographic density 30 , and also later stages of progression, such as proliferation and chemotherapy resistance of breast cancer cells 31-33 . Our findings linking IGF1 and LOY support its role in early tumorigenesis. While LOY is measured in leukocytes and exclusively in men, recent studies have demonstrated that it represents ubiquitous DNA repair mechanisms and cell regulatory processes, with relevance to cancers in various tissues, and in both sexes, including higher risk of breast cancer in women 28,29 .
Furthermore, previous genetic studies had suggested that any effect of IGF1 on breast cancer was restricted to ER+ subtypes 22,23 , a view supported by experimental evidence for crosstalk in cells between the estrogens and IGF1 signaling pathways 34 . Estrogen increases the expression of IGF receptors in breast cancer cells, thus increasing IGF1 mitogenic activity. By contrast, we showed that genetically predicted higher IGF1 levels conferred similar effects on both ER+ and ER-breast cancer subtypes, and suggests a mechanism for IGF1 independent of estrogen receptors to stimulate breast cancer formation.
Our finding for colorectal cancer is consistent with previous epidemiological studies 25,35,36 , including a phenotypic analysis in UKBB which showed that higher prediagnostic levels of circulating IGF1 were associated with higher risks of proximal, distal and rectal colorectal cancers, which persisted controlling for other serologic factors (CRP, testosterone, SHBG, and HbA1c) 25 . Colonocytes express IGF1-receptors; these are frequently overexpressed in neoplastic cells, and monoclonal antibody blockade of these receptors inhibits cell proliferation.
Previous phenotypic studies have reported the role of IGF axis in prostate cancer progression and mortality 37,38 . We found a genetic association between IGF1 and risk of prostate cancer in both PRACTICAL and UKBB. However, no association was seen in any sensitivity analysis that controlled for the wide observed heterogeneity between effects of individual variants. Heterogeneity might arise because some variants are affected by confounding factors, such as BMI, or reflect differing effects of specific pathways in IGF1 regulation. For example, a previous MR analysis using a single genetic instrument in the IGFBP3 region reported the opposite (i.e. protective) effect of IGF1 on prostate cancer progression and mortality 24 . IGFBP3 binds 99% of circulating IGF1 and hence may protect tissues from effects of bioavailable IGF1 and may even have separate biological effects.
We acknowledge a number of limitations in our work. We identified preliminary evidence suggesting the role of IGF1 in multiple myeloma, hepatoma and biliary tract cancers. However, these outcomes were only available in UKBB, the same data source that we used to derive the genetic instrument for IGF1 levels. This is a potential source of bias and these findings require confirmation in future studies. Unfortunately, data on IGFBP3 levels were unavailable in UKBB, so we were unable to distinguish independent effects of IGF1 and its major binding protein. However, in the independent EPIC-Norfolk study, our IGF1 genetic score explained a much smaller proportion of variance in circulating IGFBP3 (~2.6%) than IGF1 (~7.5%) levels. Therefore, the disease effects indicated by our IGF1 genetic score are predominantly mediated by IGF1 alone. The MR approach assumes a linear relationship between the exposure and outcome. Hence, we could not test for threshold effects. However, previous phenotypic studies have found no evidence for such non-linear relationships, e.g. with risk of colorectal cancer 25 .
Our findings may have potential implications for disease prevention and management. First, they support the further exploration of lifestyle approaches to avoid high IGF1 levels, for example, for example possibly by reducing total protein and dairy protein intakes 39,40 . Secondly, our data underline existing clinical approaches for cancer screening in patients with disorders of GH-IGF1 axis overactivity 13 , and may suggest extension of such protocols beyond colorectal cancer. Thirdly, our findings may encourage renewed interest in use of anti-IGF1 therapies for cancer outcomes 41,42 .
In conclusion, our findings provide new insight into the genetic regulation of IGF1 levels, and demonstrate the likely causal role of IGF1 in early tumorigenesis and susceptibility to breast and colorectal cancers, with preliminary evidence for other cancers.

Phenotype preparation in the UK Biobank
The genome-wide association study (GWAS) for IGF1 levels was performed in the UK Biobank study (UKBB) 43,44 . Details of this study, including data collection and processing, are extensively described elsewhere [45][46][47] . Briefly, UKBB involves ~500,000 adult participants with information on genotypes and phenotypes, including the measurements of 34 biomarkers. Informed consent was provided by all participants. Study approval was received from the National Research Ethics Service Committee North West-Haydock and all study procedures were performed according to the World Medical Association Declaration of Helsinki ethical principles for medical research. Here, we used genetic data from the UKBB 'v3' release 45 , containing the full set of Haplotype Reference Consortium (HRC) and 1000 Genomes imputed variants 48 .
Genetic data and IGF1 levels were available in up to 461,554 UKBB participants, with mean age 56.5 years (range: 37-73) and mean serum IGF1 concentration 21.4 nmol/l (range: 1.445-127.766 nmol/l). In addition to the quality control criteria applied by UKBB, we performed further procedures. In order to study a sample of relatively homogeneous ancestry, we restricted our GWAS analysis to UKBB participants of 'white European' ancestry, identified using K-means clustering for the first four principle components derived from genotype information. Individuals who clustered in this group but did not self-identify as white Europeans by the online questionnaire were excluded. Individuals with extreme levels of IGF1 (beyond 5 SD from the mean value) were also excluded. After these exclusions, data were available on up to 428,525 UKBB individuals, with a mean IGF1 concentration of 21.3 nmol/L (range: 1.445-42.449 nmol/l).

Genetic discovery analysis in the UK Biobank
The association between genetic variants and circulating IGF1 concentrations was examined using linear mixed models, implemented in BOLT-LMM v2.3.4, as this approach enables robust control for population stratification and cryptic relatedness 49 . We performed three sets of association analysis, using male and female data, separately, adjusted for age, as well as a sex-combined dataset, adjusted for both age and sex. Statistically independent signals for IGF1 were identified using 1Mb distance-based clumping of all variants with P-value 5×10 -8 , MAF > 0.1% and an imputation quality score > 0.5. Lead signals were defined as independent signals with the lowest P-value, which were uncorrelated with other independent signals in each LD block (r 2 <0.05). These were supplemented with additional independent signals ('secondary signals'), which were identified using approximate conditional analysis implemented in genome-wide complex trait analysis (GCTA) 50 . Conditional analysis considers LD between variants in a reference dataset, and selects additional signals independently associated with circulating IGF1 concentrations when conditioning on the effect of the lead signal at that locus. Secondary signals met the following criteria: (1) P < 5×10 -8 in both pre-and post-conditional analyses, (2) uncorrelated with another signal (r 2 < 0.05) and (3) its beta estimate changed by < 20% between pre-and post-conditional models 51  To assess the genetic correlation in circulating IGF1 concentrations between men and women, linkage disequilibrium score regression (LDSC) was applied to the summary statistics from the sex-stratified GWAS models 52 . The same approach was used to examine shared genetic aetiology between IGF1 and other complex health and behavioural traits using LD Hub, a centralized database and web interface which contains publicly available GWAS summary statistics for 245 traits of individuals with European ancestry 53 .
Replication of GWAS identified signals was performed in an independent dataset from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Norfolk cohort, a population-based prospective study 27 . Data on IGF1 levels were available for 7544 participants with genotype data only on autosomal variants. We examined consistency in the directions of association between individual signals and IGF1 levels, as well as their statistical significance. Where a particular signal was not present in EPIC-Norfolk, we searched the UKBB white European dataset for proxies (within 1 Mb and r 2 > 0.5) and chose the proxy with the highest r 2 value (Extended data, Supplementary Table 11) 26 . Additionally, regression models were conducted on ventiles of the combined genetic score for higher IGF1 levels, and were controlled for assay batch, age, sex, BMI, height and ten genetic principal components as covariates. Allelic scoring was performed using the --score command in PLINK 1.9, weighted for effect estimates from the UKBB discovery analysis.

Identifying putatively functional genes
We applied the Ensembl Variant Effect Predictor (VEP) 54 to identify predicted deleterious variants correlated (r 2 >0.8) with any IGF1 signal. The identified variants, including missense, stop-gained and splice site disrupting variants, were then classified in three categories, high, medium and low, based on the impact of a variant on the structure and function of a protein using VEP, SIFT and PolyPhen 55,56 . 'High impact' variants were defined as stop gained, frameshift or splice site disrupting by VEP. 'Medium impact' variants were defined as: missense variants with moderate impact by VEP and deleterious by SIFT and at least possibly damaging by PolyPhen. 'Low impact' variants were defined as: missense variants with moderate impact by VEP and tolerated and/or benign by PolyPhen (Extended data, Supplementary Table 3) 26 . LD was calculated from best guess genotypes for 1000 Genomes Phase 3 and HRC imputed variants using PLINK v1.9, based on 25,000 white British, unrelated participants from the UK Biobank. UCSC LiftOver was used to convert variant locations from b37 to b38.

Pathway analysis
We annotated GWAS signals located within 300 kb of genes known to be involved in the regulation of IGF-1, including three specific pathways IGF1 secretion, growth hormone secretion and IGF1 serum balance, using KEGG, Reactome and manual curation. Secondly, we performed gene set enrichment analysis (GSEA) using STRING to identify pathways highlighted by the genes mapped by IGF1-associated non-synonymous variants (maximum distance 500kb).
Pathway associations were tested more broadly using Meta-analysis Gene-set Enrichment of Variant Associations (MAGENTA) software, which implements a GSEA-based approach 57 . MAGENTA maps each gene in the genome to GWAS variants within upstream and downstream limits of 110 kb and 40 kb, respectively, and assigns each gene a score according to the lowest variant P-value for association with IGF1. Genes in the HLA region were excluded due to their strong and complex linkage disequilibrium patterns and high gene density. The gene score was then corrected for SNP density, LD-related properties and gene size in a regression model. In total, 3216 pathways from Gene Ontology, PANTHER, KEGG and Ingenuity were tested for enrichment of genes associated with IGF1 levels, using 10,000 permutations. Individual pathways were considered significant by a FDR<0.05 for either the 75th or 95th percentile for enrichment, and 90 pathways met this threshold (Extended data, Supplementary Table 6) 26 .
Mendelian randomization: instrumental variable selection MR analysis was applied to examine the likelihood of a causal effect of IGF1 on genomic instability and cancer risks. In this approach, genetic variants that are significantly associated with an exposure of interest are used as instrumental variables (IVs) to test the causality of that exposure on the outcome of interest. For a genetic variant to be a reliable instrument, the following assumptions should be met: (1) the genetic instrument is associated with the exposure of interest, (2) the genetic instrument should not be associated with any other competing risk factor that is a confounder, and (3) the genetic instrument should not be associated with the outcome, except via the causal pathway that includes the exposure of interest 19 .
We used the 831 IGF1 genome-wide significant signals as IVs. If a particular signal was not present in the outcome GWAS, we searched the UKBB white European dataset for proxies (within 1 Mb and r 2 > 0.5) and chose the variant with the highest r 2 value (Extended data, Table 11) 26 . Genotypes at all variants were aligned to designate the IGF1-increasing alleles as the effect alleles.

MR: outcome data
Mosaic LOY in leucocyte DNA in men, an established marker of genomic instability, was used in a bidirectional MR analysis, thus examining the effect of IGF1 on LOY, and vice versa. This was performed using UKBB data, where the presence of LOY in individuals was identified using the pseudoautosomal region (PAR)-LOY method 58 . This approach uses the diploid nature of the PAR to detect LOY based on the differences between maternal (X PAR) and paternal (Y PAR) allelic intensities at heterozygous sites. In the presence of LOY, Y PAR intensities are lower compared to X PAR intensities, and positive values for PAR-LOY indicate more loss of chromosome Y. We used the genetic signals for LOY identified by Thompson et al, 2019 using the same PAR-LOY method 28 as genetic instruments to assess the causal inference of LOY on IGF1. The replication analysis was performed using the data generated from the customer base of 23andMe Inc., a consumer genetics company. This was done using a continuous measure of Y chromosome relative intensity as the outcome where LOY was quantified on the basis of median genotyping intensity over the non-PAR of the Y chromosome (mLRR-Y) using the protocol described previously 29 . The data were available for 653,019 male participants of European ancestry. Unlike in PAR-LOY, negative values for mLRR-Y indicate more loss of chromosome Y.
According to the previously reported phenotypic evidence, we considered following primary cancer outcomes: colorectal, thyroid, breast, prostate, ovarian, melanoma and lung cancer 14 . There is a potential for bias in MR where IVs and outcomes are drawn from the sample. Therefore, where available we used independent cancer GWAS datasets from consortia without participant overlap with UKBB 59 and chose the largest publicly available GWAS dataset for each outcome. These include: the BCAC consortium for breast cancer (any cancer, and estrogen-receptor status specific cancers) 60 , ECAC for endometrial cancer 61 , OCAC for ovarian cancer 62 , ILCCO for lung cancer 63 and PRACTICAL for prostate cancer 64 . Information on the data sources is provided in Extended data, Supplementary Table 10 26 . We additionally assessed the causal inference of IGF-1 levels on cancer outcomes using data on 22 site-specific cancers available in UKBB for 367,586 unrelated white Europeans (Extended data, Supplementary Table 9) 26 . Cancer data were available in UKBB from national cancer registries, electronic health records, hospital episode statistics data, death certification data, and self-reported information validated through an interview. In case of sex-specific cancers (breast, cervical, ovarian, prostate and uterine cancer) the data were available for individuals of the relevant sex only, including 198,838 women and 168,748 men).

MR models
Our primary MR model was the inverse-variance weighted (IVW) model, which offers the most statistical power 65 . However, it does not correct for heterogeneity in outcome risk estimates between individual variants 66 , which appeared to be significant for several cancer outcomes (Extended data, Supplementary Tables 8 and 9) 26 . We therefore applied a number of sensitivity MR methods that better account for heterogeneity 67 . MR Egger was used to identify and correct for unbalanced heterogeneity ('horizontal pleiotropy'), indicated by a significant Egger intercept (P<0.05) 68 . To correct for balanced heterogeneity, we used the weighted median (WM) and penalised weighted median (PWM) MR models 69 . We also applied the MR Radial method which excludes variants from each model if they are recognized as outliers 70 . Finally, we used Steiger filtering to assess for potential reverse causality (i.e. variants with stronger association with the outcome than with the exposure) 71 . This identified only one genetic variant (with a larger effect on endometrial cancer than on IGF1 levels), and excluding this variant did not change the findings. I2 statistics were used to assess heterogeneity in outcome risk estimates between individual SNPs.

Multivariable MR
Previous research on IGF1 showed its association with BMI and height: IGF1 is a major driver of childhood growth, thus adult data on height and BMI may represent a convenient indicator of childhood IGF1 exposure 72 . In order to confirm the effect of BMI on the levels of IGF1, we performed the primary MR analysis. BMI instruments were derived from the European sex-combined GIANT consortium dataset 73 . After confirming the association, we investigated whether and to what extent BMI and height mediate the relationship between IGF1 and outcomes of interest using multivariable MR. To include BMI and height as covariates, we performed the lookup of IGF1 SNPs in BMI and height data available in the UKBB. The effects of each genetic instrument on multiple exposures are included as covariates, which after the adjustment allows the measurement of the independent, direct effect of IGF-1 levels on cancer outcomes not mediated by BMI and height 74 .
Results presented in the main text are expressed per +1SD increase in IGF1 levels (equivalent to about 5.5 nmol/l in UKBB). Values in Extended data, Supplementary Tables 7-9 26 are raw data, per unit IGF1.

Data availability
Underlying data Access to UK Biobank data is available to registered researchers worldwide from across academia and industry, without the need for collaboration, to perform health-related research that is in the public interest. The IGF-1 data from UKBB is available in the Discovery dataset: Data-Field 30772.
Similarly, EPIC-Norfolk (Replication study) data are available by application. To request data from EPIC-Norfolk, please click here to download a Data Request Form. To view a list of variable names collected by EPIC-Norfolk, then click here to access the Data Dictionary. To contact the IPIC-Norfolk team, please email epic-norfolk@mrc-epid.cam.ac.uk.
The data from 23andMe on loss of chromosome Y (LOY) used in our study are provided in the Extended data, Supplementary  Table 12. Researchers wishing to access the full LOY dataset should follow the steps described on 23andMe webpage, where they will be asked to submit a research proposal summary.