Genome-wide association study of vegetarianism in UK Biobank identifies association with VRK2 [version 1; peer review: 3 approved with reservations]

Prospective studies have observed differences in risks for several health outcomes when comparing meat-eaters and vegetarians, but the mechanisms underlying these differences remain uncertain. Identifying genetic factors related to vegetarianism may be valuable for assessing causality. We report a genome-wide association study (GWAS) of vegetarianism in 367,198 participants from UK Biobank. We identified one locus, rs10189138, near the vaccinia related kinase 2 ( VRK2 ) gene, significantly associated with vegetarianism (β=0.153, p=3x10 -8 ). The associations between rs10189138 and 40 traits were calculated, and the rs10189138 T allele (MAF=0.12) was found to be significantly associated with greater height, after controlling the false discovery rate (FDR). Correlations between genetically predicted vegetarianism and 855 other genetically predicted traits were also calculated, and vegetarianism had significant positive genetic correlations with fluid intelligence and age at menarche, after controlling the FDR. Future research on an independent sample is needed to see if this GWAS result can be replicated.


Introduction
Vegetarian diets, defined by abstention from consuming meat, meat products, poultry and fish, have been gaining popularity in Western countries.Previous prospective cohort studies have reported lower risks for vegetarians compared to meat-eaters for several conditions, including obesity 1 and ischemic heart disease 2 , but higher risks have been reported for haemorrhagic stroke 2 and some fractures 3 .
Understanding factors contributing to a person's decision to follow a vegetarian diet could be important for understanding the mechanisms that underlie differences in disease risks.Further, identifying genetic factors related to this behavioural trait may aid in assessing causality through Mendelian randomization methods.The genetic contribution to vegetarianism is largely unknown due to the lack, until recently, of large studies of diet and genetic variants.However, previous studies have reported moderate to strong heritability for dietary patterns, food preferences, and appetite 4 .We aimed to identify genetic loci associated with vegetarianism through a genome-wide association study (GWAS) using the UK Biobank cohort.

UK Biobank data
The UK Biobank is a prospective cohort of approximately 500,000 people aged 40-69 who were recruited from 2006-2010 across the United Kingdom.A description of the study protocol and design can be found online.Briefly, participants were identified from National Health Service registers and lived within 25 miles of one of the 22 assessment centres.At recruitment participants completed a touchscreen questionnaire, had physical measurements and blood samples taken, and provided informed consent.Details of the blood sample assay methods and quality procedures can be found online.The study was approved by the North West Multi-Centre Research Ethics Committee (reference number 06/MRE08/65).
We used information from the touchscreen 'food frequency' questionnaire (FFQ) to classify participants as either vegetarian or non-vegetarian.The FFQ included 29 questions on the frequency of consumption of foods, with answers ranging from 'never' to 'once or more daily', and has been shown to be highly reliable for meat and fish intake 5 .Participants were defined as vegetarian if they answered 'never' to all questions related to the consumption of meat (including processed meat, beef, lamb or mutton, pork, chicken, turkey or other poultry) and fish (including oily fish and other fish).Participants were excluded if they did not answer the relevant questions (n=11,557).
From April 2009, participants who were recruited were additionally invited to complete the Oxford WebQ, a 24-hour dietary assessment tool.If participants provided an email address then they were invited to complete the WebQ up to four additional times between February 2011 and June 2012.We excluded WebQ records where energy intake was below or above pre-specified sex-specific limits (<3,349 kJ or >16,747 kJ for men and <2,093 kJ or >14,654 kJ for women; n=9,157), or where a record was non-representative due to illness or fasting (n=2,971), and WebQ data were only used if a participant completed three or more questionnaires (n=57,915).Estimated mean weekly intakes of total, red, and processed meat, and poultry were calculated by combining responses from both the FFQ and WebQ.
The additive association between the binary vegetarian phenotype and each SNP was tested using a logistic regression model, adjusted for the first twenty principal components, genotype array, sex, age, and age-squared.
Linear regression models, with the same covariates as listed above, were used to estimate the per-allele difference for selected baseline characteristics, blood biomarker concentrations, and meat intakes (in non-vegetarians).These estimates were used to calculate the percent change per-allele in comparison to the mean value in participants with GG genotype.A false discovery rate (FDR) controlling procedure (Benjamini-Hochberg) was used to account for multiple testing.
We used linkage disequilibrium (LD) score regression to estimate genetic correlation between genetically predicted vegetarianism and publicly available GWAS summary-level results for 855 heritable traits on LDHub 6 .
VRK2 is a serine/threonine kinase that is important in several signal transduction cascades 7 .Previous studies have reported an association at this locus with differences in the overall intake of beef, lamb/mutton, pork, and processed meat 8 , higher risk of schizophrenia, major depressive disorder and genetic generalized epilepsy 9 and longer sleep duration 7 .
Using linear regression models, we found that the rs10189138 T allele was significantly associated with increased height (0.10 cm; 95%CI: 0.05 to 0.14) (Figure 1).Denotes associations that were statistically significant after applying a False Discovery Rate (FDR) controlling procedure to account for multiple test. 2 Analysis only includes participants who are non-vegetarian.
Using LD score regression to estimate genetic correlation, we found that fluid intelligence (r g =0.12; 95%CI 0.06 to 0.18) and age at menarche (r g =0.09; 95%CI 0.04 to 0.14) were significantly correlated with vegetarianism (Figure 2) after controlling the FDR.These results are in keeping with previous epidemiological evidence that has shown a significant association between vegetarianism and higher IQ in childhood and adulthood, as well as higher attainment of academic qualifications 10 .Additionally, a small prospective study of American schoolgirls previously reported a 6 month later age at menarche among vegetarians compared to non-vegetarians 11 .

Conclusion
In summary, we report one novel SNP, rs10189138, near VRK2, that is significantly associated with a 16.5% per-allele increased likelihood of following a vegetarian diet.Future research is needed on an independent sample to see if this result can be replicated.

Data availability
We use data from the UK Biobank resource under Application Number 3248 for this work.All bona fide researchers can apply to use the UK Biobank resource for health-related research that is in the public interest.Further information on the application process is available from the UK Biobank website (https://www.ukbiobank.ac.uk/register-apply/). Figure 2. Genetic correlations of vegetarianism with selected heritable traits 1 . 1 Results are shown for 21 out of 855 heritable traits tested, where p<0.01. 2 Denotes correlations that were statistically significant after applying a False Discovery Rate (FDR) controlling procedure to account for multiple testing.
In addition to their comments, my comments are as follows: 1) The title of the article must clearly state that this is preliminary finding.
2) The introduction part should be longer.An expanded review of the previous research and the potential substantial value added by the author's research would provide context for the hypothesis.The introduction part is not consistent yet.
3) Methods section must contain all details (might be moved to supplementary material if too big).Please state clearly the methods you have used, including QC procedures, software, adjusting for PC etc.This would give the reviewers and readers of the paper an understanding of its strengths and limitations.
Please describe the case and control selection.The strength of this paper is in combining different questionnaires for consistent selection of the cases.However there is a lack of details in description of the final numbers of cases and controls.
Please provide a full description of statistical analysis of association.Please mention the imputation panel used for these samples.
Please provide a more detailed description of LDSC analysis.Please provide all QC steps including the software used.I have the same concern as other reviewers -this analysis might be underpowered.
My main concern is that the identified variant is an artifact from uncorrected population structure or some other issue with the data.Further research and combining this data with other cohorts may eliminate significant results from this study 4) Results and Discussion.Please provide Manhattan and LocusZoom plots first.
Please provide more details for additional tested traits: Do they come from UKBB? How were they selected?Are there any overlaps between samples?
As mentioned above -please provide all the details about methodology, use other reviewers comments and take them step by step, then it would be easier to understand the data reported in the results section.So far I cannot comment on the heritability analysis results due to the lack of methodology described in the methods section.
Please list all strengths and limitations of your study.Please provide a more detailed description of associated gene and locus.Please bring your findings to the context of the study.

6) Please make conclusion longer.
I have listed only the very major comments.When this would be fixed, I would be happy to move to another round of reviewing with more detailed analysis.This paper can be a good example of a concise cohort report with preliminary findings.However all the comments and concerns raised by reviewers must be addressed first.Thank you!

Are the conclusions drawn adequately supported by the results? No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: GWAS, genomic imprinting, reproduction, diabetes.The authors report a single marker with p-value just below the conventional genome-wide significance threshold of 5*10^-8, which may be a true association.However, it would increase confidence in the finding somewhat if nearby markers in LD with this variant showed similar levels of association.Could the authors please provide a Manhattan plot, and a regional association plot ("LocusZoom" style) as supplementary materials?As already suggested by reviewer 1, the full GWAS summary statistics should also be made available if at all possible.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable
Could the authors contrast their curated phenotype based on specific food items to self-reported adherence to a vegetarian and/or vegan diet?What is the correlation between the authors' derived phenotype and self-reported vegetarian diet?What is the effect of the reported lead SNP on self-reported vegetarian diet?

If applicable, is the statistical analysis and its interpretation appropriate?
Yes Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genetic epidemiology, genome-wide association studies of complex traits.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Reviewer Report 01 February 2021 https://doi.org/10.21956/wellcomeopenres.18036.r42083 Fensom et al. conduct a genome-wide association study of vegetarianism in individuals of white British ancestry from UK Biobank.They identify one locus reaching genome-wide significance, SNP rs10189138, in an intron of the VRK2 gene, encoding a serine-threonine kinase.Follow-up analysis includes testing for association between this SNP with additional phenotypes in UK Biobank and overall genetic correlation analysis.Though this locus has already been identified to be associated with a wide variety of dietary habits including meat and fish intake and several empirical dietary patterns in individuals of European ancestry in UK Biobank ((Cole, Florez, & Hirschhorn, 2020) Supplemental Table 4) 1 , this is the first reporting to my knowledge of its association with a manually curated composite phenotype of vegetarianism, providing more insight into its overall association with dietary intake and potentially the function of VRK2.
The primary strength of the study is the use and combination of multiple touchscreen questionnaire answers related to meat and fish consumption to curate a moderately sized 'case/control' analysis of vegetarianism (>5K) versus controls (>360K), as well as testing the lead SNP's association with a wide variety of outcomes in UK Biobank to better understand pleiotropy.
Below are my comments, in order of significance

Major Comments
Overall the Conclusion section is quite brief, and I believe the reader would benefit greatly from a discussion on both the limitations and interpretations of the findings in this study. 1.
UK Biobank may be subject to population structure that is uncorrected using principal components in white British individuals alone (Haworth et al., 2019) 2 .Given population and geographical differences are strongly associated with dietary intake and height (Sohail et al., 2019) 3 , it remains possible that the lead signal is an artifact of uncorrected sample structure.This issue merits either further analytical investigation or at minimum attention in the Conclusion section.Of note, the follow-up analyses I performed in Cole et al. using linear mixed models and adjustment for assessment center found limited residual confounding via LD score regression (Cole et al., 2020) 1 , but again, I believe readers should be made aware of its potential impact in the Conclusion section.

2.
Calculating genetic correlation (shared heritability) between a trait with ~1% heritability seems underpowered.I am surprised by the ability to identify significant genetic correlations, can you please A) describe how you calculated heritability, B) describe the original studies for these additional tested traits C) describe your results in the context of what would appear limited power.Also, do most of these traits come from UK Biobank, and if so, how would sample overlap affect your findings?

3.
There is no description of how any of the additional phenotypes in Figure 1 are derived in UK Biobank.Given UK Biobank provides raw measurements, some level of quality control should be taken on these variables, For example, BMI and several biomarkers have largely skewed distributions, how did you handle those for linear regression?It may be easier to obtain summary statistics of this SNP's association with additional traits using PheWAS servers (https://biobankengine.stanford.edu/;http://big.stats.ox.ac.uk/), or look-ups in publicly available GWAS (e.g.GIANT consortium), or the recently published GWAS on UK Biobank biomarkers (Sinnott-Armstrong et al., 2021) 4 .These resources performed at least minimal phenotypic pre-processing.

Minor Comments
It is unclear how the 24-hour recall questionnaire from 57K individuals was combined with FFQ data to calculate weekly meat intake for all individuals (>360K).Does this mean that the majority of data is from FFQ, and a subset of contributing individuals have their weekly intake value derived from both questionnaires?If this is the case, why limit to just the 57K individuals that took the questionnaire three times, when ~200K took it at least once.Understandably, this data is more reliable for individuals that participated in the 24HR questionnaire more than once, but at minimum a clear description of how the 24HR questionnaire supplemented the FFQ data is necessary to understand this phenotype.

1.
Can you please clarify the % difference per-allele calculation in Figure 1.It appears the figure reports effect estimates in units of the predictor under an additive model at this SNP, but I am unclear how the percent change calculation is made.

2.
Given the manuscript reports methods before results, the mention of a "GG genotype" in the linear regression models section has no context.

3.
Is the effect allele, T, also the minor allele? 4.
The SNP appears to be in an intron of VRK2, making it a slightly more compelling candidate gene than using the term "near".

5.
Will the GWAS be made publicly available for download?It should be shared at least by request to the authors.Reviewer Expertise: Genetics of complex traits, particularly that of dietary intake and its relationship with metabolic traits.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 1 .
Figure 1.Association between variant rs10189138 and selected baseline characteristics, biomarker concentrations and meat intakes (in non-vegetarians) in UK Biobank.1 Denotes associations that were statistically significant after applying a False Discovery Rate (FDR) controlling procedure to account for multiple test.2Analysis only includes participants who are non-vegetarian.
scientific standard, however I have significant reservations, as outlined above.Reviewer Report 18 May 2021 https://doi.org/10.21956/wellcomeopenres.18036.r43654© 2021 Karlsson R.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Robert Karlsson Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden This research note had already been thoroughly reviewed when I was invited to review it, and as far as I can see the manuscript has not yet been updated to address the concerns raised by reviewer 1 (Dr.Cole).I agree with all of the comments already given by Dr. Cole, and just want to add the following mostly minor remarks: The methods subsection concerning genotype data analyses is a little short on details and references, specifically: what software was used for the quality control and management of genome-wide genotype data?○ what software was used for genome-wide association analysis?○ what software was used for other association tests?○

6 . References 1 .
Cole J, Florez J, Hirschhorn J: Comprehensive genomic analysis of dietary habits in UK Biobank identifies hundreds of genetic associations.Nature Communications.2020; 11 (1).Publisher Full Text 2. Haworth S, Mitchell R, Corbin L, Wade K, et al.: Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis.Nature Communications.2019; 10 (1).Publisher Full Text 3. Sohail M, Maier R, Ganna A, Bloemendal A, et al.: Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies.eLife.2019; 8. Publisher Full Text 4. Sinnott-Armstrong N, Tanigawa Y, Amar D, Mars N, et al.: Genetics of 35 blood and urine biomarkers in the UK Biobank.Nat Genet.2021.PubMed Abstract | Publisher Full TextIs the work clearly and accurately presented and does it cite the current literature?YesIs the study design appropriate and is the work technically sound?YesAre sufficient details of methods and analysis provided to allow replication by others?PartlyIf applicable, is the statistical analysis and its interpretation appropriate?YesAre all the source data underlying the results available to ensure full reproducibility?PartlyAre the conclusions drawn adequately supported by the results?PartlyCompeting Interests: No competing interests were disclosed.