The genome sequence of the red deer, Cervus elaphus Linnaeus 1758 [version 1; peer review: awaiting peer review]

We present a genome assembly from an individual female Cervus elaphus (the red deer; Chordata; Mammalia; Artiodactyla; Cervidae). The genome sequence is 2,887 megabases in span. The majority of the assembly is scaffolded into 34 chromosomal pseudomolecules, with the X sex chromosome assembled.


Background
Long term studies of individuals living in the wild offer unparalleled opportunities to study ecological and evolutionary processes (Clutton-Brock & Sheldon, 2010). One such study concerns the red deer on the Isle of Rum, Scotland (https://rumdeer.bio.ed.ac.uk/). For 50 years the lives of individuals living in the northeast part of the island have been followed from birth, through all breeding attempts, to death, giving rise to detailed understanding of the impacts of weather and density on individual survival and breeding success (Clutton-Brock et al., 1982) and how these effects on individuals build up into population dynamics, with implications for deer management (Clutton-Brock et al., 2002). An ambition to measure male breeding success accurately and thereby recover the pedigree for this wild population led to the development of microsatellite and then single nucleotide polymorphism (SNP) assays for red deer and inspired two widely-used computer programs for inferring parentage from genetic markers, CERVUS and Sequoia (Huisman, 2017;Marshall et al., 1998). The resulting pedigree has enabled several evolutionary genetic investigations, such as measurement of the heritability of different traits including breeding success and antler size (Kruuk et al., 2000;Kruuk et al., 2002) and tests of the causes of evolutionary stasis or change in traits such as antler size and birth weight (Bonnet et al., 2019;Kruuk et al., 2002). Our most recent studies have used thousands of genome-wide SNP markers to demonstrate the impact of inbreeding depression in the population (Huisman et al., 2016), to resolve a genetic map for red deer (Johnston et al., 2017) and to investigate determinants of individual variation in recombination in the population (Johnston et al., 2018). Increasingly, our studies depend on knowing precisely where each marker is in the genome.
When the Darwin Tree of Life project announced its intention to make a high quality genome for all UK species, we were delighted that it was proposed to sequence a red deer from Rum, so that the information gained would be maximally relevant to the Rum study population. An effort to obtain suitably high molecular weight DNA from animals on the island failed, and so we turned to the one deer from Rum living on the mainland.
Thistle was born on Rum in 1993 and had a bad start: unusually, her mother rejected her after she was weighed and tagged by the research team. In a rare move, she was taken in and hand reared by Fiona Guinness of the deer project and later shipped to Reediehill Deer Farm, Fife, where one of us who had worked with the deer on Rum in the 1970's (JF) now farms deer. As a hand-tame deer, Thistle had a long and successful career. She featured in several television commercials and feature films but sometimes proved to be too tame to convey the desired image of wildness. She also captivated many children in her role as an educator (Figure 1). She did not socialise readily with other deer on the farm, preferring the comforts of the farmhouse where she was wont to run upstairs to lie on the beds if given the opportunity. She did however produce many calves, the last being born in 2011. In early 2020, at the age of 26½, she was blood sampled for some health checks and we took the opportunity to obtain samples suitable for high molecular weight DNA extraction. The age to which red deer live has been the subject of much speculation and terrific exaggeration over history (Blaxter, 1979), but rational analysis by the same author suggested a maximum of 26. Still alive at the time of writing, aged 28, we think Thistle may be the oldest documented red deer in the world, and she has easily outlived the longest-lived wild hind recorded on Rum who lived to be 25½.

Genome sequence report
The genome was sequenced from a blood sample taken from a female C. elaphus ( Figure 1) collected from Reediehill Deer Farm, Fife, Scotland, UK (latitude 56.304268, longitude -3.274638). A total of 30-fold coverage in Pacific Biosciences single-molecule circular consensus (HiFi) long reads and 26-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 35 missing/misjoins, reducing the assembly length by 8.65% and the scaffold number by 59.66%, and increasing the scaffold N50 by 16.54%.
The final assembly has a total length of 2,887 Mb in 145 sequence scaffolds with a scaffold N50 of 83 Mb (Table 1). The majority, 96.02%, of the assembly sequence was assigned to 34 chromosomal-level scaffolds, representing 33 autosomes (numbered by synteny to the C. elpahus genome described by (Bana et al., 2018)), and the X sex chromosome (

Genome annotation report
The mCerEla1.1 genome has been annotated using the NCBI RefSeq annotation pipeline (    were generated using the Arima v2 Hi-C kit and sequenced on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019). The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate..   Genome annotation The C. elaphus assembly was annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. The annotation was generated from transcripts and proteins retrieved from NCBI Entrez by alignment to the genome assembly, as described here (Pruitt et al., 2014).

Ethical/compliance issues
The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material; • Legality of collection, transfer and use (national and international).
Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators.
The genome sequence is released openly for reuse. The C. elaphus genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.