The genome sequence of the killer whale, Orcinus orca (Linnaeus, 1758)

We present a genome assembly from an individual female Orcinus orca (the killer whale; Chordata; Mammalia; Artiodactyla; Delphinidae). The genome sequence is 2.65 gigabases in span. The majority of the assembly (93.76%) is scaffolded into 22 chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 16.4 kilobases in length.


Background
The killer whale, Orcinus orca, is the largest and most widely geographically distributed species of dolphin (Delphinidae). Killer whales are found from the Arctic to the Antarctic, and all waters in between, occurring in the greatest densities at high latitudes (Forney & Wade, 2007). In these areas of high biological productivity, they have evolved into ecotypes with different prey preferences. The best-studied of these being the fish-eating and mammal-eating ecotypes of the North Pacific (Ford et al., 1998). These two ecotypes differ not only in their diet, but also their behaviour and social structure (Ford et al., 1996). From the first study using restriction fragment length polymorphism (RFLP) (Stevens et al., 1989) or D-loop sequences (Hoelzel & Dover, 1991) of mitochondrial DNA to complete mitochondrial genome sequences (Morin et al., 2010), and from microsatellites (Barrett-Lennard, 2000;Hoelzel et al., 1998) to whole genome sequences (Foote et al., 2016), the significant genetic differentiation between the two North Pacific ecotypes have been repeatedly and ever-more robustly established. The genetic impact of post-glacial colonisation of these waters by killer whales and the genetic signatures associated with divergence into distinct ecotypes have been a focus of genetic and genomic studies (Foote et al., 2016;Foote et al., 2019;Foote et al., 2021;Hoelzel et al., 2007).
Genomic studies are also providing insights into the fitness and health of populations through the estimation of mutation load and inbreeding (Foote et al., 2021). Studies have found a small population of killer whales found in UK waters to show zero fecundity, consistent with high inbreeding (Beck et al., 2014), and genome sequencing revealed 38% of the genome comprising of runs of homozygosity in one female (Foote et al., 2021). The population now consists of just two adult males and is therefore beyond rescue (Allendorf et al., 2022). There are other populations of killer whales that seasonally occur in UK waters. These include groups that migrate each summer from Iceland to the Northeast of Scotland and Northern Isles of Orkney and Shetland, where they hunt seals close to shore (Samarra & Foote, 2015). Genomic resources are key to monitoring the health, in terms of inbreeding, in these killer whale populations.
The existing draft genome assembly for the killer whale (Oorc_1.1) was generated as part of the first high-throughput sequencing project on marine mammal genomes (Foote et al., 2015). Prior to this, only a 2x coverage genome of a bottlenose dolphin, generated using the same methods as the first human genome project, was available as a reference (Lindblad-Toh et al., 2011). The Oorc_1.1 assembly came about following an unusual event. When a young female killer whale stranded in 2010 on the coast of the Netherlands, it provided the possibility to access fresh blood samples, collected by Dolfinarium Harderwijk for health checks. The priority was to extract DNA and sequence informative markers that could provide an indication of the population of origin of the killer whale, now named Morgan. This would be a key step, if releasing Morgan back to the wild was deemed possible. A combination of genetic analyses, and acoustic matching of Morgan's vocal repertoire to recording databases (stereotyped call repertoires are culturally transmitted down matrilineal lineages in killer whale societies; Ford, 1991), pinpointed Morgan's origins as the population that feeds upon the Norwegian spring-spawning herring, Clupea harengus, (Vester & Samarra, 2011). The blood sample (BioSample: SAMN01180276) also provided sufficiently high molecular weight to generate the long-insert libraries needed for the first draft of the killer whale genome.
Whilst ultimately, the goal to rehabilitate and release Morgan back to the wild was unsuccessful, the genome assembly that resulted from that process has provided the backbone for several genomic studies which have contributed to our understanding of wild killer whale populations. The announcement by the Darwin Tree of Life project to generate high quality assemblies for all UK species provided an opportunity to improve the killer whale genome. Archived blood from Morgan once again provided the material, this time for generating Hi-C data and single molecule long-reads. The improved contiguity of this new DToL assembly will usher in a new phase of genomics research on killer whales, which will include identifying structural variants and phasing of population genomic data.

Genome sequence report
The genome was sequenced from a juvenile female O. orca collected from Dolfinarium, Harderwijk, Netherlands (Figure 1). A total of 34-fold coverage in Pacific Biosciences Figure 1. Image of Morgan, the Orcinus orca individual whose blood sample was used for the genome assembly. Morgan now resides at Loro Parque, Tenerife. Image credit: Dolfinarium, Harderwijk, Netherlands. single-molecule HiFi long reads were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data, used with permission from the DNA Zoo Consortium (dnazoo.org). Manual assembly curation corrected 100 missing/misjoins, reducing the scaffold number by 13.87%, and increasing the scaffold N50 by 8.19%.
The final assembly has a total length of 2.65 Gb in 447 sequence scaffolds with a scaffold N50 of 114.2 Mb ( Table 1). The majority, 93.76%, of the assembly sequence was assigned to 22 chromosomal-level scaffolds, representing 21 autosomes (numbered by sequence length) and the X sex chromosome (Figure 2-Figure 5; Table 2). The order and orientation of scaffolds in a repetitive region of chromosome 17 (~33.6Mb) is uncertain.  blood vial before transportation on dry ice to Copenhagen where it was frozen at -80 degrees. DNA was extracted from the blood sample of mOrcOrc1 by Andrew Foote at the University of Copenhagen.
The pre-extracted DNA arrived at the Tree of Life laboratory, Wellcome Sanger Institute, courtesy of Andrew Foote. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. and used with permission from the DNA Zoo Consortium (dnazoo.org).

Sequencing
Pacific Biosciences HiFi circular consensus sequencing libraries were constructed according to the manufacturers'

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2022. The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Acknowledgements
Unpublished genome assemblies and sequencing data are used with permission from the DNA Zoo Consortium (dnazoo.org). The work is exciting and genome data of female killer whales will be necessary. However, the following are some comments to be concerned about: MT genome analysis can be included as a phylogeny. If the MT genome shows polymorphism, it can be significant.

1.
Please show PacBio raw data information and reasons for combining Hi-C data. Table 1 should include a number of raw reads from PacBio and Hi-C incorporated data. Is the N50 of contigs equal to span/genome size...? Please check the BUSCO results of the contigs assembly. 2.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes