The genome sequence of the ornate tailed digger wasp, Cerceris rybyensis (Linnaeus, 1771)

We present a genome assembly from an individual female Cerceris rybyensis (the ornate tailed digger wasp; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 574 megabases in span. The majority of the assembly, 89.81%, is scaffolded into 14 chromosomal pseudomolecules.


Background
The Ornate Tailed Digger Wasp, Cerceris rybyensis, is a widespread, mid-sized (6-12 mm) digger wasp that occurs throughout the Palearctic. In the UK it is common across southern England in habitats with sandy soil and on chalk grassland. Adults are yellow and black with a distinctively ribbed metasoma characteristic of the Genus, that is irregularly banded black and yellow in this species, distinguishing it from other Cerceris species in the UK.
It is a univoltine species, with a flight period from late June to early September. It often nests in dense aggregations, occasionally intermixed with Cerceris arenaria. Adult females dig a 10-15 cm deep burrow with multiple cells branching off laterally (Else, 1997). The females hunt small to medium sized bees of several genera, particularly Lasioglossum and Halictus. Often prey returning to their nest fully laden with pollen are favoured, although males and unladen females are also taken. Navigation back to the nest is aided by the undertaking of arcing orientation flights around the nest as they leave (Zeil, 1993). Prey is paralysed by stinging and malaxated just behind the head, before being carried back to a pre-prepared nest cell within the burrow. Paralysis is achieved by the delivery of venom via the stinger, which contains a complex mix of biogenic amines, peptides and proteins that act as toxins, neuromodulators, immunomodulators, metabolic-modulators and antimicrobial agents (Kote et al., 2019). Each cell is stocked with 5-8 paralysed bees, which remain alive for around 2 days (Lomholdt, 1975). A single egg is laid in each cell, which hatches and consumes the prey provisions before pupating. Individuals remain within the pupal cocoon over winter. Adults visit the flowers of various species to feed on nectar, including common hogweed (Heracleum sphondylium), wild carrot (Daucus carota), yarrow (Achillea millefolium) and creeping thistle (Cirsium arvense).

Genome sequence report
The genome was sequenced from a single female C. rybyensis ( Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.782, longitude -1.316). A total of 58-fold coverage in Pacific Biosciences single-molecule, circular consensus (HiFi) long reads and 91-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 13 missing/misjoins and removed 1 haplotypic duplication, reducing the scaffold number by 1.47%.
The final assembly has a total length of 574 Mb in 51 sequence scaffolds with a scaffold N50 of 19.0 Mb (Table 1). Of the assembly sequence, 89.81% was assigned to 14 chromosomal-level scaffolds, representing 14 autosomes (numbered by sequence length) (Figure 2- Figure 5; Table 2). A large number of repetitive scaffolds were produced in sequencing, which were not able to be placed in the assembly. The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 96.8% (single 96.6%, duplicated 0.3%) using the hymenoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and DNA extraction
A single female C. rybyensis was collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.338) by Liam Crowley, University of Oxford, using a net. The specimen was identified by the same individual and preserved on dry ice prior to transfer to the Wellcome Sanger Institute.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The iyCerRyby1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Abdomen tissue was disrupted using a Nippi Powermasher  fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.
Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina NovaSeq 6000 instruments. Hi-C data were generated from remaining thorax/abdomen tissue using the Arima Hi-C+ kit and sequenced on a NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021) and annotated using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.