The genome sequence of the common wasp, Vespula vulgaris (Linnaeus, 1758) [version 1; peer review: awaiting peer review]

We present a genome assembly from an individual female Vespula vulgaris (the common wasp; Arthropoda; Insecta; Hymenoptera; Vespidae). The genome sequence is 188 megabases in span. The majority of the assembly is scaffolded into 25 chromosomal pseudomolecules.


Introduction
The Common wasp, Vespula vulgaris, is one of the most widespread species of social wasp in the UK. This species is eusocial, living in colonies with a reproductive queen, sterile workers and reproductive males. Colonies are annual in the UK, typically producing up to around 10,000 workers (Archer, 1981). Nests are commonly constructed underground, particularly within old rodent holes, but can also be frequently found in aerial situations such as roof spaces, sheds, outhouses etc. Nests are constructed out of a paper-like substance produced from macerated wood fibres mixed with saliva. The nest consists of hexagonal cells arranged into combs, covered by a nest envelope. The nest envelope of this species is yellow to brown in colour due to the mix of partially rotted wood used to make the pulp.
Overwintered queens emerge from early March, found a nest around May, and produce the first workers from early June (Archer, 2008). Worker numbers build up throughout the summer, with males and new queens produced around September. Nests usually last until late October, although exceptionally, some may last through the winter to February. This species is a generalist predator, with workers preying on a wide range of insect and other arthropod species, which are killed, butchered and malaxated before being carried back to the nest to be fed to the developing brood. Adults feed on carbohydrate rich substances including nectar, sap, honeydew and secretions from the larvae. The propensity of adults to visit flowers, particularly shallow blooms, means this species may act as an important pollinator. We note the recent production of a high-quality genome assembly for V. vulgaris and two other species of this genus (Harrop et al., 2020), and believe the sequence described here, generated as part of the Darwin Tree of Life project, will further aid understanding of the biology and ecology of the common wasp.

Genome sequence report
The genome was sequenced from a single female V. vulgaris collected from Wytham Woods, Oxfordshire, UK (latitude 51.774, longitude -1.332). A total of 87-fold coverage in Pacific Biosciences single-molecule long reads and 190-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 18 missing/misjoins, reducing the assembly length by 0.001% and the scaffold number by 34.15%, and increasing the scaffold N50 by 2.74%. The final assembly has a total length of 188 Mb in 28 sequence scaffolds with a scaffold N50 of 9 Mb (Table 1). Of the assembly sequence, 99.5% was assigned to 25 chromosomal-level scaffolds (numbered by sequence length) (Figure 1- Figure 4; Table 2). The assembly has a BUSCO (Simão et al., 2015) v5.1.2 completeness of 96.4% using the hymenoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
A single female V. vulgaris was collected from Wytham Woods, Oxfordshire, UK (latitude 51.774, longitude -1.332) by Liam Crowley, University of Oxford, using a net. The sample was snap-frozen using dry ice and stored in a CoolRack.
DNA was extracted from head/thorax tissue of iyVesVulg1 at the Wellcome Sanger Institute (WSI) Scientific Operations  to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were sequenced on HiSeq X.
Assembly was carried out with Hifiasm (Cheng et al., 2021). Haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) was carried out with SALSA2 (Ghurye et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass   Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled with MitoHiFi (Uliano-Silva et al., 2021). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020).
The genome sequence is released openly for reuse. The V. vulgaris genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.