The genome sequence of the two-banded wasp hoverfly, Chrysotoxum bicinctum (Linnaeus, 1758)

We present a genome assembly from an individual female Chrysotoxum bicinctum (the two-banded wasp hoverfly; Arthropoda; Insecta; Diptera; Syriphidae). The genome sequence is 913 megabases in span. The majority of the assembly (98.81%) is scaffolded into five chromosomal pseudomolecules, with the X sex chromosome assembled.


Background
Chrysotoxum bicinctum, the two-banded wasp hoverfly, is one of Britain's most distinctive hoverflies. Its chocolatecoloured wing markings and bright yellow bars on the second and fourth abdominal segments make this fly unmistakeable in the field (Ball et al., 2015;van Veen, 2010). The genus Chrysotoxum are large wasp-mimic hoverflies with long, elegant antennae and consist of more than 110 species (Masetti et al., 2006). This wasp mimicry likely gives protection against predation by birds through batesian mimicry (Leavey et al., 2021). Across their flight period of May to September, this species is common across southern Britain but its abundance decreases with northerly latitude. C. bicinctum inhabits grassy meadows and open woodland rides feeding on a range of flowers but with a preference for composites and umbellifers (Ball et al., 2015). Very little is known about the larval biology of this hoverfly but it is thought that they feed upon the root aphids residing within the nests of Lasius niger ants (Speight, 1976). Observations of ovipositing behaviour include a female C. bicinctum repeatedly laying eggs about a Lasius ant nest (Rotheray & Gilbert, 1989). It is not known how the hoverfly larvae avoid predation by the ants who are usually highly protective of their root aphid charges. Potential avenues of research include pheromone mimicry of the aphids by the hoverfly larvae, or simple armour to negate the attacks of the ants. Insight into the biological life history of this distinctive hoverfly is currently severely lacking. It is hoped that with this production, for the first time, of a high quality Chrysotoxum bicinctum genome, generated as part of the Darwin Tree of Life project, will further aid understanding of the biology and ecology of this hoverfly.

Genome sequence report
The genome was sequenced from a single female C. bicinctum ( Figure 1) collected from Wytham Great Wood, Oxfordshire, UK (latitude 51.769, longitude -1.33). A total of 29-fold coverage in Pacific Biosciences single-molecule long reads and 37-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 60 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 326 missing/misjoins and removed 87 haplotypic duplications, reducing the assembly length by 3.27% and the scaffold number by 64.06%, and increasing the scaffold N50 by 644.27%.
The final assembly has a total length of 913 Mb in 92 sequence scaffolds with a scaffold N50 of 118 Mb (Table 1). The majority, 98.81%, of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 4 autosomes (numbered by sequence length), and the X sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 96.6% (single 95.5%, duplicated 1.1%) using the diptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A female (idChrBici1) C. bicinctum was collected from Wytham Great Wood, Oxfordshire, UK (latitude 51.769, longitude -1.33) by Will Hawkes, University of Exeter, who also identified the sample. A second sample of unknown sex (idChrBici2), was collected by Matt Smith from Hartslock Reserve, Oxfordshire, UK (latitude 51.511263, longitude -1.112222). The samples were collected using a net and snap-frozen on dry ice. DNA was extracted from the whole organism of idChrBici1 at the Wellcome Sanger Institute (WSI) Scientific Operations core from head/thorax tissue using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions. Following this, further DNA was extracted for a PacBio top-up. Tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple  and Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated from abdomen tissue of idChrBici1, and head and abdomen tissue of idChrBici2 using the Arima Hi-C+ kit and sequenced on HiSeq X (idChrBici1) and Illumina NovaSeq 6000 instruments (idChrBici2).   Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021) and annotated with MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.
The genome sequence is released openly for reuse. The C. bicinctum genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

Shu-Jun Wei
Institute of Plant Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China

Li-Jun Cao
Institute of Plant Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China This data note reports the genome of the two-banded wasp hoverfly Chrysotoxum bicinctum. The hoverflies are very interesting species of insects. Genomes of hoverflies are usually large and difficult to assemble. I am happy to see the publication of this hoverfly genome. The genome was assembled using long reads, linked reads, and Hi-C technologies. Data were well analyzed using available methods and software. The results will provide invaluable resources for hoverflies' studies and references for other relative species' genome assembly. I have some minor comments.
The genome was assembled into five pseudochromosomes with the X sex chromosome.
What is the karyotype of this species or other relative species? How did the authors make sure the identified X chromosome is correct? There is a lack of analysis of the X chromosome to confirm the identification.

1.
Sampling information was repeated in "Genome sequence report" and "Methods" sections. Remove one of the duplicated contents. 2.
I noticed that many figures of this manuscript were generated by BlobToolKit Pipeline and used directly. However, there are many format problems in these figures. I suggest the author revise these figures carefully. Some typical issues: Figure 3, change "gc" to "GC content", sum to "Sum", "total" to "Total", "no-hit" to "No-hit". The same problems were found in other figures where the first letters were not capitalized for the first word. Figure 5, length in Mb should be marked in the axes.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?

Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I work on insect genomics, population genetics, and pest control research. I used population genomics approaches to trace insects' origin, invasion routes, long-distance migration, and local adaptation to pesticide and environmental stresses.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.