The genome sequence of the plain-faced dronefly, Eristalis arbustorum (Linnaeus, 1758)

We present a genome assembly from an individual female Eristalis arbustorum (the plain-faced dronefly; Arthropoda; Insecta; Diptera; Syriphidae). The genome sequence is 451 megabases in span. The majority of the assembly (94.71%) is scaffolded into 6 chromosomal pseudomolecules, with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 16.0 kilobases in length.


Background
The plain-faced drone fly, Eristalis arbustorum, is a smaller member of the Eristalis genus.Both sexes lack an obvious median stripe down the centre of their face (which E. nemorum has) and have more plumose arista and darker tips of the mid tibiae than E. abusiva (van Veen, 2010).Like others in the Eristalis genus, E. arbustorum has large pale patches either side of its abdomen and is a batesian mimic of the honeybee Apis mellifera to gain protection from predators such as birds.This mimicry is not limited solely to visual similarities but also behavioural and acoustic similarities (Golding & Edmunds, 2000;Moore & Hassall, 2016).E. arbustorum is widespread and common in the UK, and can be found in a variety of open habitats feeding on a range of flowers including Apiaceae and Asteraceae, making it an important generalist pollinator (Ball et al., 2015;Doyle et al., 2020).The larvae of E. arbustorum are colloquially known as rat-tailed maggots and feed on decaying organic matter in organically rich pools (Ball et al., 2015;van Veen, 2010).Interestingly, as in E. tenax, reproduction by larval stages (paedogenesis) has been suggested to occur (Achterkamp et al., 2000) and E. arbustorum, along with other other Eristalis flies, play a highly important ecological role in terms of decomposition (Hurtado et al., 2008).E. arbustorum has also been used as a model for the plastic response to temperature on traits such as colour pattern and wing size and has featured in studies of fine scale population structure (Francuski et al., 2020;Ottenheim et al., 1998).This is the first production of a high quality E. arbustorum genome and we believe that the sequence described here, generated as part of the Darwin Tree of Life project, will further aid understanding of the biology and ecology of this hoverfly.

Genome sequence report
The genome was sequenced from a single female E. arbustorum collected from Wytham Great Wood, Oxfordshire, UK (latitude 51.769, longitude -1.330) (Figure 1).A total of 19-fold coverage in Pacific Biosciences single-molecule long reads and 87-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 370 missing/misjoins and removed 5 haplotypic duplications, reducing the assembly size by 0.11% and scaffold number by 33.59%, and increasing the scaffold N50 by 244.23%.
The final assembly has a total length of 451 Mb in 257 sequence scaffolds with a scaffold N50 of 77.5 Mb (Table 1).The majority, 94.71%, of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 5 autosomes (numbered by sequence length), and the X sex chromosome (Figure 2-Figure 5; Table 2).Based on published karyotype potential micro-chromosomes have not been recovered  in the curated assembly (Rozek et al., 1995) The assembly has a BUSCO v5.1.2(Manni et al., 2021) completeness of 96.5% (single 95.9%, duplicated 0.7%) using the diptera_odb10 reference set.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
Two female E. arbustorum samples, idEriArbu1 and idEriArbu2, were collected from Wytham Great Wood, Oxfordshire, UK (latitude 51.772, longitude -1.339) by Will Hawkes, University of Exeter on 7 (idEriArbu1) and 8 (idEriArbu2) August 2019.The specimens were caught with a net, snap-frozen on dry ice and stored using a CoolRack.
DNA was extracted from the head/thorax of idEriArbu1 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions.RNA from abdomen tissue of idEriArbu2 was extracted in the Tree of Life Laboratory at the WSI using TRIzol, according to the  manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq X (10X) and Illumina HiSeq 4000 (RNA-Seq) instruments.Hi-C data were generated from remaining head/thorax tissue of idEriArbu1 using the Arima v2 Hi-C kit in the Tree of Life laboratory and sequenced at the Scientific Operations core on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth 2012).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019).The assembly was checked for contamination as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext.The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020).The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020).Table 3 contains a list of all software tool versions used, where appropriate.

Abby Davis
The University of New England, NSW, Australia This data report assembles the genome of the fly Eristalis arbustorum, a syrphid fly capable of providing dual ecosystem services to the environment.The authors describe how the fly representatives were acquired, how the DNA and RNA of the specimens were extracted, and how these data were sequenced.The genome data supplied here will be important to syrphid fly phylogenetics and taxonomic research.
The rationale for creating the dataset is clearly described in the Background, the protocols for constructing this dataset are appropriate, the authors have listed sufficient details of methods and materials to allow replication by others, and the raw datasets have been deposited in the INSDC databases with the genome sequence of E. arbustorum free to use.

Zhao Le
School of Biological Sciences and Engineering, Shaanxi University of Technology, Hanzhong, China The manuscript 'The genome sequence of the plain-faced dronefly, Eristalis arbustorum (Linnaeus, 1758)' applies the Hifi long read technology to assemble a species of syrphid flies, which have overwhelming importance for ecological stability and agricultural economy.To date, the available high-quality genome data of syrphids are limited, this newly generated data will contribute to the syrphid research.
I have some comments: Although this is a paper aimed at the data report, the descriptions of results are too short for understanding.The authors presented several figures and tables in the paper, but almost none of them have been well explained and discussed in the current version manuscript.For readers, the meaning of each table and figures are quite confused.At least, the authors need to discuss some important parameters and results shown in each plot or table respectively to reflect the quality of this assembly version, or if there are some other available previous published genomes of closed species, it is worth doing a comparison.

○
Overall, it feels like an unfinished manuscript and quite a lot of content needs to be filled.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Syrphidae, genome, evolution I confirm that I have read this submission and believe that I have an appropriate level of

Figure 1 .
Figure 1.Image of the Eristalis arbustorum specimen used in genome sequencing taken during preservation and processing.

Figure 2 .
Figure 2. Genome assembly of Eristalis arbustorum, idEriArbu1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 451,042,988 bp assembly.The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (115,465,858 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 chromosome lengths (78,272,982 and 66,425,613 bp), respectively.The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idEriArbu1.1/dataset/CAKAIZ01/snail.

Figure 5 .
Figure 5. Genome assembly of Eristalis arbustorum, idEriArbu1.1:Hi-C contact map.Hi-C contact map of the idEriArbu1.1 assembly, visualised in HiGlass.Chromosomes are arranged in size order from left to right and top to bottom.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes
Reviewer Expertise: Syrphidae, Diptera taxonomy I confirm that I

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.