The genome sequence of the ferruginous bee-grabber, Sicus ferrugineus (Linnaeus, 1761)

We present a genome assembly from an individual male Sicus ferrugineus (the ferruginous bee-grabber; Arthropoda; Insecta; Diptera; Conopidae). The genome sequence is 312 megabases in span. The majority of the assembly (99.67%) is scaffolded into 5 chromosomal pseudomolecules, with the X and Y sex chromosomes assembled. The complete mitochondrial genome was also assembled and is 16.9 kilobases in length.


Background
The ferruginous bee-grabber, Sicus ferrugineus (Linnaeus 1761), is the commonest and most abundant member of the dipteran family Conopidae (thick-headed flies) found in the British Isles (Conopid Recording Scheme of Britain & Ireland, personal communication). Widespread throughout Europe (Stuke, 2017), these enigmatic flies inhabit grassland, woodland, hedgerow and garden habitats. Often seen resting on and around flowering plants, on which they feed, adults hold their elongated abdomens curled under the body (Smith, 1969). At such food sources, S. ferrugineus females can readily be seen 'grabbing' several species of bumblebee (Bombus spp.) both in the air and on surfaces (Schmid-Hempel & Schmid-Hempel, 1996b). S. ferrugineus is an endoparasite of these bees, and these lunging 'grabs' are usually the point of egg delivery. Female S. ferrugineus bear specialised abdominal structures used to seize and inject a single egg into the body cavity of the target bee. These include the theca, a grasping structure under sternite 5. The theca is notably smaller in S. ferrugineus than that of the only other recorded British Sicus species, S. abdominalis (Smith, 1969), and is the primary morphological structure used for identifying species of this genus. S. ferrugineus eggs bear a hooked micropyle (Kotrba, 2011;Smith, 1969) and, once hatched, the resulting larva feeds on the haemolymph of the host, reaching pupal stage in around 11 days (Schmid-Hempel & Schmid-Hempel, 1996a). The high-quality S. ferrugineus reference genome presented here is the first full genome sequence of a conopid fly and presents a unique opportunity to better understand the fascinating parasitic ecology of this species.

Genome sequence report
The genome was sequenced from a single male S. ferrugineus collected from Wytham Great Wood, Oxfordshire (Biological vice-county: Berkshire), UK (latitude 51.770, longitude -1.339) (Figure 1). A total of 49-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 83-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 103 missing/misjoins, reducing the assembly size by 0.16% and the scaffold number by 65.88%, and increasing the scaffold N50 by 86.05%.
The final assembly has a total length of 312 Mb in 29 sequence scaffolds with a scaffold N50 of 44.9 Mb (Table 1). The majority, 99.67%, of the assembly sequence was assigned to 7 chromosomal-level scaffolds, representing 5 autosomes (numbered by sequence length), and the X and Y sex chromosomes (Figure 2- Figure 5; Table 2). Orientation and location of some pieces of heterochromatic repeat is less clear than for other regions of the assembly, particularly for heterochromatic regions in chromosome 2. The assembly contains many regions of pentameric repeat that cause problems with Hi-C mapping and are visible as drops in association.
The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 95.4% (single 94.3%, duplicated 1.0%) using the diptera_odb10 reference set (n=3285). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and DNA extraction
One male S. ferrugineus sample, idSicFerr1, was collected from Wytham Great Wood, Oxfordshire, (Biological vice-county: Berkshire), UK (latitude 51.770, longitude -1.339) by Liam Crowley, University of Oxford, on 15 June 2020. The specimen was caught in grassland with a net, identified by the same individual, snap-frozen on dry ice and stored using a CoolRack.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The idSicFerr1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. thorax/abdomen tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts. Fragment size analysis of   0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi) and Illumina HiSeq X (10X) instruments. Hi-C data were generated in the Tree of Life laboratory from head tissue of idSicFerr1 using the Arima v2 kit and sequenced on a HiSeq X instrument.

Nico Fuhrmann
Biogeography, University of Trier, Trier, Rhineland-Palatinate, Germany The article presents the first chromosome-level genome assembly for the ferruginous beegrabber, Sicus ferrugineus. It is the first publicly available genome sequence of an abundant member of the dipteran family Conopidae. Extracted from a single male individual, the authors chose to sequence the genome on the PacBio SEQUEL II and Illumina HiSeq X. The initial assembly was done with Hifiasm and followed by short-read polishing. Using Hi-C data for the genome scaffolding reduced the scaffold numbers significantly. The resulting assembly consists of five autosomes, the X-and Y-chromosome and the mitochondrion of the species. The quality analyses of the assembly indicates large scaffolds and 95% found complete dipteran BUSCOs. This highquality genome sequence is available on the nucleotide database ENA. A genome annotation is still missing. However, the authors mention that the genome will be annotated in the future and available through Ensembl.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes