The genome sequence of the large burdock Cheilosia, Cheilosia vulpina (Meigen, 1822)

We present a genome assembly from an individual female Cheilosia vulpina (the large burdock Cheilosia or stocky blacklet; Arthropoda; Insecta; Diptera; Syriphidae). The genome sequence is 913 megabases in span. The majority of the assembly (98.81%) is scaffolded into sixchromosomal pseudomolecules, with the X sex chromosome assembled.


Background
Cheilosia vulpina (the large burdock Cheilosia or the stocky blacklet) is a stocky, medium-large Cheilosia with outstanding hairs on the face. The males can be distinguished from other large hairy-faced Cheilosia by the longer body hairs, especially the scutellar marginals and hairs around the margins of the abdomen (much shorter in C. lasiopa) and pale bases to the tibiae (dark in the much longer-winged C. variabilis). The females have very noticeable 'fasciae' of pale hairs across the tergites and look like large, well-marked specimens of C. proxima or the scarcer C. velutina in the field (neither of which have outstanding facial hairs).
In the UK, C. vulpina is widespread though localised to southern England and seems to be most frequent in calcareous grasslands. It has been reared from the roots of greater and lesser burdocks in Germany (the probable foodplants in Britain) and also globe artichoke outside of the UK. There are two generations a year, and the spring brood averages larger and hairier than the summer one. The summer generation was once regarded as a separate species, C. conops. Adults are particularly keen on the flowers of umbellifers, including cow parsley in spring, and angelica, wild parsnip, and upright hedge-parsley in summer.

Genome sequence report
The genome was sequenced from a single female C. vulpina ( Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.331). A total of 51-fold coverage in Pacific Biosciences single-molecule long reads and 120-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 11 missing/misjoins, reducing the assembly length by 0.11% and the scaffold number by 33.33%, and increasing the scaffold N50 by 52.44%.
The final assembly has a total length of 405 Mb in 20 sequence scaffolds with a scaffold N50 of 69.4 Mb ( Table 1). The majority, 99.92%, of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 5 autosomes (numbered by sequence length), and the X sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO completeness of 97.1% (single 96.7%, duplicated 0.5%) using the diptera_odb10  reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
Sample acquisition and nucleic acid extraction A female C. vulpina (idCheVulp2) and a second C. vulpina of unknown sex (idCheVulp1) were collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.77, longitude -1.331) by Steven Falk, independent researcher, who also identified the specimens. The specimens were collected using a net and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The idCheVulp2 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Abdomen tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solidphase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina NovaSeq 6000 instruments. Hi-C data were generated from head tissue of idChrBici1 and idCheVulp2 using the Arima Hi-C+ kit and sequenced on a NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with     The genome sequence is released openly for reuse. The C. vulpina genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

Ljiljana Šašić Zorić
BioSense Institute, Novi Sad, Serbia "The genome sequence of the large burdock Cheilosia, Cheilosia vulpina (Meigen, 1822)" is presented in the form of a technical report on the genome assembly of hoverfly species Cheilosia vulpina. The article is an important contribution to further research on hoverflies being known as the second most important pollinators among insects. It describes technical aspects of genome assembly clearly and concisely. The applied approach includes PacBio and 10X sequencing data, with scaffolding using Hi-C data. The methodology is suitable and overall well described, although some minor inconsistencies should be resolved before final publication.

Suggestions for revision: Abstract
The genome assembly total length and per cent of the assembly sequence assigned to 6 chromosomal-level scaffolds (913M and 98.81%) are not in agreement with the main text (405M and 99.92%).
○ "sixchromosomal" should be written separately "six chromosomal" ○ Genome sequence report It would be good if the caption for Figure 1 refers to the name of the species, not only ID. I would suggest editing the figure caption in the following way: "Image of the sequenced Cheilosia vulpina (idCheVulp2) specimen…" ○ In the second paragraph, it is stated that the final assembly has a total length of 405 Mb in 20 sequence scaffolds. The same is stated in Table 1. However, based on Figure 2 and results available at https://blobtoolkit.genomehubs.org/view/idCheVulp2.1/dataset/CAKAIU01/snail it seems that there are 21 scaffolds.

Methods
It is stated that the gender of specimen idCheVulp1 is not known. What is the reason not to determine gender? Is the specimen damaged? Please state this.

○
There is inconsistency in the text regarding the use of idCheVulp1 specimen in analysis. Although based on Table 1 it is used for Hi-C, only for idCheVulp2 sample is stated that tissue was set aside for Hi-C sequencing. Additionally, it is written that Hi-C data were generated from the head tissue of idChrBici1 and idCheVulp2, but there is not any additional information for specimen idChrBici1.

○
There is no information on parameters used in data analysis. If the default values were used please state that clearly or, in case the data analysis follows the procedure previously described in detail, please refer to the source.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format? Yes of expertise to confirm that it is of an acceptable scientific standard.