The genome sequence of a snail-killing fly, Coremacera marginata (Fabricius, 1775)

We present a genome assembly from an individual female Coremacera marginata (Arthropoda; Insecta; Diptera; Sciomyzidae). The genome sequence is 980 megabases in span. The majority of the assembly (99.84%) is scaffolded into six chromosomal pseudomolecules, with the X sex chromosome assembled.


Background
Sciomyzidae (Diptera) are commonly known as snail-killing flies or marsh flies, the latter name reflecting the habitat preference of many species from this family. Coremacera marginata (Diptera, Sciomyzidae) is a dark grey-brown fly with a characteristic wing pattern consisting of a strongly infuscated wing margin (darker near the coastal vein) and dark brown base colour with numerous pale spots across the rest of the wing. The species is fairly common and widely distributed in England and Wales, in Scotland it has only been recorded from around the Moray and Dornoch Firths (Ball, 2017). It prefers open and dry habitats, particularly calcareous grasslands, also coastal dunes, open scrubby woods, old fields on woodland margins and is occasionally found in wetland habitats (Ball, 2017;Rozkošný, 1984). Flight period occurs from mid-May till beginning of October (Ball, 2017;Speight & Knutson, 2012).
Coremacera marginata is oviparous. The eggs are laid on or near the host. The larvae are parasitoids of terrestrial snails (Knutson, 1970;Rozkošný, 1984;Rozkošný, 1987), with a preference of Cochlicopa and Discus species in laboratory conditions (Knutson, 1973;Rozkošný, 1984). Upon hatching the larva feeds on a living snail. The host survives for up to ten days, unless infested with multiple larvae (up to 11 have been reported to attack a single snail), in which case death can occur within 24 hours (Knutson, 1973;Rozkošný, 1984). The larva continues to feed on the decomposing tissues until it reaches the second or third instar. It then moves to a second snail to continue feeding, killing the host in one to two days. Rarely, the larva will require a third snail to complete its development. Pupation occurs outside the shell. The larval stage lasts from 22 to 97 days with an average of 52 days, and the pupa from 47 to 124 days (Knutson, 1973;Rozkošný, 1984). This species overwinters as a mature larva or as a pupa (Ball, 2017;Speight & Knutson, 2012). Adults feed on flowers, dead insects and snails, and also on insect eggs and live snails' secretions (Berg & Knutson, 1978). First and third larval instars and the puparium have been described by Knutson (1973).
Coremacera marginata was split into two subspecies, Coremacera marginata marginata (Fabricius, 1775) and Coremacera marginata pontica, by Elberg (1968) based on paler specimens from southern European Russia and Iran. This was subsequently rejected by Knutson (1973) due to a lack of differentiating structural characters that would support the separation.
The high-quality genome sequence described here is the first one reported for Coremacera marginata and has been generated as part of the Darwin Tree of Life project. It will aid in understanding the biology, physiology and ecology of the species.

Genome sequence report
The genome was sequenced from a single female C. marginata ( Figure 1) collected from Wigmore Park, Luton, UK (latitude 51.88378, longitude -0.36861422). A total of 25-fold coverage in Pacific Biosciences single-molecule long reads and 33fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 617 missing/misjoins and removed 8 haplotypic duplications, reducing the assembly size by 0.18% and the scaffold number by 82.91%, and increasing the scaffold N50 by 268.04%.
The final assembly has a total length of 980 Mb in 60 sequence scaffolds with a scaffold N50 of 184.1 Mb ( Table 1). The majority, 99.84%, of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 5 autosomes (numbered by sequence length), and the X sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 97.2% (single 96.2%, duplicated 1.1%) using the diptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A female C. marginata (idCorMarg1) was collected from Wigmore Park, Luton, UK (latitude 51.88378, longitude -0.36861422) by Olga Sivell, Natural History Museum, and identified by Duncan Sivell, Natural History Museum based on Rozkošný (1984) and Ball (2017). The specimens were collected using a net and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The idCorMarg1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C and RNA sequencing. Thorax tissue was cryogenically disrupted  RNA was extracted from abdomen tissue in the Tree of Life Laboratory at the WSI using TRIzol (Invitrogen), according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis  from abdomen tissue of the same specimen using the Arima Hi-C+ kit and sequenced on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019). The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021),  which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). The genome sequence is released openly for reuse. The C. marginata genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases.
The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.