The genome sequence of Tachina fera Linnaeus, 1761, a tachinid fly [version 1; peer review: 1 approved, 1 approved with reservations]

We present a genome assembly from an individual female Tachina fera (Arthropoda; Insecta; Diptera; Tachinidae). The genome sequence is 752 megabases in span. The majority of the assembly (99.98%) is scaffolded into 6 chromosomal pseudomolecules, with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 17.4 kilobases in length


Background
Tachina fera (Linnaeus, 1761) is one of the most striking flies commonly encountered in the UK countryside. With adults ranging between 9 and 14 mm in length, it is an easily noticeable fly. Spiky bristles, characteristic of the Tachinidae, adorn a chestnut abdomen with a dark central stripe. Tachina fera is abundant across Europe, North Africa and Asia (Tschorsnig & Herting, 1994). In the UK, T. fera is bivoltine, with adults in flight from May to June, and from July to September (Belshaw, 1993). Adults feed at a range of flowers throughout the landscape. Tachina fera has mainly been recorded emerging from Noctuid moth caterpillars (Belshaw, 1993). The method of parasitism utilised by T. fera is notable as the egg is not placed into the host by the mother but laid pre-incubated onto leaves close to it. The larva, once hatched, will make its own way to the host, stimulated by vibration (Belshaw, 1993;Stireman et al., 2006). The parasitic nature of Tachinid species such as T. fera mean they are important, but underappreciated, regulators of insect herbivory in our ecosystem (Stireman et al., 2006), as well as playing important roles in pollination (e.g. Martel et al., 2021). The chromosome-level genome assembly presented here is, to our knowledge, the first high-quality resource developed for a Tachinid and represents a key step in understanding the complex ecology of these beautiful and spiky flies.

Genome sequence report
The genome was sequenced from a single female T. fera collected from Wytham Woods, Oxfordshire (Biological vice-county: Berkshire), UK (latitude 51.770, longitude -1.338) ( Figure 1). A total of 41-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 46-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 246 missing/misjoins and removed 60 haplotypic duplications, reducing the assembly size by 1.88% and the scaffold number by 94.81%, and increasing the scaffold N50 by 120.51%.
The final assembly has a total length of 752 Mb in 12 sequence scaffolds with a scaffold N50 of 142 Mb ( Table 1). The majority, 99.98%, of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 5 autosomes (numbered by sequence length), and the X sex chromosome (Figure 2- Figure 5; Table 2). The order and orientation of contigs within the centromere of chromosome 2 are not known. Lots   of apparent haplotypic duplication was excised from this region owing to a divergent Hi-C pattern and seeming low coverage (which was somewhat ambiguous due to read coverage levels in this repetitive region).
The assembly has a BUSCO v5.       (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3  The genome sequence is released openly for reuse. The T. fera genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.  1844). This is a bit problematic for the story as the females are difficult to tell apart with certainty. It is very unlikely that you would have happened to sample the rarer species, but you might need to make this reservation in the introduction. Figure 1 is quite poor when it comes to specimen details. For vouchering, I would recommend taking more detailed images. I would for example think (based on the shape of the abdomen and apparently narrow frons, although this is poorly visible) that this is a male specimen. Or can it be that the image is from the other specimen used for the RNA-seq?

Author information
Genome sequence report: Looks very good. To be sure of the specimen, you could check the sex by (most calyptrate flies follow XY-system of sex determination) by looking at the existence of the dominant maledetermining factor in your sequence data (e.g. Vicoso & Bachtrog, 2015 1 ). Was the X chromosome present as diploid or haploid sequence count?
However, I see no presentation of the RNA-seq data? How many transcripts, what coverage etc?

Sample acquisition & nucleic acids extraction:
Were the whole specimens destroyed in the DNA and RNA extractions or is there some reference tissue left? Where (and how) are these stored and how can they be located? If the reference specimens are still existing, describe all associated labels in detail. For later morphological analysis it would be great to preserve at least the abdomens and legs as a voucher. The voucher should be placed in a public collection. I am sure you have some established practice within DToL but it needs to be described here.
Also: "The specimen was caught in […]" -> The specimens were caught … (there was two).

RNA extraction:
Was there no poly-A purification? How was the rRNA depleted before the sequencing?

Suggestion for the future:
Depending on your RNA-seq results, it might make sense to extract RNA from the head+thorax (especially when there are more than one specimen). The RNA yield is often poor from the abdomen (on average) due to high levels of digestive enzymes, mass of gut content, fat etc. Also the head+thorax could give a better overview of the gene expression.