The genome sequence of the peach blossom moth, Thyatira batis (Linnaeus, 1758)

We present a genome assembly from an individual male Thyatira batis (the peach-blossom moth; Arthropoda; Insecta; Lepidoptera; Drepanidae). The genome sequence is 315 megabases in span. The majority of the assembly (99.68%) is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. The mitochondrial genome was also assembled and is 15.4 kilobases in length. Gene annotation of this assembly on Ensembl has identified 12,238 protein coding genes.


Introduction
Thyatira batis (peach-blossom) is one of the most striking moths in the UK, with the forewings marked with bright pink blotches resembling the petals of peach tree flowers. The species has been used as a model to study the effectiveness of disruptive coloration for predator avoidance (Schaefer & Stobbe, 2006). T. batis is common in woodland habitats in Britain and Ireland, and found across the palearctic, from Europe to Japan. The genome of T. batis was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all of the named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence for T. batis, based on one male specimen from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK.

Genome sequence report
The genome was sequenced from a single male T. batis collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.772, longitude -1.337) (Figure 1). A total of 29-fold coverage in Pacific Biosciences singlemolecule long reads (N50 13 kb) and 122-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 36 missing/misjoins and removed six haplotypic duplications, reducing the assembly size by 2.57% and scaffold number by 38.55%, and increasing the scaffold N50 by 8.70%.
The final assembly has a total length of 315 Mb in 52 sequence scaffolds with a scaffold N50 of 11 Mb (Table 1). Of the assembly sequence, 99.7% was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Simão et al., 2015) v5.1.2 completeness of 99.0% (single 98.7%, duplicated 0.3%, fragmented 0.3%, missing 0.8%) using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Gene annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Thyatira batis assembly (GCA_905147785.1, see https://rapid.ensembl.org/Thyatira_ batis_GCA_905147785.1/; Table 1). The annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019) and OrthoDB (Kriventseva et al., 2008). Prediction tools, CPC2 (Kang et al., 2017) and RNAsamba (Camargo et al., 2020), were used to aid determination of protein coding genes.

Methods
A male T. batis (ilThyBati1) and a second sample of unknown sex were collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.772, longitude -1.337) by Douglas Boyes in July 2019 (ilThyBati1) and July 2020 (ilThyBati2). The samples were snap-frozen on dry ice and stored using a CoolRack.
DNA was extracted from thorax/abdomen tissue of ilThyBati1 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions. RNA from thorax/abdomen tissue of ilThyBati1 and abdomen tissue of ilThyBati2 was extracted in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Poly(A)     Table 3 contains a list of all software tool versions used, where appropriate. The materials that have contributed to
The genome sequence is released openly for reuse. The T. batis genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.