The genome sequence of the square-spot rustic, Xestia xanthographa (Schiffermuller, 1775)

We present a genome assembly from an individual male Xestia xanthographa (the square-spot rustic; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 934 megabases in span. The majority of the assembly (99.94%) is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled.


Background
Xestia xanthographa (square-spot rustic) is a widespread noctuid moth found across much of the Palearctic, Europe, North Africa and North America; its larvae are nocturnal feeders on various grasses. In the UK, adults are abundant in late summer from August to September and the species overwinters as a larva. Xestia xanthographa was a key species in a recent study revealing the detrimental effects of street-lighting on caterpillar abundance in the UK (Boyes et al., 2021). The species has also been recorded as a common prey species for autumn-flying bats (Razgour et al., 2011), and, as an adaptation to facilitate bat avoidance, the auditory sensitivity of X. xanthographa is broadly tuned with an optimal frequency of 30 kHz (Norman & Jones, 2008).
The genome of X. xanthographa, was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all of the named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence for X. xanthographa, based on one male specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from a single male X. xanthographa collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.337) (Figure 1). A total of 27-fold coverage in Pacific Biosciences single-molecule long reads (N50 12 kb) and 40-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 86 missing/misjoins and removed 17 haplotypic duplications, reducing the assembly size by 1.41% and scaffold number by 44.86%, and increasing the scaffold N50 by 1.61%.
The final assembly has a total length of 934 Mb in 59 sequence scaffolds with a scaffold N50 of 31 Mb (Table 1). Of the assembly sequence, 99.94% was assigned to 31 chromosomallevel scaffolds, representing 30 autosomes (numbered by sequence length), and the Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 98.7% using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A male X. xanthographa (ilXesXant1) and a second specimen of unknown sex (ilXesXant2) were collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.337) by Douglas Boyes, University of Oxford, using a light trap. The specimens were identified by the same individual and snapfrozen on dry ice. DNA was extracted from whole organism tissue of ilXesXant1 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen Figure 1. Image of the Xestia xanthographa specimens taken prior to preservation and processing. Above, ilXesXant, used for genome and Hi-C sequencing; below, ilXesXant2, used for RNA-Seq. and Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated from abdomen tissue of ilXesXant1 using the Arima v1.0 kit and sequenced on HiSeq X.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021). Haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes  and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Xin Liu
State Key Laboratory of Agricultural Genomics, BGI (Beijing Genomics Institute)-Shenzhen, Shenzhen, China The data note by Douglas et al. described the dataset of the assembled square-spot rustic genome. The manuscript clearly presented the methods used for sampling, sample preparation, sequencing, and data analysis, with the related statistics presented. The genome assembly is of good quality, with long scaffold N50 and anchored to chromosomes, making it valuable for later genome and other studies. I think the current manuscript is good enough to be published as a data note. Meanwhile, I have the following suggestions for the authors to consider: In the 'Genome sequence report' section on Page 5, the statement of BUSCO results might need to be modified. Interpretation of the BUSCO result should be like "98.7% of the BUSCOs in the lepidoptera_odb10 dataset were found to be complete", as you can also observe the other 1% to be either in multiple copies (duplicated), or fragmented. This reflected the completeness of the genome assembly.

1.
Also, in the part mentioned above, I would suggest providing some statistics (at least the length maybe) of the assembled second haplotype, which would be informative for understanding the heterozygosity in the genome.

2.
I found all the data notes in the same gateway directly used figures from the BlobToolkit Viewer. I would suggest revising the figures in the data note thus they would be a better fit. For example, the axis labels might be better to be revised in cases and the font sizes and the dataset label can also be removed. I don't know what the 'Scale' means in Figure 2, and I think Figure 3 and Figure 4 are quite difficult to understand, at least for me.

3.
Last, I would like to express my condolence on Douglas Boyes' death. After noting this from the manuscript and searched online, I would like to thank his contributions in providing this valuable dataset.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes