The genome sequence of the broad-bordered yellow underwing, Noctua fimbriata (Schreber, 1759) [version 1; peer review: awaiting peer review]

We present a genome assembly from an individual female Noctua fimbriata (the broad-bordered yellow underwing; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 574 megabases in span. The complete assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosomes assembled.


Background
Noctua fimbriata (broad-bordered yellow underwing) is a common noctuid moth with marked sexual dimorphism in its wing colouration: females are orange/buff coloured, whereas males are darker brown. There is also variation between individuals in wing colour and pattern, but only on the upper surface of the forewing; (Owen & Whiteley, 1989) have pointed out that restriction to the visual surface is consistent with polymorphism maintained by frequency-dependent selection by predators. The species is found across Europe and western parts of Asia; it occurs throughout the UK, but is less common in the north of England and in Scotland. Larvae of Noctua fimbriata are polyphagous, feeding on many species of herbaceous plant as well as low-growing trees and shrubs; the species in common in woodlands and also often recorded in gardens. N. fimbriata has an unusual flight period in the UK with adults emerging in July, then undergoing a summer aestivation period before a second flight period in late August and September (Randle et al., 2019); the adaptive significance of the aestivation period is unclear.
The genome of N. fimbriata was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all of the named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence for N. fimbriata, based on one female specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from a single female N. fimbriata ( Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.338). A total of 20-fold coverage in Pacific Biosciences single-molecule long reads (N50 16 kb) and 73-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 222 missing/misjoins and removed 9 haplotypic duplications, reducing the assembly length by 0.37% and the scaffold number by 69.09%, and increasing the scaffold N50 by 41.75%. The final assembly has a total length of 574 Mb in 51 sequence scaffolds with a scaffold N50 of 19.0 Mb (Table 1). Of the assembly sequence, 100% was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the W and Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Manni et al., 2021) completeness of 99.0% using thelepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
Sample acquisition, DNA extraction and sequencing A single female N. fimbriata (ilNocFimb1) was collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.338) by Peter Holland, University of Oxford, and identified by the same individual. The specimen was found alive in a rain puddle in daytime and preserved on dry ice prior to transfer to the Wellcome Sanger Institute.
DNA was extracted from thorax/abdomen tissue at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions. Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated from remaining thorax/abdomen tissue using the Arima Hi-C+ kit in the Sanger Tree of Life core laboratory and sequenced on HiSeq X.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016)  as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performed annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores  Table 3 contains a list of all software tool versions used, where appropriate.

Ethics/compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner. The submission
The genome sequence is released openly for reuse. The N. fimbriata genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.