The genome sequence of the merveille du jour, Griposia aprilina (Linnaeus, 1758)

We present a genome assembly from an individual Griposia aprilina (the merveille du jour; Arthropoda; Insecta; Lepidoptera; Noctuidae). The genome sequence is 720 megabases in span. The majority of the assembly (99.89%) is scaffolded into 32 chromosomal pseudomolecules with the W and Z sex chromosomes assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.


Background
The merveille du jour, Griposia aprilina (Linnaeus, 1758), is a species of moth belonging to the Xylenini tribe of the Noctuidae family. The species is generally common but rarely abundant; it is widespread across Europe, observed as far east as the Urals, the Caucasus, and Asia Minor, as well as the British Isles (except for the extreme north and parts of Ireland), with a distribution increasing in the UK since 1970 (Randle et al., 2019).
G. aprilina is one of the most charismatic noctuids and adults are beautifully camouflaged with black, green, and white markings that mimic lichens on bark. The Linnean species name aprilina is thought to refer to the colour of opening buds, or spring (Emmet, 1991). They prefer mature woodlands where larvae internally feed on flowers and leaves of oak trees (Quercus spp.). Adults are on the wing between September and October and feed at night on ivy blooms and berries. They overwinter as eggs on branches or within bark of the host plant ("Merveille Du Jour", n.d.).
The moth may possibly be sister to the recently discovered Griposia jahannamah (belonging to BIN BOLD:ACJ6462 on BOLD) from Iran (Fibiger et al., 2008). It is very narrowly divergent in COI-5P to others of its BIN BOLD:AAC3647, including G. wegneri, G. skyvai and G. bouveti (Huemer et al., 2019) and also G. pinkeri of Greece and the Middle East, all of which are lichen-camouflaged. The genus does not yet seem to have been included in modern molecular phylogenetic works and it would be interesting to trace the evolution of colour pattern traits once the sister taxon of Griposia is known.

Genome sequence report
The genome was sequenced from a single female G. aprilina collected from Wytham Woods, Berkshire, UK (Figure 1). A total of 31-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 59-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 26 missing/misjoins and removed four haplotypic duplications, reducing the assembly size by 0.17% and the scaffold number by 22.22%, and increasing the scaffold N50 by 4.16%.
The final assembly has a total length of 720 Mb in 42 sequence scaffolds with a scaffold N50 of 24.6 Mb ( Table 1). The majority, 99.89%, of the assembly sequence was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length) and the W and Z sex chromosomes (Figure 2- Figure 5; Table 2).  The assembly has a BUSCO v5.3.2 (Manni et al., 2021) completeness of 99.0% (single 98.4%, duplicated 0.6%) using the lepidoptera_odb10 reference set (n=5,286). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A single female G. aprilina specimen (ilGriApri1) was collected by using a light trap from Wytham Woods, Berkshire, UK (latitude 51.772, longitude -1.338) by Douglas Boyes (University of Oxford). The specimen was identified by Douglas Boyes and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The ilGriApri1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Thorax tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from the abdomen tissue of ilGriApri1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the  integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi), Illumina NovaSeq 6000 (10X) and Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated in the Tree of Life laboratory from head tissue of ilGriApri1 using the Arima v2 kit and sequenced on a NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_ dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019. The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). The genome sequence is released openly for reuse. The G. aprilina genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.