The genome sequence of the buff-tip, Phalera bucephala (Linnaeus, 1758) [version 1; peer review: awaiting peer review]

We present a genome assembly from an individual female Phalera bucephala (the buff-tip; Arthropoda; Insecta; Lepidoptera; Notodontidae). The genome sequence is 933 megabases in span. The majority of the assembly, 99.27%, is scaffolded into 31 chromosomal pseudomolecules, with the W and Z sex chromosome assembled.


Background
Phalera bucephala (buff-tip) exhibits one of the most striking examples of camouflage amongst UK moths: the yellow-tipped forewings held tent-like along the body give the convincing appearance of a broken birch twig. The moth is nocturnal and found across the UK, mainland Europe and parts of Asia. The larvae are polyphagous, feeding on the leaves of several deciduous trees including birch, beech and oak. Ford (1967) comments that the larvae can produce a pungent smell, presumably as a defence mechanism. The species can become a transient pest; for example, defoliating trees along the Maidenhead bypass in the UK in the 1970s (Port & Thompson, 1980) and apple trees in Lithuania (Molis, 1970). The species has also been used in studies to assess the effect of multiple stressors (herbivores, powdery mildew and aphids) on oak trees, revealing complex plant-pathogen-insect interactions (van Dijk et al., 2020).
The genome of P. bucephala, was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all of the named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence for P. bucephala, based on one female specimen from Wytham Woods, Oxfordshire, UK.

Genome sequence report
The genome was sequenced from a single female P. bucephala ( Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.764, longitude -1.327). A total of 34-fold coverage in Pacific Biosciences single-molecule circular consensus HiFi long reads (N50 15 kb) and 51-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 155 missing/misjoins and removed 4 haplotypic duplications, reducing the assembly size by 0.22% and scaffold number by 45.28%, and increasing the scaffold N50 by 40.20%.
The final assembly has a total length of 933 Mb in 116 sequence scaffolds with a scaffold N50 of 34 Mb (Table 1). Of the assembly sequence, 99.27% was assigned to 31 chromosomal-level scaffolds, representing 29 autosomes (numbered by sequence length), and the W and Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 98.9% (single 97.8%, duplicated 1.0%) using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited. Figure 1. Image of the Phalera bucephala specimens taken prior to preservation and processing. Above, ilPhaBuce1, used for genome and Hi-C sequencing; below, ilPhaBuce2, used for RNA-Seq.

Methods
Sample acquisition and nucleic acid extraction A female P. bucephala (ilPhaBuce1) and a second specimen of unknown sex (ilPhaBuce2) were collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.764, longitude -1.327) by Douglas Boyes, UKCEH, using a net. The samples were identified by the same individual and snap-frozen on dry ice. DNA was extracted from whole organism tissue of ilPhaBuce1 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer's instructions. RNA was extracted from thorax/abdomen tissue of ilPhaBuce2 in the Tree of Life Laboratory at the WSI using TRIzol (Invitrogen), according to the manufacturer's instructions.
RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers' instructions. Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II (HiFi), Illumina HiSeq X (10X) and Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated from head tissue using the Qiagen EpiTect Hi-C kit and sequenced on HiSeq X.

Genome assembly
Assembly was carried out with HiCanu (Nurk et al., 2020). Haplotypic duplication was identified and removed with purge_dups   (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data  (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.