The genome sequence of the small elephant hawk moth, Deilephila porcellus (Linnaeus, 1758)

We present a genome assembly from an individual male Deilephila porcellus (the small elephant hawk moth; Arthropoda; Insecta; Lepidoptera; Sphingidae). The genome sequence is 402 megabases in span. The majority of the assembly (99.99%) is scaffolded into 29 chromosomal pseudomolecules, with the Z sex chromosome assembled.


Background
Deilephila porcellus (small elephant hawk-moth) is characterised by striking pink and sand markings and is distributed across Europe, reaching as far East as China. Often confused with Deilephila elpenor (elephant hawk-moth), Deilephila porcellus can be identified most easily by a slightly smaller wingspan (40-45mm), brighter colouration and lack of the longitudinal pink abdominal stripe, typical of D. elpenor.
Deilephila porcellus is widespread throughout Britain, of rather local distribution in Southern England and Wales and scarce in Scotland and Northern England. This species flies from May to July and can be found in a range of open habitats including grassland, heathland, sand dunes and shingle beaches (Waring et al., 2017). Adults are generalists, nocturnally feeding on the nectar of numerous flowering plants, including Rhododendron and Honeysuckle. Orchids are frequently visited for nectar; the relative frequency of different hawk-moth pollinators, with their differing proboscis lengths, has been shown to select for different spur lengths in the lesser butterfly orchid (Platanthera bifolia). in open areas in Sweden, the relatively short-tongued Deilephila porcellus is the most frequent pollinator and the orchid's spurs are correspondingly short when compared to woodland populations, mainly pollinated by the long-tongued Sphinx ligustri (Boberg et al., 2014). Caterpillars, which primarily feed on bedstraws (Galium), emerge from June to September, and vary in colouration from brown to grey-green with large eyespots situated towards the anterior end. Functionally, eyespots and behaviour act to deter avian predation; when threatened, larvae widen anterior segments of the body, adopting defensive postures thought to mimic snakes, thus reducing incidence of attacks (Hossie & Sherratt, 2013;Poulton, 1890). The full lifecycle takes one year to complete, with pupae over-wintering in cocoons beneath larval food plants or just below the surface of the leaf litter.
Here we present a genome sequence for D. porcellus, generated as part of the Darwin Tree of Life Project.

Genome sequence report
The genome was sequenced from a single male D. porcellus ( Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.337). A total of 40-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 92-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 4 missing/misjoins and removed 1 haplotypic duplications, reducing the scaffold number by 9.09%.
The final assembly has a total length of 402 Mb in 30 sequence scaffolds with a scaffold N50 of 15.1 Mb (Table 1). Of the assembly sequence, 99.99% was assigned to 29 chromosomal-level scaffolds, representing 28 autosomes (numbered by sequence length), and the Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 98.8% (single, 98.5%, duplicated 0.2%) using the lepidoptera_odb10 reference set (n=5286). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
A male D. porcellus (ilDeiPorc1) was collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.337) by Douglas Boyes, University of Oxford, using a light trap. The specimens were identified by the same individual and snap-frozen on dry ice.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The ilDeiPorc1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Abdomen tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible  Illumina HiSeq X (10X) instruments. Hi-C data were generated from head/thorax tissue of ilDeiPorc1 using the Arima v2 kit and sequenced on HiSeq X.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021). Haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing  Table 3 contains a list of all software tool versions used, where appropriate.