The genome sequence of the harlequin ladybird, Harmonia axyridis (Pallas, 1773) [version 1; peer review: awaiting peer review]

We present a genome assembly from an individual female Harmonia axyridis (the harlequin ladybird; Arthropoda; Insecta; Coleoptera; Coccinellidae). The genome sequence is 426 megabases in span. The majority (99.98%) of the assembly is scaffolded into 8 chromosomal pseudomolecules, with the X sex chromosome assembled.


Background
The harlequin ladybird, Harmonia axyridis, is large (5-8 mm), voracious ladybird species widely considered to be one of the world's most invasive insects. Its native range is central and eastern Asia, but it was introduced to North America and Europe as a biocontrol agent. It has spread rapidly and is now established across North, Central and South America, Europe and Africa. Examination of microsatellites has demonstrated that an invasive population in eastern North America acted as the source of those which invaded Europe, South America and South Africa (Lombaert et al., 2010). The Harlequin ladybird was first recorded in the UK in 2003 in southeastern England. Since its arrival it spread rapidly and is now widespread across the UK, and has been recorded on Ireland, Orkney, Shetland, the Channel Islands, the Isles of Scilly and the Isle of Man. It is a highly polymorphic species with several recognised forms. The colour of the elytra ranges from yellow, orange, red or black, with 0-21 black spots, 4 or 2 red/orange spots. The legs are always brown and the underside is dark with a reddish/brown border. The harlequin ladybird is a generalist, feeding on aphids as well as soft fruit, pollen, nectar and many other soft-bodied insects, including other ladybird larvae. It overwinters as an adult and is often found in buildings where aggregations of adults form. The haemolymph of this species contains high concentrations of isopropyl methoxy pyrazine (Al Abassi et al., 1998) andharmonine (Nagel et al., 2015) and it readily autohaemorrhages when agitated. The defensive secretions have a foul odour and can cause staining. Furthermore, it is also known to bite humans (Ramsey & Losey, 2012), leading to this species' consideration as a minor household pest. The spread of the harlequin ladybird is associated with dramatic declines in other, native ladybird species. This is believed to be driven by Harmonia axyridis outcompeting other aphidophagous species as well as intraguild predation (Majerus et al., 2006).

Genome sequence report
The genome was sequenced from one female H. axyridis collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.772, longitude -1.338) ( Figure 1). A total of 53-fold coverage in Pacific Biosciences single-molecule long reads and 93-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 158 missing/misjoins, reducing the assembly length by 1.32% and the scaffold number by 92.49%, and increasing the scaffold N50 by 56.15%.
The final assembly has a total length of 249 Mb in 39 sequence scaffolds with a scaffold N50 of 37.2 Mb (Table 1). The  majority, 99.96%, of the assembly sequence was assigned to 10 chromosomal-level scaffolds, representing 8 autosomes (numbered by sequence length), and the X and Y sex chromosome (Figure 2- Figure 5; Table 2). Some scaffolds remain unplaced due to repetitive content giving an ambiguous Hi-C signal. A large cluster of rDNA sequences was placed on the X chromosome using Hi-C data only. The assembly has a BUSCO v5.  instructions. Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated from the whole organism using the Arima v2 Hi-C kit and sequenced on a HiSeq X instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass   Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano- Silva et al., 2021). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Ethics/compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner. The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice. By agreeing with and signing up to the Sampling Code of