The genome sequence of the heath fritillary, Melitaea athalia (Rottemburg, 1775)

We present a genome assembly from an individual female Melitaea athalia (also known as Mellicta athalia; the heath fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 610 megabases in span. In total, 99.98% of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,824 protein coding genes.


Introduction
The heath fritillary, Melitaea athalia (also known as Mellicta athalia), is a medium-small sized butterfly found throughout the Palaearctic from western Europe to Japan. Historically, the species has been linked with the traditional practice of woodland coppicing, earning it the nickname of 'Woodman's Follower'. M. athalia is one of the UK's rarest butterflies and was on the brink of extinction during the 1970s, but conservation efforts have since helped to save the species (Warren, 1987). In the UK M. athalia is restricted to grasslands in Cornwall and Devon, heathland in Exmoor, and coppiced woodland in Kent and Essex (Tomlinson & Still, 2002) and is a species of principal importance under the Natural Environment and Rural Communities Act 2006. However, it is listed as Least Concern in the IUCN Red List (Europe) (van Swaay et al., 2010). Up to eight forms and subspecies are recognized in Europe (Tolman & Lewington, 1997). The taxon celadussa Fruhstorfer, 1910, originally described as a subspecies of athalia from southwestern Europe, is now recognized by many authors as a distinct parapatric species, with a contact zone extending from France to Austria where hybrids are found (Wiemers et al., 2018). Univoltine Fennoscandian and southern European alpine subspecies fly in single broods (June-July), whilst subalpine subspecies are bivoltine and fly during May-June and late July-August (Tolman & Lewington, 1997). Females of M. athalia lay eggs in batches on the underside of leaves of a wide range of herbaceus food plants, with caterpillars feeding, aestivating, and hibernating together in silk nests (Wahlberg, 2000). The standard haploid karyotype of M. athalia consists of 30 autosomes and one sex chromsome (Bátori et al., 2012), and the female is heterogametic (WZ).

Genome sequence report
The genome was sequenced from a single female M. athalia collected from Lupşa, Transylvania, Romania (latitude 46.416, longitude 23.192) (Figure 1). A total of 30-fold coverage in Pacific Biosciences single-molecule long reads (N50 16 kb) and 64-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 82 missing/misjoins and removed 19 haplotypic duplications, reducing the assembly size by 1.94% and scaffold number by 45.12%, and increasing the scaffold N50 by 7.20%.
The final assembly has a total length of 610 Mb in 46 sequence scaffolds with a scaffold N50 of 20 Mb (Table 1). Of the assembly sequence, 99.98% was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the W and Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Simão et al., 2015)  completeness of 98.6% (single 97.9%, duplicated 0.7%, fragmented 0.4%, missing 1.0%) using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Gene annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the Melitaea athalia assembly (GCA_905220545.1, see https://rapid.ensembl.org/Mellicta_ athalia_GCA_905220545.1/; Table 1). The annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019)) and OrthoDB (Kriventseva et al., 2008). Prediction tools, CPC2 (Kang et al., 2017) and RNAsamba (Camargo et al., 2020), were used to aid determination of protein coding genes.  DNA was extracted from the whole organism of ilMelAtha1 using the Qiagen MagAttract HMW DNA kit, according to    Table 3 contains a list of all software tool versions used, where appropriate.

Sample acquisition, nucleic acid extraction and sequencing
Ethical/compliance issues The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material; • Legality of collection, transfer and use (national and international).
The genome sequence is released openly for reuse. The M. athalia genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.