The genome sequence of the large tortoiseshell, Nymphalis polychloros (Linnaeus, 1758)

We present a genome assembly from an individual female Nymphalis polychloros (the large tortoiseshell; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 398 megabases in span. The majority of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled.


Introduction
The large tortoiseshell, also known as the black-legged tortoiseshell or elm nymphalid, is a widespread but rare butterfly in woodlands across continental Europe, North Africa and Central Asia. Once common in England and Wales, N. polychloros went extinct in Southern Britain in the 1960s for unknown reasons and is currently classified as 'vulnerable' in several European countries (Maes et al., 2020). It is listed as Least Concern in the IUCN Red List Category (Europe) (van Swaay et al., 2010). However, recent sightings of a breeding colony in Dorset in 2021 suggest that this species is once again resident in the UK. It is morphologically very close to both the small tortoiseshell, Aglais urticae, and the scarce tortoiseshell, N. xanthomelas, in adult appearance. The species uses a wide variety of host plants such as Pyrus, Prunus, Salix, Ulmus, Crataegus, and others. It is univoltine and overwinters as an adult. (Lorković, 1941) reported a karyotype of 31 chromosomes and the genome size estimated for its relative, Aglais io, is 363.5 Mb (Mackintosh et al., 2019).

Genome sequence report
The genome was sequenced from a single female N. polychloros ( Figure 1) to 36-fold coverage in Pacific Biosciences singlemolecule long reads and 84-fold coverage in 10X Genomics read clouds. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected two missing/misjoins, reducing the scaffold number by 5.31%. The final assembly has a total length of 398 Mb in 38 sequence scaffolds with a scaffold N50 of 14 Mb (Table 1). Of the assembly sequence, 100% was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the W and Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.1.2 (Simão et al., 2015) completeness of 98.8% using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
The female N. polychloros specimen SC_NP_345 was collected using a net from Somiedo, Brana de Mumian, Asturias, Spain     Table 3 contains a list of all software tool versions used, where appropriate.
The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material; • Legality of collection, transfer and use (national and international).
Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators. The genome sequence is released openly for reuse. The N. polychloros genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases.The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1. Reviewer Expertise: the phylogeny and evolutionary history of some butterfly groups.

Acknowledgements
to provide details on its observed heterozygosity. Dealing with heterozygosity is a recurrent issue in genome assembly.
Similarly, chosen parameter values would be good to provide for all packages and softwares (such as hifiasm, freebase, etc), perhaps as an additional column for table 3. Those values are essential for reproducibility, but also very useful for people assembling similar genomes.
It would be interesting to provide details of the improvements allowed by the different steps (for instance the polishing step, by giving stats before and after). Again this would be useful for other users and generally for assembly of similar genomes. This could take the form of a table.
The note presents the generation of RNAseq data, which is great, but the data is not (yet) used for annotation. I wondered why include this in the methods if it is actually not analysed.
Status/justification: This species the large tortoiseshell is a fairly common though elusive butterfly in its predominantly continental European range. I understand that the Darwin tree of life effort is motivated by sequencing "British taxa" and the status of this species in Britain may have influenced its position on the priority list. However, the aim of a reference genome probably goes beyond that. Abundance and conservation status of taxa are very variable depending on how far from the range margin one stands! From a broader perspective, the large tortoiseshell is a forest species with a broad European distribution. It has a relatively poorly known ecology compared to closely related species. And interesting question marks remain regarding the origins and status of its genetic structure (vicariance, speciation?). Nymphalis as a genus also has unclear relationships with other genera such as Polygonia and Kaniska. Perhaps a reference genome could stimulate interesting research on those aspects which could make a better "justification" for sequencing it than the recent sighting of a colony in Dorset where the species is teetering on its range margin.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.