The genome sequence of the scotch argus butterfly, Erebia aethiops (Esper, 1777)

We present a genome assembly from an individual female Erebia aethiops (the scotch argus; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 473 megabases in span. The complete assembly is scaffolded into 20 chromosomal pseudomolecules, with the W and Z sex chromosomes assembled. The complete mitochondrial genome was also assembled and is 15.2 kilobases in length.


Background
The Scotch argus, Erebia aethiops (Esper, 1777) has a wide distribution in the Palaearctic from Scotland to western Siberia and the Altai Mountains (Wendt et al., 2021). Unlike most other Erebia species, E. aethiops occurs in the lowland and montane zone. The species was first described from Scotland as the subspecies E. aethiops caledonia (Newland, 2012), though this taxonomy now refers only to populations in the west and southwest of Scotland (Newland, 2012;Thomson, 1980). Populations in the north and southeast of Scotland belong to the nominate subspecies E. aethiops aethiops (Thomas & Lewington, 2016). While the two subspecies differ in their larval foodplant preference and wing morphology, with caledonia having narrower forewings and a narrower orange band, their taxonomic status remains disputed (Kirkland, 1995).
In general, E. aethiops prefers meadows near forested areas and open woodlands (Loertscher, 1991); (Slamova et al., 2011;Wendt et al., 2021) with evidence for sex-specific preference in meso-and microhabitat use (Slamova et al., 2011;Slamova et al., 2013). E. aethiops is univoltine, with hibernating larvae and an adult flight period from mid-July to mid-August. Larvae feed on a wide range of grasses, including Bromus erectus, Brachypodium pinnatum and, in the UK, Molinia caerulea and Sesleria caerulea (Slamova et al., 2013;Thomas & Lewington, 2016). The species may be vulnerable to anthropogenic habitat fragmentation (Slamova et al., 2013;Wendt et al., 2021). Although UK populations have seen declines and northward range shifts over the last decades (Franco et al., 2006) and E. aethiops is now listed as Vulnerable on the UK Red List (Fox et al., 2022), it is listed as a species of Least Concern on the IUCN Red List of Europe (van Swaay et al., 2010). The karyotype of E. aethiops was first described as consisting of 21 chromosomes based on a single individual from Croatia (Lorković, 1941). Although we do not know whether this chromosome count included a W, it is inconsistent with the 20 chromosomal scaffolds of this assembly (Table 2).

Genome sequence report
The genome was sequenced from a single female E. aethiops ( Figure 1) collected from Carrifran Wildwood, Scotland (latitude 55.4001, longitude -3.3352). A total of 35-fold coverage in Pacific Biosciences single-molecule circular consensus (HiFi) long reads and 61-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 30 missing/misjoins and removed two haplotypic duplications, reducing the assembly size by 0.04% and the scaffold number by 23.94%, and increasing the scaffold N50 by 8.42%.
The final assembly has a total length of 473 Mb in 54 sequence scaffolds with a scaffold N50 of 25.9 Mb ( Table 1). The complete assembly sequence was assigned to 20 chromosomallevel scaffolds, representing 18 autosomes (numbered by sequence length), and the W and Z sex chromosomes (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 98.5% (single

Project accession data
Assembly identifier ilEreAeth2.1

Genome assembly
Assembly accession GCA_923060345. 97.8% duplicated 0.7%) using the lepidoptera_odb10 reference set (n=5,286). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Sample acquisition and nucleic acid extraction
Two female E. aethiops specimens (ilEreAeth2, genome assembly; ilEreAeth1, additional HiFi and 10X reads) and a male (ilEreAeth3, Hi-C) were collected from Carrifran Wildwood, Scotland (latitude 55.4001, longitude -3.3352) using a net by Oskar and Konrad Lohse, who also identified the samples. Specimens were snap-frozen at -80°C.
DNA was extracted in the Tree of Life Laboratory at the Wellcome Sanger Institute. Whole organism tissue of ilEreAeth2 and ilEreAeth1 was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was   Biosciences SEQUEL II, Illumina HiSeq X (ilEreAeth1, 10X) and Illumina NovaSeq 6000 (ilEreAeth2, 10X) instruments. Hi-C data were also generated from remaining whole organism tissue of ilEreAeth3 using the Arima v2 Hi-C kit and sequenced on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using SALSA2 (Ghurye et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.