The genome sequence of the European peacock butterfly, Aglais io (Linnaeus, 1758)

We present a genome assembly from an individual male Aglais io (also known as Inachis io and Nymphalis io) (the European peacock; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 384 megabases in span. The majority (99.91%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 11,420 protein coding genes.


Introduction
The European peacock (Aglais io, synonyms include Inachis io and Nymphalis io) is a palearctic butterfly species. A. io is easily recognised by the large and colourful eyespots on its wings, which act as a defence against avian predators (Blest, 1957;Vallin et al., 2005). It is distributed from temperate Europe to Japan, with larvae feeding on nettles and hops (Urtica dioica, Urtica urens, and Humulus lupulus). It has recently (end of the 20th century) been introduced to Canada. It overwinters as an adult and it is generally considered as univoltine in the British Isles, although in the south it may display a partial second generation. In southern Europe it has two generations per year, and occasionally a partial third one. It is found throughout the British Isles, although rare in the Outer Hebrides, and has increased in both abundance and occurrence over the last 50 years (Fox et al., 2015). This species is listed as Least Concern in the IUCN Red List (Europe) (van Swaay et al., 2010). A. io has 31 pairs of chromosomes (Maeki & Makino, 1953;Maeki, 1953) and the female is heterogametic (WZ). Male genome size has been estimated at approximately 364Mb using flow cytometry (Mackintosh et al., 2019).

Genome sequence report
The genome was sequenced from a single male A. io ( Figure 1) collected from East Linton, East Lothian, Scotland, UK (latitude 55.977161, longitude -2.667545). A total of 64-fold coverage in Pacific Biosciences single-molecule long reads and 77-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 13 missing/misjoins and removed three haplotypic duplications, reducing the assembly length by 0.02% and the scaffold number by 16.33%, and increasing the scaffold N50 by 3.06%.
The final assembly has a total length of 384 Mb in 42 sequence scaffolds with a scaffold N50 of 13 Mb (Table 1). The majority, 99.91%, of the assembly sequence was assigned to 31 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the Z sex chromosome (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Simão et al., 2015) v5.1.2 completeness of 98.8% using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the A. io assembly (GCA_905147045.1; Table 1). The annotation was created primarily through alignment of transcriptomic data to the genome, with gap  filling via protein to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019) and OrthoDB (Kriventseva et al., 2008). Prediction tools, CPC2 (Kang et al., 2017) and RNAsamba (Camargo et al., 2020), were used to aid determination of protein coding genes.

Methods
The male A. io specimen SC_AI_1368 was collected from East Linton, East Lothian, Scotland, UK (latitude 55.977161, longitude -2.667545) by Konrad Lohse, University of Edinburgh, using a net. The specimen was snap-frozen in liquid nitrogen.     Table 3 contains a list of all software tool versions used, where appropriate.
The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material; • Legality of collection, transfer and use (national and international).
Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators.  The genome sequence is released openly for reuse. The A. io genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1. The article is well written and clearly describes how authors constructed the high-quality reference genomes of the peacock butterfly with various sequencing techniques. The genome was assembled into chromosomes and annotated for sharing among the scientific community. The only minor question I have is why not sequence the female which includes the W chromosome. And I also highly recommend sequencing female instead of male in future Darwin Tree of Life projects for Lepidoptera if possible. Sequencing females may lead to high-quality assembly of W chromosome, the important sex chromosome which is rarely included in the current Lepidoptera genome assemblies due to high repetitive content. However, the capacity of integration of multiple sequencing techniques by Darwin Tree of Life project has high chance to assemble W chromosome and make a big difference on sex chromosome research in Lepidoptera.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Canadian Food Inspection Agency, Ottawa, Canada
This note presents a near-chromosome level genome assembly for the butterfly Aglais io using long-read, read cloud, and chromosome conformation sequencing techniques, as well as annotation using RNAseq data. The data production and assembly steps were performed according to current best-practices, and the authors have done their due diligence in the reporting of data processing and assembly statistics, making this a valuable resource for genomic studies of A. io and other related species.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes