The genome sequence of the European water vole, Arvicola amphibius Linnaeus 1758

We present a genome assembly from an individual male Arvicola amphibius (the European water vole; Chordata; Mammalia; Rodentia; Cricetidae). The genome sequence is 2.30 gigabases in span. The majority of the assembly is scaffolded into 18 chromosomal pseudomolecules, including the X sex chromosome. Gene annotation of this assembly on Ensembl has identified 21,394 protein coding genes.


Species taxonomy Introduction
The European water vole, Arvicola amphibius Linnaeus 1758, is a small semi-aquatic mammal that lives on the banks of freshwater water courses and in wetlands. A. amphibius is native to Europe, west Asia, Russia and Kazakhstan. While the IUCN Red List of Threatened Species reports that A. amphibius is of "least concern" worldwide, populations in the United Kingdom have declined to such an extent that the species is considered nationally endangered (Mathews & Harrower, 2020) owing to habitat loss and predation by the American mink, Neovison vison, an invasive alien species. An estimate by Natural England put the 2018 UK population of A. amphibius at 132,000, down from 7.3 million in 1990 (Strachan, 2004). Water voles are absent from Ireland. There have been a number of conservation projects in the UK aimed at supporting populations of A. amphibius, including efforts at habitat restoration and to control the population of American mink (Bryce et al., 2011). There are also efforts to reintroduce the water vole in a number of restored urban and wild habitats. This genome sequence will be of use as a reference for researchers that wish to assess the population genomics of A. amphibius and manage reintroductions.

Genome sequence report
The genome was sequenced from a single male A. amphibius collected from the Wildwood Trust, Herne Common, Kent, UK. A total of 45-fold coverage in Pacific Biosciences singlemolecule long reads (N50 20 kb) and 52-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 155 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. The final assembly has a total length of 2.298 Gb in 216 sequence scaffolds with a scaffold N50 of 138.7 Mb (Table 1). The majority, 99.4%, of the assembly sequence was assigned to 19 chromosomal-level scaffolds, representing 17 autosomes (numbered by sequence length apart from chromosome 12, which is larger because the previous version of the assembly, mArvAmp1.1, mistakenly labelled this as two separate chromosomes), and the X sex chromosome (Figure 1-Figure 4; Table 2). The assembly has a BUSCO (Simao et al., 2015) v5.0.0 completeness of 96.1% using the mammalia_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Gene annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for an earlier version of the Arvicola amphibius assembly (GCA_903992535.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein to-genome     Illumina data by aligning to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using evidence from Bionano (using the Bionano Access viewer), using HiGlass and Pretext, and by taking marker data and inspecting 10X barcode overlap using longranger. Figure 1-Figure 3 were generated using BlobToolKit (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.

Institute of Evolutionary Biology (CSIC-UPF), Barcelona, Spain
This manuscript presents the annotated genome sequence of the European water vole (Arvicola amphibius), a small semiaquatic rodent distributed across Europe and Asia. Water vole populations of the genus Arvicola have a complex evolution with fossorial and semi-aquatic ecological types (ecotypes), thus this genome sequence can be very convenient to study ecological adaptations in rodents.
The report is well structured and clearly defined. However, there are some parts in the introduction that need to be better clarified.
The controversial taxonomic status of this genus, specifically between A. amphibius and its sister species A. scherman, and the complex genetic structure found in Great Britain is not properly assessed in the introduction. In addition, water voles of the genus Arvicola have broad ecological variability that should be better explained. Based on this, the species and the ecotype analyzed should be identified. In addition, the postglacial colonization events of water voles in the United Kingdom might be explained in more detail. All of these aspects will facilitate the future applications of the specimen sequenced for the conservation of the species.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?
Yes expertise to confirm that it is of an acceptable scientific standard.