The genome sequence of the common pipistrelle, Pipistrellus pipistrellus Schreber 1774 [version 1; peer review: 2 approved with reservations]

We present a genome assembly from an individual female Pipistrellus pipistrellus (the common pipistrelle; Chordata; Mammalia; Chiroptera; Vespertilionidae). The genome sequence is 1.76 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal pseudomolecules, with the X sex chromosome assembled.


Species taxonomy Introduction
The common pipistrelle, Pipistrellus pipistrellus, is a small species of bat with a range that extends across Europe and into Central Asia and North Africa (Boston et al., 2014). It is one of the most common species of bat in the United Kingdom and Ireland, where it is considered to be of least concern on the Mammal Society's Red List of extinction risk (Mathews & Harrower, 2020). Despite a decline in roost count, the population of common pipistrelles has increased since 1999 (Bat Conservation Trust, 2020). P. pipistrellus roosts in trees and buildings, emerging at dusk to feed on small flying insects, using laryngeal echolocation to orient and locate prey.
Originally thought to be a single species, it was not until fairly recently that the cryptic species P. pipistrellus and Pipistrellus pygmaeus (the soprano pipistrelle) were confirmed to be distinct (Barlow & Jones, 1999), although differences between the two species had been noted previously (Jones & van Parijs, 1993). The two species appear morphologically similar, but exhibit differences in their echolocation call peak frequency: P. pipistrellus (~45 kHz) and P. pygmaeus (~55 kHz). They exhibit small but significant genetic differences although hybridization between the two species has been observed (Sztencel-Jabłonka & Bogdanowicz, 2012).
This genome sequence will be of utility to researchers that wish to examine in depth the genetic differences between P. pipistrellus and its cryptic partner P. pygmaeus that underlie the small but significant divergence between the species.

Genome sequence report
The genome was sequenced from a single female P. pipistrellus collected from Potten End, Berkhamsted, Hertfordshire, UK. A total of 56-fold coverage in Pacific Biosciences singlemolecule long reads (N50 19 kb) and 34-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 31 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 408 missing/misjoins and removed 19 haplotypic duplications, reducing the scaffold number by 35.5%, increasing the scaffold N50 by 126.2% and decreasing the assembly length by 0.1%. The final assembly has a total length of 2.88 Gb in 323 sequence scaffolds with a scaffold N50 of 94.9 Mb (Table 1). The majority, 98.8%, of the assembly sequence was assigned to 22 chromosomal-level scaffolds representing 21 autosomes (numbered by sequence length), and the X sex chromosome (Figure 1-Figure 4; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 89.6% using the mammalia_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Methods
The common pipistrelle specimen was a female individual found at a residential address in Potten End, Berkhamsted, Hertfordshire, UK. The animal had died during renovation of a private home during works licensed by Natural England under the Bat Low Impact Class License, WML-CL21.
DNA was extracted using an agarose plug extraction from spleen tissue following the Bionano Prep Animal Tissue DNA Isolation Soft Tissue Protocol. Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Hi-C data were generated using the Arima Hi-C kit from a separate tissue sample (mPipPip2) taken from the same animal. Sequencing was performed by the Scientific Operations DNA Pipelines at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL I (long read) and Illumina HiSeq X (10X, Hi-C) instruments.
Assembly was carried out following the Vertebrate Genome Project pipeline v1.6 (Rhie et al., 2020) with Falcon-unzip (Chin et al., 2016), haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x (see Table 3 for software versions and sources). Scaffolding with Hi-C data (Rao et al., 2014) was carried out with SALSA2 (Ghurye et al., 2019). The Hi-C scaffolded assembly was polished with arrow using the PacBio data, then polished with the 10X Genomics Illumina data by aligning    et al., 2020) to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The assembly was checked for contamination and corrected. Manual curation was performed as described previously (Howe et al., 2021) using the gEVAL system (Chow et al., 2016), Bionano Access, HiGlass and Pretext. Figure 1-Figure 3 were generated using BlobToolKit (Challis et al., 2020).

Data availability
Underlying data European Nucleotide Archive: mPipPip1 (common pipistrelle), Accession number PRJEB39564: https://www.ebi.ac.uk/ena/ browser/view/PRJEB39566 The genome sequence is released openly for reuse. The P. pipistrellus genome sequencing initiative is part of the Wellcome Sanger Institute's "25 genomes for 25 years" project. It is also part of the Vertebrate Genomes Project (VGP) ordinal references programme, the Darwin Tree of Life (DToL) project, and Bat1K. All raw sequence data and the assembly have been deposited in the ENA. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

Frank Panitz
Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark The genome sequence of the bat pipistrelle will provide a valuable tool to investigate the genetics between bats and related species. The authors apply current state-of-the-art methods to generate an assembly resulting in chromosome-level pseudomolecules. Still, some aspects regarding the assembly quality have to be re-assessed and improved. An account of the estimated (haploid) genome size and how it relates to the assembled sequences is missing. K-mer analysis to identify k-mers that are overrepresented as well as an evaluation of repeat sequences would help to better assess the resulting assembly.

○
The contig N50 length given in table 1 is larger than the scaffold N50 values, which does not appear normal.

○
The coverage of the autosome is generally between 14 and 15-fold; for the chromosomal pseudomolecule 19, however, coverage is more than twice as high (> 33-fold; https://blobto olkit.genomehubs.org/view/Pipistrellus%20pipistrellus/dataset/CAJEUD01/table?ERR3316149_cov--Active=true&ERR3316178_cov--Active=true#Lists). The authors should explain this observation; as high levels of repeats, that fail to assemble correctly may account for this discrepancy an assessment or annotation of repetitive elements should be provided. Backmapping of short (Illumina) reads is suggested to corroborate the coverage distribution over the genome.

○
The BUSCO assessment presented indicates that about 10% of the BUSCO genes are not accounted for. The authors should comment on this aspect and might consider experimental validation e.g. by transcriptome sequencing.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?
The completeness accessed by BUSCO is not that high. About 9.8% of the BUSCO genes are missing in the assembly. The author should provide an explain to this phenomenon and compare the genome quality with other bats. Whether this is a characteristic of the bat itself or the assembly needs further improvement.