The genome sequence of the European nightjar, Caprimulgus europaeus (Linnaeus, 1758)

We present a genome assembly from an individual female Caprimulgus europaeus (the European nightjar; Chordata; Aves; Caprimulgiformes; Caprimulgidae). The genome sequence is 1,178 megabases in span. The majority of the assembly (99.33%) is scaffolded into 37 chromosomal pseudomolecules, including the W and Z sex chromosomes.


Background
The European nightjar (Caprimulgus europaeus; also known as the Eurasian nightjar and common goatsucker) is an insectivorous, crepuscular, ground-nesting bird distributed throughout the Western Palearctic (Hagemeijer & Blair, 1997). It breeds in semi-natural dry and open habitats with scattered trees (Cramp & Brooks, 1985). Little is known about the ecology of the European nightjar (Cramp & Brooks, 1985;Polakowski et al., 2020), and in general that of the Caprimulgidae family. The family comprises peculiar species such as the only bird known to hibernate, the Common Poorwill (Phalaenoptilus nuttallii) (Carey, 2019;French, 2019;Woods et al., 2019), and one of the few birds that uses echo-localization, the South American Oilbird (Steatornis caripensis) (Brinkløv et al., 2013). The European nightjar has been found to be more resistant to pathogens than other bird species (Jiang et al., 2021). Although categorized as 'least concern' by the IUCN (IUCN, 2016), the European nightjar has experienced a steady population decline in the past decades, and is of conservation concern in Europe ( Eaton et al., 2015;Evens et al., 2017;Keller et al., 2010). The availability of a high-quality, chromosome-level reference genome will help to deepen the knowledge on the biology and evolution of this species, boosting studies on the genomics of the peculiar family of Caprimulgidae. Moreover, as genomic resources gain preheminence in conservation efforts (Allendorf, 2017;Fuentes-Pardo & Ruzzante, 2017;Supple & Shapiro, 2018), we expect that the reference genome presented here will help aid planning conservation actions for the European nightjar.

Genome sequence report
The genome was sequenced from a blood sample taken from a single female C. europaeus collected from a bird ringing station in Ventotene, Italy (latitude 40.79404, longitude 13.42777). A total of 87-fold coverage in Pacific Biosciences single-molecule long reads and 62-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 144 missing/ misjoins and removed 31 haplotypic duplications, reducing the assembly length by 0.15% and the scaffold number by 21.94%, and increasing the scaffold N50 by 26.46%.
The final assembly has a total length of 1,178 Mb in 121 sequence scaffolds with a scaffold N50 of 83 Mb (Table 1). Of the assembly sequence, 99.3% was assigned to 37 chromosomal-level scaffolds, representing 35 autosomes (numbered by sequence length) and the W and Z sex chromosomes ( Figure 1-Figure 4; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 97.4% (single 96.9%, duplicated 0.6%) using the aves_odb10 reference set. While not fully phased, the assembly deposited is of one pseudo-haplotype. Contigs corresponding to the alternate haplotype have also been deposited.

Sample acquisition
Sampling was performed during the routine activity of the scientific ringing station located in Ventotene island, Latina, Italy (latitude 40.7926°, longitude 13.4241°) during spring migration. Samples have been collected by ISPRA researchers within their institutional activities as from Italian national Law n. 157/92. Bird capture was performed in the evening according to standardized protocols using mist-nets (Saino et al., 2010;Spina et al., 1993). The sample was collected with a heparinized capillary tube after puncturing the ulnar vein with an intra-epidermal needle. The blood was immediately transferred into 99% ethanol, initially kept at room temperature and then frozen.

DNA extraction and sequencing
High molecular weight DNA was extracted from the blood sample at the Scientific Operations core of the Wellcome Sanger Institute using the Bionano Prep Blood DNA Isolation Kit according to the Bionano Prep Frozen Blood protocol. Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated from the same blood sample using the Arima Hi-C+ kit and sequenced on HiSeq X.

Genome assembly
Assembly was carried out following the Vertebrate Genome Project pipeline v1.6 (Rhie et al., 2020) with Falcon-unzip (Chin et al., 2016), haplotypic duplication was identified and applying homozygous non-reference edits using bcftools consensus. A complete mitochondrion was not found using mitoVGP (Formenti et al., 2021a), likely due to the sample being sourced from blood tissue, so mitochondrial sequence NC_025773.1 (Caprimulgus indicus) was used during  polishing. The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using gEVAL, HiGlass (Kerpedjiev et al., 2018) and Pretext. The genome was analysed, and BUSCO scores generated, within the BlobToolKit environment (Challis et al., 2020). Table 3 gives version numbers of the software tools used in this work. The genome sequence is released openly for reuse. The C. europaeus genome sequencing initiative is part of the Darwin Tree of Life (DToL) project and the Vertebrate Genomes Project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.