The genome sequence of the common toad, Bufo bufo (Linnaeus, 1758)

We present a genome assembly from an individual male Bufo bufo (the common toad; Chordata; Amphibia; Anura; Bufonidae). The genome sequence is 5.04 gigabases in span. The majority of the assembly (99.1%) is scaffolded into 11 chromosomal pseudomolecules. Gene annotation of this assembly by the NCBI Eukaryotic Genome Annotation Pipeline has identified 21,517 protein coding genes.


Introduction
The common toad, Bufo bufo (Anura: Bufonidae) is widely distributed throughout Europe. It has a biphasic life cycle that includes aquatic, benthic larvae and terrestrial adults. Bufonids like B. bufo are notable amongst anurans in that they (1) lack maxillary teeth, (2) have Bidder's organs, and (3) have paired paratoid glands that contain alkaloid toxins. Bufo bufo has been used extensively in comparative vertebrate research including as a model system in sensory biology (Ewert, 1974).
Based on populations from mainland Europe, the nuclear genome size of B. bufo was previously estimated to be between 5.82 and 7.75 picograms (= 5.69 and 7.58 gigabases;(Gregory, 2021)). This is slightly larger than our 5.04 gigabase assembly. The eleven pseudomolecules in our assembly match the expected number of chromosomes in B. bufo (2N = 22; six macro-and five micro-chromosomes; (Birstein & Mazin, 1982;Makino & Others, 1951). This is the third nuclear genome sequence to be reported from a bufonid anuran (Edwards et al., 2018;Lu et al., 2021). The B. bufo reference genome reported here has been used to study pseudogenization of the tooth enamel gene amelogenin in bufonids (Shaheen et al., 2021). The genome of a common toad from the UK is particularly timely as a tool for understanding the dynamics of population declines observed over the last two decades (Carrier & Beebee, 2003;Petrovan & Schmidt, 2016).

Genome sequence report
The genome was sequenced from one male B. bufo collected from the Natural History Museum (NHM) Wildlife Garden, London, UK ( Figure 1A, B). A total of 64-fold coverage in Pacific Biosciences single-molecule long reads (N50 28 kb) and 56-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 29 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 3498 missing/misjoins and removed 513 haplotypic duplications, reducing the assembly length by 2.4% and the scaffold number by 49.5%, and increasing the scaffold N50 by 38.9%.
The final assembly has a total length of 5.04 Gb in 1307 sequence scaffolds with a scaffold N50 of 636 Mb (Table 1). The majority, 99.1%, of the assembly sequence was assigned to 11 chromosomal-level scaffolds (numbered by sequence length) (Figure 2- Figure 5; Table 2). The assembly has a BUSCO (Simão et al., 2015) v5.1.2 completeness of 90.1% using the tetrapoda_odb10 reference set. However, a BUSCO (v4.0.2) score of 95.3% using the same reference set was obtained for the annotated gene set of the aBufBuf1.1 assembly (see section Genome annotation), indicating that the assembly has a high level of completeness and that some genes were missed during BUSCO analysis of the whole genome assembly. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation
The B. bufo assembly was annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that    annotates genes, transcripts and proteins on draft and finished genome assemblies. The annotation (NCBI Bufo bufo Annotation Release 100; Table 1) was generated from transcripts and proteins retrieved from NCBI Entrez by alignment to the genome assembly, as described (Pruitt et al., 2014).

Sample acquisition
A single male B. bufo was collected from a stable, isolated population in the NHM Wildlife Garden, London, UK (latitude 51.49586, longitude -0.178622, elevation 17 m) by Jeffrey W. Streicher on 1 July 2015 ( Figure 1C). The specimen of B. bufo (NHMUK 2013.484, Field ID: JWS 758) was 55.5 mm snout-vent length (determined using Miyamoto digital callipers to the nearest 0.1 mm) and contained many nematode parasites in its stomach ( Figure 1D). The specimen was collected with permission from the NHM Wildlife Garden management team and is part of a long-term monitoring project run by the Department of Life Sciences and the Angela Marmont Centre for UK Biodiversity. It was humanely euthanised using a saturated solution of tricaine mesylate (MS-222). Multiple tissues including heart, thigh muscle, liver, eyes, kidney, testes, Bidder's organ, and intestines were sampled into an ammonium sulfate-based RNA + DNA preservation buffer. After ~24 hours of storage at 4°C, the tissues were transferred  to -80°C until they were sent for genome sequencing. Sample tissue has been accessioned by the NHM Molecular Collections Facility (NHMUK 2013.484).
DNA extraction and sequencing DNA was extracted from heart tissue using the Bionano Prep Animal Tissue DNA Isolation kit according to the manufacturer's instructions. Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Hi-C data were generated from heart tissue using the Arima v2 Hi-C kit. Extraction and sequencing was performed by the Scientific Operations DNA Pipelines at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL I and Illumina HiSeq X instruments. DNA was labeled for Bionano Genomics optical mapping following the Bionano Prep Direct Label and Stain (DLS) Protocol and run on one Saphyr instrument chip flowcell.

Genome assembly
Assembly was carried out following the Vertebrate Genome Project pipeline v1.6 ((Rhie et al., 2021)) with Falcon-unzip (Chin et al., 2016), haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x. Hybrid scaffolding was performed using the BioNano DLE-1 data and BioNano Solve. Scaffolding with Hi-C data ((Rao et al., 2014)) was carried out with HiLine, then 3D-DNA (Dudchenko et al., 2017). The Hi-C scaffolded assembly was polished with arrow using the PacBio data, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The mitochondrial genome was assembled at The Rockefeller University using the mitoVGP pipeline (Formenti et al., 2021). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016;(Howe et al., 2021)). Manual curation was performed using evidence from Bionano (using the Bionano Access viewer), using HiGlass (Kerpedjiev et al., 2018) and Pretext, as described previously (Howe et al., 2021). Figure 2- Figure 4 and BUSCO values were generated using BlobToolKit (Challis et al., 2020). Table 3 contains a list of software tools and versions, where applicable.

Ethical/compliance issues
The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.  et al., 2020) transcripts and proteins retrieved from NCBI Entrez by alignment to the genome assembly, as described by (Pruitt et al., 2014)."

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes