The genome sequence of the long-spined sea scorpion, Taurulus bubalis (Euphrasén, 1786)

We present a genome assembly from an individual female Taurulus bubalis (the long-spined sea scorpion; Chordata; Actinopteri; Perciformes; Cottidae). The genome sequence is 615 megabases in span. The complete assembly is scaffolded into 21 chromosomal pseudomolecules.


Background
The long-spined sea scorpion (Taurulus bubalis, Perciformes: Cottidae), also known as the longspined bullhead or fatherlasher, is named for the distinctive long spine found on its cheek (above the pectoral fin and behind the eye). It is a rocky shore fish found throughout western European waters and around all coasts of Britain and Ireland, from the shore to around 30 m depth. They are also occasionally seen in the Mediterranean. Adult fish can reach around 20 cm long and have a broad head with a large mouth. They are sometimes confused with the bull rout, Myoxocephalus scorpius (also known as the shortspined scorpion fish), but adult bull rout are much larger and lack the distinctive cheek spine (Neal, 2008). Long-spined sea scorpions are ambush predators with cryptic coloration and have a varied diet, primarily focused on crustaceans, but molluscs, fish and polychaetes are also consumed. Most prey are swallowed whole. Individuals grow rapidly during the first two years of life, and may begin to spawn during their second year. All individuals will have spawned by their third year, with spawning occurring between December and March (King & Fives, 1983).
The long-spined sea scorpion has a notable behavioural response to changing environmental conditions: when oxygen tension is lower, individuals will emerge from the water and climb onto the land where they breathe air. They are not, however, very mobile once they have emerged (Davenport & Woolmington, 1981). The long-spined sea scorpion has been classified as "Least Concern" on the IUCN red list (Lorance et al., 2014).

Genome sequence report
The genome was sequenced from one T. bubalis of unknown sex collected from Farland Point, Great Cumbrae, North Ayrshire, UK (latitude 55.746815, longitude -4.914907) (Figure 1). A total of 38-fold coverage in Pacific Biosciences single-molecule long reads and 51-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 161 missing/misjoins and removed 6 haplotypic duplications, reducing the assembly length by 0.28% and the scaffold number by 86.10%, and increasing the scaffold N50 by 31.71%.
The final assembly has a total length of 615 Mb in 26 sequence scaffolds with a scaffold N50 of 29.1 Mb ( Table 1). The complete assembly sequence was assigned to 21 chromosomal-level scaffolds, representing 21 autosomes (numbered by sequence length) (Figure 2- Figure 5; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 98.4% (single 97.6%, duplicated 0.8%) using the actinopterygii_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.  dissected on dry ice with tissue set aside for Hi-C and RNA sequencing. Muscle tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system. RNA was extracted from fin tissue in the Tree of Life Laboratory at the WSI using TRIzol (Invitrogen), according to the manufacturer's instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA  Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated from gill tissue using the Arima v2 Hi-C kit and sequenced on HiSeq X.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with   HiGlass (Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.
The genome sequence is released openly for reuse. The T. bubalis genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

Thomas Desvignes
University of Oregon, Eugene, USA In this article, the authors report on the publicly-available, chromosome-scale genome assembly of the longspined bullhead Taurulus bubalis, a marine sculpin from western Europe. The assembly appears to be of great quality using appropriate methods and following the high standard of the Darwin Tree of Life Project. I have only minor comments.
In the abstract the sequenced specimen is referred to as a female, but the first sentence of the genome sequence report and of the methods state that the specimen was of unknown sex. What is the accurate situation? Knowing the sex of the individual would obviously be beneficial.
Is there any information on the karyotype of T. bubalis or closely related species to support that the 21 pseudochromosome molecules obtained is the expected number of chromosomes?
I am confused by the statement that the 21 chromosomal-level scaffolds represent "21 autosomes"? What about potential sex chromosomes? Is there any information supporting the hypothesis that the species' sex determination mechanism is environmental? Or is it instead that the analysis of the two haplotypes did not reveal obvious differences in any chromosome pairs? In which case, if the species has a genetically determined sex, the potential sex chromosomes would be homomorphic, although still not autosomes, I believe.
According to Fishbase, "longspined bullhead" is the FAO-accepted common name for Taurulus bubalis while "longspined sea-scorpion" is a vernacular name from Ireland and the UK. Please verify and if appropriate replace "sea-scorpion" by "bullhead", also note that there is apparently no hyphen in "longspined".
Similarly, based on Fishbase, the FAO-accepted common name for Myoxocephalus scorpius is "shorthorn sculpin" while "bull-rout" is considered a vernacular name.
In the background section, the mentioned spines are preopercular spines. It might be worth simply adding this word because these preopercular spines represent a distinctive character in sculpins.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes