The genome sequence of the brown trout, Salmo trutta Linnaeus 1758

We present a genome assembly from an individual female Salmo trutta (the brown trout; Chordata; Actinopteri; Salmoniformes; Salmonidae). The genome sequence is 2.37 gigabases in span. The majority of the assembly is scaffolded into 40 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl has identified 43,935 protein coding genes.


Introduction
The brown trout, Salmo trutta, is native to Europe, western Asia and North Africa; however, the species has been successfully introduced to a multitude of other geographical locations (Klemetsen et al., 2003). Genetically similar S. trutta can be freshwater residents, freshwater migrants or anadromous (migrating to the sea to feed, only returning to freshwater to breed), leading taxonomists initially to believe that these were multiple independent species. This phenotypic difference has a genetic component but is also partly caused by environmental factors, such as food availability, which lead to changes in gene expression and drives migration and adaptation to different environments (Ferguson et al., 2019). S. trutta also exhibit considerable genetic variation within migratory or resident populations; these differences can be seen by populations in different habitats (Ferguson, 1989)  This reference genome sequence will be of utility for researchers that wish to sample and analyse the genetics of S. trutta populations, helping to understand genetic drivers behind migration and the reasons why different populations of brown trout are so well adapted to different conditions. As increases in atmospheric CO 2 continue to increase temperatures and acidify oceans, this information will help conservation of S. trutta and other species by revealing which genetic components allow populations to adapt to warmer and more acidic environments.

Genome sequence report
The genome was sequenced from a single female Salmo trutta bred at the Institute of Marine Research, Bergen, Norway. A total of 52-fold coverage in Pacific Biosciences single-molecule long reads (N50 19 kb) and 70-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 65 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data, and 67-fold coverage of Bionano optical maps. Manual assembly curation corrected 175 missing/misjoins, reducing the scaffold number by 4.8% and the assembly length by 0.5%. The final assembly has a total length of 2.37 Gb in 1,441 sequence scaffolds with a scaffold N50 of 52.21 Mb ( Table 1). The majority, 91.5%, of the assembly sequence was assigned to 40 chromosomal-level scaffolds, representing 40 autosomes (numbered by sequence length). No sex chromosomes could be identified ( Figure 1; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 97.2% using the actinopterygii_odb10 reference set. Genome assembly

Gene annotation
The Ensembl gene annotation system (Aken et al., 2016) was used to generate annotation for the fSalTru1.1 assembly (GCA_901001165.1) ( Table 1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of vertebrate proteins from UniProt (UniProt Consortium, 2019). The resulting Ensembl annotation includes 122,381 transcripts assigned to 43,935 coding and 4,441 non-coding genes (Salmo trutta -Ensembl Rapid Release).

Methods
Owing to the high genetic diversity of brown trout and the variable chromosome numbers (S. trutta have 38-42 chromosomes, with multiple copies of these chromosomes), doubled haploid specimens were bred for sequencing and generation of the assembly. The doubled haploid female used in this study was bred on 26 November 2015 at the Institute of Marine Research using a protocol optimized for Atlantic salmon, Salmo salar (see (Hansen et al., 2020)). In summary, eggs from one Salmo trutta female from a domestic stock that originated from Lake Tunhovd in eastern Norway were fertilized with UV irradiated milt (brown trout sperm diluted 1:40 with sperm fluid and irradiated (254 nm) for 8 mins at    Table 3 for software versions and sources) a total map length of 2.62 Gb and a map N50 of 29.37 Mb.
Assembly was carried out following the Vertebrate Genome Project pipeline v1.0 (Rhie et al., 2020) with Falcon-unzip (Chin et al., 2016) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x. Hybrid  bcftools consensus. Two rounds of the Illumina polishing were applied. The assembly was checked for contamination and corrected. Manual curation was performed as described previously (Howe et al., 2021) using the gEVAL system (Chow et al., 2016), Bionano Access, HiGlass and Pretext. Figure 1-Figure 3 and BUSCO values were generated using BlobToolKit (Challis et al., 2020). The brown trout, Salmo trutta, is native to Europe, western Asia and North Africa and is an important fish across these regions. The authors improved the reference genome of Salmo trutta using PacBio, Hi-C sequencing technologies which means a much more complete chromosomelevel assembly can be feasibly obtained. Based on the new assembly, genome analysis was then performed on a female individual. The manuscript did a great job demonstrating successful high-quality chromosome level analysis in a non-model species. In particular, the Introduction provides an excellent backdrop to the findings of the paper. Some figures are clear and concise, and the analyses are sufficiently well described in the methods to enable the reader to fully understand what was done. In general, the manuscript was clearly written and the analytical methods were sound. I have only a few minor concerns about the paper. Sex determination should be an extremely simple trait. Is it due to genetics or assembly error?
There are quite a few inconsistencies between the genetic map and the assembly. Careful checking is needed to make sure the inconsistencies are not due to assembly errors 2.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes