The genome sequence of the Atlantic horse mackerel, Trachurus trachurus (Linnaeus 1758)

We present a genome assembly from an individual Trachurus trachurus (the Atlantic horse mackerel; Chordata; Actinopteri; Carangiformes; Carangidae). The genome sequence is 801 megabases in span. The majority of the assembly, 98.68%, is scaffolded into 24 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl has identified 25,797 protein coding genes.


Background
The Atlantic horse mackerel Trachurus trachurus (Linnaeus, 1758), also known as European horse mackerel or common scad, is northern Europe's only resident representative of the Carangidae, a ray-finned fish family that includes the jacks, pompanos and trevallies. Trachurus trachurus is a benthopelagic shoaling species and is typically found at depths of less than 200 m. The species has a broad distribution, including Iceland, Northeast Atlantic continental shelf waters, the Mediterranean, and north-western African coastal waters at least as far as Ghana (Healey et al., 2020). Atlantic horse mackerel are targeted by commercial fisheries using trawls, purse seines and long-lines. Major fished stocks are managed regionally. Those in Northeast Atlantic continental shelf waters are separated into a southern stock (Atlantic waters of the Iberian Peninsula), a western stock (shelf-edge seas from Bay of Biscay to the Norwegian coast, including spawning grounds of the Celtic Sea), and a North Sea stock (central and southern North Sea, including the Skagerrak and Kattegat) (ICES, 2019). Total landings of 140,000 metric tonnes were reported in 2018, down from catches of over 450,000 metric tonnes in the mid 1990s. On the basis of declining abundance over sections of the species range, it has been listed as Vulnerable by the International Union for the Conservation of Nature (Smith-Vaniz et al., 2015).

Genome sequence report
The genome was sequenced from a single T. trachurus of unknown sex collected from Southampton Water, off the coast of Hampshire, UK. A total of 105-fold coverage in Pacific Biosciences single-molecule long reads (N50 23 kb) and 64-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 22 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 141 missing/misjoins and removed 43 haplotypic duplications, reducing the scaffold number by 22.14%, increasing the scaffold N50 by 19.37% and decreasing the assembly length by 1.57%.
The final assembly has a total length of 801 Mb in 152 sequence scaffolds with a scaffold N50 of 35.4 Mb ( Table 1). The majority, 98.68%, of the assembly sequence was assigned to 24 chromosomal-level scaffolds, representing 24 autosomes (numbered by synteny to Oryzias latipes (Japanese medaka); GCF_002234675.1) (Figure 1-Figure 4; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 98.6% using the actinopterygii_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Gene annotation
The Ensembl gene annotation system (     Only one minor error was detected, the link to the Hi-C interactive map in Figure 4 did not work at the time of this review. It would be nice to have a fixed link in regards to examining the data in more detail.
Another small issue is that the genetic material came from a single sample of unknown sex. Sex determination in fish as a whole is complicated, it would be helpful in future genome studies to find some confirmation of the sex of the sample if possible through gonadal cannulation or biopsy.
In addition the genome report finds after generating the 24 autosomes based on synteny to Oryzias latipes (Japanese medaka) that 128 scaffolds of N50 104,180 and length 10.5 Mb fell into the unplaced sequence. It would be interesting to determine if some polishing of the genome or the pseudo-alternative haplotype could resolve these unplaced scaffolds, perhaps by choosing a different fish model for syntenic comparison.
An alternative haplotype was generated containing 1,049 scaffolds with total length 797 Mb and an N50 of 1.6 Mb. It would be helpful, in this reviewer's opinion, to provide a few more details about the alignments of the alternate locus reference sequences to the main chromosome sequences in the assembly to put these alternative loci into the context of the reference genome.
A few more comments on the significance of this genome report: The horse mackerel is not a true "mackerel", which represents another important fisheries stock. It may be helpful, therefore, to point out that true mackerels are in the family Scombridae which holds many highly migratory species with unique challenges and adaptations. In contrast, the horse mackerel has two significant populations. The Northern stock of the horse mackerel spawns in the North Sea and heads back to Northern waters, whereas the Western stock spawns in the Bay of Biscay and heads west as the fish mature. Overall, this genome report is a significant contribution which will help in the management of important fisheries stocks that are under pressure from climate change, habitat reduction and increasing economic demand for food from the oceans.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Partly Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Marine biology, Bioinformatics, Whole genome assemblies of non-model organisms, Metagenomics.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
© 2022 Castro L. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Luís Filipe C. Castro
CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Porto, Portugal This genome provides a valuable resource in the context of teleost species that represent a fishery resource. The lack of a fully phased genome is not a significant problem. This will provide an opportunity for comparative genomics approaches with the ever growing number of high-quality genomes currently available for this range of taxa.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes