The genome sequence of Aplidium turbinatum (Savigny 1816), a colonial sea squirt

We present a genome assembly from an individual Aplidium turbinatum (Chordata; Ascidiacea; Aplousobranchia; Polyclinidae). The genome sequence is 605 megabases in span. The majority of the assembly (99.98%) is scaffolded into 18 chromosomal pseudomolecules. The complete mitochondrial genome was also assembled and is 18.4 kilobases in length.


Background
The polyclinid ascidian Aplidium turbinatum (formerly known as Sidnyum turbinatum -see (Monniot & Monniot, 1987)) has a European distribution from Norway to the Mediterranean. It is frequently encountered in shallow water around the coasts of Great Britain and Ireland.
Colonies comprise a number of lobes or 'heads' with flat tops, tapering towards their common attached base. Up to 12 (rarely 25) zooids approximately 5 mm long are embedded vertically in each colony lobe, with their separate eight-lobed inhalant openings in the flat upper surface, arranged around a single exhalant opening. The colonial tunic in which the zooids are embedded is unusually thin for a polyclinid and transparent, meaning that the zooids inside can be seen clearly. Pigmentation in a variety of possible colours picks out the endostyle and the structure of the branchial basket of the zooids, while the inhalant openings are often also pigmented.
A cytotoxic substance, turbinamide, has been isolated from A. turbinatum (Aiello et al., 2001). The cytotoxic effect was found to be selective, acting against neuronal cells but not immune-system cells.

Genome sequence report
The genome was sequenced from a single monoecious hermaphrodite A. turbinatum clonal colony ( Figure 1) collected from Queen Anne's Battery Marina visitors' pontoon, Plymouth, UK. A total of 65-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 85-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 37 missing/misjoins and removed 10 haplotypic duplications, reducing the assembly size by 0.54%, the scaffold number by 43.48%, and the scaffold N50 by 0.71%.
The final assembly has a total length of 605 Mb in 26 sequence scaffolds with a scaffold N50 of 34.0 Mb (Table 1). The majority, 99.88%, of the assembly sequence was assigned to 11 chromosomal-level scaffolds, representing 18 autosomes (numbered by sequence length) (Figure 2- Figure 5; Table 2). The assembly was curated to 18 chromosomes and oriented with the centromere on the left. Of note, telomere regions on chromosomes are all at half coverage. A few small scaffolds remain unlocalised due to lack of Hi-C signal.    DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The kaAplTurb1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C and RNA sequencing. Tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of  The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext.   The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021), which performs annotation using MitoFinder (Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020).  Overall, I think the reference genome looks really good and that the article can be Approved for indexing.
I have identified a couple of points that could be clarify in the text: What was the purpose of sequencing 65-fold coverage in Pac Bio HiFi and 85-fold in 10X Genomic and not relying only on PacBio HiFi? 1.
I think that the following sentence could benefits from some "The assembly was curated to 18 chromosomes and oriented with the centromere on the left.": a) How was the curation performed? Manually with PretextView? b) Was the "real" number of chromosome known from previous studies? Was this information used for the curation? 2.
Same with the following sentence: "Of note, telomere regions on chromosomes are all at half coverage": a) How to explain these differences in coverage? b) Perhaps a link to a figure or a table would help here.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.