The genome sequence of a soldier beetle, Cantharis rustica Fallén 1807

We present a genome assembly from an individual male Cantharis rustica (a soldier beetle; Arthropoda; Insecta; Coleoptera; Cantharidae). The genome sequence is 446 megabases in span. The majority (99.71%) of the assembly is scaffolded into 7 chromosomal pseudomolecules, with the X sex chromosome assembled.


Background
Cantharis rustica (Coleoptera, Cantharidae) is a soldier beetle that can be distinguished from other British soldier beetles by its black elytra, red pronotum with a black spot, and red (or partly red) femora with the remainder of the legs black in colour (Fitton, 1973). It is common and widely distributed in southern Britain, but scarce and localised in the north (Alexander, 2003;Alexander, 2014). The species prefers lowland grassland habitats, but also occurs in woodland and other habitats with tall grass. Adults can be found on vegetation and flower heads from mid-May till the end of June (Alexander, 1991;Alexander, 2003;Fitton, 1973).
The karyotype of Cantharis rustica has been described and illustrated by James & Angus (2007); males have an X0 sex chromosome system. The high-quality genome sequence described here is, to our knowledge, the first one reported for Cantharis rustica and has been generated as part of the Darwin Tree of Life project. It will aid in understanding the biology, physiology and ecology of the species.

Genome sequence report
The genome was sequenced from one male C. rustica collected from Wigmore Park, Luton, UK (latitude 51.88378, longitude -0.36861422). A total of 43-fold coverage in Pacific Biosciences single-molecule long reads and 48-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 60 missing/misjoins and removed 7 haplotypic duplications, reducing the assembly length by 0.75% and the scaffold number by 72.13%, and increasing the scaffold N50 by 133.18%.
The final assembly has a total length of 446 Mb in 17 sequence scaffolds with a scaffold N50 of 57.8 Mb (Table 1). The majority, 99.71%, of the assembly sequence was assigned to 7 chromosomal-level scaffolds, representing 6 autosomes (numbered by sequence length), and the X sex chromosome (Figure 1-Figure 4; Table 2). The assembly has a BUSCO v5.1.2 (Manni et al., 2021) completeness of 97.7% (single 95.5%, duplicated 2.2%) using the endopterygota_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Chromosome 2 contains a large heterochromatic region of low confidence at approximately 20-46 Mb. This block consists of a number of scaffolds with high repeat content that can be localised to chromosome 2 but their order and orientation with respect to each other is unsure. Large islands of similar tandem repeat with high GC content are observed near both poles of Chromosome 1. Small islands of a related repeat are observed in all other chromosomes.

Sample acquisition and DNA extraction
A single female C. rustica (icCanRust1) was collected from Wigmore Park, Luton, UK (latitude 51.88378, longitude -0.36861422) by Olga Sivell, Natural History Museum, using a net. The sample was identified by Duncan Sivell, Natural History Museum and snap-frozen on dry ice. Unfortunately, as this specimen was collected during a COVID-19 lockdown, no image was captured prior to preservation.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The icCanRust1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solidphase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Instituteon Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated from abdomen tissue using the Arima Hi-C+ kit and sequenced on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with     One Page 3, this sentence is vague "Small islands of a related repeat are observed in all other chromosomes." Is the repeat a conserved motif found repeatedly, and is it the same as the tandem repeat referred to in the preceding sentence? ○ Page 7, reference to "Manual curation" should be clarified as curation of the assembly, not of a gene set.

○
Page 8, remove "the" before RNA-seq data ○ Is the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes