The genome sequence of the common red soldier beetle, Rhagonycha fulva (Scopoli, 1763)

We present a genome assembly from an individual female Rhagonycha fulva (the common red soldier beetle; Arthropoda; Insecta; Coleoptera; Cantharidae). The genome sequence is 425 megabases in span. The majority of the assembly is scaffolded into seven chromosomal pseudomolecules, with the X sex chromosome assembled.


Introduction
The common red soldier beetle, Rhagonycha fulva, is the most abundant and widespread soldier beetle (Coleoptera: Cantharidae) in the UK. They can be found in a variety of habitats, where adults are frequently encountered on the flowers of umbellifers (Apiaceae), thistles (Asteraceae) and ragwort (Senecio jacobaea) throughout the summer. It can be particularly abundant on the flowers of common hogweed, Heracleum sphondylium (Grace & Nelson, 1981), and their association with flowers indicates this species' potential role as an important pollinator. Adults are predatory on small insects, but also feed extensively on floral resources. They are diurnal and fly readily, males in particular are highly mobile (Rodwell et al., 2018). Mating occurs over a prolonged period of time, meaning female-male pairs are often encountered in copulation. Eggs are laid into the soil and the larvae are predatory, hunting amongst the leaf litter.
Adults can be easily recognised by the extensive reddish colour of the entire body with black tips to the elytra and black tarsi, antennae and palps.

Genome sequence report
The genome was sequenced from one female R. fulva collected from Wytham farm, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.779, longitude -1.317). A total of 41-fold coverage in Pacific Biosciences single-molecule long reads and 103-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 73 missing/misjoins and removed 12 haplotypic duplications, reducing the assembly length by 1.54% and the scaffold number by 84.62%, and increasing the scaffold N50 by 238.56%. The final assembly has a total length of 425 Mb in 13 sequence scaffolds with a scaffold N50 of 116 Mb (Table 1). The majority, 99.97%, of the assembly sequence was assigned to seven chromosomal-level scaffolds, representing six autosomes (numbered by sequence length), and the X sex chromosome (Figure 1-Figure 4; Table 2). The assembly has a BUSCO v5.1.2 (Simão et al., 2015) completeness of 98.9% using the endopterygota_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Chromosome 1 contains a largely homogeneous, heterochromatic block at 24.45-95.85 Mb (Figure 4), in accordance with existing karyotyping (see Figure 3 of (James & Angus, 2007)). This block consists of numerous scaffolds with high repeat content that can be localised to chromosome 1, but their order and orientation is unsure. It is likely that the assembly overexpands this region due to difficulties in identifying and removing haplotypic duplications.

Methods
A single female R. fulva was collected from Wytham farm, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.779, longitude -1.317) by Liam Crowley, University of Oxford, and snap-frozen on dry ice using a CoolRack. A second specimen of unknown sex, icRhaFulv4, was collected from Wigmore Park, Luton, UK (latitude 51.88378, longitude -0.36861422) by Olga Sivell, Natural History Museum, and snap-frozen on dry ice.

Biosciences SEQUEL II (HiFi), Illumina HiSeq X (10X) and
Illumina HiSeq 4000 (RNA-Seq) instruments. Hi-C data were generated using the Arima v2 Hi-C kit and sequenced on a HiSeq X instrument.
Assembly was carried out with Hifiasm (Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). One round of polishing was performed by aligning 10X Genomics read data to the  Pretext. The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2021). The genome was analysed and BUSCO scores generated within the BlobToolKit environment (Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.  The genome sequence is released openly for reuse. The R. fulva genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases.The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.

© 2022 Bocak L.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

CATRIN, Biodiversity & Moelcular Evolution Olomouc, Olomouc, Czech Republic
The data note requires minor revision. The information on the species biology is sufficient, but I would recommend mentioning the distribution of the species as the researchers from other countries can use data and it is valuable to indicate where the species occurs. Geographic origin is mentioned twice with all coordinates, etc. If this is not a part of the report template, modify the text.
The description of applied methods is detailed, data access information is complete.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes