The genome sequence of the common green lacewing, Chrysoperla carnea (Stephens, 1836)

We present a genome assembly from an individual female Chrysoperla carnea (a common green lacewing; Arthropoda; Insecta; Neuroptera; Chrysopidae). The genome sequence is 560 megabases in span. The majority of the assembly (95.70%) is scaffolded into six chromosomal pseudomolecules, with the X sex chromosome assembled. Gene annotation of this assembly by the NCBI Eukaryotic Genome Annotation Pipeline has identified 12,985 protein coding genes.


Background
Chrysoperla carnea, a common green lacewing, is a common and widespread lacewing across the Holarctic. It is one of the most common species of lacewing in the UK, found across a wide range of habitats. This species is part of a species complex that contains several cryptic species. These species can be distinguished by differences in the substrate-borne songs produced by adults via abdominal vibrations (Henry et al., 2002). In the UK the C. carnea group is currently split into two species, C. carnea sensu stricto and C. lucasina, both of which appear to be common. A third species, Chrysoperla pallida, may also be present. Chrysoperla carnea overwinters as an adult in common with all Chrysoperla species, but has the unique trait of losing its green pigment and turning yellow-brown during the winter period. The larvae are voracious generalist predators, feeding on aphids and other insects (Rosenheim et al., 1999), including several other pest groups such as spider mites, thrips, whitefly, leafhoppers, psyllids and Lepidoptera. They have been used extensively as biocontrol agents in agricultural and horticultural systems and are commercially produced for this purpose. Adults visit flowers and feed on pollen and nectar. Females have been recorded consuming more pollen than males (Villenave et al., 2005). The eggs are laid on vegetation and are suspended off the surface on characteristic stalks.

Genome sequence report
The genome was sequenced from one male C. carnea collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.772, longitude -1.338). A total of 40-fold coverage in Pacific Biosciences single-molecule long reads and 147-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 20 missing/misjoins and removed 6 haplotypic duplications, reducing the assembly size by 5.41% and the scaffold number by 2.32% and increasing the scaffold N50 by 0.55%.
The final assembly has a total length of 560 Mb in 337 sequence scaffolds with a scaffold N50 of 94.4 Mb ( Table 1). The majority of the assembly sequence (95.70%) was assigned to six chromosomal-level scaffolds, representing five autosomes (numbered by sequence length), and the X sex chromosome (Figure 1-Figure 4; Table 2). There is a very large repeat associated with the X chromosome, which has resulted in the presence of many unlocalised scaffolds in the assembly. The assembly has a BUSCO v5. 1.2 (Manni et al., 2021) completeness of 95.9% (single 95.0%, duplicated 0.9%) using the endopterygota_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation report
The inChrCarn1.1 genome has been annotated using the NCBI RefSeq annotation pipeline (     Table 3 contains a list of all software tool versions used, where appropriate.

Genome annotation
The C. carnea assembly was annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated using the Arima v2 Hi-C kit and sequenced on a HiSeq X instrument. that annotates genes, transcripts and proteins on draft and finished genome assemblies. The annotation was generated from transcripts and proteins retrieved from NCBI Entrez by alignment to the genome assembly, as described here (Pruitt et al., 2014). The Background section would benefit from polishing. The first three lines contain the word "common" three times, and it appears again twice more in this paragraph. A pedantic point, but easily resolved e.g. change "in common with" to "similarly to".

Ethics/compliance issues
The Background section would also benefit from two additions: 1) mention of whether lacewing genomes have been sequenced before (they have...just -e.g. Wang et al 2022 -but still underexplored) and 2) why this lacewing is being sequenced and what can we learn from this genome?
The main text states a male is sequenced, while in the Methods it says female -presumably it was a female sequenced given the lack of a Y chromosome.
The protocols followed are fairly standard, and the techniques sound. However, the Methods are very brief and there are few parameters described, which makes replication tricky. Given this article type, which is technical in nature, a bit more detail would be useful for readers.
For instance, although it is stated that the assembly was checked for contamination, no outcome is noted. Was there contamination? And were any cobionts (non-target organisms) present in the dataset?
In the DNA extraction section of the Methods, the wording needs clearing up, e.g. "whole organism" is repeated.
All in all, a useful addition to the genome sequencing project.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format? Yes In the 'Genome Sequence Report' section could be relevant to mention why the endopterygota_odb10 reference set was used and the effect that has on the results.

3.
I suggest fixing the axis labels in Figure 2 (a raw sequence identifier is used in the y-axis; the 'sum length' is inverted) and improving the figure caption. The colours used are very similar and difficult to distinguish when the bubbles overlap. A legend referencing the bubble size to the scaffold length could be beneficial. Also, some of the bars on the histogram are barely visible.

4.
In the 'Methods' section, I recommend providing more details about the morphological identification of the specimens. Were pictures taken?

5.
In the 'Methods' section, the assembly identifier is used in reference to the species name "DNA was extracted from the whole organism of inChrCarn1…" this should be corrected. The phrase "the whole organism" is repeated twice in this sentence.

6.
Figure 4 caption could include more details, for example, of how this figure is generated (visualization of the Hi-C data mapped to sequence data).

7.
In the 'Genome Assembly' section, details about some of the parameters used in the software settings should be included when suitable.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format? Yes © 2022 Torcivia J. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

John Torcivia
George Washington University, Washington, DC, USA The authors present a genome assembly (along with downstream annotation) for a species of lacewing. There is some background provided on the species and usefulness of full assemblies of the genome of any species is helpful to report.
Protocols used appear to be fairly standard and generally accepted methodologies to perform this assembly and downstream annotation of the genome. While these techniques do not guarantee a perfect assembly, they offer a strong starting point for future research to refine as needed.
Particular techniques are provided along with supporting papers. One potential improvement to the publication would be additional technical information (which parameters were chosen for each tool for example) to help with potential replication. Many parameters are related to computation cost but some could have an effect on the actual assembly itself. That said, it is unlikely that those selected deviate in a significant way from the standard parameters used for these tools and I don't have concern with the general replicability of this dataset. I like the inclusion of a table of the software tools used along with their versions and think this is a helpful addition to these types of articles.
The sequencing data is uploaded in FASTQ format which is a typical and well understood format. This is deposited into a public repository and available for download and additional utilization. Additional downstream annotation is provided in an NIH database and is represented in generally accepted and understood formats.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes expertise to confirm that it is of an acceptable scientific standard.