The genome sequence of Anoplius nigerrimus (Scopoli, 1763), a spider wasp [version 1; peer review: 1 approved with reservations]

We present a genome assembly from an individual Anoplius nigerrimus (Arthropoda; Insecta; Hymenoptera; Pompilidae) of unknown sex. The genome sequence is 624 megabases in span. In total, 45.75% of the assembly is scaffolded into 15 chromosomal pseudomolecules. The mitochondrial genome was also assembled and is 17.5 kilobases in length.


Background
Anoplius nigerrimus is one of 43 species of spider-hunting wasp known from Britain and Ireland, where it is widespread and often common (Edwards, 1998). Unlike close relatives, which are more associated with wetlands, A. nigerrimus is found in a variety of habitats (Day, 1988). Females can excavate vertical burrows ending in a widened cell where the egg is laid, typically under stones, or more frequently will utilise existing burrows or cavities, for example in stems or snail shells. Spiders are hunted in vegetation such as grass clumps (Kurczewski, 2010). Prey spiders are stung on the underside of the cephalothorax. The paralysed prey is then dragged by the spider walking backwards to the nest (Kurczewski, 2010). Hosts are often Lycosidae, although other spider families are preyed upon.
An holarctic species, favouring a temperate climate, A. nigerrimus is univoltine, flying mainly in the summer, with males in particular being regular flower visitors. As the name suggests, this is an entirely black pompilid, characterised by strong bristle-like setae on the female 6 th metasomal tergite (a characteristic of the genus Anoplius) and a rather triangular 3 rd submarginal cell in the fore wing. This is the first full genome for a species of Pompilidae and as such will make a valuable contribution towards understanding the evolution of aculeate (stinging) Hymenoptera.

Genome sequence report
The genome was sequenced from a single A. nigerrimus ( Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.769, longitude -1.339). A total of 29-fold coverage in Pacific Biosciences single-molecule long reads and 87-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 18 missing/misjoins and removed 1 haplotypic duplication, reducing the assembly size by 0.06% and the scaffold number by 1.31%, and increasing the scaffold N50 by 11.90%.
The final assembly has a total length of 624 Mb in 1202 sequence scaffolds with a scaffold N50 of 2.0 Mb (Table 1). Of the assembly sequence, 45.75% was assigned to 15 chromosomal-level scaffolds (numbered by sequence length) (Figure 2- Figure 5; Table 2). The assembly generated is extremely fragmented,   Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12-20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute

Discussion of the manuscript
Background: I have no major issues with the background discussion of the insect. I would prefer that the final sentence is expanded, I feel that "… will make a valuable contribution towards understanding the evolution of aculeate Hymenoptera" is relatively ambiguous. What type of contribution do the authors mean? For example, the evolution of venom? This type of discussion would make it easier to understand the rationale behind sequencing this species. Currently it feels like it was in the freezer and available.

Genome sequence report:
The bare facts of the samples collection, genome sequencing and genome statistics are stated. The data availability information is listed in Table 1 as required. When was the insect collected? Why was it not sexed? Not knowing the sex can make sequencing more difficult in Hymenoptera as many employ a haplodiploid sex determination system. Resulting in many Hymenopteran genome projects using haploid males as they are easier to assemble 1 . I am concerned by "the assembly generated is extremely fragmented". I would like further information about this, for example, why is the genome extremely fragmented? is it because of repetitive regions? Failed library preparation? Small insert size for the long reads / 10X reads? and so on. This is especially concerning when sequencing projects with similar genome coverage have much better contiguity. In Figure 1., the insect is out of focus. I would recommend using a better image in its place of the actual insect and not the in-focus collection vial.

Methods:
What year was the insect collected? Why was it not sexed? How was it identified? Now I understand that many of the genomes are being published by the DTOL consortia and that there is likely a certain recipe for the manuscripts, but I do not like that they are basically word for word copies (just observed by going through other submitted articles) between each manuscript with only the species and sample accession replaced. Not only is this the case but it seems to perpetuate systematic errors. Meaning that although the bare minimum of the methods is reported it is difficult to understand and lacks any substance.
Reading the DNA extraction section was confusing. The Authors use three different technologies to generate this genome assembly, PacBio HiFi reads, 10X genomics and Hi-C. Logically this section would be split into HMW techniques (HiFi and 10X separated too) and then Hi-C. This would aid readability. The section starts with the insect being dissected, please be explicit in which tissue was kept back for Hi-C. I know that it was the head because you mention it a few paragraphs later. But it is unclear at the start where you are explaining about the sample dissection, what tissue is being used for which library preparation. The thorax was disrupted and then DNA was measured from the homogenized sample? To what end? Then it moves on to the HMW DNA extraction, maybe the sentence about measuring DNA in the homogenized sample was supposed to be here? Please also report the DNA extraction outcomes. This could be why your genome is "extremely fragmented". Could you explain which DNA was used for which library? Was the HMW DNA cleaned up using 0.8 X Ampure beads used for the 10X chromium sequencing? If so, how was the 50ng of DNA input determined, the input for 10X chromium sequencing is usually decided by the genome size of the organism. Was the HMW DNA of 12-20kbp used for the PacBio HiFi reads? Did this work, please show the library size determined by fragment analysis. Why is there no mention of the Hi-C library preparation here, but rather in the sequencing section. How were the cells fixed? Any brief description of the library preparation would be good. The Arima v2.0 kit is just for the Hi-C, which kit was used for the library preparation KAPA, Swift biosciences? These are all important details to include here, so that the research can be properly interpreted and repeated.
Sequencing: Which versions of protocols were used? Again this is a word-for-word copy from previous DTOL genome announcements. But still it is confusing, HiFi and 10x are both listed as using the manufacturers instructions. This is too ambiguous to be repeatable. Split this up and mention which version of the protocol was used. How were the 10X and Hi-C libraries sequenced, together, separate? How much PhiX was used? How was demultiplexing and basecalling performed?

Genome assembly:
The methods used for the assembly are basic and again a cookie cutter one size fits all, but this is not usually the case with genome assembly projects. Why were the 10X reads only used for polishing the genome? Did you assemble the 10X reads on their own? Did you try other assemblers for the long reads, Can u for example? Maybe you would have a less fragmented genome that could be scaffolded better? Why not try to assemble using the 10X reads also? Scaff10X, ARKS or physlr? You say that you assembled the mitochondrial genome and that it was annotated, yet it is not presented anywhere in the genome report. Maybe this could be included as a panel in Fig 2. It always strikes me as odd reading a DTOL data output that contains no phylogenetic tree. The BUSCOs identified from your species could easily be used to generate a basic phylogeny with any related insect to validate its phylogenetic position and likely with better accuracy than just Athropoda (Fig 3. and 4.). The Hi-C contact map in Figure 5 clearly shows the 45% of contigs being placed into 15 linked groups. Is this what was expected, have the 15 chromosomes been reported before? What is the other 55%? is this caused by technical issues in the Hi-C library production? Is it due to a huge amount of repetitive regions in the genome? An explanation of this would be useful in the genome. To be truly reproducible, I would like to see the commands used to assemble and analyse the genome in Table 3.