The genome sequence of the yellow loosestrife bee, Macropis europaea Warncke, 1973 [version 1; peer review: 1 approved with reservations]

We present a genome assembly from an individual male Macropis europaea (the yellow loosestrife bee; Arthropoda; Insecta; Hymenoptera; Melittidae). The genome sequence is 547 megabases in span. The majority of the assembly (61.81%) is scaffolded into 11 chromosomal pseudomolecules. There is an unusually large proportion of satellite repeat, which could not be placed in the assembly The mitochondrial genome was also assembled and is 19.2 kilobases in length. Falk et al present a high-quality genome assembly for M. europaea using state-of-the-art sequencing technologies and computational approaches. The report is detailed and easy to follow. Including the information would to


Background
Macropis (Macropis) europaea Warncke (yellow loosestrife bee) is a medium sized oligolectic species specialised to provision its nest using floral oils and pollen collected almost exclusively from Lysimachia vulgaris (yellow loosestrife) (Primulaceae). The species is endemic to Europe. The genus Macropis Panzer represents the only oil collecting bees in the Holarctic (Pekkarinen et al., 2003). The oils are mixed with pollen to form the larval food. Additionally, the oils are used to provide a waterproof coating to the nest's cells. This allows the species to nest in damp soils that prohibit other ground nesting bee species. Both sexes have a distinctive leg morphology. Males are easily identified by their swollen hind femora and tibiae, and yellow face. Females have specialised hairs on the forelegs and feathery hairs on the hind tibia/basitarsus to help carry oils back to the nest site (Michener, 2007). In the UK the species is restricted to wetland sites in southern England and is active from June to September.

Genome sequence report
The genome was sequenced from a single male M. europaea ( Figure 1) collected from Wytham Woods, Oxfordshire (biological vice-county: Berkshire), UK (latitude 51.767, longitude -1.311). A total of 30-fold coverage in Pacific Biosciences single-molecule long reads and 55-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 17 missing/misjoins, increasing the assembly size by 0.92%, the scaffold number by 4.89 and the scaffold N50 by 9.29%.
The final assembly has a total length of 547 Mb in 2273 sequence scaffolds with a scaffold N50 of 22.7 Mb (Table 1). Of the assembly sequence, 61.81% was assigned to 11 chromosomallevel scaffolds (numbered by sequence length) (Figure 2- Figure 5; Table 2). There is an unusually large proportion of satellite repeat in this assembly, which is unplaceable using the Hi-C map. The assembly has a BUSCO v5.2.2 (Manni et al., 2021) completeness of 97.3% (single 97.1%, duplicated 0.2%) using the hymenoptera_odb10 reference set (n=5991).

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers' instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina NovaSeq 6000 instruments. Hi-C data were generated from head tissue of iySphMoni1 using the Arima v2.0 kit and sequenced on an Illumina NovaSeq 6000 instrument.

Genome assembly
Assembly was carried out with Hifiasm (Cheng et al., 2021).
Haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020). Scaffolding with Hi-C data (Rao et al., 2014) was carried out with SALSA2 (Ghurye et al., 2019). The Hi-C scaffolded assembly was polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012). One round of the Illumina polishing was applied. The mitochondrial genome was assembled with MitoHiFi (Uliano- Silva et al., 2021), which performed annotation using MitoFinder (Allio et al., 2020). The assembly was checked for contamination as described previously (Howe et al., 2021). Manual curation (Howe et al., 2021) was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext. The genome was analysed within the BlobToolKit environment (Challis et al., 2020).