The genome sequence of the Australian filarial nematode, Cercopithifilaria johnstoni [version 1; peer review: 1 approved, 2 approved with reservations]

We present a genome assembly and annotation of an individual female Cercopithifilaria johnstoni, a parasitic filarial nematode that is transmitted by hard ticks (Ixodidae) to infect a broad range of native Australian murid and marsupial hosts. The genome sequence is 76.9 Mbp in length, and although in draft form (N50 = 99 kbp, N50[n] = 232), is largely complete based on universally conserved orthologs (BUSCOs; genome = 94.9%, protein = 96.5%) and relative to other related filarial species. These data represent the first genomic resources for the genus Cercopithifilaria, a group of parasites with a broad host range, and form the basis for comparative analysis with the human-infective parasite, Onchocerca volvulus, both of which are responsible for similar eye and skin pathologies in their respective hosts.


Introduction
Cercopithifilaria johnstoni (Mackerras, 1954) is a parasitic filarial nematode transmitted by ixodid ticks to infect a diverse range of native Australian mammalian hosts (Spratt & Haycock, 1988), including monotremes, marsupials, and native rodents. The ability to infect such a broad host range is unusual for a filarial parasite; however, it is yet to be determined if this reflects permissive infectivity and persistence in diverse hosts or cryptic species diversity among morphologically indistinguishable parasites. Over 30 years ago, investigation of C. johnstoni infection of native hosts and experimentally-infected laboratory rats (Rattus norvegicus) revealed that C. johnstoni could cause skin and ocular immunopathologies that appear to be analogous to those seen in humans infected with Onchocerca volvulus (Spratt & Haycock, 1988;Vuong et al., 1993), the causative agent of the neglected tropical disease onchocerciasis. This research prompted the hypothesis that C. johnstoni infection of R. norvegicus could provide an immunologically relevant and experimentally tractable laboratory model of onchocerciasis. Motivated by this hypothesis and progress in the development of C. johnstoni as a laboratory model, we have generated a draft genome assembly and annotation to understand the basic biology of the parasite. These genomic data will facilitate the investigation of hypotheses relating to host specificity, provide a resource for comparative analysis between related filarial species, and in particular, be used to characterise the genetic determinants of disease pathology and their relevance to human onchocerciasis.

Genome sequence report
The genome was sequenced from DNA extracted from a single female parasite collected via post-mortem dissection of an Australian bush rat, R. fuscipes. A total of 24,374,948 300 bp paired-end reads representing ~190-fold coverage of the genome were obtained by Illumina MiSeq sequencing. Trimmed reads (n = 22,065,411) were assembled, which, after contamination and haplotype removal, resulted in an assembly with a total length of 76.9 Mbp in 2,091 scaffold sequences with a scaffold N50 of 99,003 bp and N50(n) of 232 (Table 1). Compared to other filarial nematodes with assembled genomes, the C. johnstoni assembly ranked 6th of 18 based on both genome contiguity (N50) and completeness (Genome BUSCOs); we note that three assemblies with better genome contiguity and completeness statistics -O. volvulus (Cotton et al., 2016), Brugia malayi (Foster et al., 2020), and Loa loa (prjna246086) (Tallon et al., 2014) -were all assembled using high-throughput sequencing together with one or more long molecule technologies, i.e., long-read PacBio sequencing and optical mapping, to improve contiguity whereas a further two assemblies -L. loa (prjna37757) (Desjardins et al., 2013) and O. flexuosa (prjna230512) -have incorporated long-range mate-pair sequencing libraries for scaffolding. Annotation of the C. johnstoni genome identified 10,565 genes and 11,690 transcripts, broadly consistent with the number of reported annotation features for other filarial nematodes (Table 1; range = 8,140-16,203 for both gene and transcript features). Similar to the genome statistics described above, the annotation of the predicted proteome is also highly resolved, with 96.5% complete BUSCOs identified ( Table 1).
The immunopathology of O. volvulus infection is hypothesised to be driven by the recognition of immunoreactive proteins of Wolbachia (Saint André et al., 2002), a species of intracellular bacteria found in several filarial nematodes where it is thought to play a symbiotic role in host metabolism and/or reproduction (Taylor et al., 2005). The similar pathologies caused by C. johnstoni infection of rats and O. volvulus infection of humans prompted us to examine the presence of Wolbachia in our C. johnstoni assembly. Analysis of raw sequencing reads revealed only 0.38% of reads classified as bacterial, with less than 0.02% attributed to Rickettsiales (a group of obligate intracellular bacteria to which Wolbachia belong). Alignment of C. johnstoni protein-coding sequences to a diverse collection of Wolbachia reference genomes (Lefoulon et al., 2020) revealed 18 candidates; only two proteins, CJOH_00023800.t1 (blast match to YadA-like family protein) and CJOH_00083160.t1 (blast match to a prophage tail fibre N-terminal domain-containing protein / collagen-like protein) were over-represented by bacterial (but not Wolbachia specifically) relative to nematode blast hits, whereas the remaining candidates were enriched in proteins that localise to mitochondria and were present in both filaria and non-filarial nematodes. Finally, quantification of nucleotide similarity between Wolbachia and the C. johnstoni genome revealed that, on average, only 1.38% of the Wolbachia genome (at 65.05% nucleotide identify) was represented in sequence matches to the C. johnstoni scaffolds and contigs. Collectively, we conclude that Wolbachia is absent from C. johnstoni, and that a Wolbachia-independent mechanism drives immunopathology in C. johnstoni infections.

Sample collection
As part of a larger program of fieldwork to investigate natural transmission of C. johnstoni in a wild, free-ranging population of Australian bush rats Rattus fuscipes (Figure 1a All efforts were made to ameliorate any suffering of animals through providing large cages and keeping their habitat and diet as close as possible to that of the wild. The study was also closely monitored by the facility veterinarian. The rats were housed singly in large plastic tubs approximately 0.5 m × 1 m square and 1 m deep, with a hinged mesh lid. The tubs were filled with leaf litter and contained small hollow logs for refuge. Rats were fed a mix of standard rat diet supplemented with meal worms. The adult parasite that was sequenced was recovered post-mortem from a single female rat who was euthanised by CO 2 asphyxia on advice of the facility veterinarian following a short illness of unknown origin. DNA extraction, library preparation, and sequencing A single adult female worm (approximately 7 cm in length) was cut into approximately 1 cm length pieces using a sterile scalpel blade before being placed in a lysis solution (lysis buffer and proteinase K solution) for 18 h. Genomic DNA from the worm lysate was extracted using an ISOLATE II Genomic DNA Kit (Bioline, Australia) following the manufacturer's instructions, except for the following modification: the sample was eluted from the extraction column in 50 µl of extraction buffer, which was passed back through the extraction column a second time to collect additional DNA remaining on the column before further analysis.
Genomic DNA (500 ng in 50 µl) was sheared before sequencing library preparation using a Covaris S220 Focused-ultrasonicator with the following settings optimised for generating fragments approximately 400-600 bp: Peak incidence power = 175 W; Duty factor = 5%; cycles per burst = 200; treatment time = 55 s. A DNA sequencing library was prepared from 500 ng DNA using a NEBNext Ultra Library Prep Kit for Illumina, following the manufacturer's instructions. The resulting library was run on a 2% agarose gel, from which a gel cut was made to extract the 500-700 bp fragment fraction, which was subsequently purified using a Promega Gel and PCR clean-up kit (Promega, Australia).
The sequencing library was diluted to 15 pM and spiked with 1% PhiX control DNA (Illumina) before being sequenced on an Illumina MiSeq using Illumina V3 2x301 bp PE sequencing chemistry. In total, 24,374,948 reads (91.16% of total) passed filters and were used for further analysis.

Genome assembly
Before assembly, raw sequencing reads were first visualised for quality and inherent bias using FastQC version 0.11.9. Reads were adapted and quality trimmed using Trimmomatic De novo genome assembly was performed using SPAdes version 3.10.1 (Prjibelski et al., 2020) using default parameters. The raw assembly was decontaminated, first using Redundans (Pryszcz & Gabaldón, 2016) to remove additional haplotypes present in the assembly, followed by BlobTools (Laetsch & Blaxter, 2017) to identify putative bacterial and host contamination present in the assembly (Figure 2). Only scaffolds containing hits to "Nematoda" or "no-hit" (the origin of these sequences is unclear but could potentially be novel nematode sequences) and with a mapped average read depth of 10 or greater were retained. The decontaminated assembly was further scaffolded using OPERA-LG (Gao et al., 2016) to encourage unique joins that could not be previously made due to alternative haplotypes present, followed by a second-round using Redundans to fill gaps. The iterative improvements to the assembly are documented in Table 2, demonstrating improved contiguity while maintaining and recovering conserved BUSCOs.
(HQ184469.1). Reads that mapped were then de novo assembled using Velvet version 1.2.10 (Zerbino & Birney, 2008) using default parameters, with kmer=99 identified as optimal using Velvet-optimiser version 2.2.5. Velvet was unsuccessful in producing a closed mtDNA genome, so an iterative mapping and joining approach was used to manually curate the assembly, resulting in a complete single contig of 13,716 bp. Validation of the assembly was performed by multiple sequence alignment to available filarial mtDNA genomes above using Mesquite version 3.04 (Maddison & Maddison, 2019) and visualised in progressiveMauve (20150213) (Darling et al., 2010).

Genome annotation
The mtDNA genome sequence was initially annotated using MITOS (Bernt et al., 2013). The C. johnstoni annotation was improved manually by comparing sequence alignments and GFF3 annotation files from C. johnstoni with the closely related filarial nematodes L. loa, D. immitis, A. viteae, B. malayi, O. ochengi, O. volvulus, W. bancrofti.
The nuclear genome assembly was annotated using Braker v2 (Brůna et al., 2021). As no RNA-seq data were available, we generated hints (predicted introns, start and stop codons) for Braker using the ProtHint pipeline; spliced alignments were generated by mapping proteins from OrthoDB Metazoan protein database, from which evidence (prothint_augustus.gff) was used as an input to Braker.
The final GFF containing both nuclear and mitochondrial genome annotations was converted to EMBL format for submission to ENA using EMBLmyGFF3 (Norling et al., 2018).

Genome and annotation completeness
Genome and annotation completeness was estimated using BUSCO (Benchmarking Universal Single-Copy Orthologues) version 4 (Seppey et al., 2019) with lineage set to nematode_odb9 and mode set to "genome" or "protein" for the assembly or protein-coding genes, respectively, using "Caenorhabditis" as a training species for gene identification. Comparative genome assembly statistics were generated using assembly-stats version 1.0.1. All genomic and proteomic data from available assemblies of related filarial nematode species were obtained from WormBase ParaSite release 16 (Howe et al., 2017).
The analysis code used in this study is available from GitHub and is archived with Zenodo (Doyle & McKann, 2021).

Data availability
Genomic resources European Nucleotide Archive: Raw sequence data, genome and annotation are deposited in the ENA. Accession number PRJEB47283; https://identifiers.org/ena.embl:PRJEB47283.
The assembly will also be made available at WormBase ParaSite (https://parasite.wormbase.org/), the primary repository for helminth genomes and annotations.