The genome sequence of the Australian filarial nematode, Cercopithifilaria johnstoni

We present a genome assembly and annotation of an individual female Cercopithifilaria johnstoni, a parasitic filarial nematode that is transmitted by hard ticks (Ixodidae) to infect a broad range of native Australian murid and marsupial hosts. The genome sequence is 76.9 Mbp in length, and although in draft form (N50 = 99 kbp, N50[n] = 232), is largely complete based on universally conserved orthologs (BUSCOs; genome = 94.9%, protein = 96.5%) and relative to other related filarial species. These data represent the first genomic resources for the genus Cercopithifilaria, a group of parasites with a broad host range, and form the basis for comparative analysis with the human-infective parasite, Onchocerca volvulus, both of which are responsible for similar eye and skin pathologies in their respective hosts.


Introduction
Cercopithifilaria johnstoni (Mackerras, 1954) is a parasitic filarial nematode transmitted by ixodid ticks to infect a diverse range of native Australian mammalian hosts (Spratt & Haycock, 1988), including monotremes, marsupials, and native rodents. The ability to infect such a broad host range is unusual for a filarial parasite; however, it is yet to be determined if this reflects permissive infectivity and persistence in diverse hosts or cryptic species diversity among morphologically indistinguishable parasites. Over 30 years ago, investigation of C. johnstoni infection of native hosts and experimentally-infected laboratory rats (Rattus norvegicus) revealed that C. johnstoni could cause skin and ocular immunopathologies that appear to be analogous to those seen in humans infected with Onchocerca volvulus (Spratt & Haycock, 1988;Vuong et al., 1993), the causative agent of the neglected tropical disease onchocerciasis. This research prompted the hypothesis that C. johnstoni infection of R. norvegicus could provide an immunologically relevant and experimentally tractable laboratory model of onchocerciasis. Motivated by this hypothesis and progress in the development of C. johnstoni as a laboratory model, we have generated a draft genome assembly and annotation to understand the basic biology of the parasite. These genomic data will facilitate the investigation of hypotheses relating to host specificity, provide a resource for comparative analysis between related filarial species, and in particular, be used to characterise the genetic determinants of disease pathology and their relevance to human onchocerciasis.

Genome sequence report
The genome was sequenced from DNA extracted from a single female parasite collected via post-mortem dissection of an Australian bush rat, R. fuscipes (Figure 1a). A total of 24,374,948 300 bp paired-end reads representing ~190-fold coverage of the genome were obtained by Illumina MiSeq sequencing. Trimmed reads (n = 22,065,411) were assembled, which, after contamination ( Figure 2) and haplotype removal, resulted in an assembly with a total length of 76.9 Mbp in 2,091 scaffold sequences with a scaffold N50 of 99,003 bp and N50(n) of 232 (Table 1). Compared to other filarial nematodes with assembled genomes, the C. johnstoni assembly ranked 6th of 18 based on both genome contiguity (N50) and completeness (Genome BUSCOs); we note that three assemblies with better genome contiguity and completeness statistics -O. volvulus (Cotton et al., 2016), Brugia malayi (Foster et al., 2020), and Loa loa (prjna246086) (Tallon et al., 2014) -were all assembled using high-throughput sequencing together with one or more long molecule technologies, i.e., long-read PacBio sequencing and optical mapping, to improve contiguity whereas a further two assemblies -L. loa (prjna37757) (Desjardins et al., 2013) and O. flexuosa (prjna230512) -have incorporated long-range mate-pair sequencing libraries for scaffolding. The assembly includes a complete mitochondrial genome for C. johnstoni (contig ID: c_johnstoni_mitochondrial_genome), which we used together with other complete mitochondrial genomes of filarial nematodes to demonstrate the phylogenetic placement of C. johnstoni ( Figure 3). These data robustly recapitulate the known phylogeny of filarial nematodes and place C. johnstoni within a monophyletic clade with two rodent-infective parasites, Acanthocheilonema viteae and Litomosoides sigmodontis. Annotation of the C. johnstoni genome identified 10,565 genes and 11,690 transcripts, broadly consistent with the number of reported annotation features for other filarial nematodes (Table 1; range = 8,140-16,203 for both gene and transcript features). Similar to the genome statistics described above, the annotation of the predicted proteome is also highly resolved, with 96.5% complete BUSCOs identified (Table 1). These data demonstrate the utility of using a large collection of diverse metazoan proteins to guide the annotation of a genome in the absence of species-specific data, for example, RNA-seq.
The immunopathology of O. volvulus infection is hypothesised to be driven by the recognition of immunoreactive proteins of Wolbachia (Saint André et al., 2002), a species of intracellular bacteria found in several filarial nematodes species (Figure 3; closed circles) where it is thought to play a symbiotic role in host metabolism and/or reproduction (Taylor et al., 2005). The similar pathologies caused by C. johnstoni infection of rats and O. volvulus infection of humans prompted us to examine the presence of Wolbachia in our C. johnstoni assembly. Analysis of raw sequencing reads revealed only 0.1% of C. johnstoni reads classified as bacterial, with only a single read matching Wolbachia in our custom Kraken database; for context, analysis of O. volvulus raw sequencing reads against the same database revealed, on average, 1.98% of reads were derived from Wolbachia (n = 32 O. volvulus whole-genome sequencing datasets (Choi et al., 2016); range = 0.08 -13.26%; average library size = 34 million reads), which is consistent with previous estimates based on mapped reads to the O. volvulus nuclear and Wolbachia genomes (Armoo et al., 2017). Alignment of C. johnstoni protein-coding sequences to a diverse collection of Wolbachia reference genomes (Lefoulon et al., 2020) revealed 18 candidates; only two proteins,    CJOH_00023800.t1 (blast match to YadA-like family protein) and CJOH_00083160.t1 (blast match to a prophage tail fibre N-terminal domain-containing protein / collagen-like protein) were over-represented by bacterial (but not Wolbachia specifically) relative to nematode blast hits, whereas the remaining candidates were enriched in proteins that localise to mitochondria and were present in both filaria and non-filarial nematodes. Finally, quantification of nucleotide similarity between Wolbachia and the C. johnstoni genome revealed that, on average, only 1.38% of the Wolbachia genome (at 65.05% nucleotide identify) was represented in sequence matches to the unfiltered C. johnstoni scaffolds and contigs prior to genome improvement. Collectively, we conclude that Wolbachia is absent from C. johnstoni, and that a Wolbachia-independent mechanism drives immunopathology in C. johnstoni infections. All efforts were made to ameliorate any suffering of animals through providing large cages and keeping their habitat and diet as close as possible to that of the wild. The study was also closely monitored by the facility veterinarian. The rats were housed singly in large plastic tubs approximately 0.5 m × 1 m square and 1 m deep, with a hinged mesh lid. The tubs were filled with leaf litter and contained small hollow logs for refuge. Rats were fed a mix of standard rat diet supplemented with meal worms. The adult parasite that was sequenced was recovered post-mortem from a single female rat who was euthanised by CO 2 asphyxia on advice of the facility veterinarian following a short illness of unknown origin.

Sample collection
DNA extraction, library preparation, and sequencing A single adult female worm (approximately 7 cm in length) was cut into approximately 1 cm length pieces using a sterile scalpel blade before being placed in a lysis solution (lysis buffer and proteinase K solution) for 18 h. Genomic DNA from the worm lysate was extracted using an ISOLATE II Genomic DNA Kit (Bioline, Australia) following the manufacturer's instructions, except for the following modification: the sample was eluted from the extraction column in 50 µl of extraction buffer, which was passed back through the extraction column a second time to collect additional DNA remaining on the column before further analysis.
Genomic DNA (500 ng in 50 µl) was sheared before sequencing library preparation using a Covaris S220 Focused-ultrasonicator with the following settings optimised for generating fragments approximately 400-600 bp: Peak incidence power = 175 W; Duty factor = 5%; cycles per burst = 200; treatment time = 55 s. A DNA sequencing library was prepared from 500 ng DNA using a NEBNext Ultra Library Prep Kit for Illumina, following the manufacturer's instructions. The resulting library was run on a 2% agarose gel, from which a gel cut was made to extract the 500-700 bp fragment fraction, which was subsequently purified using a Promega Gel and PCR clean-up kit (Promega, Australia).
The sequencing library was diluted to 15 pM and spiked with 1% PhiX control DNA (Illumina) before being sequenced on an Illumina MiSeq using Illumina V3 2x301 bp PE sequencing chemistry. In total, 24,374,948 reads (91.16% of total) passed filters and were used for further analysis.

Genome assembly
Before assembly, raw sequencing reads were first visualised for quality and inherent bias using FastQC version 0.11.9. Reads were adapted and quality trimmed using Trimmomatic version 0.32 (Bolger et al., 2014) (CROP:150 SLIDINGWINDOW:10:20 MINLEN:100), after which 22,065,411 paired-end reads were retained for assembly. Genome size was estimated from the trimmed reads using GenomeScope 2.0 (Ranallo-Benavidez et al., 2020), which predicted a length of 63.24 Mbp.
De novo genome assembly was performed using SPAdes version 3.10.1 (Prjibelski et al., 2020) using default parameters. The raw assembly was decontaminated, first using Redundans (Pryszcz & Gabaldón, 2016) to remove additional haplotypes present in the assembly, followed by BlobTools (Laetsch & Blaxter, 2017) to identify putative bacterial and host contamination present in the assembly (Figure 2). Only scaffolds containing hits to "Nematoda" or "no-hit" (the origin of these sequences is unclear but could potentially be novel nematode sequences) and with a mapped average read depth of 10 or greater were retained. The decontaminated assembly was further scaffolded using OPERA-LG (Gao et al., 2016) to encourage unique joins that could not be previously made due to alternative haplotypes present, followed by a second-round using Redundans to fill gaps. The iterative improvements to the assembly are documented in Table 2, demonstrating improved contiguity while maintaining and recovering conserved BUSCOs.
The final GFF containing both nuclear and mitochondrial genome annotations was converted to EMBL format for submission to ENA using EMBLmyGFF3 (Norling et al., 2018).

Genome and annotation completeness
Genome and annotation completeness was estimated using BUSCO (Benchmarking Universal Single-Copy Orthologues) version 4 (Seppey et al., 2019) with lineage set to nematode_odb9 and mode set to "genome" or "protein" for the assembly or protein-coding genes, respectively, using "Caenorhabditis" as a training species for gene identification. Comparative genome assembly statistics were generated using assembly-stats version 1.0.1. All genomic and proteomic data from available assemblies of related filarial nematode species were obtained from WormBase ParaSite release 16 (Howe et al., 2017).

Phylogenetic analysis
Phylogenetic placement of C. johnstoni was performed by comparing its assembled mitochondrial genome to publicly available mitochondrial genomes of filarial nematodes. Mitochondrial genomes from the following species were downloaded from NCBI:
The analysis code used in this study is available from GitHub and is archived with Zenodo (Doyle & McKann, 2021).

Data availability
Genomic resources European Nucleotide Archive: Raw sequence data, genome and annotation are deposited in the ENA. Accession number PRJEB47283; https://identifiers.org/ena.embl:PRJEB47283.
The assembly will also be made available at WormBase ParaSite (https://parasite.wormbase.org/), the primary repository for helminth genomes and annotations.

Open Peer Review
The article by McKann et al. reports an annotated draft genome of Cercopithifilaria johnstoni, a filarial nematode parasitising Australian mammals. Despite the fragmented nature of the assembly, the genome is of significant value to the research community because of its taxonomic position, its adaptive evolution to parasitise marsupials and its potential use as a laboratory model system for onchocerchiasis. Combined fundamental and applied applications make it a valuable nematode genomic resource.
I was surprised to see genome completeness scores over 94% with an assembly using only shortread sequence data. Could this relate to short introns with less repetitive elements?
As this parasite is a useful model for onchocerciasis, it would be good to show the completeness of gene models specific to parasite-host interactions in Onchocerca and related species.
The mt genome was also assembled and annotated but there was no description of the mt genome herein. A mt phylogenomic tree might provide the reader with more context of the taxonomic position of this parasite.
Did the lack of RNAseq data affect gene model predictions? If not, then the findings herein would be strong support for relying only on amino acid sequence homology for training ab initio gene predictors. It would simplify efforts to complete the genome annotations for some taxa.
Is the rationale for creating the dataset(s) clearly described?

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Parasite genomics and genetics.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Author Response 01 Dec 2021
Stephen Doyle,

Reviewer 3 -Neil Young
The article by McKann et al. reports an annotated draft genome of Cercopithifilaria johnstoni, a filarial nematode parasitising Australian mammals. Despite the fragmented nature of the assembly, the genome is of significant value to the research community because of its taxonomic position, its adaptive evolution to parasitise marsupials and its potential use as a laboratory model system for onchocerchiasis. Combined fundamental and applied applications make it a valuable nematode genomic resource.
I was surprised to see genome completeness scores over 94% with an assembly using only short-read sequence data. Could this relate to short introns with less repetitive elements?

Response: We were also pleasantly surprised by the relatively high genome completeness statistics for an Illumina-only assembly. We attribute it in part to the low diversity of sequencing a freshly-collected individual parasite at high coverage with relatively long Illumina reads (3x300 bp). The genome at 76 Mb is on the smaller side compared to other nematodes, and while its overall repetitive content was not noticeably different from other filarial nematodes, it must compact its genome content into a smaller space suggesting it
is "less complex" to some degree.  Table 1. Arguably, these data show that the genome and proteome are highly representative based on conserved orthologs "expected" to be present, and relative to closely related species. We agree that to further establish C.

johnstoni as a model for onchocerciasis, a better understanding of the genes involved in host-parasite interactions is needed; these data are in fact the focus of a separate follow up publication. As a Wellcome Open Research Data Note aims to focus specifically on the data themselves and "not… analyses or conclusions", we initially (and now again, subsequently after peer review) decided against presenting these downstream analyses of the genome resources.
The mt genome was also assembled and annotated but there was no description of the mt genome herein. A mt phylogenomic tree might provide the reader with more context of the taxonomic position of this parasite.

Response:
The reviewer is correct -we did not specifically describe the mitochondrial genome. However, we agree that a phylogeny using the mitochondrial genome would illustrate where C. johnstoni is placed relative to other filarial species.
To address this comment, we now include this phylogeny in Figure 3.
Did the lack of RNAseq data affect gene model predictions? If not, then the findings herein would be strong support for relying only on amino acid sequence homology for training ab initio gene predictors. It would simplify efforts to complete the genome annotations for some taxa.

These results suggest, based on a relatively simple metric of the proportion of conserved orthologous genes, that using a large collection of diverse metazoan proteins as hints for Braker2 can improve existing annotations and does a respectable job when compared with a well-curated genome annotation. Therefore, it is likely that this represents a valid approach for annotation of genomes from species where collecting additional speciesspecies evidence, ie, RNA-seq, is difficult. This needs further testing, which is outside the scope of this work.
This validation exercise of the approach used to annotate the C. johnstoni genome as we describe provides further support for the high BUSCO scores we report and completeness of the C. johnstoni genome and annotation.

Jane Hodgkinson
Veterinary Parasitology, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK The authors present a genome for the nematode Cercopithifilaria johnstoni, a parasite of considerable interest in its own right and as a comparator for other filarial nematodes.
In my opinion all the methodologies are appropriate and every attempt has been made to produce a genome assembly of the best quality with the available sequence data. Table 1 clearly identifies that the quality of the assembly of Cercopithifilaria johnstoni as presented, is comparable with the quality of published genomes for other filaria; indeed it is towards the top end (6/18) in terms of completeness and contiguity.
I have no reservations in recommended this manuscript for indexing in its current form.

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes © 2021 Wasmuth J. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.