Detection of SARS-CoV-2 variant 501Y.V2 in Comoros Islands in January 2021

Background. Genomic data is key in understanding the spread and evolution of SARS-CoV-2 pandemic and informing the design and evaluation of interventions. However, SARS-CoV-2 genomic data remains scarce across Africa, with no reports yet from the Indian Ocean islands. Methods. We genome sequenced six SARS-CoV-2 positive samples from the first major infection wave in the Union of Comoros in January 2021 and undertook detailed phylogenetic analysis. Results. All the recovered six genomes classified within the 501Y.V2 variant of concern (also known as lineage B.1.351) and appeared to be from 2 sub-clusters with the most recent common ancestor dated 30 th Oct-2020 (95% Credibility Interval: 06 th Sep-2020 to 10 th Dec-2020). Comparison of the Comoros genomes with those of 501Y.V2 variant of concern from other countries deposited into the GISAID database revealed their close association with viruses identified in France and Mayotte (part of the Comoros archipelago and a France, Overseas Department). Conclusions. The recovered genomes, albeit few, confirmed local transmission following probably multiple introductions of the SARS-CoV-2 501Y.V2 variant of concern during the Comoros’s first major COVID-19 wave. These findings demonstrate the importance of genomic surveillance and have implications for ongoing control strategies on the islands.


Introduction
Although Comoros, an island country in the Indian ocean, detected its first case of SARS-CoV-2 on 30 th April 2020, it experienced its first major SARS-CoV-2 outbreak in January 2021 i.e., 10 months later 1 . By 28 th February 2021, Comoros had 3,571 laboratory-confirmed SARS-CoV-2 infections, 2,748 (76.9%) of which were confirmed after 1 st January 2021.
Genomic surveillance has been key in understanding the introduction, spread, and evolution of SARS-CoV-2 pandemic into countries since its emergence in late 2019 in China and in informing the design and evaluation of interventions [2][3][4] . Towards the end of 2020, in widely different geographical locations globally, three SARS-CoV-2 variants of concern (Alpha, Beta, and Gamma) emerged that appeared to be considerably more transmissible and with potential to facilitate immune escape or cause more severe disease than the prior SARS-CoV-2 variants [5][6][7] . The three variants possessed several defining amino acid changes, most of them occurring within the immunogenic spike (S) protein 8 . The S protein contains the domain that binds the virus to the human host cell receptor and is a key target for several vaccines 9 . Here, we investigated if the variants of concern had a role in the rising number of SARS-CoV-2 cases in the Union of Comoros in January 2021.

Ethical statement
The SARS-CoV-2 genomes were generated as part of a regional collaborative COVID-19 public health rapid response. The whole genome sequencing study protocol was reviewed and approved by the Scientific and Ethics Review Committee (SERU), Kenya Medical Research Institute (KEMRI), Kenya (SERU #4035). Individual patient consent was not required by the committee for the use of these samples for sequencing as a part of the public health emergency response.

Study site and samples
The samples analysed had been collected between 5 th and 11 th January 2021 in the Union of Comoros, specifically from two islands, Ngazidja and Mohéli (Table 1). A total of 11 positive nasopharyngeal/oropharyngeal swab samples were sent to KEMRI-Wellcome Trust Programme (KWTRP) in Kilifi, Kenya, for genome analysis. KWTRP is one of the 12 designated WHO-AFRO /Africa-CDC specialized and regional reference laboratories for SARS-CoV-2 sequencing in Africa 10 . Sample size was determined by the sequence available from the public health response.

Laboratory procedures
On receiving the samples on the 16 th and 17 th January 2021, viral RNA was extracted using the QIAamp Viral RNA Mini kit (52906, Qiagen, Hilden, Germany) following the manufacturer's instructions and analysed using the Sansure Biotech Novel Coronavirus (2019-nCoV) Nucleic acid Diagnostic real-time RT-PCR commercial kit (S3102E, Sansure Inc., China) which targets the nucleocapsid (N) and ORF1ab regions. Nine of the 11 samples were confirmed as SARS-CoV-2 positive by both gene targets (cycle threshold (Ct) <38.0). We proceeded to sequence six samples that had a Ct value of ≤ 29.0. Samples with Ct above 29.0 were excluded because we observed in our laboratory that they frequently fail the downstream quality control steps before sequencing due to possession of low viral titres. The RNA was first reverse transcribed using the LunaScript® RT SuperMix Kit (E3010, New England Biolabs Inc., Germany) then amplified using the Q5® Hot Start High-Fidelity 2X Master Mix (NEB M0494; New England Biolabs Inc., Germany) along with the ARTIC nCoV-2019 version 3 primers 11 . The resultant amplicons were taken forward for library preparation and MinION (Mk1B) (Oxford Nanopore Technology, Oxford) sequencing. The six samples were processed alongside 17 other samples from coastal Kenya to make a batch of 23 samples.

Results
We assembled >80% of the SARS-CoV-2 genome from each of the six sequenced samples ( Table 1). The recovered genomes were classified into the lineage B.1.351 using the Pangolin toolkit v2.3.0 13 . The genomes possessed six of the eight Beta variant defining amino acid changes in the S protein (i.e., L18F, D80A, D215G, K417N, D614G, and A701V) plus a known three amino acid deletion at positions 243-245. Two additional defining amino acid changes (E484K and N501K) which were unconfirmed fell within a region that was not sequenced due to PCR amplicon drop-off. Our findings and confirmation of the presence of the SARS-CoV-2 Beta variant in Comoros samples was conveyed to Union of Comoros authorities on the 22 nd January via the WHO-AFRO office to inform public health actions.
The six Union of Comoros sequences differed only at three nucleotide positions: A13192G (1 genome), T23560C (2 genomes), and G27505T (1 genome). Compared to the Wuhan 2019 reference (Accession number: NC_45512.2), the Union of Comoros genomes had 21-22 nucleotide substitutions that translated into 16-17 amino acid changes. A time-scaled MCC phylogenetic tree of these sequences revealed that the Union of Comoros genomes formed a monophyletic group together with genomes from Mayotte (which is part of the Comoros archipelago and a French Overseas department) and France (Figure 1). This group diverged into two sub-clusters with the most recent common ancestor dated 30 th Oct-2020 (95%CI: 06 th Sep-2020 to 10 th Dec-2020).

Discussion
We provide evidence of circulation of the Beta variant during the first major SARS-CoV-2 epidemic peak in the Union of Comoros. The Beta variant was first identified in South Africa and had been reported in 41 countries as of 19 th February 2021. Initial data suggested that this variant exhibits up to 6-fold reduction in neutralization activity by post-vaccination sera or convalescent sera from individuals infected by prior variants 14 . Thus, finding this variant in Union of Comoros is concerning since it has potential to overcome pre-existing immunity derived from natural infection or vaccination.

Understanding the extent of spread of this variant in Union
Comoros is limited by the low number of cases sequenced. As of 28 th February 2021, the number of new cases in the Union of Comoros had considerably declined after peaking in mid-January 1 . Comparison of the Union of Comoros genomes with genomes from across the globe found a close relation with those from the neighbouring Mayotte, a French Overseas Department. Mayotte detected its first case of SARS-CoV-2 on 10th March 2020 and experienced its first major SARS-CoV-2 outbreak from mid-January 2021 to mid-March 2021 thus overlapping with the Union of Comoros outbreak. 721 SARS-CoV-2 genomes were available on GISAID database from Mayotte as of 23 rd April 2021 and the majority (52%) were classified   I only have a few minor comments as follows: I noticed that there was no mention of the inclusion of negative controls in the sequencing assays, and how the presence of mapped reads in the negative controls, if any, was addressed. This is particularly relevant given the high similarity between the six samples.

○
Including the assembly pipeline version, and indicating which tool was used in generating the consensus sequence (nonopolish or medaka) may be useful for others who may be interested in including these data in future analyses ○ In Table 1, perhaps just 'genome coverage', +/-number of Ns, may be more appropriate than 'genome length'. Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes