The genome, being the molecular architecture of life encodes its phenotypic and genotypic expression. Evolving SARS-CoV-2 gene variants play a significant role in its replication, spread, and pathogenicity with respect to its human host . The present study assessed SARS-CoV-2 genomic variability in the African population to understand the epidemiology, viral-host relationship, and resultant effect of such mutations. We also identified conserved domains as loopholes in the SARS-CoV-2 genome as potential targets for vaccine development and/or drug design. We identified ORF1ab RdRp and RNA primase, S, ORF3, ORF8, and N proteins as SARS-CoV-2 mutational hotspots with a conserved E, M, ORF6, ORF7a/b, and ORF10 proteins.
Generally, the clade GR—1534 (50.4%), and G—895 (29.4%) characterized by the spike D614G, nucleocapsid R203K, and G204R variants were the most prevalent in our study. Clades G and GR from previous reports [10, 16] have been mostly observed in Europe. Our study corroborates the WHO report  whereby most index case of COVID-19 in Africa was from Europe and North America instead of Asia, where it originated. Earlier findings [4, 10] have also indicated the prevalence of the G and GR clade in viral sequences originating from Africa.
The leader protein, ORF1ab, cleaved into nonstructural proteins (nsp1-nsp16) is essential for genome replication. RdRp is responsible for viral RNA replication, thus, due to this important role, it is expected that RdRp is well conserved. Interestingly, the present study corroborates earlier findings [11, 14] with reported recurrent missense mutations in the RdRp region resulting in protein sequence alteration. In particular, the RdRp P4715L mutation (observed in 2787 viral sequences) located close to a hydrophobic cleft has been identified as a potential antiviral drug target . As of January 2, 2021, the P4715L mutation has been observed in 214,154 (92.8%) viral sequences and the T265I mutations in 31,802 (13.8%) viral sequences globally (https://epicov.org/epi3). Due to its high binding affinity (Kd 21.83 nM) to RdRp, Atazanavir has been identified as a potential COVID-19 therapeutic candidate . Interestingly, the RdRp P4715L and S protein D614G mutation co-evolved in the same viral sequences (96.5%); connoting a synergistic effect of these two (2) hotspot mutations [10, 16].
The 3CLpro enzyme plays a vital role in the SARS-CoV-2 life cycle, replication, and processing of the carboxyl-terminus of nsp4 through nsp16 . The 3CLpro was a candidate antiviral drug target during the outbreak of Middle East respiratory syndrome coronavirus (MERS-CoV) and SARS-CoV. Molecular docking application has identified aliskiren, dipyridamole, mopidamol, and rosuvastatin as potential antiviral candidates due to their relatively high binding energy to the ORF1ab 3CLpro domain . The significance and the relative conservativeness of the SARS-CoV-2 3CLpro make it a suitable antiviral target as reported in previous studies [19, 20].
The S glycoprotein mediates host cell-surface receptor binding via its S1 domain and induces host-membrane fusion through the S2 domain. This suggests its important role in viral-host tropism, transmission, and invasion . Several COVID-19 vaccine candidates approved or in clinical trial are inactivated or live-attenuated viruses, or those that target the SARS-CoV-2 S protein [8, 12]. The novel spike N501Y mutation detected in 607 viral sequences globally, found in both the 501Y.V2 and SARS-CoV-2 VOC 202012/01 was only observed in 103 viral sequences originating from South Africa and not in any other African countries studied. To gain cellular entry, the S glycoprotein  binds to the human angiotensin-converting enzyme (ACE) 2 receptor, facilitating human transmission. Hence, variations in this region may have a significant effect on viral fitness due to decreased binding affinity for the host ACE2 protein. In a bid to evade host-immune response, the S glycoprotein being a surface protein is constantly under selective pressure; this might explain the observed recurrent mutations in this domain in order to promote its adaptation to the host genome. Mutations in the SARS-CoV spike S1 domain give it a selective advantage in binding much more tightly to human ACE2 compared to civet SARS-CoV S1 . The present study observed a highly recurrent mutation (D614G) in the spike S1 domain with a relatively conserved S2 domain, which may infer viral-host membrane fusion as the central function of SARS-CoV-2 S glycoprotein. An earlier report established that coronaviruses can elicit receptor-independent entry into host cells . Therefore, preference should be given to understanding the mechanism of S2 domain mediating host-cell membrane fusion as potential cellular targets for antiviral interventions .
The N phosphoprotein composed of the carboxyl- and N-terminal domain forms the ribonucleoprotein complex with the viral RNA; which enhances viral genome transcription, facilitates helical nucleocapsid formation, and membrane protein interaction during virion assemblage [19, 22]. Gene variants in the N domain alter its binding to miRNAs, which might contribute to the pathogenesis and progression of infection in COVID-19 patients . However, despite its ability to elicit an immune response, no N-targeted COVID-19 vaccine has been reported . The N protein S194L variant was predominant in viral samples originating from South Africa (94.1%). Except for viral sequences from Egypt, Northern Africa, N protein R203K, and G204R mutations simultaneously occurred in the same viral sequences, this explains a synergistic function of these mutations. The most frequently mutated S protein D614G co-evolves with other recurrent mutations (RdRp P4715L, N protein R203K, and G204R mutations) . These co-mutations are present in critical protein regions which facilitate viral ACE-2 host-entry, RNA replication, and virion assemblage. These co-mutations might confer higher viral-host transmissibility .
The M protein which consists of three transmembrane domains determines the shape of the viral envelope, while the E protein facilitates viral assemblage and budding . The interaction of S glycoprotein with M protein is necessary to retain S protein in the Endoplasmic Reticulum-Golgi intermediate compartment/Golgi complex after membrane fusion and its integration into new virions . The M protein also binds to N phosphoprotein to stabilize the nucleocapsid and aid viral assembly. During viral replication, E protein is upregulated in the infected cell facilitating viral assembly. The role of E protein in viral maturation has been expressed in E protein knock-out recombinant coronaviruses, with resultant crippled viral maturation and reduced viral titers . Despite the conservation of the SARS-CoV-2 E and M proteins observed in our study, due to their small molecular size and poor immunogenic activity for humoral responses, they are yet to be explored alone as suitable COVID-19 vaccine target .
There is no substantial report to attribute the involvement of ORF10 in SARS-CoV-2 transmission and pathogenesis. The viral ORF3 and ORF10 proteins can synergistically attack heme on the host’s hemoglobin 1-β chain, thereby disintegrating iron to form porphyrin. This will result in reduced levels of hemoglobin carrying oxygen and carbon dioxide, interfering with the heme pathway, extreme poisoning, and inflammation of the hepatocytes . Studies on chloroquine (CQ) and hydroxychloroquine (HCQ) antiviral mechanism of action depicts their inhibitory activities against viral S protein and ORF8 binding to porphyrin. They also inhibit the viral ORF1ab, ORF3, and ORF10 proteins attacking heme to form porphyrin, thus easing respiratory distress symptoms . The use of CQ and HCQ as a potent drug against coronavirus has generated controversies due to their adverse effects on patients. The US Food and Drug Administration on Jun 16, 2020, retracted the use of CQ and HCQ as potent therapeutic candidates for coronavirus treatment due to their lack of efficacy and safety concerns (http://www.chinadaily.com.cn/). Hence, the quest for a clinically approved and efficient therapeutic agent is still on and our study has been able to suggest potential targets for drugs or vaccine development.
Currently, the Pfizer-BioNTech, Moderna, and AstraZeneca’s COVID-19 vaccines have been authorized and recommended for use, while the Janssen and Novavax, among many other COVID-19 vaccines Phase 3 clinical trials are being planned or currently in progress . However, with the advent of the SARS-CoV-2 B.1.1.7 variant that has spread across 33 countries, there has been a concern of vaccine efficacy and evasion. This calls for continuous surveillance of the SARS-CoV-2 genome through mutational studies. Due to our keen interest in mutations that affect protein sequence, synonymous mutations which do not alter amino acid residue were not accounted for in the present study. More so, this genomic dataset includes very few viral sequences (< 50) from most of the African countries (67%) sampled (all countries sampled are presented in Table 1 in alphabetical order), while some African countries do not have any viral sequences originating from them available in recognized public repositories. Therefore, some African countries’ gene variants might likely remain unsampled. Hence, we encourage support for biomedical researchers and research institutes in developing countries in order to generate extensive genomic resources to understand viral transmissibility, evolution, and variation in the African region.