Analysis of codon usage patterns in open reading frame 4 of hepatitis E viruses
Beni-Suef University Journal of Basic and Applied Sciences volume 11, Article number: 65 (2022)
Hepatitis E virus (HEV) is a member of the family Hepeviridae and causes acute HEV infections resulting in thousands of deaths worldwide. The zoonotic nature of HEV in addition to its tendency from human to human transmission has led scientists across the globe to work on its different aspects. HEV also accounts for about 30% mortality rates in case of pregnant women. The genome of HEV is organized into three open reading frames (ORFs): ORF1 ORF2 and ORF3. A reading frame encoded protein ORF4 has recently been discovered which is exclusive to GT 1 isolates of HEV. The ORF4 is suggested to play crucial role in pregnancy-associated pathology and enhanced replication. Though studies have documented the ORF4’s importance, the genetic features of ORF4 protein genes in terms of compositional patterns have not been elucidated. As codon usage performs critical role in establishment of the host–pathogen relationship, therefore, the present study reports the codon usage analysis (based on nucleotide sequences of HEV ORF4 available in the public database) in three hosts along with the factors influencing the codon usage patterns of the protein genes of ORF4 of HEV.
The nucleotide composition analysis indicated that ORF4 protein genes showed overrepresentation of C nucleotide and while A nucleotide was the least-represented, with random distribution of G and T(U) nucleotides. The relative synonymous codon usage (RSCU) analysis revealed biasness toward C/G-ended codons (over U/A) in all three natural HEV-hosts (human, rat and ferret). It was observed that all the ORF4 genes were richly endowed with GC content. Further, our results showed the occurrence of both coincidence and antagonistic codon usage patterns among HEV-hosts. The findings further emphasized that both mutational and selection forces influenced the codon usage patterns of ORF4 protein genes.
To the best of our knowledge, this is first bioinformatics study evaluating codon usage patterns in HEV ORF4 protein genes. The findings from this study are expected to increase our understanding toward significant factors involved in evolutionary changes of ORF4.
Hepatitis E virus (HEV) is a small RNA virus, belonging to the Hepeviridae family. Hepatitis E is potentially a serious acute disease caused by the agent HEV [1, 2]. HEV is primarily transmitted through contaminated water sources or through the consumption of infected or undercooked meat products derived from animals (swine, deer, or wild boar) [3, 4]. The HEV contains a positive-sense, single-stranded RNA molecule of approximately 7.2 kB in length, flanked by 5′ and 3′ untranslated regions (UTR) . The genome possesses a 7-methylguanine cap at the 5′ end and a poly(A) tail at the 3′ end and encodes three open reading frames (ORFs), i.e., ORF1, ORF2 and ORF3. ORF1 encodes the largest non-structural polyprotein having multifunctional domains, required for viral replication [6, 7]. The reading frame ORF2 codes for the capsid protein . The ORF3 encodes the phosphorylated protein having multiple functions [9, 10]. HEV genotype 1 (GT 1) isolates have been recently identified with an additional reading frame (ORF4), which encodes ORF4 protein only during ER stress . This newly identified ORF4 is exclusive to HEV GT 1 . ORF4 has been demonstrated to play a significant functional role in the replication cycle of GT 1 HEV. Evidence suggests that ORF4 interacts with multiple viral and host proteins to enhance virus replication [11, 12].
The present study analyzed the compositional biasness in terms of nucleotide composition and synonymous codon usage patterns of the HEV ORF4 protein genes. The prevalence of degeneracy in the genetic code allows more than one codon to encode for a specific amino acid. Thus, alternative codons encoding the same amino acid are termed as synonymous codons. Interestingly, in viruses, the preference of some codons over the others has been well documented. This phenomenon refers to codon usage bias (CUB) [13, 14]. CUB is considered as an important force in the evolution of viral genomes. Factors influencing the CUB include mutational pressure, natural selection, G + C content, secondary protein structure and selective transcription replication [15,16,17,18]. Previous reports have suggested that natural selection and directional mutation pressure are two major mechanisms that account for codon usage variation among viral genomes [15, 19,20,21]. However, mutational bias, rather than natural selection, found to be a dominant factor affecting the codon usage patterns in some RNA viruses [22,23,24,25]. The development of a disease is caused by the complex interaction among various factors, which includes pathogen’s virulence, host organism defense response and environmental aspects [26, 27]. These mentioned factors play role in addition to CUB decide the outcome of the host–pathogen interaction or relationship [28, 29]. The pathogens can better adapt to their hosts as well as its environment by allowing certain evolutionary changes which is reflected by their CUB patterns. Moreover, the efficiency of a pathogen to infect its host is significantly dependent on codon optimization process. This is because codon optimization affects the growth of a pathogen in its environment . The similar codon usage pattern among virus and its hosts may overall influence the virus’s fitness, evasion from host’s immune system and evolution [30, 31]. Therefore, the study of codon usage in viruses can reveal important information about virus evolution, regulation of gene expression and protein synthesis. Irrespective of the ORF4 region’s importance, its codon usage patterns have not been determined [32, 33]. In this regard, this investigation has been carried out to analyze the codon usage patterns of the HEV ORF4 protein genes.
The codon usage analysis has been extensively carried out for protein genes of other reading frames of HEV, i.e., ORF1, ORF2 and ORF3 . Baha and colleagues has evaluated the codon usage patterns of ORFs, but the codon compositional restrain in ORF4 has not been analyzed . In this study, we performed comprehensive analysis of nucleotide composition and synonymous codon usage, based on available nucleotide sequences (on the NCBI GenBank) of the ORF4 protein genes, to determine the evolutionary factors that could play an important role in shaping the codon usage patterns. To the best of our knowledge, our comprehensive analysis for the first time provides insights into the codon usage patterns of ORF4 protein genes. This study will also shed lights on the distinguishing genetic features of HEV prevalent in the ORF4 sequences.
2.1 Sequence data acquisition
Nucleotide sequences of the ORF4 protein genes were retrieved from GenBank database available at the National Centre for Biotechnology information (NCBI) (http://www.ncbi.nlm.nih.gov). The retrieved sequences were selected based on the following inclusion criteria: (A) Selected sequences from same or different countries at varying time intervals were assembled in order to avoid repetition. (B) Sequences were included from different hosts encompassing human, rat and ferret. (C) Accumulated sequences from GenBank were categorized into different datasets. (D) Three datasets were prepared for each host organism (human, rat and ferret). (E) Multiple alignment was carried out for these datasets using ClustalW algorithm installed in the BioEdit Sequence Alignment Editor 7.2.5 . The complete list of the sequences used for the present analysis in different host organisms are listed in additional files (Additional files 1–3: Tables S1–S3).
2.2 Nucleotide composition analysis
The following nucleotide composition properties of the ORF4 sequences were calculated using Mega-X (Version 10.1.7): (1) occurrence of overall nucleotide frequencies (A%, C%, T/U% and G%); (2) occurrence of nucleotides at the third codon site (A3%, C3% U3% and G3%); and (3) occurrence of G + C content at different codon positions, i.e., first (GC1), second (GC2) and third synonymous codon positions (GC3). The five non-biased codons were omitted from the nucleotide composition analysis. It included three termination codons (UAG, UGA, UAA), i.e., as they do not code for any amino acid; and two codons AUG and UGG, as they code for particular amino acid Met and Trp, respectively, Therefore, these mentioned five codons do not exhibit any codon bias.
2.3 Relative synonymous codon usage (RSCU) analysis
The ratio between the observed and expected usage frequency of a codon is described as the Relative Synonymous Codon Usage (RSCU). RSCU value if all synonymous codons are used equally for any specific amino acid . The RSCU index was determined as follows:
where RSCU is the relative synonymous codon usage value and Gij is the observed number of the ith codon for the sjth amino acid that has an “ni” type of synonymous codon. Codons with RSCU values (> 1.6) and (< 0.6) were considered as “overrepresented” and “underrepresented” codons, respectively, whereas codons having the RSCU values (1) were regarded as not biased (average level codon) . The mean RSCU values of the ORF4 protein genes were calculated using Mega-X (Version 10.1.7), in order to reveal the codon usage patterns without the effect of amino acid composition and sequence length.
2.4 Relationship between overall nucleotide composition and nucleotide composition at the 3rd codon position
The correlation between A, T, G, C, GC and 3rd codon position of its counterparts (A3, T3, G3, C3, GC3) was assessed to analyze whether natural selection/mutation pressure individually contributed or both collaboratively influenced the evolution of ORF4 in HEV natural hosts.
3.1 Analysis of nucleotide composition in coding sequences
The nucleotide compositions of the ORF4 protein genes were calculated to analyze the effect imposed by compositional constraints on codon usage. The results of the nucleotide composition analysis are mentioned in Table 1 (Fig. 1).
Human: The nucleotides C and G were found to be most abundant in these coding sequences, with an average of 35.597% and 27.966%, respectively, compared with U (21.341%) and A (15.094%). The most frequent nucleotide at the third position was G3S (39.245%), followed by C3S (31.194%), A3S (16.352%) and U3S (13.207%). Thus, synonymous codons at the third position followed the same trend (G3S > C3S > A3S > U3S). The overall GC content was higher than that of AU, with 63.563% observed, compared with 36.441%, respectively, which indicated a GC-biased composition. The overall GC content and GC% at different positions GC1, GC2 and GC3 were with an average of 63.563%, 52.829%, 67.421& and 70.439%, respectively (Additional file 1: Table S1) (Table 1).
Rat: The nucleotides C and U were found to be most abundant in these coding sequences, with an average of 29.451% and 27.122%, respectively, compared with G (27.070%) and A (16.356%). The most frequent nucleotide at the third position was G3S (34.782%), followed by C3S (31.754%), U3S (17.003%) and A3S (16.459%). Thus, synonymous codons at the third position followed the trend (G3S > C3S > U3S > A3S). The overall GC content was higher than that of AU, with 56.498% observed, compared with 43.478%, respectively, which indicated a GC-biased composition. The overall GC content and GC% at different position GC1, GC2 and GC3 were with an average of 56.498%, 50.387%, 52.639% and 66.536%, respectively (Additional file 2: Table S2) (Table 1).
Ferret: The nucleotides C and U were found to be most abundant in these coding sequences, with an average of 28.768% and 27.119%, respectively, compared with G (26.358%) and A (17.753%). The most frequent nucleotide at the third position was G3S (32.717%), followed by C3S (30.597%), U3S (20.706%) and A3S (15.978%), Thus, synonymous codons at the third position followed the trend (G3S > C3S > U3S > A3S). The overall GC content was higher than that of AU, with 55.126% observed, compared with 44.872%, respectively, which indicated a GC-biased composition. The overall GC content and GC% at different position GC1, GC2 and GC3 were with an average of 55.126%, 51.63%, 50.434% and 63.314%, respectively (Additional file 3: Table S3) (Table 1).
Therefore, initially it could be interpreted that nucleotide C was overrepresented, whereas the nucleotide A was underrepresented in HEV ORF4 protein genes. The nucleotides G and T (U) were distributed randomly. In addition to this, it was observed that the GC content (> 50%) was significantly higher than AU content (since AT content was < 50%) in ORF4 protein genes.
3.2 Analysis of codon usage patterns in coding sequences
RSCU measure was undertaken to evaluate the codon usage pattern of ORF4 protein gene sequences. The RSCU values were computed for every codon in each gene sequence to decrypt the extent to which C-ended codons were preferred. The results are mentioned in Table 2 (Fig. 2).
Human: Out of 18 preferred codons (UCC, UCA, UCG, AGU, AGC, CCU, CCC, CCA, CCG, ACC, ACG, GCU, GCC, GCG, CAG, UGC, GGC and GGG), 13 were C/G-ending (C-ending: 7; G-ending: 6) and 5 were U/A -ending (U-ending: 3; A-ending: 2) (Additional file 4: Table S4) (Table 2). This indicated preference of C-and G-ended codons over U and A-ended codons in gene sequences. Among these preferred ones, 3 had RSCU value > 1.6, i.e., overrepresented codons (CAG, UGC and GGC), while the remaining 14 had RSCU values > 0.6 and < 1.6 (UCC, UCA, UCG, AGU, AGC, CCU, CCC, CCA, CCG, ACC, ACG, GCU, GCC, GCG and GGG). Presence of one underrepresented (RSCU < 0.6) synonymous codon was revealed (CCU).
Rat: Out of 25 preferred codons (UUU, UUC, UUA, UUG, CUC, CUA, CUG, AUU, AUC, AUA, GUG, UCC, UCG, AGC, CCU, CCG, ACA, ACG, GCC, GCA, UGC, CGC, CGG, AGG and GGC), 17 preferred codons were C/G-ending (C-ending: 9; G-ending: 8) and 8 were U/A-ending (A-ending: 5; U-ending: 3) (Additional file 5: Table S5) (Table 2). This indicated preference of C- and G-ended codons over U- and A-ended codons in gene sequences. Among these preferred ones, 6 had RSCU value > 1.6, i.e., overrepresented codons (GUG, AGC, GCC, CGC, AGG and GGC), while the remaining 18 had RSCU values > 0.6 and < 1.6 (UUU, UUC, UUA, UUG, CUC, CUA, CUG, AUU, AUC, AUA, UCC, UCG, CCU, CCG, ACA, ACG, GCA, UGC and CGG). Presence of one underrepresented (RSCU < 0.6) synonymous codon was revealed (UUA).
Ferret: Out of 22 preferred codons (UUU, UUC, UUG, CUA, CUG, AUU, AUC, AUA, GUC, UCU, UCA, AGC, CCU, CCC, GCC, UAC, CAC, CAG, GAG, CGC, CGG and AGG), 15 preferred codons were C/G-ending (C-ending: 7; G-ending: 5) and 7 were U/A -ending (U-ending: 4; A-ending: 3) (Additional file 6: Table S6) (Table 2). This indicated preference of C- and G-ended codons over U and A-ended codons in gene sequences. Among these preferred ones, 7 had RSCU value > 1.6, i.e., overrepresented codons (UUG, UCU, GCC, CAC, CAG, CGC and AGG), while the remaining 15 had RSCU values > 0.6 and < 1.6 (UUU, UUC, CUA, CUG, AUU, AUC, AUA, GUC, UCA, AGC, CCU, CCC, UAC, GAG and CGG). Presence of an optional underrepresented (RSCU < 0.6) synonymous codon was not revealed.
The overall/host-specific RSCU analysis revealed that C/G-ending codons were preferred over U/A-ending codons in the ORF4 coding sequences across all host organisms. The number of preferred codons in each host followed the order: 25 (rat) > 22 (ferret) > 18 (human). Thus, our results clearly suggested the common attributes and differences among the usage of preferred codons, i.e., in the case of overrepresented and underrepresented codons in each host. Thus, our RSCU findings clearly revealed both similarities and discrepancies in the codon usage patterns among HEV-hosts.
3.2.1 Relationship among hosts by comparing codon usage frequency
A specific amino acid is encoded by more than one codon. It has been documented that the usage of synonymous codons is not random . By exploiting RSCU values of the HEV-hosts, we computed the preferred codon frequency for each amino acid. The frequency was determined to analyze the influence of selection pressure from hosts on codon usage patterns of HEV. A list of preferred codons encoding amino acids with higher frequency as compared to other synonymous codons for HEV-hosts is mentioned in Table 3. (Additional files 4–6: S4–S6 Tables).
The observed 10 amino acids Iso (I), Ala (A), Glu (Q), Asn (N), Lys (K), Asp (D), Glu (E), Cys (C), Arg (R) and Gly (G) showed similar usage of preferred codons, i.e., AUU for Iso, GCC for Ala, CAG for Gln, AAC for Asn, AAG for Lys, GAU for Asp, GAG for Glu, UGC for Cys, CGC for Arg and GGC for Gly, among all three natural HEV-hosts, which implicated a phenomenon of “mutual codon preference”. Therefore, the codons (AUU, GCC for Ala, CAG, AAC, AAG, GAU, GAG, CGC and GGC) indicated coincident codon usage portion, i.e., these mentioned preferred codons were commonly shared between all the natural HEV-hosts. In addition to this, within some preferred codons, discrepancies were observed between host organisms, i.e., preferred codons showed dissimilar usage among HEV-hosts For instance, HEV-hosts (human, rat and ferret) shared different usage of preferred codon for Ser (UCG for human, AGC for rat and UCU for ferret).
Moreover, this phenomenon was also observed in specific hosts, i.e., preferred codons encoding amino acid were different in specific host in comparison with other two host organisms, such as HEV-hosts (human and rat) shared evidence of preferred codon for UUC encoding Phe, except ferret, which preferred UUU over UUC; hosts human and ferret shared evidence of preferred codon for UUG encoding Leu, except rat, which preferred CUG over UUG; human and rat shared evidence of preferred codon for GUG encoding Val, except ferret, which preferred GUC over GUG; hosts human and ferret shared evidence of preferred codon for CCC encoding Pro, except rat, which preferred CCU over CCC; human and rat shared evidence of preferred codon for ACG encoding Thr, except ferret, which preferred ACC over ACG; rat and ferret shared evidence of preferred codon for UAC encoding Tyr, except human, which preferred UAU over UAC.
Our results clearly indicated that codon usage patterns in ORF4 gene sequences showed a mixture of coincidence and antagonism among HEV-hosts.
3.3 Comparative analysis of the RSCU values among hosts
Moreover, the top most frequent used codons, least frequent used codons and unused codons also showed common attributes and differences in codon usage patterns among HEV-hosts as represented in Table 4. These observations further emphasized occurrence of mutual codon preference and lack of shared codon preference among host–pathogens.
3.4 Effect of natural selection in shaping the codon usage patterns in HEV
It has been suggested that the frequencies of nucleotides A and U/T should be equal to that of C and G at the third position of the codon if mutational pressure affects the synonymous codon usage bias . However, we observed huge variations in the nucleotide composition in the overall ORF4 gene sequences as observed in Table 1. This indicated that other mechanisms including natural selection influenced the codon usage bias in HEV. Thus, these findings concluded that compositional constraints under mutational bias in combination with natural selection shaped up the codon usage patterns in ORF4 coding sequences across all hosts.
As HEV exhibits enormously high genetic diversity in addition to lack of appropriate culture system for its propagation, these factors pose a major challenge in the improvement of treatment methods. HEV has been identified with multiple genotypes and subtypes via nucleotide sequence analysis [39, 40]. Characterizing genetic properties to figure out common regions and possible differences between genotypes is expected to assist and contribute to the process of a development of effective preventive measures against HEV infection. Our previous investigations have elucidated the ORF4 protein structure in different host organisms  in addition to its role as a probable drug target . In this context, we conducted bioinformatics study of different ORF4 sequences of HEV by analyzing its codon usage patterns in different host organisms to provide insights into common attributes and differences among usage of amino acid in virus’s structure. Using these findings, it is hoped that more efficient and precise approaches could be identified and selected for treatment protocols.
The genetic code encompasses 64 codons, separated into 20 distinguishable groups. Each individual group consists of one to six codons and encodes the same amino acid. Thus, each standard amino acid is often encoded by alternative codons belonging to the same group. These alternative codons are termed as ‘synonymous’ codons. CUB is a phenomenon wherein one codon (over its synonymous partners) is preferred [15, 43]. CUB is considered as a distinctive property and appreciably differs among genes as well as genomes [36, 44]. Investigations have reported that codon usage patterns in organisms assist in the understanding of molecular organization of genomes. Due to improvement in sequencing technologies, CUB has gained more attention as codon usage patterns in several prokaryotic and eukaryotic have been studied . As viruses are obligate parasites, they require a set of proteins and enzymes to colonize the host by counteracting the host’s defense mechanism . The establishment of an association between a host and viruses depends on translational accuracy , which is largely affected by synonymous codon usage patterns [45, 48]. Mutational bias and natural selection are the two major forces that govern the overall codon usage variation in the genomes. It is well known that mutation pressure rather than translational selection is the primary determining factor of codon bias is in human RNA viruses . On combining, these forces help us in decoding the selection of preferred codons that whether it has been influenced by mutational pressure or natural selection. Thus, in the presented study, we performed an orderly survey of the evolutionary pressures (i.e., mutational bias and natural selection) across the ORF4 to gain insights into its codon usgae patterns. The codon usage pattern of the reading frames (ORFs), such as ORF1, ORF2 and ORF3 protein genes have been elucidated ; however, our understanding of codon patterns in ORF4 remains to be determined. This study is the first in its kind to describe the codon usage of patterns of ORF4 genome of HEV in three different host organisms (human, rat and ferret).
Nucleotide composition constraints impose an effect on the codon usage patterns, and thus we performed the nucleotide composition analysis of the HEV ORF4 protein genes. The analysis revealed an overrepresentation of C nucleotide and underrepresentation of A nucleotide in the overall nucleotide composition. This is in agreement with the previous investigation carried out by Baha and colleagues in HEV isolates encompassing different genotypes and hosts . The investigation revealed C as the most-represented nucleotide, while A as the least-represented nucleotide . Similarly like previous observation, our nucleotide analysis also showed the random distribution of G and T (U) nucleotides . Our analysis revealed that ORF4 genes were highly endowed with GC content which is again in agreement with the previous report which suggested that all the ORF coding sequences of HEV had overall high value of GC content (exceeding 50%) . Our compositional characteristics revealed C/G-rich nucleotide pattern in humans, while hosts rat and ferret were observed with C/(T)U richness. These results further substantiate our findings as ORF1 and ORF3 showed C/G-rich genome, while ORF2 showed prevalence of C/T(U) nucleotides . However, the observed pattern in ORF4 is different to the pattern observed in most of the RNA viruses (HIV, hepatitis C, rubella viruses), which revealed high prevalence of A rather than C . This opposite nucleotide pattern biasness could be due to adaptation of a common ancestor of modern HEV strains to their host (in terms of nucleotide composition) during the process of evolution . Our observed opposite patterns to majority of RNA viruses further show consistency with earlier report on other reading frames (ORF1, ORF2 and ORF3) . Thus, it is interesting to mention that our findings from initial compositional analysis show consistency with the previous report on HEV ORFs codon usage patterns .
Next, we examined the role of selection forces in determining the codon usage patterns of ORF4 genes. In viruses, it has been suggested that their AU or GC-rich composition show correlation with RSCU patterns, such as, AU or GC-rich genomes preferred codons ending with either A/U or G/C, respectively. This trend supports the influence of mutational pressure . As ORF4 revealed that nucleotide compositional bias is in line with its RSCU patterns in the case of human, mutation pressure is found to be a major driving factor in shaping its codon usage pattern. However, in the case of hosts rat and ferret, despite these regions had higher percentage of C and U nucleotides, their RSCU pattern showed preference toward C- and G-ended codons, i.e., RSCU results were not consistent with the initial nucleotide composition. This suggested the involvement of other factors besides nucleotide composition in shaping the synonymous codon usage patterns in these two host organisms (rat and ferret). In context with this, we observed huge variations in the nucleotide composition in the overall ORF4 gene sequences, which indicated that other mechanisms including natural selection influenced the codon usage bias in HEV. Thus, it could be interpreted that both mutation and natural selection forces shaped the codon usage patterns of ORF4 coding sequences. Our findings show consistency with the previous codon usage analyses carried out in HEV that demonstrated the predominance of mutation pressure  and natural selection, respectively .
Then, we next analyzed the relationship between codon usage patterns of ORF4 in its natural hosts. The common attributes and differences among HEV-hosts were scrutinized by computing the frequency of amino acids using their RSCU values. The number of preferred codons varied among different natural hosts and maximum usage was found to be in rat and least in human. Additionally, it was revealed that the number of overrepresented and underrepresented codons in each host organism also varied. Thus, a noteworthy variation in the usage for preferred codons among HEV-hosts implied that the codon usage patterns in ORF4 in different host organisms were subjected to different selection pressures. Furthermore, we observed that the frequency of the most used and least used codons also showed similarities and differences between hosts. Thus, it was revealed that HEV ORF4 showed a mixture of two codon usage patterns: coincidence and antagonism. This is similar to previous studies carried out in other viruses, such as HCV  and enterovirus . A recent investigation on HEV has also shown both similarities and discrepancies in the ORF1 Y-domain region codon usage patterns which further substantiate our present findings . It has been proposed that codon usage similar portions assist in effective translation of the corresponding amino acids between viruses and their respective hosts [55, 56], whereas the antagonistic portions of codon usage encourage in correct folding of viral proteins, even though decrease in the corresponding amino acids translation efficiency is observed [57,58,59]. On summing up these criteria, our findings revealed that none of the hosts showed complete resemblance or complete discrepancy to the other HEV-host.
The findings from such bioinformatics codon usage studies can be validated using experiments and further could be utilized for clinical trials to envisage our understanding of HEV biology. Such type of investigations on other viruses can shed some new lights in its behavioral biology.
The presented study documents the codon usage analysis in HEV ORF4 for the first time. This novel bioinformatics approach is expected to strengthen our understanding on the common attributes and differences in the codon usage patterns among ORF4 protein genes. The nucleotide compositional analysis showed overrepresentation of C nucleotide while revealed A as the least-represented nucleotide. The synonymous codon usage analysis revealed that the preferred codons mostly ended with C and G nucleotides. Moreover, it was observed that codon usage pattern among HEV-hosts was a mixture of coincidence and antagonism. The study reveals that synonymous codon usage in ORF4 is an evolutionary process, perhaps reflecting a dynamic process of mutation and selection forces to adjust its codon usage to different hosts and conditions. Investigation of the codon usage patterns is essential for evolution and efficient expression of viral proteins so that they generate efficient immune response. Such strategies of codon optimization for preferred codon usage are very useful in vaccine development. The presented study here is anticipated to increase our knowledge regarding the mechanisms influencing codon usage and evolution of ORF4.
Availability of data and materials
Hepatitis E virus
Open reading frame 4
Relative synonymous codon usage
Lhomme S, Marion O, Abravanel F, Chapuy-Regaud S, Kamar N, Izopet J (2016) Hepatitis E pathogenesis Viruses 8(8):212
Kamar N, Izopet J, Pavio N, Aggarwal R, Labrique A, Wedemeyer H, Dalton HR (2017) Hepatitis E virus infection. Nat Rev Dis Primers 3(1):1–6
Galiana C, Fernandez-Barredo S, Garcia A, Gomez MT, Perez-Gracia MT (2008) Occupational exposure to hepatitis E virus (HEV) in swine workers. Am J Trop Med Hyg 78(6):1012–1015
Teixeira J, Mesquita JR, Pereira SS, Oliveira RM, Abreu-Silva J, Rodrigues A, Myrmel M, Stene-Johansen K, Øverbø J, Gonçalves G, Nascimento MS (2017) Prevalence of hepatitis E virus antibodies in workers occupationally exposed to swine in Portugal. Med Microbiol Immunol 206(1):77–81
Tam AW, Smith MM, Guerra ME, Huang CC, Bradley DW, Fry KE, Reyes GR (1991) Hepatitis E virus (HEV): molecular cloning and sequencing of the full-length viral genome. Virology 185(1):120–131
Ansari IH, Nanda SK, Durgapal H, Agrawal S, Mohanty SK, Gupta D, Jameel S, Panda SK (2000) Cloning, sequencing, and expression of the hepatitis E virus (HEV) nonstructural open reading frame 1 (ORF1). J Med Virol 60(3):275–283
Parvez MK (2013) Molecular characterization of hepatitis E virus ORF1 gene supports apapain-like cysteine protease (PCP)- domain activity. Virus Res 178(2):553–556
Mori Y, Matsuura Y (2011) Structure of hepatitis E viral particle. Virus Res 161(1):59–64
Ding Q, Heller B, Capuccino JM, Song B, Nimgaonkar I, Hrebikova G, Contreras JE, Ploss A (2017) Hepatitis E virus ORF3 is a functional ion channel required for release of infectious particles. Proc Natl Acad Sci USA 114(5):1147–1152
He M, Wang M, Huang Y, Peng W, Zheng Z, Xia N, Xu J, Tian D (2016) The ORF3 protein of genotype 1 hepatitis E virus suppresses TLR3-induced NF-κB signaling via TRADD and RIP1. Sci Rep 6(1):1–13
Nair VP, Anang S, Subramani C, Madhvi A, Bakshi K, Srivastava A, Nayak B, CT RK, Surjit M, (2016) Endoplasmic reticulum stress induced synthesis of a novel viral factor mediates efficient replication of genotype-1 hepa-titis E virus. PLoS Pathog 12(4):e1005521
Subramani C, Nair VP, Anang S, Mandal SD, Pareek M, Kaushik N, Srivastava A, Saha S, Nayak B, Ranjith-Kumar CT, Surjit M (2018) Host-virus protein interaction network reveals the involvement of multiple host processes in the life cycle of hepatitis E virus. MSystems 3(1):e00135–e00217
Grantham R, Gautier C, Gouy M, Mercier R, Pave A (1980) Codon catalog usage and the genome hypothesis. Nucl Acids Res 8(1):197–197
Marin A, Bertranpetit J, Oliver JL, Medina JR (1989) Variation in G+C-content and codon choice: differences among synonymous codon groups in vertebrate genes. Nucl Acids Res 17(15):6181–6189
Gu W, Zhou T, Ma J, Sun X, Lu Z (2004) Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales. Virus Res 101(2):155–161
Sharp PM, Li WH (1986) Codon usage in regulatory genes in Escherichia coli does not reflect selection for ‘rare’ codons. Nucl Acids Res 14(19):7737–7749
Duret L, Mouchiroud D (1999) Expression pattern and surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci U S A 96(8):4482–4487
Van der Linden MG, de Farias ST (2006) Correlation between codon usage and thermostability. Extremophiles 10(5):479–481
Wang M, Zhang J, Zhou JH, Chen HT, Ma LN et al (2011) Analysis of codon usage in bovine viral diarrhea virus. Arch Virol 156(1):153–160
Wang L, Xing H, Yuan Y, Wang X, Saeed M, Tao J, Sun X (2018) Genome-wide analysis of codon usage bias in four sequenced cotton species. PLoS ONE 13(3):e0194372
Wong EH, Smith DK, Rabadan R, Peiris M, Poon LL (2010) Codon usage bias and the evolution of influenza A viruses. Codon usage biases of influenza virus. BMC Evol Biol 10(1):1–4
Chen Y (2013) A comparison of synonymous codon usage bias patterns in DNA and RNA virus genomes: quantifying the relative importance of mutational pressure and natural selection. Biomed Res Int 2013:406342
Shi SL, Jiang YR, Liu YQ, Xia RX, Qin L (2013) Selective pressure dominates the synonymous codon usage in parvoviridae. Virus Genes 46(1):10–19
Zhang Z, Dai W, Wang Y, Lu C, Fan H (2013) Analysis of synonymous codon usage patterns in torque teno sus virus 1 (TTSuV1). Arch Virol 158(1):145–154
Zhang Z, Dai W, Dai D (2013) Synonymous Codon Usage in TTSuV2: analysis and Comparison with TTSuV1. PLoS ONE 8:e81469
Cheng YT, Zhang L, He SY (2019) Plant-microbe interactions facing environmental challenge. Cell Host Microbe 26(2):183–192
Thakur MP, Van der Putten WH, Cobben MM, van Kleunen M, Geisen S (2019) Microbial invasions in terrestrial ecosystems. Nat Rev Microbiol 17(10):621–631
Biswas K, Palchoudhury S, Chakraborty P, Bhattacharyya UK, Ghosh DK, Debnath P, Lee RF (2019) Codon usage bias analysis of Citrus tristeza virus: higher codon adaptation to citrus reticulata host. Viruses 11(4):331
Deng Y, de Lima HF, Kalfon J, Chu D, Von Der Haar T (2020) Hidden patterns of codon usage bias across kingdoms. J R Soc Interface 17(163):20190819
Moratorio G, Iriarte A, Moreno P, Musto H, Cristina J (2013) A detailed comparative analysis on the overall codon usage patterns in West Nile virus. Infect Genet Evol 14:396–400
Shackelton LA, Parrish CR, Holmes EC (2006) Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses. J Mol Evol 62(5):551–563
Zoya S, Shama P (2021) Physicochemical attributes of hepatitis E virus ORF4: a general perspective. Indian J Health Sci Care 8(2):110–118
Shafat Z, Tazeen A, Ahmed M, Parvez MK, Parveen S (2021) Understanding Hepatitis E viruses by exploring the structural and functional properties of ORF4. Netw Biol 11(4):274–276
Baha S, Behloul N, Liu Z, Wei W, Shi R, Meng J (2019) Comprehensive analysis of genetic and evolutionary features of the hepatitis E virus. BMC Genomics 20(1):1–16
Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 41:95–98
Uddin A, Chakraborty S (2019) Codon usage pattern of genes involved in central nervous system. Mol Neurobiol 56(3):1737–1748
Barbhuiya RI, Uddin A, Chakraborty S (2019) Compositional properties and codon usage pattern of mitochondrial ATP gene in diferent classes of Arthropoda. Genetica 147:231–248
Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2(1):13–34
Primadharsini PP, Nagashima S, Okamoto H (2019) Genetic variability and evolution of hepatitis E virus. Viruses 11(5):456
Oliveira-Filho EF, König M, Thiel HJ (2013) Thiel HJ (2013) Genetic variability of HEV isolates: inconsistencies of current classification. Vet Microbiol 165(1–2):148–154
Shafat Z, Ahmed A, Parvez MK, Parveen S (2021) Role of ORF4 in Hepatitis E virus regulation: analysis of intrinsically disordered regions. J Proteins Proteomics 12(4):289–306
Shafat Z, Ahmed A, Parvez MK, Parveen S (2021) Sequence to structure analysis of the ORF4 protein from Hepatitis E virus. Bioinformation 17(9):818
Zhou Z, Dang Y, Zhou M, Li L, Yu CH, Fu J, Liu Y (2016) Codon usage is an important determinant of gene expression levels largely through its effects on transcription. PNAS 113(41):E6117–E6125
Payne BL, Alvarez-Ponce D (2019) Codon usage diferences among genes expressed in diferent tissues of Drosophila melanogaster. Genom Biol Evol 11(4):1054–1065
Roy A, Van Staden J (2019) Comprehensive profling of codon usage signatures and codon context variations in the genus Ustilago. World J Microbiol Biotechnol 35(8):118
Badet T, Peyraud R, Mbengue M, Navaud O, Derbyshire M, Oliver RP, Rafaele S (2017) Codon optimization underpins generalist parasitism in fungi. Elife 6:e22472
Sur S, Sen A, Bothra AK (2007) Mutational drift prevails over translational efficiency in Frankianif operons. IJBT 6(3):321–332
Sahoo S, Das SS, Rakshit R (2019) Codon usage pattern and predicted gene expression in Arabidopsis thaliana. Gene X 2:100012
Jenkins GM, Holmes EC (2003) The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 92(1):1–7
Auewarakul P (2005) Composition bias and genome polarity of RNA viruses. Virus Res 109(1):33–37
Bouquet J, Cherel P, Pavio N (2012) Genetic characterization and codon usage bias of full-length hepatitis E virus sequences shed new lights on genotypic distribution, host restriction and genome evolution. Infect Genet Evol 12(8):1842–1853
Hu JS, Wang QQ, Zhang J, Chen HT, Xu ZW, Zhu L, Ding YZ, Ma LN, Xu K, Gu YX, Liu YS (2011) The characteristic of codon usage pattern and its evolution of hepatitis C virus. Infect Genet Evol 11:2098–2102
Liu YS, Zhou JH, Chen HT, Ma LN, Pejsak Z et al (2011) The characteristics of the synonymous codon usage in enterovirus 71 virus and the effects of host on the virus in codon usage pattern. Infect Genet Evol 11(5):1168–1173
Shafat Z, Ahmed A, Parvez MK, Parveen S (2022) Decoding the codon usage patterns in Y-domain region of Hepatitis E viruses. J Genet Eng Biotechnol 20:56
Kramer EB, Farabaugh PJ (2006) The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA 13(1):87–96
Sharp PM, Li WH (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucl Acids Res 15(3):1281–1295
Buhr F, Jha S, Thommen M, Mittelstaet J, Kutz F, Schwalbe H, Rodnina MV, Komar AA (2016) Synonymous codons direct cotranslational folding toward different protein conformations. Mol Cell 61(3):341–351
Jacobs WM, Shakhnovich EI (2017) Evidence of evolutionary selection for cotranslational folding. Proc Natl Acad Sci 114(43):11434–11439
Seligmann H, Warthi G (2017) Genetic code optimization for cotranslational protein folding: codon directional asymmetry correlates with antiparallel betasheets, tRNA synthetase classes. Comput Struct Biotechnol J 15:412–424
The authors would like to acknowledge Maulana Azad National Fellowship (MANF), University Grant Commission (UGC), Council of Scientific and Industrial Research (CSIR) (37(1697)17/EMR-II) and Central Council for Research in Unani Medicine (CCRUM), Ministry of Ayurveda, Yoga and Neuropathy, Unani, Siddha and Homeopathy (AYUSH) (F.No.3-63/2019- CCRUM/Tech) supported by the Government of India.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. Nucleotide composition analysis of HEV host Human in ORF4 coding sequences.
. Nucleotide composition analysis of HEV host Rat in ORF4 coding sequences.
. Nucleotide composition analysis of HEV host Ferret in ORF4 coding sequences.
. RSCU values of the HEV host Human in ORF4 coding sequences.
. RSCU values of the host Rat in ORF4 coding sequences.
. RSCU values of the HEV host Ferret in ORF4 coding sequences.
About this article
Cite this article
Shafat, Z., Ahmed, A., Parvez, M.K. et al. Analysis of codon usage patterns in open reading frame 4 of hepatitis E viruses. Beni-Suef Univ J Basic Appl Sci 11, 65 (2022). https://doi.org/10.1186/s43088-022-00244-w