From the result, it was observed that scaffold_3347 of length 3926 bp was showing similarity against Panax ginseng, AJK30629.1 of 444 amino acid length. Simultaneously, blastN of de novo assembled scaffolds was carried out against NCBI’s non-redundant nucleotide (NT) database. It was noted that scaffold_3346 of length 1244 bp was showing similarity against Olea europaea var. sylvestris squalene synthase-like (LOC111412627), mRNA of 1204 bp length. The CLC gap closed de novo assembly was searched for similarity against the transcriptome data. From 3347 scaffolds, there were a total of three scaffolds found significant alignments against the CDS representation GS-SQS gene. Scaffold_3347 and scaffold_3346 had 281 and 3e-76, 171 and 6e-43-bit scores, and E value respectively. Although the scaffold did not cover complete CDS, the middle portion of the CDS was covered by scaffold_3347.
Domain search revealed that sequences of scaffold_3347 representing GS-SQS were having farnesyl-diphosphate farnesyltransferase domain. There were a total of twelve conserved sites representing five different genes. The maximum number of conserved sites was covered under the farnesyl-diphosphate farnesyltransferase gene which was represented by accession TIGR01559. This family is related to phytoene synthases. The C-terminal predicted transmembrane region is absent in archaeal homologs, not included in this model [23]. The scaffold had three conserved sites for the gene Trans-Isoprenyl Diphosphate Synthases (Trans_IPPS). It was represented by accession cd00683. The head-to-head (HH) (1′-1) condensation is carried out by Trans_IPPS. This conserved domain encompasses two genes, viz., squalene synthases and phytoene synthases [23]. These residues mediate binding of prenyl phosphates. The enzymatic process of squalene production is a two-step reaction. A stable intermediate, cyclopropylcarbinyl diphosphate, is formed by squalene synthase with the help of two molecules of FPP. The squalene molecules are produced from this intermediate product by biochemical processes like heterolysis, isomerization, and reduction with NADPH. Therefore, it is a two-step reaction. Phytoene, a precursor of beta-carotene is produced by phytoene synthase (CrtB) causing condensation of two molecules of geranylgeranyl diphosphate. These enzymes, having a wide spectrum presence across eukaryote, bacteria, and archaea, are responsible for biosynthesis of many triterpene and tetraterpene precursors. Chain of these enzymes produce the triterpene and tetraterpenes in plants. Triterpenoid alkaloids and steroids are further produced from these triterpenes and tetraterpene. Another two conserved sites belong to squalene/phytoene synthase represented by pfam00494 and a pytoene/squalene synthetase represented by ERG9 COG1562 each.
After aligning the scaffold and the CDS representing GS-SQS, the analysis of the nucleotide composition of the predicted introns revealed that A+T content was more than 63%. Both the introns had AG dinucleotides sequences at 3′ splice site to facilitate the second step of the splicing event. The initial nucleotide sequences of the intron GT in the first intron whereas it was TT in case of the second intron. Thus, both the introns had common conserved branch point fitting the requirement of the spliceosome to act upon. To amplify the intronic region, primers designed from flanking regions of the introns on the scaffold encoding GS-SQS. The housekeeping gene EFTU was successfully amplified with the use of cDNA as well as the genomic DNA as template. However, the amplification of intronic region took place with the primers designed from adjoining outside regions of the introns on the scaffold encoding GS-SQS only when genomic DNA was used as template and not when the cDNA was template confirmed the presence of these two introns in GS-SQS gene in Gymnema sylvestre R. Br.
Being important features of eukaryotic genes, introns are usually non-coding sequences and are removed from pre mRNA [24]. In general, the boundary sequences of introns are usually conserved with GU in the 5′ end and AG in the 3′ end. This is because these may be important for intron splicing in pre-mRNA [25]. Introns are classified into several types. The genes of chloroplasts, mitochondria, and bacteria are reported to have introns [26, 27]. The type I intron is the most occurring type of introns reported to be present in majority of the eukaryotic nuclear genes. Since introns are preserved during evolution, they are important in genomic studies [24, 28]. They may function in the cells like regulation of gene expression and the increase of protein diversity by alternative splicing [25, 29]. Sequences of the whole intronic region are not conserved, and therefore, accumulation of mutations in such region becomes easier [30]. A wide variation in size of introns is reported. It may be as longer than dozens of kilobase pairs (kbp)—to as shorter than 10 bp. In Arabidopsis, as revealed in Arabidopsis Genome Initiative 2000, the majority of introns are small with size of a few hundred bp. The smallest exon in Arabidopsis was found to be 1 bp [31].
Chlorophytum borivilianum, Euphorbia tirucalli, Euphorbia pekinensis, Lotus japonicus, Oryza sativa, and Taxus cuspidate are the plants in which single SQS genes exist. Two paralogs exist in case of Arabidopsis thaliana, Glycyrrhiza glabra, Glycine max, Malus domestica, Nicotiana tabacum, Salvia miltiorrhiza Bunge, and Withania somnifera. There are two SQSs, SQS1 and SQS2, reported in A. thaliana. The SQS 1 was found to be broadly expressed in every tissues that are involved in the development of plant whereas the SQS2 was profoundly expressed in hypocotyl of seedlings as well as vascular tissue of cotyledon and leaf petiole. Squalene was not synthesized from recombinant SQS2 from FPP even in the presence of NADPH and Mg2+ or Mn2+, whereas in the presence of SQS1, under the same conditions and equivalent preparation, it was able to generate SQ; hence, we can say SQS1 is the ultimate functional SQS present in Arabidopsis thaliana. Three SQS paralogs exist in case of Panax ginseng [32, 33]. Three SQS genes found in P. ginseng, SS1, SS2, and SS3, were found to be capable of converting yeast erg9 mutant to ergosterol prototrophy despite the divergence in sequence yeast, and similarly, in the case of Glycine max, which possesses two SQS, GmSQS1 and GmSQS2 were capable of converting yeast sterol auxotrophy erg9 mutant to sterol prototrophy. The product sterols were also found to be raised in Arabidopsis seed, due to overexpression of Glycine max GmSQS1. A similar observation was found in W. somnifera SQS that possesses 2 SQS, WsSQS1 and WsSQS2, in which cDNA investigation was performed, and finally, preliminary enzyme activity as well as recombinant expression was reported [16, 33]. It is also noted that the accumulated phytosterol and triterpenoid compounds in Bupleurum falcatum, Eleutherococcus senticosus, Panax ginseng, Solanum chacoense, and Withania somnifera were elevated with the overexpression of SQS genes [16].