Skip to main content

Rational in silico drug design of HIV-RT inhibitors through G-QSAR and molecular docking study of 4-arylthio and 4-aryloxy-3-iodopyridine-2(1-H)-one derivative



Human immunodeficiency virus infection and acquired immune deficiency syndrome (HIV/AIDS) is a spectrum of conditions caused by infection with the human immunodeficiency virus (HIV). Antiretroviral therapy (ART) against HIV infection offers the promise of controlling disease progression and prolonging the survival of HIV-infected patients. Reverse transcriptase (RT) inhibitors remain the cornerstone of the drug regimen to treat AIDS. In this direction, by using group-based QSAR study (G-QSAR), identification of the structural need for the development of lead structure with reverse transcriptase inhibition on 97 reported structures was carried out. Docking analysis was performed further and suggested the structural properties required for binding affinity with the receptor. The molecules in the data set were fragmented into six (R1, R2, R3, R4, R5, and R6) by applying the fragmentation pattern. Three G-QSAR models were selected based on the statistical significance of the model. The molecular docking study was performed to explain the structural properties required for the design of potent HIV-RT inhibitors.


The statistically validated QSAR models reveal the presence of higher hydrophobic groups containing single-bonded –Br atom, 2 aromatic bonded –NH group with less electronegativity, and entropic interaction fields at R2 essential for better anti-HIV activity. The presence of a lipophilic group at R3, oxygen and sulfur connected with two aromatic bonds at R4, and –CH3 group at R5 was fruitful for reverse transcriptase inhibition. Docking studies of the selected inhibitors with the active site of reverse transcriptase enzyme showed hydrogen bond, Van der Waal’s, charge, aromatic, and π–π interactions with residues present at the active site.


The results of the generated models provide significant site-specific insight into the structural requirements for reverse transcriptase inhibition during the design and development of novel anti-HIV compounds. Molecular docking study revealed the binding interaction between the ligand and the receptor which gave insight towards the structure-based design for the discovery of more potent compounds with better activity against HIV infection.


Human immunodeficiency virus (HIV) attacks the body’s immune system, specifically the CD4 cells (T cells), which help the immune system fight off infections. HIV reduces the number of CD4 cells in the body, making the person more likely to get other infections or infection-related cancers [1, 2]. Over time, HIV can destroy so many of these cells that the body cannot fight off infections and leads to acquired immunodeficiency syndrome (AIDS), the last stage of HIV infection [3]. It is one of the world’s most significant public health challenges, particularly in low- and middle-income countries. In 2017, approximately 36.9 million people (35.1 million adults) were living with HIV and 1.8 million people became newly infected, globally. Nearly 1 million people died from AIDS-related illnesses in 2017 [4]. The medicine used to treat HIV is called antiretroviral therapy (ART) [5, 6]. There are various antiretroviral drugs available in the market such as entry or fusion inhibitors, nucleoside or non-nucleoside reverse transcriptase inhibitors (NRTI/NNRTI), integrase inhibitors (IN), protease inhibitors (PI), and maturation inhibitors [7]. The resistance of the virus to the available antiretroviral drugs is the biggest challenge for ART, and the discovery of new anti-HIV agents to overcome this resistance is continually required [8]. Human immunodeficiency virus (HIV) is a retrovirus because of the presence of an enzyme reverse transcriptase (RT), which possesses both RNA-dependent DNA polymerase (RDDP) and ribonuclease-H (RNase H) activities that work in tandem to convert viral genomic single-stranded RNA to double-stranded DNA by the process reverse transcription and by retrotransposon, mobile genetic elements are then integrated into the DNA of the infected host cell to cause disease [9,10,11]. Hence, reverse transcriptase inhibitors both nucleoside reverse transcriptase inhibitor (NRTI’s) and non-nucleoside reverse transcriptase inhibitor (NNRTI’s) are active against HIV which inhibits the process reverse transcription that leads to inhibition of formation of DNA from viral RNA (Fig. 1) for insertion into the host DNA sequence cause treatment of the disease [12, 13].

Fig. 1

Mechanism of reverse transcriptase inhibitors in HIV

Traditional drug development methods are based on random screening, which has a major disadvantage like a lengthy, expensive, and intellectual method. To overcome these disadvantages in the recent decade, the emergence of computer-aided drug development (CADD) processes are taken place [14]. These methods are run at a lower cost and give a viable option for the screening of potential drug candidates. The number of methodologies is involved in the CADD, out of this quantitative structure-activity relationship (QSAR) is one such efficient chemoinformatic tool, which aims to identify and quantify the relationship between molecular structure and certain physicochemical properties for the development of predictive models [15]. QSAR models like 2D and 3D are having their own merits and demerits. Generated 2D QSAR models are only correlating the physiochemical properties of the molecules with their biological activity. These models do not specify the site at which modification is required for enhancement of activity. For this purpose, 3D-QSAR models are suitable for the prediction of the activity of the compounds based on their 3D grid points generated around the aligned set of molecules. One of the major limitations of the 3D-QSAR method is its dependency on molecular alignment and conformers chosen for the alignment. This becomes crucial when the information of confirmation is absent or when the molecular framework is not rigid. Hence, it is clearly understood that there is a requirement of a QSAR method which will allow flexibility to study molecular sites of interest and not depend upon conformational analysis and alignment of the molecules to provide information about sites and nature of interaction required for activity. Fragment-based G-QSAR has shown promise in current drug discovery and lead optimization. This method involves the calculation and evaluation of descriptors only for the substituent groups or molecular fragments instead of the whole molecule [16]. The biggest advantage of this method is it can be applied for both congeneric and non-congeneric series. G-QSAR method is helpful over 2D- and 3D-QSAR methods because this method provides useful information about the significant substitution sites, their chemical nature, and overall interaction that affects the therapeutic activity of the compounds [17, 18]. The focus of the present study was to develop fragment-based G-QSAR model correlating the biological activity of the molecular fragments with the 2D fragment-based descriptors and molecular docking study to interpret the structural requirement and the mechanism of the binding interaction between the ligand and the receptor on a congeneric series of 97 compounds taken from reported literature. The information derived from such predictive models could be utilized for accurate prediction of biological activities (dependent variables) of molecular fragments based on their chemical structure and properties (independent variables) that can be utilized as the building blocks for designing of novel anti-HIV molecules before synthesis and experimental testing [19, 20]. The result of the docking study can improve the binding process of ligands with its receptor and provide insights into the structural features related to the activities of the new drug compounds.


Data set

To perform the present computational study, a data set of 97 compounds having reported IC50 values was taken from the literature [21]. The selected compounds for the study shared the same activity and assay procedure with significant variations in their structure and potency. Inhibitory potencies of the compounds in the data set have IC50 value ranges from 0.63 to 19.5 nm which were further converted to pIC50 by using the mathematical formula given as Eq. 1:

$$ {\mathrm{pIC}}_{50}=-\log 10\ \left({\mathrm{IC}}_{50}\right) $$

The chemical structure of the congeneric dataset was prepared by using MarvinSketch. The conversion of 2D structures to 3D structures and energy minimization of 3D compounds were performed by the help of force field batch minimization modules of the VLifeMDS software [22]. Energy minimization is performed to optimize the molecules up to their lowest stable state of energy. A template which is the representative of the entire molecules of the dataset under study was drawn with the presence of a dummy atom at the substitution sites. The template has 6 substitution sites depicted as R1–R6 in Fig. 2. The study was performed by using VLife MDS version 4.6 from VLife Sciences Technologies Pvt Ltd, Pune, India.

Fig. 2

Molecular template of 4-arylthio and 4-aryloxy-3-iodopyridine-2(1H)-one derivatives utilized for fragmentation pattern (R1–R6 are substitution sites)

Calculation of descriptors for G-QSAR modeling

This step is performed by using the G-QSAR module of VLife MDS. The common scaffold (Fig. 2) was used as a template for the generation of fragment-based G-QSAR models. The optimized molecules were imported into the QSAR worksheet, and their pIC50 values were incorporated manually, followed by the calculation of various physiochemical descriptors for the different substitution sites of the compounds [23]. A total of 321 physiochemical descriptors for classes like individual physiochemical descriptors like molecular weight, hydrogen bond donors and acceptors, retention index (chi), atomic valence connectivity index (chiv), chain path count, cluster, path cluster, kappa, estate numbers, estate contributors, and information theory index were calculated for the groups to present at the substitution site in each of 97 molecules presented in Table 1.

Table 1 Chemical structure and observed and predicted activities (pIC50) for the training and test set compounds

Data selection and building of G-QSAR model

The data set was divided into a training set of 75 compounds and a test set of 22 compounds based on Sphere Exclusion algorithms so that the activity of the selected test set is uniformly distributed throughout the activity column of the compounds [19]. Unicolumn statistics is performed for both training and test series to check the spread of data. The results of the unicolumn statistics study are presented in Table 2. From the result, the test set is interpolative, i.e., the activity of the test set is derived within the activity range of the training set. The mean and standard deviation of the training and test sets provide insight into the relative difference of mean and point density distribution of the two sets. The average value of the test set is higher than the training set shows the presence of relatively more active molecules as compared to the inactive ones.

Table 2 Unicolumn statistics of activity (pIC50) for training and test set compounds for G-QSAR

For the building of G-QSAR models, a simulated annealing algorithm (SA) was utilized. After that, multiple G-QSAR models were generated using multiple linear regression (MLR), partial least squares regression (PLSR), and principal component regression (PCR).

Validation of the developed G-QSAR model

For validation of the developed G-QSAR models, the data set is divided into two sets as training and test sets. This division is based on the substitution groups and the inhibition of compounds. The training set is employed to produce the QSAR model, and the test set is used to validate the quality of the developed models [24]. The statistical parameters of the developed models and internal and external validations are adopted for testing the fitness, stability, and predictive ability of the QSAR models [17]. The models are validated by considering many statistical parameters such as the number of compounds in regression (n), the number of variables (k), degree of freedom, squared correlation coefficient (r2), cross-validated correlation coefficient (q2), Fisher’s value (F test), and r2 for the external test set, (pred_r2) for external validation. For the internal predictive ability of the model, leave-one-out (LOO) method is used, showed as the value of q2 (cross-validated explained variance). External validation of the developed QSAR models is performed by measuring the predictive power of the current models on the external test set by calculating the pred_r2 value as given in Eq. 2, which gives the statistical correlation between predicted and actual activities of the test set compounds

$$ \mathrm{pre}\_\mathrm{r}2=1\hbox{-} \frac{\sum {\left({\mathrm{y}}_{\mathrm{i}}-{\overset{\frown }{y}}_{\mathrm{i}}\right)}^2}{\sum {\left({\mathrm{y}}_{\mathrm{i}}-{y}_{\mathrm{mean}}\right)}^2} $$

where yi, \( {\overset{\frown }{y}}_{\mathrm{i}} \), and ymean are the actual, predicted activity of the ith molecule in the test set, and the average activity of all the molecules in the test set, respectively.

Internal validation of the developed QSAR models is performed by calculating the q2 value as given in Eq. 3, which gives the statistical correlation between predicted and actual activities of the training set compounds.

$$ {q}^2=1\hbox{-} \frac{\sum {\left({\mathrm{y}}_{\mathrm{i}}-{\overset{\frown }{y}}_i\right)}^2}{\sum {\left({\mathrm{y}}_{\mathrm{i}}-{y}_{\mathrm{mean}}\right)}^2} $$

where yi, , and ymean are the actual, predicted activity of the ith molecule in the training set, and the average activity of all the molecules in the training set, respectively.

The generated models were considered having a significant predictivity when the squared correlation coefficient (r2) between physiochemical descriptors (independent variable) and activity (dependent variable) was over 0.6. The developed model possesses significant internal and external predictivity when the cross-validated correlation coefficient of the leave-one-out method (q2) > 0.6, the correlation coefficient of the training set (pred_r2) > 0.5, and higher value of F test.

Molecular docking study

A molecular docking study is a computational approach for searching a ligand that can fit both geometrically and energetically into the binding site of a target to show biological activity [25]. Docking study helps to predict drug/ligand or receptor/protein interactions by identifying the suitable active sites in the protein, getting the best geometry of ligand-receptor complex, and calculating the energy of interaction for different ligands to design more effective ligands with good binding affinity against RT enzyme [26].

Protein preparation

For protein preparation, three-dimensional crystallographic structures, and the coordinates of the target protein (PDB-ID: 1S6G) having resolution 3.0 Å is retrieved from the RCSB PDB database ( The protocol for protein preparation was performed by deleting the bounded ligand, inserting missing atoms in incomplete residues, deleting alternate conformations, and modeling the missing loop regions with the help of Biopredicta (homology modeling) modules of the VLife-MDS software [27]. The final 3D structure of RT was evaluated using Biopredicta modules; Ramachandran plot showed that 87.95% of residues presented in most favored regions (Fig. 3). Before performing docking bond, orders of the ligands are assigned, hydrogen atoms are added, and the water molecules which do not involve in the interaction are deleted.

Fig. 3

Ramachandran plot of RT enzyme (PDB ID = 1S6Q)

Protein-ligand docking

The molecular docking study was performed by using the Biopredicta tool of the V-Life MDS software version 4.6. The optimized and symmetrical crystalline protein structure got from Protein Preparation Wizard was used for docking study. The energy minimization of the crystal structure was carried out to relieve the steric clashes among the residues due to the addition of hydrogen atoms [28]. Crystallographic water molecules (water molecules without H bonds) were deleted. Hydrogen bonds corresponding to pH 6.8 were added considering the appropriate ionization states for both the acidic and basic amino acid residues. A grid box was generated at the centroid of the active site for docking, and the active site was defined with a 10 Å radius around the ligand present in the crystal structure [29]. To test the docking parameters, all compounds were docked into the catalytic pocket of the RT enzyme (PDB-ID: 1S6Q). Finally, the best-docked structure was selected using the dock score function [30].


Development of G-QSAR model

In the present work, effective G-QSAR model for 97 molecules belonging to the congeneric series with viability in the biological activity is generated by calculating fragment-based molecular descriptors. By using the VLife MDS software, a total of 984 descriptors are calculated, and after removal of invariable columns, a total of 321 descriptors are used to generate G-QSAR models. The sphere exclusion method with a dissimilarity value of + 1 resulted in a training set of 75 and a test set of 22 compounds. The test set was selected as maximum pIC50 value of the test set compounds if it was less or equal to that of the training set and the lowest pIC50 value of the test set compound was more than or equal to that of the training set so that the test set has been derived from the maximum-minimum range of training set. Multiple G-QSAR models were built using a simulated annealing algorithm (SA) coupled with multiple linear regression (MLR), partial least squares regression (PLS), and principal component regression (PCR). Several models were generated, and the best three out of them are selected based on the statistical values like r2, q2, pred_r2, F test, and standard error. The statistical parameters of each model are shown in Table 3.

Table 3 Statistical parameters of the developed G-QSAR models

The fitness plot between actual and predicted activity for training and test set compounds provides an idea about how well these models are trained and how well they predict the activity of the external test set. Further, the distribution curve of actual and predicted activity for training and test set compounds of the developed models is depicting closeness between the actual and predicted activity of the compounds for training and test sets. Descriptors like SK-hydrophobic area, SaaN-count, Chi-3-cluster, SscHE-index, SaaNE-index, Delta epsilon-C, SsBr count, SaaNH count, smr, SssScount, SaaOcount, SsCH3count, 1-path count, Id-average, Chi v3, and Chi v2 are showing contribution on the respective substitution sites.

Interpretation of model-I (SA-MLR)

The G-QSAR model-I was obtained by using a simulated annealing algorithm (SA) coupled with multiple linear regressions (MLR). The correlation equations between activity (pIC50) and the selected parameters are expressed by Eq. 4;

$$ {\displaystyle \begin{array}{l}{\mathrm{pIC}}_{50}=0.4685+0.5031\left(\pm 0.0023\right){\mathrm{R}}_2\mathrm{SKHydrophobicArea}-0.2021\left(\pm 0.0757\right)\\ {}{\mathrm{R}}_2\mathrm{SsSHE}-\mathrm{index}-0.0381\left(\pm 0.0090\right){\mathrm{R}}_2\mathrm{SaaNE}-\mathrm{index}+0.4132\left(\pm 0.0003\right){\mathrm{R}}_4\mathrm{H}-\mathrm{acceptor}\ \mathrm{Count}+0.3217\\ {}\left(\pm 0.0020\right){\mathrm{R}}_3\mathrm{XlogP}\end{array}} $$

The generated model-I was statistically significant with predictivity, r2 = 73.11%, internal (q2), and external (pred_r2) validation revealed a predictive power of 68.74% and 73.25% respectively. The fitness plot between actual and predicted activity for training and test set compounds is given in Fig. 4a, which provides an idea about the predictivity of the model for training and test set compounds. Further, the distribution curve of actual and predicted activity for training and test set compounds for model-I are presented in Fig. 5a, b, depicting closeness between the actual and predicted activity of the compounds for training and test set. The contribution of different physiochemical descriptors towards activity is shown in Fig. 6a.

Fig. 4

ac Fitness plot between training and test set compounds for model-I, model-II, and model-III

Fig. 5

af Actual vs predicted activity of training and test set compounds for model-I, model-II, and model-III

Fig. 6

ac Contribution plot of descriptors towards pIC50 for model-I, model-II and model-III

Interpretation of model-II (SA-PLS)

Model-II obtained by simulated annealing algorithm (SA) associated with partial least square regression (PLS) expressed as Eq. 5 explains an improved correlation coefficient of r2 = 85.24%, internal (q2) and external (pred_r2) validation predictive power of 69.25% and 74.21% respectively. The higher degree of freedom (89.35), F test (55.037), and low standard error (pred_r2se = 0.3078) value support the robustness of the model.

$$ {\displaystyle \begin{array}{l}{\mathrm{pIC}}_{50}=0.4227-0.2014\left(\pm 0.0017\right){\mathrm{R}}_1\mathrm{Volume}+0.9541\left(\pm 0.0124\right)\\ {}{\mathrm{R}}_2\mathrm{SKHydrophobicArea}+0.0417\left(\pm 0.0033\right)\\ {}{\mathrm{R}}_2\mathrm{SsBrcount}+0.0913\left(\pm 0.0002\right){\mathrm{R}}_2\mathrm{SaaNHcount}-1.6448\\ {}\left(\pm 0.0051\right){\mathrm{R}}_2\mathrm{DeltaEpsilonC}-0.2695\left(\pm 0.0104\right){\mathrm{R}}_2\mathrm{IdAverage}+0.1909\\ {}\left(\pm 0.0057\right){\mathrm{R}}_21\mathrm{PathCount}-0.1126\left(\pm 0.0033\right)\\ {}{\mathrm{R}}_4\mathrm{chiV}3-0.1063\left(\pm 0.0033\right){\mathrm{R}}_4\mathrm{chiV}2+0.1074\left(\pm 0.0001\right)\\ {}{\mathrm{R}}_3\mathrm{slogP}-0.1124\left(\pm 0.0048\right){\mathrm{R}}_5\mathrm{smr}\end{array}} $$

The fitness plot and the distribution curve of actual and predictive activity for training and test sets are given as Fig. 4b, and Fig. 5 c and d show that the minimum difference between the actual and predicted values of the compounds is a measure of the high quality of the model. The contribution curve of descriptors towards activity is presented as Fig. 6b.

Interpretation of model-III (SA-PCR)

The third model, model–III, was obtained by a simulated annealing algorithm (SA) associated with principal component regression (PCR) expressed as Eq. 6 explaining the biological activity as a function of some physiochemical descriptors like SKHydrophobicArea, SaaNcount, Chi3Cluster, SsCH3count, SaaScount, and SaaOcount at their respective fragmented substitution sites. The generated model has correlation coefficient r2 = 71.24%, internal (q2) and external (pred_r2) predictive ability of 64.21% and 70.25% respectively.

$$ {\displaystyle \begin{array}{l}{\mathrm{pIC}}_{50}=0.3998+0.4215\left(\pm 0.0014\right){\mathrm{R}}_2-\mathrm{SKHydrophobicArea}-0.1088\left(\pm 0.0022\right)\\ {}{\mathrm{R}}_2-\mathrm{SaaNcount}+0.4545\left(\pm 0.0007\right){\mathrm{R}}_2-\mathrm{chi}3\mathrm{Cluster}+0.3214\\ {}\left(\pm 0.0033\right){\mathrm{R}}_4-\mathrm{SaaScount}+0.1163\left(\pm 0.0012\right)\\ {}{\mathrm{R}}_4-\mathrm{SaaOcount}+0.1995\left(\pm 0.0048\right){\mathrm{R}}_5-{\mathrm{SsCH}}_3\mathrm{count}\end{array}} $$

The fitness plot and the distribution curve of actual and predictive activity for training and test set (Fig. 4c and Fig. 5e, f) shows a correlation between actual and predicted activity. Contribution curve of descriptors towards activity is shown in Fig. 6c.

Molecular docking

The docking studies were carried out for 97 data set compounds into the catalytic pocket of the prepared target RT enzyme (PDB-ID: 1S6Q) by the V-Life MDS software. This study is useful to identify the binding potency and poses of active molecules that reveal the molecular mechanism of action. The compounds during docking showed several poses, orientation, and configurations. Each configuration is characterized by a combined score of van der Waal’s forces, hydrogen bonding, pi interaction, charge interaction, halogen bond interaction, and salt bridge interaction. The docking scores of the listed compounds are presented in Table 4. The docking study revealed that interactions were dominated by the hydrophobicity or π-aromaticity due to the presence of aromatic and hetero atomic rings significant for stacking interactions. The docking interaction result of compound 51 given in Table 5 reveals that the interactions were dominated in the region of LYS-374, GLN-572, TRP-573, PRO-574, VAL-609, PHE-610, GLU-945, and GLU-948 amino acid residues due to the presence of the active site in the region (Fig. 7a). The 2D-docked pose of compound 51 in the active site of the target receptor is shown in Fig. 7b.

Table 4 Docking score of compounds (Kcal/mol)
Table 5 Docking interaction of compound 51 with the binding pocket of 1S6Q
Fig. 7

a Docking pose of compound 51 into active site of 1S6Q. b 2D-ligand interaction diagram of compound 51 in the binding pocket of 1S6Q


All three models developed during the present study are more significant and found to have good predictivity. Model-I developed by SA-MLR is statistically significant and shows good predictivity for both training and test set of compounds. The contribution plot of descriptors towards pIC50 for model-I shows the positive contribution of SKHydrophopic area (34.03%), indicates the substitution of the higher hydrophobic group like substituted aromatic or heterocyclic rings at R2 increase activity, while the negative contribution of SssSHE-index (− 13.67%) and SaaNE-index(− 2.57%) at R2 position shows –NH2 connected to one bond and –SH group connected with over one single bond are conducive for anti-HIV activity. The positive contribution of H-acceptor count (27.95%) at R4 and XlogP (21.76%) at R3 indicates the presence of H-bond forming elements like oxygen and sulfur at R4, and higher lipophilic group at R3 increases drug activity by increasing its binding with the receptor. G-QSAR equation (model-II) developed by using SA-PLC algorithm shows improved statistical study results for both training and test set of compounds. The contribution plot shows the positive contribution of molecular weight (5.25%) at R1 which indicates that the substitution of the bulkier group at R1 increases drug activity. The positive contribution of descriptors like SKHydrophobic area (24.89%), SsBrcount (1.08%), SaaNHcount (2.38%), and 1Pathcount (4.98%) at R2 shows substitution of a higher hydrophobic group containing more –Br atom connected with one bond, the total number of –NH group connected with 2 aromatic bonds, and fragment R2 of first-order increase anti-HIV activity. Descriptors like DeltaEpsilonC (− 42.91%) and IdAverage (− 7.03%) showing significant negative contribution towards activity indicate a decrease in the contribution of electronegativity and entropic interaction fields of the fragment at R2 increase biological activity. The positive contribution of slogP (2.80%) at R3 suggesting the presence of the lipophilic group at R3 enhances inhibitory activity. Atomic valence connectivity index ChiV3 (− 2.93 %) and ChiV2 (− 2.77%) at R4 contributes negatively towards activity indicates the atomic connectivity index of fragment R4 of order 3 and 2 decrease activity. The descriptor smr (− 2.93 %) at R5 deleterious towards activity shows the presence of a group having low molecular refractivity increase enzyme inhibition. Model-III developed by SA-PCR is also significant in predicting the role of descriptors towards activity for both training and test set. The contribution plot for descriptors and pIC50 shows a positive contribution of SKhydrophobicArea (25.98%) and Chi3Cluster (28.02%), and negative contribution of descriptor SaaNcount (− 6.7%) at R2 means the presence of the group having higher hydrophobicity, 3rd order cluster chi index, and total nitrogen not connected with two aromatic bonds increase anti-HIV activity. The positive contribution of Subaccount (19.81%) and SaaOcount (7.17%) at R4 indicates the presence of oxygen and sulfur connected with two aromatic bonds at this site increases enzyme inhibition activity. Descriptor SsCH3count (12.29%) at R5 contribute positively towards activity, indicates the presence of –CH3 group increases drug activity by increasing its binding efficiency with the target site of the receptor. Molecular docking was performed for these compounds present in the data set with the active site of the target RT enzyme to derive the ligand-receptor interaction mechanism. The 2D-dock pose of compound 51 evident that the generated map of hydrophobic and hydrophilic fields, where benzene ring and heteroatomic (Pyridine and furan) rings are present in the chemical structure, is buried in the hydrophobic pocket. Further docking analysis shows the amide group of fragment R2 of the ligand forms a hydrogen bond between amino (–NH2) and acidic group (–COOH) of GLN-572, VAL-609, and PHE-610 amino acid residue present at the active site of the receptor. Hydrophobic interactions were observed between groups present at R1, R2, R4, R5, and R6 with LYS-374, GLN-572, TRP-573, PRO-574, VAL-609, PHE-610, GLU-945, and GLU-948 amino acids of receptor to form a stabilized complex suggesting a strong binding of inhibitors with the RT enzyme. The residue such as GLN-572, TRP-573, PRO-608, VAL-609, PHE-610, LYS-622, GLU-945, and GLU-948 was involved in the van der Waal’s interaction, and amino acid TRP-573 involved in both aromatic and charged interaction with the ligand molecules increase the stabilization of inhibitor at the active site of RT enzyme. The result of the G-QSAR and molecular docking study provided a molecular level understanding to infer that identified compounds are promiscuous and might be a potential inhibitor of RT enzyme.


In the present study, an attempt was made to generate novel fragment-based QSAR (G-QSAR) models for a congeneric series of 97, 4-arylthio, and 4-aryloxy-3-iodopyridine-2(1H)-one derivative with known anti-HIV activity. The whole data set of compounds were divided into training and test sets, and three G-QSAR models were developed by a simulated annealing algorithm (SA) coupled with multiple linear regression (MLR), partial least squares regression (PLS), and principal component regression (PCR). All generated models I, II, and III were statistically significant and provide site-specific clues for the design of new reverse transcriptase inhibitors. Model-II developed by SA-PLS was more significant among all; the result of the statistical parameters were r2 = 85.24%, q2 = 69.25%, pred_r2 = 74.21%, the higher degree of freedom (89.35), F test (55.037), and low standard error (pred_r2se = 0.3078) values during validation study for both training and test set compounds fulfilled the conditions for predictive and robustness. This model indicates the presence of higher hydrophobic substituent containing single-bonded –Br atom, 2 aromatic bonded –NH group with less electronegativity, and entropic interaction fields at fragment R2 essential for better anti-HIV activity. Similarly, the presence of a lipophilic group at R3, oxygen, and sulfur connected with two aromatic bonds at R4 and –CH3 group at R5 increases inhibition activity by increasing binding efficiency with reverse transcriptase enzyme. G-QSAR method allows ease to interpretation, unlike conventional QSAR method which could only suggest important descriptors but does not reflect the site where it has to be optimized for further design of new compounds. Molecular docking results of the G-QSAR-generated compounds showed the interactions between the fragmented groups reported for these compounds and the residues located at the binding site. Dock pose of the selected compound reveals that the presence of a group at fragmented site R1, R4, R5, and R6 are responsible for hydrophobic interaction, and group at R2 is essential for both H-bond and hydrophobic interaction with the amino acid residues at the active site of the target enzyme. The findings got from G-QSAR and docking studies were utilized for designing newer RT-inhibitor anti-HIV agents. It is therefore concluded that the molecular manipulations at appropriate sites suggested by structure-activity relationship data will prove beneficial for identifying particular chemical variation at specific substitution sites and mathematical models for prediction of biological activity of newly designed molecules.

Availability of data and materials

All data generated or analyzed during this study are included in this published article.



Group-based quantitative structure activity relationship


Simulated annealing


Multiple linear regression


Partial least square


Principle component regression


Root mean square deviation


Root mean square error


Standard deviation

r 2 :

Coefficient of determination

q 2 :

Cross-validated squared correlation coefficient


External predictivity


Fisher test


  1. 1.

    Jegede O, Babu J, Di Santo R, McColl DJ, Weber J, Quinones-Mateu M (2008) HIV type 1 integrase inhibitors: from basic research to clinical Implications. AIDS Rev 10(3):172–189

    PubMed  Google Scholar 

  2. 2.

    Fajardo-Ortiz D, Lopez-Cervantes M, Duran L, Dumontier M, Lara M, Ochoa H, Castano VM (2017) The emergence and evolution of the research fronts in HIV/AIDS research. PLoS One 12(5).

  3. 3.

    Wainberg MA, Jeang KT (2008) 25 years of HIV-1 research – progress and perspectives. BMC Med 6:31–37

    Article  Google Scholar 

  4. 4.

    Sweileh WM (2018) Global research output on HIV/AIDS–related medication adherence from 1980 to 2017. BMC Health Serv Res 18(1):765–777

    Article  Google Scholar 

  5. 5.

    Safadi YE, Vivet-Boudou V, Marquet R (2007) HIV-1 reverse transcriptase inhibitors. Appl Microbiol Biotechnol 75(4):723–737

    CAS  Article  Google Scholar 

  6. 6.

    Marchand B, Das K, Himmel DM, Parniak MA, Hughes SH, Arnold E (2008) Structure and function of HIV-1 reverse transcriptase: molecular mechanisms of polymerization and inhibition. JMol Biol 385(3):693–713

    Google Scholar 

  7. 7.

    La J, Latham CF, Tinetti RN, Johnson A, Tyssen D, Huber KD, Sluis-Cremer N, Simpson JS, Headey SJ, Chalmers DK, Tachedjian G (2015) Identification of mechanistically distinct inhibitors of HIV-1 reverse transcriptase through fragment screening. Proc Natl Acad Sci 112(22):6979–6984

    CAS  Article  Google Scholar 

  8. 8.

    Cory TJ, Midde NM, Rao PSS, Kumar S (2015) Investigational reverse transcriptase inhibitors for the treatment of HIV. Expert Opin Investig Drugs 24(9):1219–1228

    CAS  Article  Google Scholar 

  9. 9.

    Asahchop EL, Wainberg MA, Sloan RD, Tremblay CL (2012) Antiviral drug resistance and the need for development of new HIV-1 reverse transcriptase inhibitors. Antimicrob Agents Chemother 56(10):5000–5008

    CAS  Article  Google Scholar 

  10. 10.

    Chinsembu KC (2019) Chemical diversity and activity profiles of HIV-1 reverse transcriptase inhibitors from plants. Rev Bras Farmacogn 29(4):504–528

    CAS  Article  Google Scholar 

  11. 11.

    Bethell RC, Lie YS, Parkin NT (2005) In vitro activity of SPD754, a new deoxycytidine nucleoside reverse transcriptase inhibitor (NRTI), against 215 HIV-1 isolates resistant to other NRTIs. Antivir Chem Chemother 16(5):295–302

    CAS  Article  Google Scholar 

  12. 12.

    Mehellou Y, Clercq ED (2010) Twenty-six years of anti-HIV drug discovery: where do we stand and where do we go. J Med Chem 53(2):521–538

    CAS  Article  Google Scholar 

  13. 13.

    Tsibris AMN, Hirsch MS (2010) Antiretroviral therapy in the clinic. J.Virol. 84(11):5458–5464

    CAS  Article  Google Scholar 

  14. 14.

    Correa-Basurto J, Bello M, Rosales-Hernandez MC, Hernandez-Rodríguez M, Vazquez IN, Rojo-Domínguez A, Flores-Sandoval CA (2014) QSAR, docking, dynamic simulation and quantum mechanics studies to explore the recognition properties of cholinesterase binding sites. Chem Biol Interact 209:1–13

    CAS  Article  Google Scholar 

  15. 15.

    Wang Z, Cheng L, Kai Z, Wu F, Liu Z, Cai M (2014) Molecular modeling studies of atorvastatin analogues as HMGR inhibitors using 3D-QSAR, molecular docking and molecular dynamics simulations. Bioorg Med Chem Lett 24(16):3869–3876

    CAS  Article  Google Scholar 

  16. 16.

    Singh A, Goyal S, Jamal S, Subramani B, Das M, Admane N, Grover A (2016) Computational identification of novel piperidine derivatives as potential HDM2 inhibitors designed by fragment-based QSAR, molecular docking and molecular dynamics simulations. Structural Chemistry 27:993–1003

    CAS  Article  Google Scholar 

  17. 17.

    Ajmani S, Jadhav K, Kulkarni SA (2009) Group-based QSAR (G-QSAR): mitigating interpretation challenges in QSAR. QSAR Comb Sci 28(1):36–51

    CAS  Article  Google Scholar 

  18. 18.

    Joshi K, Goyal S, Grover S, Jamal S, Singh A, Dhar P, Grover A (2016) Novel group-based QSAR and combinatorial design of CK-1δinhibitors as neuroprotective agents. BMC Bioinformatics 17(19):515–527

    Article  Google Scholar 

  19. 19.

    Abdullahi AD, Abdualkader AM, Abdulsamat N, Ingale K (2015) Application of group-based QSAR and molecular docking in the design of insulin-like growth factor antagonist. Trop J Pharm Res 14(6):941–951

    CAS  Article  Google Scholar 

  20. 20.

    Kale MA, Baheti KG (2015) G-QSAR studies of novel 1,2,4 triazole [3,4-b]-1,3,4-thiadiazole derivatives. Der Pharmacia Sinica 6(9):15–20

    CAS  Google Scholar 

  21. 21.

    Guillemont J, Benjahad A, Oumouch S, Decrane L, Palandjian P, Vernier D, Queguiner L, Andries K, Bethune M, Hertogs K, Grierson DS, Nguyen CH (2009) Synthesis and biological evaluation of C-5 methyl substituted 4-arylthio and 4-aryloxy-3-iodopyridin-2(1H)-one type anti-HIV agents. J Med Chem 52(23):7473–7487

    CAS  Article  Google Scholar 

  22. 22.

    VLife MDS 4.6 Molecular design suite (2018). Vlife Sciences Technologies Pvt. Ltd. Pune.

  23. 23.

    Choudhari P, Kumbhar S, Phalle S, Choudhari S, Desai S, Khare S, Jadhav S (2016) Application of group-based QSAR on 2-thioxo-4-thiazolidinone for development of potent anti-diabetic compounds. Molecular Structure 1128:355–360

    Article  Google Scholar 

  24. 24.

    Kale MA, Sonwane GM (2020) Molecular docking, G-QSAR studies, synthesis and anticancer screening of some new 2-phenazinamines as Bcr-Abl tyrosine kinase inhibitors. Curr Drug Discov Technol 17(2).

  25. 25.

    Behera DK, Behera PM, Acharya L, Dixit A (2017) Pharmacophore modelling, virtual screening and molecular docking studies on PLD1 inhibitors. SAR QSAR Environ Res 28:991–1009

    CAS  Article  Google Scholar 

  26. 26.

    Safarizadeh H, Garkani-Nejad Z (2019) Molecular docking, molecular dynamics simulations and QSAR studies on some of 2-arylethenylquinoline derivatives for inhibition of Alzheimer’s amyloid-beta aggregation: insight into mechanism of interactions and parameters for design of new inhibitors. J Mol Graph Model 87:129–143

    CAS  Article  Google Scholar 

  27. 27.

    Panigrahi D, Mishra A, Sahu SK (2020) Pharmacophore modeling, QSAR study, molecular docking and insilico ADME prediction of 1,2,3-triazole and pyrazolopyridones as DprE1 inhibitor antitubercular agents. SN App Sci 2(5).

  28. 28.

    Shivakumar D, Williams J, Wu Y, Damm W, Shelley J, Sherman W (2010) Prediction of absolute solvation free energies using molecular dynamics free energy perturbation and the OPLS force field. J Chem Theory Comput 6:1509–1519

    CAS  Article  Google Scholar 

  29. 29.

    Tang HJ, Yang L, LiJH CJ (2016) Molecular modelling studies of 3, 5-dipyridyl-1, 2, 4-triazole derivatives as xanthine oxidoreductase inhibitors using 3D-QSAR, Topomer CoMFA, molecular docking and molecular dynamic simulations. J Taiwan Inst Chem Eng 68:64–73

    CAS  Article  Google Scholar 

  30. 30.

    Athar M, Lone MY, Khedkar VM, Jha PC (2016) Pharmacophore model prediction, 3D-QSAR and molecular docking studies on vinyl sulfones targeting Nrf2-mediated gene transcription intended for anti-Parkinson drug design. J Biomol Struct Dyn 34:1282–1297

    CAS  Article  Google Scholar 

Download references


The authors are thankful to V-Life Science Technologies Pvt. Ltd for providing the software for the study.


The present research work was not funded by any funding agencies.

Author information




SKS and DP designed the research work. DP performed the whole research work, including computational studies. SKS and AM analyzed the data. All authors contributed equally to writing the paper. The authors read and approved the final manuscript

Corresponding author

Correspondence to Debadash Panigrahi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Panigrahi, D., Mishra, A. & Sahu, S.K. Rational in silico drug design of HIV-RT inhibitors through G-QSAR and molecular docking study of 4-arylthio and 4-aryloxy-3-iodopyridine-2(1-H)-one derivative. Beni-Suef Univ J Basic Appl Sci 9, 48 (2020).

Download citation


  • Quantitative structure activity relationship
  • G-QSAR
  • Antiretroviral therapy (ART)
  • Anti-HIV
  • Reverse transcriptase inhibitor
  • Molecular docking