 Research
 Open Access
 Published:
Molecular docking and QSAR theoretical model for prediction of phthalazinone derivatives as new class of potent dengue virus inhibitors
BeniSuef University Journal of Basic and Applied Sciences volume 9, Article number: 50 (2020)
Abstract
Background
Dengue fever is a key public health unease in various tropical and subtropical regions. The improvement of existing agents that can inhibit the dengue virus is therefore of utmost importance. In this work, the QSAR study was carried out on 25 molecules of phthalazinone derivatives which have been reported to possess excellent dengue virus inhibitory activity. Density functional computational technique was used in the optimisation of the molecules with the basis set at theory level (B_{3}LYP, 631G*) respectively. The multiple linear regression (MLR) model was built using genetic function approximation (GFA) in the material studio software package. Also, in this study, molecular docking simulation was carried between dengue virus serotype 2 protease (PDB CODE: 6mol) and some selected phthalazinone derivatives (compounds 1, 2, 7, 11, and 21).
Results
The model was robust as evidenced by validation and robustness statistical parameter which include predicted R^{2}_{pred.}, adjusted R^{2}_{adj.}, crossvalidated Q^{2} and R^{2} regression coefficient, etc (R^{2}_{pred.} = 0.71922, R^{2}_{adj.} = 0.939699, Q^{2}_{CV} = 0.905909, R^{2} = 0.955567) respectively. The molecular docking studies conducted in this study have outlined the binding affinities of the selected compounds (1, 2, 7 11, and 21) which are all in good correlation with their respective pIC_{50} values. The free binding affinities of the selected compounds were found to be (− 8.7, − 8.8, − 8.7, − 8.3, and − 8.9 kcal/mol) respectively, compound 21 with the binding affinity of − 8.9 kcal/mol had the best binding free energy with the protease relative to other compounds under consideration.
Conclusion
The MLRGFA model study alongside with the molecular docking analysis has essentially provided a valuable and indepth understanding as well as knowledge for the development of novel chemical compounds with enhanced inhibitory potential against the dengue virus serotype 2 (DNV2). Hence, the developed model can be applicable in predicting the antidengue activity of a new set of chemical compounds that fall within its applicability domain.
Background
Dengue infection is a mosquitoborne infection caused by a virus called Dengue virus (DNV) a member of the Flavivirus found predominantly in tropical and subtropical areas around the world [1]. DNV spreads among humans by the infected Aedes aegypti or Aedes albopictus specifically female of the Aedes genus [2, 3].
They are classified into four different but closely interrelated serotypes (DNV1, DNV2, DNV3, and DNV4). Infection with one serotype confers lifelong immunity; however, secondary infection by a different serotype can increase the risk of developing severe dengue, because crossimmunity to the other serotypes is only partial and temporary [4].
Dengue virus serotype 2 (DNV2) is responsible for the major infections and accounts for the largest death rate and hence considered as the virulent strain among other serotypes of the four [5]. The risk of developing dengue shock syndrome (DSS) and dengue hemorrhagic fever (DHF) is associated with infection by multiple serotypes as a result of antibodydependent enhancement [6].
The dengue virus has about seven nonstructural (NS) proteins, NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5. The Flavivirus proteases are evolutionally conserved and exceedingly stable. In NS3, there is the presence of Nterminal serine protease domain, although inactive but becomes active upon complexation with NS2B. Also, this protease can assume a structural conformation of either open or closed. In the closed state that is catalytically active, NS2B is completely tied round NS3 and becomes a component of the active site. In the open and inactive conformation, NS2B has partially bound to NS3 and far away from the active site and hence inactive. The highly conserved Flavivirus NS2B/NS3 protease is necessary for viral replication and hence a druggable target [7].
In recent times, dengue infection has been reported in the Caribbean region, South America, and Europe [8]. Thus, DNV infection constitutes a serious threat globally. Approximately 40 to 100 million people are infected by DNV annually and more than 50% of the population of the world are at high risk of the infection by this virus [8]. These infections, in some persons, can progress into a more acute stage known as (DHF) and (DSS) [9,10,11], thus constitute a serious fatal threat in major dengue cases, around 2.5% from 500,000 clinical cases [9].
Despite these fatal consequences of DNV infection likewise the possible imminent outbreaks, there have been no antiviral drugs to prevent or treat DNV infections [12,13,14]. This problem is also worsened by the longlasting dispersal of these viruses to diverse geographical regions as foretold more than 20 years ago [15]. The present certified dengue vaccine, Dengvaxia, has upraised alarms about the efficacy and increased danger of severe syndrome for seronegative persons at the phase of clinical trials [16].
Antidengue potentials of synthetic and medicinal plants have been described in the literature [17, 18].
Searching of biochemical libraries of these compounds is a real stride in the right track for the design of potent drug candidates against these viruses.
Quantitative structureactivity relationship (QSAR) is essential in drug improvement as it investigates the properties of the drug through its models which characterize mathematical equations correlating the response of chemicals (i.e., biological activity) with their structural and physicochemical information in the form of numerical quantities named descriptors [19]. QSAR studies are directed at developing correspondence models through a response of chemicals and chemical information data in a statistical approach.
For the reliability of QSAR models, they are subjected to various authentication tests to check for the consistency of the developed correlation models. After its development, a QSAR model is usually verified by employing multiple statistical validation strategies giving an estimation of its predictive strength and stability [19, 20].
QSAR analysis is an effective process for improving lead compounds and designing new drugs of the desired property. It is also used in predicting the biological activity of compounds based on the molecular descriptors of compounds recognized in the appropriate mathematical models.
The goal of this investigation is to obtain a model, to forecast the activity of the selected dataset and hopefully able to predict new compounds with improved activities capable of mitigating dengue viral replication.
A better understanding and insight of the structural necessities for the design of effective and specific inhibitors against flaviviral protease would contribute to the development of targeted therapies for infections by these viruses.
Methods
Data collection
The dataset used in this study was phthalazinone derivatives reported in a published literature to experimentally possess antidengue activity [21]. It is recommended that biological activities values such as IC_{50} and EC_{50} used for the building of any given QSAR model should be obtained from the same species using the related procedures [22, 23].
Molecular geometry optimization
The twodimensional (2D) structures of the obtained compounds presented in Table 1, 2, 3 and 4 were drawn using the ChemDraw software [24]. The spatial conformations of the compounds were exported from 2D structure to threedimensional (3D) structure using the Spartan 14 V1.1.4 by Wavefunction programming package. The 3D structures were geometrically optimized by minimizing energy. In the process, the chemical structures were first of all minimized by a molecular mechanics force field to remove tension energy of the molecules’ conformation. Density functional theory (DFT) technique was further employed using the Becke’s threeparameter exchange functional (B3) hybrid alongside the Lee, Yang, and Parr correlation functional (LYP), termed as B3LYP hybrid functional, for thorough geometric optimization of the structures. The Spartan files of all the optimized molecules were then saved in SD file format, which is one the readable input format in PaDELDescriptor software [25].
Biological activities (pIC_{50})
The obtained biological activities of phthalazinone derivatives against cytoplasmic DNVRNA replication measured in IC_{50} (μM) were converted to the logarithm unit (pIC_{50}) using the Eq. (1) to increase the linearity of activity values and approach normal statistical distribution. The observed structures and the biological activities of these compounds are presented in Fig. 1a–d and Tables 1, 2, 3 and 4 respectively.
Molecular descriptor generation
Molecular descriptors which are the mathematical values describing the properties of a molecule we determined. Quantum chemical descriptors calculation for all the 25 molecules of phthalazinone derivatives were calculated using PaDELDescriptor software V2.20. A total number of about 1870 molecular descriptors were calculated and combined with those obtained from the 3D structure by the Spartan program software.
Splitting of dataset into modeling train and external evaluation test sets
To build the QSAR models, the data set which is the chemical compound was separated into two sets in the ratio of 80:20, the train set and test set respectively. The train set is used for building the QSAR model; it contains 80% of the entire chemical compounds under consideration. While the test set which constitutes the remaining 20% of the total chemical compound data set was not involved in the building of the QSAR model but to ascertain the analytical quality of the built model [26].
MLRGFA model building
Statistical analysis by genetic function approximation (GFA) techniques of the Material Studio software 8.0 version was used to build the models based on multiple linear regression (MLR). The MLR is used to establish a direct relationship between a dependent variable Y (pIC_{50}) and independent variable X (molecular descriptors). The model fits well such that sum of the square difference between the experimental and predicted pIC_{50} values is lessen. In regression analysis, a contingent mean of dependent variable (pIC_{50}) Y relies on (Descriptors) X. MLR examination utilizes and also lengthens this idea to combine more multiple autonomous variables, and regression equation assumes the form:
where Y is the dependent variable, “k”s are regression coefficients for corresponding “x”s (independent variables), and “C” is intercept or a regression constant.
The GFA calculates a fitness function identified as a lack of fit (L.O.F). This fitness function is not used by the system for indicating equations that are the best model, rather estimate the superiority of the models previously built by the system thus helping in deciding the models to use based on quality. Quality of the model is inversely proportional to the LOF value and it is computed using the mathematical expression:
In this equation, LSE is the leastsquares error of the model, c is the number of descriptors in the model, d is the smoothing parameter (which has a default value of 1.0), p is the sum of all descriptors, and M is the total number of compounds involved in the model building [27].
Model quality assessment
Predictive capacity and the robustness of the developed model was appraised internally and externally using statistical parameters such as R^{2} (square correlation coefficient), Q^{2}_{CV} (crossvalidation coefficient), R^{2} pred. (external test set correlation coefficient), cR\( {\displaystyle \begin{array}{c}2\\ {}p\end{array}} \) (coefficient of determination for Yrandomization), etc. The statistical validation parameters were compared with the minimum value suggested for a generally satisfactory QSAR model [28] presented in Table 5.
Validation of the QSAR model
The authentication of a QSAR model is mainly accomplished based on the chemical compound used in model development. It comprises activity estimate of the studied compounds and subsequent estimation of some validation parameters for verifying the accuracy of model predictions capacity. To judge the quality and goodnessoffit of the model, internal validation is an ideal technique. Internal validation, which is regularly used to select a better model among contending models, was done using the data that create the models. The following internal validation parameters were calculated:
the crossvalidated squared correlation coefficient (R^{2}_{CV} or Q^{2}):
Y_{obs.} and Y_{pred.} are the experimental and predicted response values respectively and \( {\overline{Y}}_{\mathrm{obs}.} \) is the average of the experimental biological activity value for the train set data. A satisfactory predictive model should have Q^{2} value greater than 0.5 [29].
Also, another important parameter R^{2}, known as the determination coefficient: is square of the correlation coefficient between the observed and predicted response values of the training set compounds. It is the most used parameter and may be computed based on the following expression:
Given that experimental and predicted response values of biological activity have been designated as Y_{obs}. and Y_{pred}. respectively, while \( \overline{Y} \) represents the average response value of the training set. R^{2} measures the explanatory power of the model describing the variation in the activity value of molecules used in building the model. A perfect model has an R^{2} value of unity (1) and as the value deviates from unity, the fit quality of the model declines. A good model is expected to have an R^{2} value at least equal to the threshold value of greater than or equal to 0.5 [30].
R^{2}_{adj}, known as explained variance: It is an adjusted form of determination coefficient which accounts for the effect of new explanatory variables in the model, by incorporating a degree of freedom to the model [30]. To reflect the described variance in a better way, R^{2}_{adj} is the candidate of choice since the inclusion of either relevant or irrelevant independent variables in multiple regression analysis often produces nondecreasing R^{2} value [31]. It may be computed with the following expression:
where N gives the number of molecules in the data, R^{2} is the determination coefficient, p is the number of descriptors in the model, and N1p is the degree of freedom [31].
The most essential consideration is the assessment of the generated model is external validation. Usually, prior to generating a QSAR model, the whole data set is shared into the train and test sets based on different algorithms The test set compounds are not involved in the training of the QSAR model and, hence, are used in external authentication procedure. The most recommended criteria for external validation are evaluated. In this, the biological activities of the test set compounds are predicted for determining the predictive power of the model. The most commonly used parameter for evaluating the predictive performance of the model is a coefficient of squared correlation (R^{2}pred.) for the test set that is evaluated by the following expression:
where Y_{obs. (test)}, Y_{pred.(test)}, and Ȳ_{(train)} are observed, predicted, and average values of biological responses for test and train sets, respectively. R^{2} value varies from 0 to unity (1), and it is recommended that it should not be less than 0.6 [32, 33].
Statistical Yscrambling evaluation
In this evaluation, random MLR models are created by haphazardly shuffling the dependent variable while keeping the independent variables untouched. The new QSAR models are expected to have considerably low R^{2} and Q^{2} values for numerous trials, which confirm that the developed QSAR models are robust. In the process, a very important parameter, cRp^{2} is likewise considered which should exceed a value of 0.5 for scaling through this test as recommended [29].
where R^{2} is the square correlation coefficient for the regression analysis of nonrandomized model and\( {\overline{\ R}}_{\mathrm{r}}^2 \) is the average of the square correlation coefficient for the regression analysis of all randomization scores.
Evaluation of the applicability domain of the model
The built QSAR model was also appraised based on the applicability domain (AD) method to prove that the model is robust and reliable to predict the (pIC_{50}) of compounds [34]. The leverage method was involved in defining and describing the applicability domain of models built [28]. Leverage of a given chemical compound, hi, is defined by Eq. 9:
where Z is the descriptor matrix and Z^{T} is the transpose of Z, and standardized residual (SDR) was obtained as follows:
where the experimental and predicted activity values for either of the datasets are represented by y and\( \hat{\ y} \) respectively and m is the number of molecules in the set under consideration for each case. Also, model AD is defined by the boundary 0 < h_{i} < h* and − 3 < SDR < 3. Meanwhile, h* indicates cautionary leverage value.
The cautionary leverage (h*) is also the boundary of values for X outliers and is defined as:
where g is the number of descriptors in the model and n is the number of compounds that are comprised of the train set used in building the model. A summary graphical evaluation of the model AD is the plot of SDR versus leverage h_{i} called William’s plot was made [35].
Multicollinearity test
The existence of a high degree of correspondence between the descriptors contained in the best descriptors arrangement reported by GFA was calculated with the variance inflation factor (VIF) value for each descriptor:
where R^{2}_{ij} is the correlation coefficient of the multiple regression between the descriptor i and the remaining j descriptors in the model [36].
Mean effect
MFj is defined as the mean effect for the considered molecular descriptor j, while bj is the coefficient of the descriptor j, Rij represents the values for the target descriptors of each molecule, and m is the total number of descriptors in the model. The ME values demonstrate the relative implication of a descriptor, associated with other descriptors in the model. Its sign shows the variation direction in the estimations of the model as an effect of the descriptor values.
Molecular docking studies
To gain a detailed understanding of the nature of the interaction of compounds with the DNV2 NS2BNS3 protease, molecular docking was accomplished with the help of Auto Dock Vina of PyRx v software tool. The binding energy determination and visual analysis of the docked compound were accomplished using AutoDock Vina of PyRx and Discovery Studio visualization software, respectively. The crystal structure of the DNV protease was obtained from the protein data bank (PDB Code 6mol). All the heteroatoms associated with the receptor were removed from the threedimensional structure of the DNV2 (NS2BNS3) receptor (Fig. 2a) and its structure was minimized, protonated, and saved in PDBQT format. Also, the 3D structures of the optimized compounds were converted to PDBQT format with the aid of AutoDock 4.2 software. The proteinligand interaction was analyzed and visualized with the aid of Discovery studio visualization software [37].
Results
QSAR model quality
Based on the genetic algorithm of the descriptors, a multilinear regression model was developed containing five (5) descriptors. The selected MLR(GA) model is represented by Eq. (14)
From the above model (Eq. (14)), it can be deduced that the five (5) most significant descriptors includes: ATS6e, AATSC6m, GATS2v, VR1_Dzv, and SpMax3_Bhv.
The plot of predicted pIC50 against experimental pIC50 values is displayed in Figs. 3 and 4, which shows close agreement between the predicted activity of the test set and that of the train set.
Table 6 gives a detail view of the numerical values of the train and the test sets as well as the respective predicted value which show minimal residual value thereby entailing good predictive strength of the model.
Table 7 provides the statistical internal validation parameters of the model obtained from the material studio program package.
The result of the Yrandomization test is shown in Table 8, indicating a robust model evidenced by its parameters.
The domain within which the models can predict the biological activity (pIC_{50}) and the absence of outlier as well as influential compound is depicted by Fig. 5.
Table 9 provides a detailed description of the descriptors in the model as well as their quality in terms of chance correlation and degree of contribution to the model.
Docking simulation studies
Table 10 summarizes the docking result presenting the binding scores, protease residue with interaction distance as well as interaction type. Figures 2a, b, 6a–d, and 7a,b showed prepared structure of the target (NS2BNS3) and 3D structure of the prepared ligand 21, 2D interaction type for ligand 1, 2, 7, and 11 with different amino acids in the active site of protease and 2D interaction type and Hbond molecular interaction between ligand 21 and the target respectively.
Discussion
GAMLR model (QSAR)
The GFA model was successfully built from 20 train set compounds of 25 and 5 descriptors were contained in the model. The built model was subsequently used to predict the biological activity values for both the train and test reported in Table 6.
The multiple linear regression of genetic function algorithm (mlrGFA) was used to produce three models; model (M1) was selected for its statistical significance as the best model with the following statistical parameters values (LOF = 0.088598, \( {R}_{adj}^2 \)= 0.939699, R^{2}_{pred} = 0.71922, cR\( {\displaystyle \begin{array}{c}2\\ {}p\end{array}} \) = 0.749517, Q^{2}_{CV} = 0.905909, and R^{2} = 0.955567). Nevertheless, the statistical significance of this model is based on the suggested authentication standard as contained in Table 5. Though, based on the model parameters above, the model (M1) has satisfied all the requirements for a satisfactory QSAR model. Having the abovestated validation values for satisfactory model values is an indication that the generated model has a good predictive capability. Five descriptors remained designated to construct the linear model, which was able to predict the corresponding pIC_{50} values of all the selected compounds using the MLRGFA statistical method.
The predicted pIC_{50} values for the training and test sets were plotted against the experimental pIC_{50} values as shown in Fig. 4. It is also noticeably from Fig. 3 that the calculated values for the pIC_{50} were in a pact with those of the experimental value, which entails the absence of error as observed in the model. A good correlation between experimental pIC_{50} compared to the estimated pIC_{50} of the compounds in the train set molecules was observed as demonstrated by Fig. 3, evidenced by the good correlation value (R^{2} = 0.955) in Table 7 which is in agreement with the required validation threshold as suggested in Table 5 is an indication of the robustness of the built model [26, 34].
Yrandomization test
The outcome of the Yscrambling test is depicted in Table 8 in which the values of R^{2} and Q^{2} are within the standard recommend statistical values. Also, the recommended value of greater than 0.5 was obtained for cR^{2}p which shows that the model has good predictive capacity.
Applicability domain
The applicability domain evaluation process as shown in Fig. 5 displays William’s plot of the dataset, for which standardized residuals for both the train and test dataset were plotted against their respective leverage values identified no outlier for the compounds as all the data points were inside the limit of ± 3 domain. However, one outlier (compound 1) was observed to have exceeded the precautionary leverage of (h* = 0.9). Furthermore, a close assessment revealed that it was not an actual outlier as disregarding this compound did not result in any perfection of the model statistical parameters and predictive strength and as such, it was retained. Furthermore, the other reason for not eliminating this compound from the test set was to avoid the use of excessively slight sort of endpoint values.
Variance inflation factor
Table 9 shows the list of descriptors, descriptions, classes, and other related statistical parameters (VIF) that possess a relevant influence on selected relevant descriptors. For all the five descriptors, the numerical values of the VIF were all less than 10 indicating that the specifications of the model were coronal, and the model’s consistency is of great significant [19, 38,39,40].
Descriptors interpretation and mean effect
To have an insight into relevant factors responsible for the biological activity of the compound, there is a need for interpretation of the descriptors in the model and their respective individual relevant contribution in the model. The molecular descriptors in model M1 are ATS6e, AATSC6m, GATS2v, VR1_Dzv, and SpMax3_Bhv which had the following individual mean effect values (0.186, 0.023, − 0.313, − 0.060, and 1.165) respectively as obtained from Table 9.
ATS6e is the BrotoMoreau autocorrelation—lag 6/weighted by Sanderson electronegativities. It measures the strength of the relationship between relative electronegativity of two atoms in a molecule which are separated by 6 bonds; it has a positive correlation coefficient. Increment in its numerical value favors the increase in antidengue activity of the compounds. Also, these observations suggest that electronegativity of atoms that made up the compound had a substantial effect on the activity (pIC_{50}).
Also, AATSC6m is the centered BrotoMoreau autocorrelation—lag 6/weighted by mass. It measures the strength of the relationship between relative atomic mass of the atom pairs in a molecule separated by 6 bonds; it has a positive correlation coefficient. Therefore, increment in its numerical value would lead to increment in pIC_{50} as well. The BrotoMoreau autocorrelation descriptors (ATS) are given by
where n is the atom number, δ_{ij}is the Kronecker delta function (if dij = d, zero otherwise, then δ_{ij}=1), d is the considered topological distance (the lag in the autocorrelation parameters), and wi and wj are the normalized atomic properties for atoms i and j respectively. The normalized van der Waals volume, atomic mass, and electronegativity can be appropriated for the atomic property.
GATS2v which is the Geary autocorrelation—lag 2/weighted by van der Waals volumes, which has a negative mean effect is suggested to contribute negatively to antidengue activity. It is evaluated or determined in the same way as the ATS but with the introduction of Geary coefficient; it measures the strength of the relationship between van der Waals volumes of two atoms in a molecule that are eight bond apart.
The Geary autocorrelation descriptors are given by
where Ŵ represents the average coefficient of the considered property for the molecule and Δ is the number of vertex pairs from a distance equal to d.
Furthermore, VR1_Dzv is the Randiclike eigenvectorbased index from the Barysz matrix/weighted by van der Waals volumes. It is negatively correlated to the antidengue activity meaning that decrease in its value enhances the activity of the compounds. They are based on the coefficients eigenvector associated with the largest negative eigenvalue of the distance matrix of a molecule.
However, SpMax3_Bhv is the largest absolute eigenvalue of Burden modified matrix—n 3/weighted by relative van der Waals volumes. It is obtained from modified connectivity matrix whose diagonal element is replaced by relative van der Waals volume of the atoms in the molecules. It also quantifies the topology of the chemical structure on the basis of connectivity of atoms present in the structure. This descriptor contributed the largest in determining the inhibitory activity which suggests its significance in the model as evidenced by its positive mean effect value.
The descriptors with positive mean effect value is an indication that an increase in the value of such descriptor will lead to an increase in the DENV2 inhibitory activity (pIC_{50}) while a negative value indicates negative influence and as such, decrease in such value will enhance the activity (pIC_{50}) also.
The docking studies
In this study, the molecular docking studies of the phthalazinone with the NS2BNS3 protease (PDB CODE: 6MO1) was investigated using AutoDock Vina of PyRx and Discovery Studio visualization software for energy grid calculations and visual analysis of the docking pose respectively for a detailed understanding of the nature of the described interaction of inhibitors (compounds 1, 2, 7, 11, and 21) with the DNV2 protease. The docking studies showed that these compounds docked well with the target and the binding affinity (− 8.7, − 8.8, − 8.7, − 8.3, and − 8.9 kcal/mol) of the 5 ligands under consideration with the target are all in close agreement with their respective pIC_{50} values (6.193, 6.21, 6.00, 4.64, and 6.886) for the targets respectively. Amino acids, numerical data of interaction distances, and binding free energies (ΔG) between the compounds, and NS2BNS3 protease are shown in Table 10. However, the result showed that compound 21 had the best binding interaction than the remaining 4. Compounds 1, 2, 7, and 21 were considered for the docking studies due to their high experimental activity (pIC_{50}) values while compound 11 was included to establish a basis for the variation in the observed biological activity with the ones with the best biological activity based on binding affinity due to its low pIC_{50}. Furthermore, it can be seen that in all the compounds docked, their binding energy corresponds with their inhibitory activity which shows that these compounds have great potentials. It can be seen in Table 10 that the compounds 1, 2, 7, 11, and 21 form conventional hydrogen bond with the following residue (GLN64, LEU1018: 2.22 Å, 2.33 Å), (TYR1023, GLN64, GLU66, LEU1018: 2.81 Å, 2.19 Å, 2.54 Å, 2.33 Å), (GLN64, GLU66: 2.31 Å, 2.34 Å), (ARG55, GLN64, ALA1108: 2.16 Å, 2.97 Å, 2.81 Å), and (ARG55, TYR1023, GLN64, LEU101: 2.64 Å, 2.88 Å, 2.24 Å, 2.49 Å) respectively. From Table 10 also, it is likely to verify among the compounds that the increase in the amount of halogen bond interactions and the type would result in the lowering of binding free energy, which indicates a higher degree of the spontaneity of the interactions, which is also evidenced by the absence of halogen in compound 11 despite having two conventional hydrogen bond as (compound 1), the observed high binding affinity in compounds 1, 2, 7, and 21 could be attributed to fluorine. Also, compounds 2, 7, and 21 form key conventional hydrogen bond with the carbonyl of the carbamate core and the fluorine of the phenyl ring on the phthalazin core in which they both act as a hydrogen acceptor, and this entail relevance of such functional group. Figure 7 depicts the hydrogen bond interaction formed between the compound 21 and the target. The hydrogen bond interaction formed between ligand 21 with the highest binding affinity of − 8.9 kcal/mol suggests that the observed biological activity is not obtained by chance since it forms the most stable complex and as such, it can be utilized as a model compound for improving antidengue activity of the phthalazinone derivatives.
Conclusion
The present study targeted to produce a highly predictive MLRGFA model capable of revealing the structural requirements for the experimental pIC_{50} of phthalazinone derivatives against dengue virus; the results from the acceptably validated model showed that the pIC_{50} of the studied molecules against dengue virus is determined by the descriptors ATS6e, AATSC6m, GATS2v, VR1_Dzv, and SpMax3_Bhv. The molecular docking simulation study reveals that among the studied compounds, compound 21 had the best binding energy (− 8.9 kcal/mol) and the binding energy of all the studied compounds correspond with their dengue virus inhibitory activity (pIC_{50}) and the most common interaction formed with the amino acid in all the studied compound with the receptor are hydrogen and hydrophobic interactions; the presence of fluorine has a significant effect as observed in those with better activity (pIC_{50}). The information provided by the QSAR model may simplify further design of novel and highly potent dengue virus inhibitors. The studies also revealed that the compounds docked well with the targets suggesting that the ligands are efficacious in the treatment of DNV2 infection.
Availability of data and materials
Not applicable.
Abbreviations
 DNV:

Dengue virus
 QSAR:

Quantitative structureactivity relationship
 GFA:

Genetic function approximation
 MLR:

Multiple linear regression
 NS:

Nonstructural protein
 PDB:

Protein data bank
References
 1.
Swain SS, Dudey D (2013) Antidengue medicinal plants: a minireview. Res Rev J Pharmacogn Phytochem 1(2):5–9
 2.
Beatty ME, Stone A, Fitzsimons WD, Hanna JN, Lam SK, Vong S (2010) Best practices in dengue surveillance: a report from the AsiaPacific and Americas Dengue Prevention Boards. PLoS Negl Trop Dis 4(11):890
 3.
Fatima Z, Idrees M, Bajwa MA, Tahir Z, Ullah O, Zia MQ (2011) Serotype and genotype analysis of dengue virus by sequencing followed by phylogenetic analysis using samples from three mini outbreaks20072009 in Pakistan. BMC Microbiol 11(1):200
 4.
Guzman MG, Alvarez M, Halstead SB (2013) Secondary infection as a risk factor for dengue hemorrhagic fever/dengue shock syndrome: a historical perspective and role of antibodydependent enhancement of infection. Arch Virol 158:1445–1459
 5.
Balasubramanian A, Teramoto T, Kulkarni AA, Bhattacharjee AK, Padmanabhan R (2017) Antiviral activities of selected antimalarials against dengue virus type 2 and Zika virus. Antiviral Research 137:141–150
 6.
Stephenson JR (2005) Understanding dengue pathogenesis: implications for vaccine design. Bull World Health Organization 83:308–314
 7.
Yao Y, Huo T, Lin Y, Nie S, Wu F, Hua Y, Wu J, Kneubehl AR, Vogt MB, RicoHesse R, Song Y (2019) Discovery, Xray crystallography and antiviral activity of allosteric inhibitors of flavivirus NS2BNS3 protease. J Am Chem Soc 141(17):6832–6836
 8.
Toepak EP, Tambunan USF (2017) In silico design of fragmentbased drug targeting host processing αglucosidase i for dengue fever. Mater Sci Eng. 172:01201
 9.
Nitsche C, Holloway S, Schirmeister T, Klein CD (2014) Biochemistry and medicinal chemistry of the dengue virus protease. Chem Rev 114(22):11348–11381
 10.
Stevens AJ, Gahan ME, Mahalingam S, Keller PA (2009) The medicinal chemistry of dengue fever. J Med Chem 52(24):7911–7926
 11.
Lim SP, Wang QY, Noble CG, Chen YL, Dong H, Zou B, Yokokawa F, Nilar S, Smith P, Beer D, Lescar J, Shi PY (2013) Ten years of dengue drug discovery: Progress and prospects. Antiviral Res 100(2):500–519
 12.
Nguyen TTH, Lee S, Wang HK, Chen HY, Wu YT, Lin SC, Kim DW, Kim D (2013) In Vitro Evaluation of Novel Inhibitors against the NS2B and NS3 Protease of Dengue Fever Virus Type 4. Molecules 18:15600–15612
 13.
Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, Drake JM, Blanton RE, Silva LK, VG VGM (2008) Genetic ancestry and income are associated with dengue hemorrhagic fever in a highly admixed population. Eur J Hum Gen 16(6):762–765
 14.
Guzman A, Isturiz RE (2010) Update on the global spread of dengue. Int J Antimicrob Agents 36:40–42
 15.
De PM, Zanotto A, Gould EA, Gao GF, Harvey PH, Holmes EC (1996) Population dynamics of flaviviruses revealed by molecular phylogenies. Proc Natl Acad Sci USA 93(2):548–553
 16.
Vannice KS, WilderSmith A, Barrett ADT, Carrijo K, Cavaleri M, de Silva A, Durbin AP, Endy T, Harris E, Innis BL, Katzelnick LC, Smith PG, Sun W, Thomas SJ, Hombach J (2018) Clinical development and regulatory points for consideration for secondgeneration liveattenuated dengue vaccines. Vaccine 36:3411–3417
 17.
Joubert J, Foxen EB, Malan SF (2018) Microwave optimized synthesis of N(adamantan1 yl)4[(adamantan1yl)sulfamoyl] benzamide and its derivatives for antidengue virus activity. Molecules 23(7):1678
 18.
Kadir SLA, Yakoob H, Zulkifli RM (2013) Potential antidengue medicinal plants: a review. J Nat Med 67:677–689
 19.
Roy K, Kar S, Das RN (2015) A primer on QSAR/QSPR modeling: fundamental concepts. Springer
 20.
Dastmalchi S, HamzehMivehroud M, Sokouti B (2018) Quantitative structureactivity relationship: a practical approach, vol 53. CRC Press
 21.
Lu D, Liu J, Zhang Y, Liu F, Zeng L, Peng R, Zuo J (2018) Discovery and optimization of phthalazinone derivatives as a new class of potent dengue virus inhibitors. Eur J Med Chem 145:328–337
 22.
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Consonni V (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010
 23.
Dearden JC, Cronin MT, Kaiser KL (2009) How not to develop a quantitative structureactivity or structureproperty relationship (QSAR/QSPR). SAR QSAR Environ Res 20(34):241–266
 24.
Li Z, Wan H, Shi Y, Ouyang P (2004) Personal experience with four kinds of chemical structure drawing software: a review on ChemDraw, ChemWindow, ISIS/Draw, and ChemSketch. J Chem Info Comput Sci 44(5):1886–1890
 25.
Hehre WJ, Huang WW (1995) Chemistry with Computation: An introduction to SPARTAN. Wavefunction, Inc
 26.
Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488
 27.
Friedman JH (1991) Multivariate adaptive regression splines. The annals of statistics.:1–67
 28.
Veerasamy R, Rajak H, Jain A, Sivadasan S, Varghese CP, Agrawal RK (2011) Validation of QSAR modelsstrategies and importance. Int J Drug Des Discov. 3:511–519
 29.
Golbraikh A, Tropsha A (2002) Beware of q2! Journal of molecular graphics and modelling. 20(4):269–276
 30.
Larose S, Frédéric G, Michel B (2002) Attachment, social support, and loneliness in young adulthood: A test of two models. Personal Soc Psychol Bull. 28(5):684–693
 31.
Brandon K, Aline O (2015) Comprehensive R archive network (CRAN)
 32.
Roy PP, Leonard JT, Roy K (2008) Exploring the impact of the size of training sets for the development of predictive QSAR models. Chemometr Intell Lab Syst 90(1):31–42
 33.
Roy K, Ambure P (2016) The double crossvalidation software tool for MLR QSAR model development. Chemometr Intelligent Lab Syst 159:108–126
 34.
Tropsha A, Gramatica P, Gombar VK (2003) Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77
 35.
Dimitrov S, Dimitrova N, Parkerton T, Comber M, Bonnell M, Mekenyan O (2005) Baseline model for identifying the bioaccumulation potential of chemicals. SAR QSAR Environ Res 16(6):531–554
 36.
Weintrop D, Beheshti E, Horn M, Orton K, Jona K, Trouille L, Wilensky U (2016) Defining computational thinking for mathematics and science classrooms. J Sci Educ Technol 25(1):127–147
 37.
Olasupo SB, Uzairu A, Sagagis BS (2017) Density functional theory (B3LYP/631G*) study of the toxicity of polychlorinated dibenzofurans. Int J Comput Theor. Chem 5:4–24
 38.
Ameji JP, Uzairu A, Samuel H, Oluwaseye A, Samuel AN, Chinweuba OC (2015) Insilico prediction of octanolair partition coefficient of some persistent organic pollutants through QSPR modelling. J Comput Methods Mol Design 5:46–60
 39.
Shapiro S, Guggenheim B (2008) Inhibition of oral bacteria by phenolic compounds. Part 1. QSAR analysis using molecular connectivity. Quantitative Structure Activity Relationships 17(04):327–337
 40.
Jaiswal M, Khadikar PV, Scozzafava A, Supuran CT (2004) Carbonic anhydrase inhibitors: the first QSAR study on inhibition of tumorassociated isoenzyme IX with aromatic and heterocyclic sulfonamides. Bioorganic Med Chem Letters 14(12):3283–3290
Acknowledgements
The authors gratefully acknowledged the technical effort of Dr. David Ebuka of chemistry department, Ahmadu Bello University, Zaria.
Funding
Not applicable.
Author information
Affiliations
Contributions
SNA designed and wrote the manuscript; GAS, PAM, and AI supervised and carried out the statistical analysis. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Adawara, S.N., Shallangwa, G.A., Mamza, P.A. et al. Molecular docking and QSAR theoretical model for prediction of phthalazinone derivatives as new class of potent dengue virus inhibitors. BeniSuef Univ J Basic Appl Sci 9, 50 (2020). https://doi.org/10.1186/s43088020000739
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s43088020000739
Keywords
 Dengue
 QSAR
 GFAMLR, NS
 Phalazinones
 Descriptors
 Validation
 Docking