In silico studies of novel pyrazole-furan and pyrazole-pyrrole carboxamide as fungicides against Sclerotinia sclerotiorum

Pyrazole-furan and pyrazole-pyrrole moiety are among the molecular structures that were found to have an extensive range of applications in the field of medicine and agrochemical due to their wide spectrum of biological activities. These include antimicrobial activity, anti-glaucoma activity, ocular hypertension activity, and antifungal activity. An in silico study was carried out on 37 compounds of pyrazole-furan and pyrazole-pyrrole carboxamide derivatives against Sclerotinia sclerotiorum. Using Spartan 14 software, optimization of the compounds was performed at the DFT/B3LYP/6-31G* quantum mechanical method. PaDEL descriptor software was used to calculate the molecular descriptors, and a Generic Function Approximation (GFA) was employed to generate the model. Out of four models generated, model 1 was found to be the optimal and has the following statistical parameters; R2 = 0.83485, R2adj = 0.793563, cross-validated R2 = 0.74037, and external R2 = 0.58479. Molecular docking study was carried out between the antifungal compounds, and the binding site of S. sclerotiorum (PDB CODE 2X2S) in which compound 7 was identified to have the highest binding energy of − 7.5kcal/mol. This compound “7” has a strong affinity with the macromolecular target point of the S. sclerotiorum (2x2s), producing H-bond and as well as the hydrophobic interaction at target point of the amino acid residue. Considering compound 7 as our scaffold, four (4) more potent compounds (7a, 7b, 7c, and 7d) were designed using optimization method of structure-based designed which have the following docking score, − 7.7, − 7.8, − 7.7, and − 7.7kcal/mol. Statistical analyses including variance inflation factor (VIF), mean effect (ME), and applicability domain were conducted on the model. Considering an interpretation of the descriptors given in the discussion, the QSAR model provided an idea of ligand-based design while the molecular docking gave an insight on structure-based design of the new compounds with better activity against S. sclerotiorum in which four (4) compounds 7a, 7b, 7c, and 7d were designed and discovered to be of high quality and have greater binding affinity compared to the one obtained from the literature (compound 7).

The quantitative analysis of the structure-activity relationship (QSAR) is among the major efficient methods to optimize the main compounds and design novel compounds. QSAR is used to predict bioactivity of the compounds such as toxicity, carcinogenicity, or mutagenicity, depending on the structural characteristics of the molecules and the actual mathematical models. Nowadays, one can easily and accurately calculate quantum chemical parameters of the compounds due to fast development in computer technology as well as theoretical quantum chemical study which helped in predicting the new compounds with better activity than the existing ones. This quantum chemical calculation is extensively applied while forming the QSAR models [13]. Molecular docking helped to investigate the capacity of the prepared compounds toward interaction with the protein residue of the target organism and to also predict the preferred orientation of the molecules.
The objective of the current work is to build a new model that can predict the activity of chemical products with much better activity against S. sclerotiorum using Genetic Function Approximation (GFA) and molecular docking techniques.

Dataset
In this work, a series of 37 compounds were used to generate the relationship between chemical traces of the compound and its antifungal activity. These 37 compounds of novel pyrazole-furan and pyrazolepyrrole carboxamide were obtained from the previous work [24]. The logarithm of the measured EC 50 (μM) against antifungal activity given by pEC 50 (p EC 50 = − log 1/EC 50 ) was taken as a dependent variable; therefore, the data was linearly correlated with the independent variables (descriptors) [10]. The dataset was represented in Table 1 and Fig. 1.

Optimization
The dataset was optimized at a level of density function theory (DFT) by applying Becke's three-parameter read-Yang-Parr hybrid (B3LYP) function together with a "6-31G *" basis set of Spartan14 software [6]. Graphical-userinterface of Spartan14 software was used to draw the 2D molecular structures of the dataset which were later exported in the form of 3D. The optimized structures were then taken to PaDEL descriptor software to generate the quantum molecular descriptors [25].

Molecular descriptors calculations
Molecular descriptors are the properties of the molecule in numerical/mathematical values. PaDEL descriptor software was used to further calculate additional energy of those low-energy conformers, where a total of 1875 descriptors were calculated [1].

Data division
To get a validated model, the dataset was divided into training and test sets (3:1). Following the Kennard-Stone algorithm method, the division was performed in such a way that the compounds forming the training set (70% of the data) and the test set (30% of the data) were shared within an entire descriptive space filled by the complete dataset [7].

Model building and validation
The generated molecular descriptors were taken for regression analysis, with experimental activities as dependent variables and the molecular descriptors served as independent variables. Using the Genetic Function Approximation method (GFA) incorporated in the Material Studio 2017 software [9], the training set compounds were utilized to develop the QSAR model. Four QSAR models were built, and the best model was chosen according to the one with the lowest score of lack of fit (LOF) given as follows: where SSE represents the sum of squares of errors, d is a smoothing parameter defined by the user, c is the number of terms a model possessed in addition to the constant term, M gives the number of samples present in the training set, and p is the overall number of descriptors present in all terms of the model excluding the constant term [11].

Internal validation
The generated model was validated internally by the following parameters: a. The correlation coefficient (R 2 ): explain the division of overall variation ascribed to the built model. The accepted value of R 2 ranges from 0.5 to < 1 and the more the value of R 2 approaches 1.0 the better the model. Though there are other analyses that the model must pass before we can consider it a good model, being the most common internal validation pointer, R 2 is expressed as follows: where Y expt , Y predt , and Y train represent the experimental, predictive, and average activities of the training set [3].
b. Adjusted R 2 : The value of R 2 is inconsistent to evaluate the power of the built model; thus, R 2 is adjusted to restore and stabilize the model. This adjusted R 2 is defined in Eq. iii as: where p presents the number of descriptors that constituted the model, while n gives the number of training set compounds [14].
c. Cross-validated R 2 : The validity of the models was identified by a cross-validation test measured by predictive Q 2 cv . For a leave one out (LOO) crossvalidation, a data point is eliminated (left-out) in the set and the model is readjusted and then compared the predicted value of the eliminated data point to its real value. This is repeated until each data removed. We can then calculate the value of Q 2 cv using the sum of the squares of these elimination residues as in the below equation: where Y expt , Y predt , and Y train represent the experimental, predictive, and average activities of the training set [2].

External validation
The prediction capacity of the model was examined by an external validation through the ability of the model to predict the activity values of the test set compounds as well as its application in the calculating the predicted value of R 2 pred according to the equation below: where Y predt and Y expt are the test set's experimental and predicted activities while Y train gives the average activity of the training set [5].
2.6 Statistical analysis of the descriptors 2.6.1 Variance inflation factor (VIF) It was defined as the measure of multicollinearity amongst the independent variables (i.e., descriptors). It quantifies the extent of correlation between one predictor and the other predictors in a model.
where R 2 gives multiple correlation coefficient between the variables within the model. If the VIF is equal to 1, it means there is no inter-correlation in each variable, and if it ranges from 1 to 5, then it is said to be suitable and acceptable. But if the VIF turns out to be greater than 10, it indicates the instability of the model and needs to be reexamined [16,19].

Mean effect (ME)
The average effect (mean effect) correlates the effect or influence of given molecular descriptors to the activity of the compounds that made up the model. The sign of descriptors shows the direction of their deviation toward the activity of compounds. That is to say, an increase or decrease in the value of the descriptors will improve the activity of the compounds. The mean effect is defined by the following:  To evaluate the significance of the model, the ME of all the descriptors was calculated [11].

Applicability domain
To confirm the reliability of the model and to examine the outliers as well as the influential compounds, it was very important to evaluate its domain of applicability. It aimed to predict the uncertainty of a compound depends on its similarities to the compounds used in building the model and also the distance between the training and test sets of the compounds. This could be achieved by employing William's plot which was plotted using standardized residuals versus the leverages. The leverages for a particular chemical compound was given as: where h i = leverage for a particular compound and Z i = matrix i of the training set. Z = nxk descriptor-matrix for the training set compounds. Z T = transpose of the Z matrix. The warning leverage (h*) that is the boundary for usual values of Z outliers is given by; where n = number of compounds in the training set and p is the number of descriptors present in the model [15].

Ligand and receptor preparation
From the RCSBPDB (www.rcsb.org), the PDB format of the receptor was successfully downloaded. This was then taken to the discovery studio for an appropriate  preparation whereby all the residues associated with the downloaded receptor (such as a ligand, water molecules, and other traces) were removed. The ligands (the optimized compounds) which were in the SDF file were transformed into a PDB file format. The prepared structure of S. sclerotiorum and prepared compounds were docked using Autodock Vina 4.2 [22]. Discovery Studio Visualizer was also used to visualize the docking results. Figures 2  and 3 showed the prepared receptor and ligand [15].

Optimization method of structure-based design
This refers to the optimization of known molecules through evaluating its proposed analogs within the binding cavity [17]. Discovery studio was used to visualize the receptor-ligand interactions in which different interactions such as H-bond and hydrophobic interaction formed between compound 7 and the receptor (PDB ID: 2x2s) were studied. Based on the knowledge of this interaction, the designed compounds were proposed in which they were drawn, optimized, converted to PDB, and later docked with the receptors to record their potency.

QSAR model
QSAR examination was carried out to relate the structureactivity relationship of the novel pyrazole-furan and pyrazole-pyrrole carboxamide derivatives as potent inhibitors of S. sclerotiorum. Five descriptors were utilized in the construction of the QSAR model to predict the activities of the inhibitory compounds based on the Genetic Function Approximation (GFA). The first model (vi) was chosen as the optimal model due to its statistical significance.
All the validation parameters that signify the stability, robustness, and the prediction capability of the model were presented in Table 2. The names, symbols, and classes of the five (5) selected descriptors that made up the model are presented in Table 3.
From the Table 2 of validation parameters, the highly calculated R 2 value (0.835) for the predicted activities indicated that the model is internally good. Tables 4 and 5 represent the external validation and the calculation of predicted R 2 of the best-chosen model.

Interpretation of descriptors
ATSc5 and ATSm1 are 2D auto-correlation descriptors built by Todeschini and Consonni and defined by Moreau-Broto autocorrelation as ATS autocorrelation descriptor, weighted by charges (ATSc5), and weighted    by scaled atomic mass (ATSm1) [21]. It is given by the equation below: where w = any atomic property, A = number of atoms, d = topological distance, ∂ ij = Kronecker delta, m B = mth order binary spare matrix, and w = A-dimensional vector of atomic properties. ATSc5 and ATSm1 consist of negative mean effects which imply that they harm the predictive activity of the model, which means that for any increase in these descriptors there will be a decrease in the activity of the compounds. The 2D descriptor "nBondsM" defined as the dimensionless total number of bonds that have bond order greater than one and is encoded to an unsaturation. It referred to as the number of non-kekulized structures. It has a positive mean effect which signified an increase in it, will enhance the activity of the chemical compound. A simple cluster descriptor MDEC-13 (defined as molecular distance edge between all primary and tertiary carbons) is a 2D descriptor with a positive mean effect which signifies an increase of a simple cluster to a compound increases the activity of that compound. An E-State descriptor "ndsN" defined as Count of atom-type E-State: =N-is a 2D descriptor with positive mean effects which indicated that an increase in these descriptors enhances the activity of the compounds. The experimental, predictive, and residual activities for both training and test sets are shown in Table 6. The residual value is the difference between the predicted and actual activities.

Pearson's correlation
To evaluate the relationships between each of the descriptors used in the model, Pearson's correlation was carried out on the descriptors of the built model and the results were presented in Table 7. The results show that the descriptors are significantly intercorrelated because none of their correlation coefficients are up to 0.5, and this indicates the robustness as well as the stability of the model. The Variance Inflation Factor (VIF) values for each of the five descriptors were not up to 2, which indicated that the descriptors and the model are stable and accepted. Figure 5 presents a graph of observed activity versus standardized residual and shows a random dispersion at the baseline where the standardized residual is zero. This shows no systematic error occurred in the built model. Table 8 showed the standard regression coefficients "bj," the values of mean effect (MF) and confidence interval (p values). These give vital information on the impact and contribution of the descriptors toward the built model. The individual capability and inducing power of the selected descriptors toward the activity of the compounds depend on their values,   signs, and their mean effects as well. The p values of the five descriptors (at 95% c.l.) that made up the model are all < 0.05; this implies that there is a significant relationship among the descriptors (as contrary to the null hypothesis) and the inhibitory concentration of the compounds.

Applicability domain
A graph of leverages for each compound of dataset versus their standardized residuals (in terms of William's plot) was plotted to discover the outliers and the chemical influential values of the model. The domain of applicability was established within a box at ± 2.5 limits for the residuals and leverage threshold h * (where h* calculated to be 0.67). The result shows that except one (1) compound from the test set (with EC 50 of 1.91275), all the molecules in the dataset are within the box of the applicability domain of the model. This may be characterized by differences in chemical structures considering the rest of the compounds highlighted in the dataset. Figure 6 shows William's plot.

Docking studies
A molecular docking study was performed between the ligands (compounds) and the target site of S. Sclerotiorum to investigate the binding affinity of the molecules with the macromolecular active site of the fungus. All the ligands show an interaction with the active site of the S. sclerotiorum, which is to say they inhibit the activity of the fungus. Some ligands show high binding energy that varies from -6.9 to -7.5 kcal/mol as presented in Table 8. However, compound 7 shows the highest docking score of -7.5kcal/mol. Compound 7 with the lowest binding free energy, possessed an interaction mode with H-bond of TYR63 and 2.86621 bond length and hydrophobic interaction of TYR107 and PRO101. The interaction between the compound with the highest docking score and the binding pocket of the receptor is shown in Fig. 7 while Fig. 8 is the 2D hydrogen bond interaction of compound 7 with the receptor. Table 9 shows the binding affinity,     hydrogen bond, and hydrophobic interaction of the high docking score compounds.

Design
In this study, an optimization method of structure-based drug design approach was employed to design new antifungal compounds with a better activity using compound 7 as a scaffold (which has binding free energy/ docking score of − 7.5kJ/mol). On the bases of the interaction between the compound and the receptor, four (4) derivatives of the compound were designed through structural modification of the compound as understood from its docking analysis. All the proposed compounds were docked using autodock vina and their binding free energies were recorded. All the designed compounds were found to be more potent than the scaffold (compound 7) with docking scores of -7.7, -7.8, -7.7, and also -7.7 kJ/mol as represented in Table 10. The name, structure, and binding affinity of the designed compounds are represented in Table 10 above.

Conclusion
This research involves a QSAR and molecular docking studies on 37 compounds of pyrazole-furan and pyrazole-pyrrole carboxamide derivatives against S. sclerotiorum. Density function theory (DFT) was used for molecules optimization, while Generic Function Approximation (GFA) was employed in generating the built model. Out of four models built, the first model was identified to be the optimal with the following statistical parameters; R 2 = 0.83485, R 2 adj = 0.793563, crossvalidated R 2 = 0.74037, and external R 2 = 0.58479. A decrease in negative descriptors (like ATSc5 and ATSm1) and an increase in positive descriptors (like nBondsM, MDEC-13, and ndsN) will improve the activity of the compounds against S. sclerotiorum. According to the docking scores, most of the ligands (compounds) show good inhibitory activity against S. sclerotiorum. However, ligand 6 showed a higher binding affinity of -7.5 kcal/ mol. This compound has a strong affinity with the macromolecular target point of the S. sclerotiorum (2x2s), producing hydrogen bond as well as the hydrophobic interaction at the target point of amino acid residue. QSAR model gave an idea of ligand-based design while the molecular docking gave an insight on structure-based design of the more potent compounds against S. sclerotiorum in which four (4) compounds 7a, 7b, 7c, and 7d were designed and discovered to be of high quality and have greater binding affinity compared to the one obtained from the literature (compound 7).