QSAR and molecular docking studies of novel 2,5-distributed-1,3,4-thiadiazole derivatives containing 5-phenyl-2-furan as fungicides against Phythophthora infestans

The 1,3,4-thiadiazoles are among the structural moieties that were found to be of utmost importance in the fields of pharmacy and agrochemicals because of their widespread biological activity that includes anti-tumor, antibacterial, anti-inflammatory, antihypertensive, anti-tuberculosis, anticonvulsant, and antimicrobial, among others. QSAR and molecular docking studies were carried out on thirty-two (32) derivatives of 2,5-disubstituted-1,3,4-thiadiazoles for their antifungal activities toward Phytophthora infestans. Using the “graphical user interface” of Spartan14 software, the structure of the compounds of the dataset is drawn and then optimized at DFT/B3LYP/6-31G* quantum mechanical method of the software. Molecular descriptors of the optimized compounds were calculated and later on divided into the training set and test sets (at a ratio of 3:1). The training set was used for model generation and the test set was for external validation of the generated model. Four models were generated by the employment of genetic function approximation (GFA) in which the optimal model (4) turned out to have the following statistical parameters: R2 = 0.798318, R2adj = 0.750864, cross-validation R2(Q2cv) = 0.662654, and external validation R2pred = 0.624008. On the molecular docking study of thiadiazole compounds with the target protein of Phytophthora infestans effector site (PDB ID: 2NAR ), compound 13 shows the highest binding affinity with − 9.3 kcal/mol docking score and composes hydrophobic as well as H-bond interactions with the target protein (2NAR). The result of the QSAR study signifies the stability and robustness of the built model by considering the validation parameters and this gave an idea of template/ligand-based design while the molecular docking study revealed the binding interaction between the ligand and the protein site which gave an insight toward an “optimization method” of the structure-based design for the discovery of more potent compounds with better activity against Phytophthora infestans using the approach of computer-aided drug design (CADD) in plant pathology.


Background
Phytophthora infestans (also called potato blight) may be the most destructive of all plant pathogens that excessively damage potato/Irish potato leading to famines and immigration in the nineteenth century [13,14]. Some of the signs and symptoms of this disease can be seen as a white color in potato. P. infestans generate some sporanges on the steam and leaves of potato [15]. The sporanges always displayed at the lower superficies of the leaves. However, as in the case of tuber blight, the white hypha usually appears at the superficies of the tuber [11]. In normal circumstances, P. infestans perfect its life on potato or tomato leaves at approximately 5 days [22]. The sporanges formed at the surface of foliage thereby dispersing through plants at over 10°C (50°F) temperature and humidity of above 75-80% in two or more days. Sometimes the spores are washed away by the rain which gets into the soil and infect the earlystage tubers; and also, these spores can make long distances into the air which can easily get into another host. The premature levels of the blight may have disappeared. Some of the symptoms involve dark blotches displayed at the extreme end of the leaf and on the plant's stem. A grey/dark patch developed on the affected tuber which covered the skin and rapidly decomposed it to an unpleasant odor. And apparently, healthy tubers may later become rotten while stored. According to the FOA report, the most thrusting/attacking problem in the third world apart from poverty must be food shortages. Farmers in Africa are encountering distinct limitations in food production as well as cash crops. Some of those limitations include damages from diseases and pests like fungi. In the search for food and the fight for human survival, the Irish potato has a significant role to play in food supply and, therefore, has been an instrument in addressing the issues of food insecurity, due to its performance in a given area and in a given time. This potato blight cause excessive economic loose, the annual economic loose caused by P. infestans in the developing countries begins to approach a $3-billion mark [5]. Due to its rapid adaptation to the various management skills (such as genetic resistivity), control of this plant pathogen is really challenging [10]. And this makes the synthesis of novel compounds that will inhibit the dangerous P. infestans to be among the most considerable in the field of agrochemicals. Some of these researches include computational studies. The 1,3,4-thiadiazoles derivatives are among the structural moieties that were found to be of utmost importance in the fields of pharmacy and agrochemicals for their widespread biological activity such as anti-tumor [28], antibacterial [25], anti-inflammatory [19], antihypertensive [30], antituberculosis [23], anticonvulsant [18], and antimicrobial [2], among others. Furthermore, reports identify that compounds containing furan are intensively bioactive. Several researches on the derivatives of furan such as "pyrazole and triazole [6], diacylhydrazine derivatives [7]" containing 5-phenyl-2-furan moiety were carried out in which there appeared to have extensive biological activities including fungicidal and insecticidal activities, among others.
The Quantitative structure-activity relationship (QSAR) study aims to develop correlation models considering the activity of compounds and other chemical information in a statistical approach [16,27] which will lead us to the design of new compounds. While molecular docking study is "a way of predicting the favorable orientation of one molecule to another when reacted to produce a stable complex", it will also lead us to the design of more potent compounds.
Our aim in this research work is to predict highly active compounds by the employment of Genetic function approximation (GFA) and perform a molecular docking study between the 1,3,4-thiadiazole compounds and the 2NAR protein of P. infestans to predict their stable molecular orientation.

Dataset
Thirty-two derivatives of 2,5-disubstituted-1,3,4-thiadiazole derivatives containing 5-phenyl-2-furan used in this work were taken from the literature [8]. The activity of the compounds were reported in EC50 (g/L) values, which were converted to pEC50 (pEC50 = − log1/EC50). Presented in Fig. 1 and Table 1 are the molecular structures and their corresponding activities found in the dataset.

Molecular structure optimization
The structures of the compounds were optimized at the "Density function theory (DFT)" level, "Becke's threeparameter Lee-Yang-Parr hybrid functional (B3LYP)" version together with the "6-31G*" basis set of Spartan14 [4]. In this process, all the molecular structures were drawn in the graphical user interface of Spartan14 software. The energies of the drawn molecules were minimized using Molecular Mechanics Force Field (MMFF) calculation [3].

Molecular descriptor calculations
Molecular descriptors are the properties of the molecule in numerical/mathematical values. PaDEL descriptor software was used to further calculate additional energy of those low-energy conformers, where a total of 1875 descriptors were calculated.

Dataset splitting
Using Kennard-Stone algorithm technics, the dataset of 32 compounds was split into two: the training set and the test set (70% of the training and 30% to the test set) which is found in DatasetDivision GUI 1.2 software. In this technic, the training and the test set were used for model development and its validation (externally) [12].

Model building
The training set in the dataset was used for model generation through the employment of the GFA method available in the material studio. The regression analysis occurs by considering the inhibition concentration (pEC 50 ) as a dependent variable while the chosen descriptors served as independent variables.

Internal validation
Internal validation of 22 compounds of the training set took place in the software (Material studio) used for building the model. The validation parameters are as follows:

Cross-validation
This parameter was used to determine the ability of the QSAR model in predicting the activities of newly designed compounds. This indicates the stability of the built model.
where Y exp is the "observed/experimental activity", Y pred is the "predicted activity", and Y is the "mean value of the observed activity".

Friedman's lack of fit (LOF)
The parameter describes the measure of the fitness of the model and it is given by equation. ii below: C is the "number of terms in the model", d is the "user-defined smoothing parameter", P is "the total number of descriptors in the model", and M is "the number of molecules in the training set".
The regression model is given by the straight line graphs' equation, "(Y = mx + c)", where Y is the predicted activity (pEC 50 ), D is the corresponding coefficients, x is the independent variable, and c is the regression constant [17].

The correlation coefficient (R 2 )
This is another parameter used to assess the model. The closer the value of R 2 to 1.0, the better the model generated. R 2 is expressed as: The value of R 2 changes instantly with an increase in descriptors; therefore, the reliability of R 2 in measuring the stability of a given model is very minimal. Thus, R 2 has to be adjusted in order to have a fit and strong model. The following equation define adjusted R 2 as [1]: where P is the number of independent variables possessed by the model and n is the number of training sets' compounds [21].

External validation
The model generated was further validated with the test set of the dataset in order to measure its level of competence in predicting the activity of new compounds. This was done by evaluating the values of the square of the calculated R 2 of the test set values. The closer the R 2 is to 1.0, the better the robustness, fitness, and the prediction capacity of the model as well. Though sometimes R 2 value does not matter if the model fails other statistical analyses such as variance inflation factor (VIF) and mean effect, among others. The coefficient of determination R 2 pred is given by the following equation: where Y pred test and Y exp test are the values of predicted and experimental activities for the test set and Y train is the average activity for the training sets' values [3].

Statistical analysis of the descriptors 2.8.1 Variance inflation factor (VIF)
VIF is defined as the measure of multicollinearity amongst the independent variables (i.e., descriptors). It quantifies the extent of correlation between one predictor and the other predictors in a model.
where R 2 gives multiple correlation coefficient between the variables within the model. If the VIF is equal to 1, it means there is no intercorrelation in each variable, and if it ranges from 1 to 5, then it is said to be suitable and acceptable. But if the VIF turns out to be greater than 10, this indicates the instability of the model and need to be reexamined ( [20,26].

Mean effect (ME)
The average effect (mean effect) correlates the effect or influence of given molecular descriptors to the activities of the compounds that made up the model. The descriptor signs show the direction of their deviation toward the activity of the compounds. That is to say, an increase or decrease in the value of the descriptors will improve the activity of the compounds. The mean effect is defined by the following: where B j and D j are the j-descriptor coefficient in the model and the values of each descriptor in training set, while m and n stand for the number of molecular descriptors as well as the number of molecules in a training set.
To evaluate the significance of the model, the mean effect of each descriptor was calculated [9].

Applicability domain
To confirm the reliability of the model and to examine the outliers as well as the influential compounds, it is very important to evaluate the applicability domain of the built model. Its aim at predicting the uncertainty of a compound depends on its similarities to the compounds used in building the model and also the distance between the training and test set of the compounds. This can be achieved by employing William's plot which was plotted using standardized residuals versus the leverages. The leverages for a particular chemical compound is given as follows: where h i is the leverage for a particular compound and Z i is the matrix i of training set. Z is the nxk descriptormatrix for a training set compound. Z T is the transpose of the Z matrix. The warning leverage (h*) that is the boundary for normal values of Z outliers is given by; Where n is the number of molecules in the training set whereas p gives the amount of descriptors presence in the built model [17].

Molecular docking studies
With the aid of Autodock Vina of Pyrex software and Discovery Studio, a molecular docking study was performed between 2,5-disubstituted-1,3,4-thiadiazole derivatives and P. infestans effector target site to examine the interaction between the binding pocket of the effector and the compounds (i.e., the ligands). A highly resolute crystal structure of P. infestans was downloaded successfully from the protein databank (PDB Code: 2NAR). The downloaded substrate was carefully prepared using Discovery Studio which was later transported to the Pyrex for the docking calculation. With the aid of Spartan14 version 1.1.4, the optimized compounds of 2,5-disubstituted-1,3,4-thiadiazole derivatives (the ligands) were converted to PDB files [24]. The prepared structure of P. infestans effector site and prepared ligands were docked using Autodock Vina 4.2 [29]. Discovery Studio Visualizer was also used to visualize the docking results (Fig. 2).

Model building and validation
Below is the equation of the best-chosen model (4).   Table 4 and Table 5 presented the external validation and calculation of predicted R 2 of the chosen model.

Statistical analyses of the descriptors
The following are the different analyses: Pearson's correlation, standard regression coefficients, standardized predicted activity against experimental activity, standardized residual against experimental activity (pEC50), and William's plot.

The results of the docking study
The results can be seen in the receptor-ligand interaction, H-bond interactions, and hydrophobic and electrostatic interactions.

QSAR model
The best QSAR model was generated using the GFA method. Four descriptors were used in building the model where four different models were generated and model 4 found to be the best following the statistical parameters. All the values obtained matches the minimum value for evaluating the QSAR model. These values signify that there is a high correlation between the predicted and experimental activity (pEC50, Fig. 3). Internal and external validations, as well as the other statistical analysis, made the model 4 to stand fit, reliable, and highly predictive.
From Tables 2 and 4, the R 2 values of 0.79831800 (internal) and 0.624008 (external) indicate a strong relationship between the experimental and predicted activities. Additionally, the inhibition activities of the compounds increase by the addition of all the descriptors in the best chosen model.

Interpretation of descriptors
The 2D molecular descriptors AMR and SCH-7 defined as "Molar refractivity" and "Simple chain, order 7" are the first and second highest contributors toward the generation of the selected model with a positive mean effect of 0.52115 and 0.4413. Thus, the addition of those descriptors will significantly enhance the antifungal activity of the compound. 3D descriptors WG.unity and Wnu2.eneg with the mean effect of 0.01001 and 0.02754 have a low effect on the model therefore their increase will have no much significant on the activity of the compound. They are defined as "Non-directional WHIM, weighted by unit weights" and "Directional WHIM, weighted by Mulliken atomic electronegativities". Model 4 was examined as the optimal model considering the descriptors from test set compounds of the dataset.
The experimental activity, predictive activity, and residual values of the compounds are given in Table 6. The residual value is defined as the difference between the experimental and predicted activities. The lower residual values between the experimental and predicted activities indicate the high predictive power of the model.

Statistical analysis of descriptors
Pearson's correlation (Table 7) was performed between the descriptors of the chosen model in order to evaluate the relationship between each of the descriptors. The result of the correlation showed no intercorrelation among the descriptors with a correlation coefficient of less than 0.5, which signified that the descriptors used in the model were good enough. The VIF values are within the range of 1 to 5 which indicated that the descriptors and model are suitable and acceptable. Table 8 showed the standard regression coefficients "bj", the values of mean effect (MF), and confidence interval (p values). These give vital information on the effect and contribution of the descriptors toward the built model. The individual capability and inducing power of the selected descriptors toward the activity of the compounds depend on their values, signs, and their mean effects as well. The p values of the four descriptors (at a confidence limit of 95%) that made up the model are all less than 0.05; this implies that there is a significant relationship among the descriptors (as contrary to the null hypothesis) and the inhibitory concentration of the compounds. Figure 4 which presented a graph of observed activity versus standardized residual shows a random dispersion at the baseline where the standardized residual is zero. Therefore, no systematic error occurred in the built model.
The graph of standardized residuals versus leverages (for all the training set and test set) termed the William's     hydrophobic interaction mode of TYR87 (4.7572Å), TYR71, LEU52, TYR87 (4.88051Å), and ALA69 residues. Figure 6 showed a receptor-ligand interaction while Fig. 7 is the 2D structure which shows that H-bond interaction exists between the receptor and the compound 13 which has a better binding affinity and showed a better interaction with the macromolecular target site of the residue when compared with other compounds as well.

Conclusion
This research involves a QSAR and molecular docking studies on 32 compounds of 2,5-disubstituted-1,3,4-thiadiazole derivatives against P. infestans effector site. After using DFT to optimize the compounds, GFA was used to generate the built model. Among the four generated models, the fourth model was found to be the optimal, having appreciable statistical parameters with R 2 = 0.798318, R 2 adj = 0.750864, cross-validation R 2 (Q 2 cv ) = 0.662654, and external validation R 2 pred = 0.624008. Descriptors AMR and SCH-7 were the first and second highest contributors toward the generation of the selected model, and thus, their increase will increase the activity of the compound while WG.unity and Wnu2.eneg have a low effect on the model, therefore, their increase will have no much significance on the activity of the compound against P. infestans.
According to the docking scores, almost all the ligands (compounds) showed high binding affinity/strong inhibition activity against P. infestans effector site. However, ligands 11,13,14,15,17,24,26, and 30 showed higher binding affinity ranging from − 8.9 to − 9.3 kcal/mol. With ligand 13 having the highest binding energy of − 9.3 kcal/mol. This compound [13] was able to strongly dock at the binding pocket of the P. infestans effector site (2NAR) producing an H-bond as well as hydrophobic interaction with the target site. The generated QSAR model provides a worthy idea on ligand-based design whereas the molecular docking analysis suggested an approach toward the structure-based design of novel and more potent compounds against P. infestans.