Computer-aided molecular modeling studies of some 2, 3-dihydro-[1, 4] dioxino [2, 3-f] quinazoline derivatives as EGFRWT inhibitors

Quinazoline are known to possess different biological activities which among is anti-cancer most especially NSCLC. Epidermal growth factor receptor (EGFR) belongs to the receptor tyrosine kinases (RTKs) family, which is known to be one of the most important therapeutic targets for the treatment of cancer most especially NSCLC. QSAR modeling was performed to develop a model with high predictive power on some non-small cell lung cancer agents (NSCLC) (EGFRWT inhibitors). The EGFRWT inhibitors were optimized using density functional theory (DFT) method utilizing B3LYP/6-31G* level of theory. Genetic function algorithm (GFA) was used to build five models. Out of these five models, the studied one was selected and reported because of its fitness statistically with the following validation parameters: R2trng = 0.9459, R2adj = 0.9311, Q2cv = 0.8947, R2test = 0.7008, and LOF = 0.1195. The selected model was further subjected to other validation test such as VIF and Y-scrambling test applicability domain and found to be statistically significant. The kind of interactions between five most active EGFRWT inhibitors and EGFRWT enzyme were explored via molecular docking. Molecule 4 was ranked top in comparison to other ligands because it has the highest docking score of − 8.3 kcal/mol. The pharmacokinetics studies indicated that these molecules have good absorption, low toxicity level, and permeability properties because none of them violate the Lipinski’s rule of five. A model with a very high predictive power on some EGFRWT inhibitors was developed using QSAR model. The model was validated and found to have good internal and external assessment parameters: R2 of 0.9459, R2adj of 0.9311, Qcv2 of 0.8947, R2test of 0.7008, and LOF of 0.1195. The nature of interaction of these molecules with their target protein was explored via molecular docking and found molecule 4 to have the highest docking score of − 8.3 kcal/mol among co-ligands. Pharmacokinetics studies revealed that these molecules have good absorption, low toxicity level, and permeability properties. These findings proposed a way for designing potent EGFRWT inhibitors against their target enzyme.


Background
Epidermal growth factor receptor (EGFR) from the receptor tyrosine kinases (RTKs) family, is known to be one of the most useful therapeutic targets for the mitigation of cancer most especially NSCLC. It plays a vital role in the regulation of cancer cell survival, migration, growth, proliferation, and differentiation [25,27].
The most common and deadly type of all cancers around the globe is lung cancer which account for 25% of the cancer deaths every year ( [6,19]. Among the types of lung cancer with about 1.5 million patients and less than 20% survival rate is NSCLC [20].
In order to mitigate the problem of NSCLC, several medications were developed for up to 3 different generations. The first generation was designed for the treatment of EGFR L858R mutations, examples were gefitinib and erlotinib [16,23]. The second generation was designed to treat EGFR T790M mutations examples are afatinib, dacomitinib and neratinib. The second generation drugs share a common structural features of quinazoline pharmacophore and acrylamide structure [18]. While in the case of the third generation, they were developed to treat EGFR T790M/L790M double mutations example AZD9291 [7].
QSAR is a computational chemistry method which correlate experimental activities (response variable) and physicochemical properties (molecular descriptors) of a compound quantitatively [11]. Furthermore, QSAR technique of computer-aided drug design plays a crucial role in physical, analytical, pharmaceutical, medicinal, organic chemistry, biochemistry, toxicology, chemical engineering, environmental sciences, and nanotechnology [2]. Another computational chemistry technique used to explore the interaction between 3D structures of a ligand and a receptor is molecular docking and contributes in virtual screening of library of compounds in computeraided drug design. Pharmacokinetics played vital role in drug research and development by predicting ADME and drug likeness properties of drugs in hit-to-lead and lead-optimization campaigns [9].
The main aim of this work is to generate a valid QSAR model with a very high predictive power on some EGFR WT inhibitors using QSAR technique, study the nature of interactions between the EGFR WT inhibitors and EGFR enzyme via docking, and also to predict pharmacokinetics properties of these EGFR WT inhibitors.

Dataset collection
Thirty nine (39) sets of EGFR WT inhibitors with their corresponding inhibitory activities (IC 50 ) in nanomolar were retrieved from the work of [17] and used in this research. The inhibitory activities (IC 50 ) of these molecules were then converted to their corresponding negative logarithms (pIC 50 ) using Eq. 1 [3].

Structure generation and stable geometry calculations
The initial step in any QSAR modeling study after data collection is drawing of the structures of the studied molecules. For this reason, the structures of all the studied molecules were generated utilizing the ChemDraw software [10]. After structure generation of the studied molecules, constraint in the structures was reduced via energy minimizing before finding the most stable structures of the studied molecules on potential energy surface using the Spartan 14 software. DFT at B3LYP/6-311G* level of theory was used in finding the most stable structures of all the studied molecules on global minima on the potential energy surface (PES) [15].

1D, 2D
, and 3D descriptors generation, data pretreatment, and dataset splitting For the generation of the independent variables (descriptors), the most stable structures obtained in section 2.2 above were saved in a file format (SDF) that has been recognized by the software used in generation of descriptors, PaDEL descriptor tool kit [26]. The dataset was pre-treated manually to eliminate redundant and constant descriptors. After pre-treating the data, the Data division software was further used in dividing the data into training set and test set utilizing the Kennard-Stone algorithm [14]. The model building/ training set were used for the generation of the models, and the validation/test set were used for assessing the generated models [11].

Model development
The models were developed utilizing the genetic function approximation (GFA) method with the actual pIC 50 as the response variable and the descriptors as independent variables. In the case of variable selection, GFA selects most highly correlated descriptors to develop so many models which is one of the distinct characteristic of GFA.

Validation of the selected model
The most widely used assessment terms for QSAR models are the following; square correlation coefficient of the training set (R 2 training ), adjusted R 2 (R 2 adj ), crossvalidation coefficient (Q cv 2 ), and square correlation coefficient of the test set (R 2 test ). The high value of these parameters appears to be necessary but not enough [21].
In-view of this, the inter-correlation between descriptors can be detected using their variation inflation factors (VIF), to see whether these descriptors are highly correlated with one another or not. If the computed VIF values is up to 1 it means there is no inter-correlation between the descriptors; if it falls between 1-5, the model can be accepted, and if it is higher than 10, the model cannot be accepted. It can be calculated using the equation below: where R 2 is the correlation coefficient of the selected model [5].
The evaluation of significance and contribution of each descriptor to the selected model is performed using the value of mean effect of each descriptor. The mean effect is defined by the equation below: where MF j is the mean effect of a descriptor j in a model, β j is the coefficient of the descriptor J in that model and d ij is the value of the descriptor in the data matrix for each molecule in the model building set, m is the number of descriptor that appear in the model and n is the number of molecules in the model building set [4] To assure the robustness of a QSAR model and that the model was not obtained by chance correlation Y-Scrambling test was perform. It is done by reshuffling the actual activities and keeping the descriptors unchanged to generate new QSAR models for several trials, the new built QSAR models were anticipated to give low Q 2 and R 2 value. The validation parameter for this test is cR p (cR 2 p > 0.5) [12].

Applicability domain
A QSAR model is considered valid and void, if it is subjected to the applicability domain (AD) and found that the model can make good prediction of new activities of the training and test molecules. As such, the model is subjected to AD to find out whether there are influential or outliers molecules in the studied ones [22]. One of the methods used in assessing the AD is leverage approach and is given as h i : where the training set matrix I is given by x i , n × k descriptor matrix of the training set is represented by X, and X T is the transpose matrix X used in generating the model. The threshold for the value of X is the warning threshold (h*) which is presented in the equation below: where the number of chemicals of the model building set is given by q, and the number of the descriptors in the model under evaluation is represented by x.

Molecular docking
A Dell Latitude E6520 computer system, with the following specification: Intel ® Core™ i7 Dual CPU,M330 @2.75 GHz 2.75 GHz, 8 GB of RAM was utilized to explore the nature of interactions between the active site of EGFR enzyme and five most active EGFR WT inhibitors (ligands) with the help of the Pyrex virtual screening software, Chimera, PyMOL, and Discovery studio.
Before the docking analysis, ligands were prepared from the optimized structures in section 2.2 above and saved in pdb file format using Spartan'14 [1]. The 3D structure of EGFR enzyme was downloaded from the protein data bank (with pdb ID: 4zau). The enzyme was prepared with the help of Discovery Studio Visualizer for the docking analysis; in the course of the preparation, hydrogen was added. Water molecule, heteroatoms, and co-ligands were eliminated from the crystal structure saved in pdb file.
The docking of the ligands to the active site of EGFR enzyme was achieved with the help of the Pyrex software using Autodock vina [11]. After successful docking protocol, re-formation of the complexes (ligand-receptor) for further investigation was also achieved utilizing the Chimera software. Discovery studio visualizer and PyMOL were used to investigate the interactions of the complexes.

Pharmacokinetics
Pharmacokinetics studies of five (5) most active compounds among the data set was carried out using Swis-sADME a free web tool used in evaluating ADME and drug-likeness properties of small molecules [8]. The Lipinski's rule of five is useful at pre-clinical stage of drug discovery which state that if any chemical violate more than 2 of these criteria (molecular weight ˂ 500, number of hydrogen bond donors ≤ 5, number of hydrogen bond acceptors ≤ 10, calculated Log p ≤ 5, and polar surface area (PSA) ˂ 140 Å 2 ), the chemical is said to be impermeable or badly absorbed [13].

QSAR modeling
The results of the QSAR modeling are presented in Ta

Molecular docking
The results of the molecular docking are presented in Table 5 and Figs. 4 and 5.

Pharmacokinetics studies
The results of the Pharmacokinetic studies are presented in Table 6 and Fig. 6.

QSAR modeling
The studied model was selected and reported because it is statistically fit with the following assessment parameters as compared to other models built: R 2 of 0.9459, R 2 adj of 0.9311, Q cv 2 of 0.8947, R 2 test of 0.7008, and LOF of 0.1195. The selected model was found to have passed the minimum recommended values for the validation of a good QSAR models as reported by [24].
The details of the descriptors that appear in the selected model are presented in Table 1. The positive coefficient of ATSC1p, GATS1s, RDF65e, and P1e descriptors indicate the positive correlation of this descriptor to the inhibitory activities of EGFR WT inhibitors that is the more you have these types of descriptors the more the inhibitory activity of the EGFR WT inhibitors against EGFR enzyme. In-view of the other descriptors with negative coefficients (GATS8s and SpMin8_Bhm) signifies the negative correlation of the descriptors to the inhibitory activities of the EGFR WT inhibitors. The lesser the number of these descriptors in the structures of EGFR WT inhibitors the more the action of EGFR WT inhibitors against EGFR enzyme.

Interpretations of the descriptors in the best model
ATSC1p is a Centered Broto-Moreau autocorrelationlag 1/weighted by polarizabilities. This is computed by  changing properties of the atom (w) with it equidistant values (w'), which is gotten by subtracting the standard value w of the compound from each w' value: It was shown that if properties are equidistant only, then all autocorrelation descriptors are considered to be orthogonal, thus given the fitness of the succeeding statistics. The H-filled molecular graph presents the compound, which gives the properties of the compound and the sequential number of the vertices. The pairs of atoms that enter the summation were found using topological distance matrix.
GATS1s and GATS8s are Geary autocorrelation -lag 1 and 8/weighted by I-state. This descriptor measures the quality of the link between atomic charges of two atoms 8 bonds apart.
SpMin8_Bhm is the smallest absolute eigenvalue of Burden modified matrix -n 8/weighted by relative mass. SpMin is the minimum eigenvalue, called leading eigenvalue or spectral radius. This kind of function was called by Ivanciuc matrix spectrum operators (Eriksson et al., 2003). This eigenvalue has been suggested as an index of molecular branching, the smallest values corresponding to chain graphs.
RDF65e is radial distribution function -065/weighted by relative Sanderson electronegativities, and this descriptor is based on the way atoms are arranged in the regular representation of a compound and develops a radial distribution function code (RDF code) that presents certain features in common with the 3D-MORSE code.  It also explains the steric hindrance of a molecule. The RDF descriptor also provides information about ring types, atom types, the bond lengths, and planar and nonplanar systems.
P1e is the 1st component shape directional WHIM index/weighted by relative Sanderson electronegativities.
The scatter plot of predicted activities of both the test and training sets against the Actual pIC 50 is shown in Fig. 1. It can be seen from the plot that the values were plotted around the straight line which shows the significance of the selected model. Also scatter plot of Actual pIC 50 against the residuals of both the training and test is also shown (Fig. 2). The irregular appearance of these residuals on either side of zero on the plot shows the nonexistence of methodological error in the selected model.
The difference between the actual and the predicted activities in Table 2 is termed residual. The low residual values noted in the table verified the reliability of the selected model.
The correlation matrix of the descriptors that appear in the reported model was carried out (Table 3) and the descriptors were found to be orthogonal meaning no correlation exists between them. This indicates that the physicochemical parameters (descriptors) used in developing the reported model were of good quality. The calculated VIF values for the descriptors in the model building set of selected model were obtained to be less than 5 (Table 3) indicating the fitness of the selected model, and the descriptors were independent of one another. The MF value (Table 4) shows the contribution of a descriptor in comparison to other descriptors in the   Table 4 shows that ten (10) random models were generated; the R 2 and Q 2 values for these ten random models were found to be low. This verified the robustness of the selected model and that the model was not obtained by chance correlation.
The Williams plot presented in Fig. 3 identified five (5) influential compounds from which were all in the validation set. It is essential to understand that these molecules with leverage value higher than the threshold h*(h* = 0.72) are not considered when designing new EGFR WT inhibitors. These molecules might be dissimilar/structurally different from the molecules used to generate the model and, thus may have different mechanism of action.

Molecular docking
Molecular docking on the EGFR enzyme and five most active 2, 3-dihydro- [1,4] dioxino [2, 3-f] quinazoline derivatives (ligands) (EGFR WT inhibitors) were studied (Table 5). From Table 5, we can see that molecule 4 has the highest docking score of − 8.3 kcal/mol which might be as a result of hydrophobic interactions formed with LEU718, LEU792, LYS745, LEU788, VAL726, ALA743, and LEU844 amino acid residues in the active site of EGFR enzyme. Hydrogen attach to one of the nitrogen of the quinazoline moiety (molecule 4) formed hydrogen bond in the active site of EGFR enzyme with THR790 and GLN791 amino acid residues with bond lengths of 2.9464 Å and 2.3643 Å. The most active molecule (3) has a docking score of − 8.0 kcal/mol. The hydrogen attach to one of the nitrogen of the quinazoline moiety (ligand 3) formed hydrogen bond interaction with MET793 amino acid residue of bond length of 2.4928 Å. And also, it formed hydrophobic interaction with amino acid residues LYS745, VAL726MET766, LEU718, and ALA743 of the EGFR enzyme. The 2D and 3D structures of complex 3 and 4 are shown in Figs. 4 and 5.

Pharmacokinetics studies
The results of the pharmacokinetics studies of the most active compounds are shown in Table 6. From the table it can be see that none of the molecules violate any of the criteria stated; it means there is a high tendency that  all of these molecules might be pharmacologically active. In a null shell, these molecules are said to have good absorption, low toxicity level, orally bioavailable, and permeable. The bioavailability radar gives an overview of the drug-likeness of a molecule (Fig. 6). The region painted pink indicates the range for each features.

Conclusion
A mode with a very high predictive power on 39 nonsmall cell lung cancer agents (NSCLC) (EGFR WT inhibitors) was developed using QSAR. The reported model was selected because it is statistically fit with the following assessment parameters as compared to other models built: R 2 of 0.9459, R 2 adj of 0.9311, Q cv 2 of 0.8947, R 2 test of 0.7008, and LOF of 0.1195. The reported model was further subjected to other assessments such as applicability domain, Y-scrambling, and VIF and found to be statistically significant. Molecular docking was used to explore the kind of interactions between five most active EGFR WT inhibitors and EGFR enzyme. Molecule 4 has the highest docking score of − 8.3 kcal/mol among coligands. This might be a result of hydrophobic interactions formed with LEU718, LEU792, LYS745, LEU788, VAL726, ALA743, and LEU844 amino acid residues in the active site of EGFR enzyme. Hydrogen attach to one of the nitrogen of the quinazoline moiety (molecule 4) formed hydrogen bond in the active site of EGFR enzyme with THR790 and GLN791 amino acid residues with bond lengths of 2.9464 Å and 2.3643 Å. The pharmacokinetics studies indicated that these molecules have good absorption, low toxicity level, and permeability properties. The results of this study give room for designing new potent EGFR WT inhibitors against their target enzyme.