In silico studies of some 2-anilinopyrimidine derivatives as anti-triple-negative breast cancer agents

Breast cancer is a major form of health problem on the globe and the second cause of death related to cancer amidst women. A prediction of about 1 to 1.3 million cases on cancer of the breast are detected yearly globally. Triple-negative type of breast cancers (TNBCs) are described by the lack of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR). TNBCs metastasize to the central nervous system and lungs regularly. Such metastatic actions reduce the life expectancy of patients with TNBC than patients with non-TNBC due to non-enhanced inhibitor compounds. The purpose of this research was to explore the anti-proliferative activities of 2-anilinopyrimidine derivatives against triple-negative cancer cell line MDA-MB-468 via in silico studies like QSAR and molecular docking studies to further design and develop new anti-breast cancer drug with high potency and low toxicity. The quantitative structure–activity relationship QSAR model predicts the bioactivities of the compounds, and molecular docking studies comprehend the interaction between the derivatives (ligand) and thyroid hormone (TRβ1) (receptor). Model 4 was chosen as the best model from the statistical assessment; R2 = 0.8760, R2adj = 0.8451, Q2 = 0.6141, and R2pred of 0.5390. From the external validation of the QSAR model, the coefficient of the mean effect on the model parameters indicates that decreasing (VR1_Dzv and MOMI-R) and increasing (SpMin1_Bh and C3SP3) would increase the anti-proliferative activities (pIC50) of the compounds. The molecular docking studies revealed that ligands 15 and 18 had the highest docking scores of − 7.3 and − 7.4 kcal/mol with thyroid hormone receptor (TRβ1). The ligands had docking scores better than the standard anti-breast cancer drug gefitinib (− 5.3 kcal/mol). The results indicate that model 4 can be used in developing new 2-anilinopyrimidine derivatives, with better anti-breast cancer prediction activity and performance. It was proved that some series of 2-anilinopyrimidine derivative compounds bind tightly to the receptor, stabilizing the receptor (TRβ1) which is evident from the receptor–ligand interactions, and these compounds would serve as the most promising inhibitors against TRβ1. This shows a breakthrough for pharmaceutical researchers in designing and developing new anti-triple-negative breast cancer drugs.


Background
After cardiovascular diseases, cancer is the second most deadly disease to the human health [1]. Worldwide, one of seven main death causes is cancer that affects around 14 million people every year. The adoption of lifestyle activities especially in developing countries where almost 82% of the entire population of the world exist has increased higher risk of cancer, due to lack of exercise, smoking, and heredity variation [2]. Breast cancer is the utmost form of cancer on the globe and the second cause of death related to cancer amidst women. A prediction of about 1 to 1.3 million cases on cancer of the breast is detected yearly globally [3].
Triple-negative type of breast cancers (TNBCs) are termed as antagonistic mammary growths, and they are described by the lack of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR) [4]. TNBCs metastasize to the central nervous system and lung regularly than non-TNBCs, which usually metastasize to the bone. Such metastatic actions reduce the life expectancy of patients with TNBC than patients with non-TNBC patients due to nonenhanced inhibitors compounds [3].
Recently, a novel series of 2-anilinopyrimidine was reported by Jo et al. [4] as inhibitors against MDA-MB-468 cell line. There is also evidence that reduced thyroid hormone receptor manifestation and/or variations in thyroid hormone genes occur frequently in cancer [5], suggesting that the native receptors could act as tumor suppressors and that loss of re-occurrence of this receptor could show a selective lead for cell alteration and advancement of tumor transformation [6].
Unconventional medicine takes prolonged time and effort to be manufactured, thereby not meeting up with the urgency needed for a comprehensive treatment. Computer-aided drug design has been a great success in designing novel drugs with great effectiveness and better potency against diseases. The aim of this study is to explore the anti-proliferative activities of 2-anilinopyrimidine against triple-negative cancer cell line, MDA-MB-468 via in silico studies like QSAR and docking studies that can be used to further develop anti-breast cancer drug candidate.

Computational information 2.1.1 Hardware
The computer details used in this research is the 7th Generation HP Pavilion Intel R, core i7-7500u RAM 12.00 GB running on a Windows 10 operating system.

Software
The software used to carry out this research includes Spartan'14 (

Bioactivities
Anti-proliferative activities of 2-anilinopyrimidine derivative compounds were measured in inhibitory concentration (IC 50 ); an IC 50 (50% inhibitory concentration) value of a chemical compound is defined as the concentration of the compound required to decrease the viability of a given cell line by 50%. The IC 50 values were normalized using the scale of logarithm to pIC 50 values to reduce the skew in the activities. The tabulated antiproliferative activities (IC 50 ) and pIC 50 of the derivatives are shown in Table 1, and it is measured in the concentration of micromolar (μM). The logarithm scale is given as follows: pIC 50 = − log10 (IC 50 × 10 −6 ).

Geometry optimization
The geometry optimization is aimed to earn a more desirable geometric structure that is closer to the actual geometric condition of the molecular structure [2]. The derivative compounds were sketched in 2D on Chem-Draw V (12.0.2) and converted on Spartan 14 V (1.1.4) software. Density functional theory (DFT) using the B3LYP, 6-311G basis set, was used for the geometric optimization of the compounds [7][8][9]. The parent compound is shown in Fig. 1.

Molecular descriptor
Pharmaceutical Data Exploration Laboratory Software V (2.20) was used in calculating molecular descriptors for the 30 optimized compounds of 2-anilinopyrimidine derivatives [10].

Pretreatment and division of data set
Results obtained from PADEL-software were pretreated using Data Pre-treatment software GUI 1.2 to remove constant values and unwanted descriptors [9,11]. Kennard-Stone algorithm [12] was used in dividing the derivatives into 21 train and 9 test set to build the model.

Model building and model validation
The internal validation of the train test (twenty-one compounds) was executed in version 8 of Material studio software to construct a model by employing a genetic function approximation technique. Using the Friedman formula, the obtained models were evaluated [13].
where SEE is the standard estimated error. If SEE is low, it implies a better model. SEE is expressed as follows: C is the sum of the model terms, p is the total number of model descriptors, M is the sum of train set, and d is a user-defined smoothing parameter [14]. The model is verified using the correlation coefficient (R 2 ). R 2 value   should be close to 1, to obtain an enhanced and effective model. R 2 is given as follows: where Y exp and Y pred are averages of anti-proliferative and predicted activities of the train set [15]. R 2 value increases as the descriptor number increases; thus, R 2 value is not guaranteed in terms of the model's strength. The R 2 is altered to obtain a robust and strong model, which is given as follows: where p and n are the numbers of generated descriptors and train set. The stability of the model derivatives was assessed using validation coefficient test (Q 2 cv ) given as:    Y training , Y exp , and Y pred are the average biological activities (pIC 50 ), biological activities (pIC 50 ), and prediction inhibition concentration of the train set [16].

QSAR modeling evaluation
The models generated were set to undergo statistical parameters such as the cross-validated test, R 2 Fisher's test, and R 2 predicted.

Mean effect
The mean effect relates to the impact of the descriptors and the compound activities in the model. Notations attached to the descriptors show the variant direction in the activity of the compounds, either an increase or a decrease in the descriptor value. It is defined as follows: where m is the total descriptors in the model, B j equals to descriptor coefficient j, n is the total molecules in the train set, and D j is the matrix value of the descriptor in the train set [17].

Variance inflation factor (VIF)
The VIF measures the extent of correlation between one descriptor and the other descriptor in a model. The higher the values show that it is almost impossible and difficult to show the contribution of a descriptor accurately in a model. It is evaluated as follows: The higher the value, the greater the correlation between the descriptors. Values of 1-7 are sometimes regarded as being moderate, and it shows the strength and robustness of the model, while values of 10 show the correlation between the descriptors is very high, and therefore, the model is very unstable.

QSAR applicability domain of the model
The goal of applicability domain methods is for estimating individually the consistency of each model generated [19], and it aimed at predicting the uncertainty of a compound depending on its similarities to the compounds used in building the model and also the distance of both train and test set. The leverage is used in defining the applicability domain of the generated models [20]. It is formulated as follows: where X is the n × k matrix of train set descriptors, X T is the matrix transpose of X used in building the model, and X i is the matrix of train compounds of I. (h*) is the warning leverage, and it is a prediction tool that checks for outliers. It is written as follows: n is equal to the total train set and p equals to the total descriptors from the model generated. William's plot is generated by plotting the standardized residuals versus the leverage of both the train and test set. Molecules that fall within the warning leverages on the plot are the predicted compounds that fall within the threshold. The reliability of the QSAR model was assessed using the minimum accepted values as shown in Table 3 [21].

Molecular docking
Molecular docking studies were implemented on the derivative compounds of 2-anilinopyrimidine (ligand) and thyroid hormone receptor (TRβ1). The receptor was gotten from protein data bank with the code (PDB: 1Y0X).  The docking scores of the ligand-receptor were obtained with Autodock Vina of PyRx software [11]. The detailed interactions between the ligand and the receptor were visualized using Discovery Studio Visualizer.

QSAR of 2-anilinopyrimidine derivatives
Four QSAR models were generated using the Genetic Function Approximation (GFA) technique to predict the anti-proliferative activities. Model 4 passed the internal validation test which confirmed with the least requirement for QSAR modeling as shown in Table 2.  Tables 3 and 4 show the calculation of the external validation of the QSAR model using the model parameters of model 4. The external validation (R 2 pred ) was calculated as 0.5390, which also conforms to the minimum required values for QSAR modeling, and makes the model very robust and highly potent. The meaning of each model parameter used in validating model 4 is given in Table 6.
The experimental, predicted, and the residual values of 2-anilinopyrimidine derivatives are shown in Table 5.
The residual values were obtained from the calculated activities statistically. All the derivative compounds had low residual values indicating the degree of effectiveness of the QSAR model 4. Table 6 shows the four model parameters (descriptors) that were used in building the QSAR model 4 and were also used in evaluating the strength of the model externally. The descriptors are defined and classified in Table  6. Table 7 shows the statistical evaluation (VIF, mean effect, P values) of the model parameters. The VIF shows the degree of co-linearity between the descriptors, and it was calculated using the following equation: The mean effect shows the contribution of each descriptor to the built model, and the signs of the values show if the descriptors give a negative or positive contribution in the model. The P values evaluate the statistical significance between the model parameters. Figure 2 shows a straight line graph of calculated activities (predicted activities) against experimental activities of 2-anilinopyrimidine derivative compounds as tabulated in Table 5. Both the experimental and predicted activities showed a good relationship as proven by the graph.  Figure 4 is a graph of standardized residuals against the leverage values, and the plot is called William's plot. The plot was used to assess the uncertainty in similarities of the derivative compounds used in building the model. Compounds that fall between the warning leverage tend to be similar structurally. The warning leverage was calculated to be (h* = 0.714) using the formula:

Molecular docking studies
The summary of the docking studies result of some 2anilinopyrimidine derivative compounds is given in Table 8. The docking score was obtained using PyRx software while the docking interactions between the receptor and the ligand to form complexes which include hydrophobic bond, hydrogen bond, and the bonding distances were visualized using Discovery Studio Software.
The hydrogen and hydrophobic interaction that occurred between 2-anilinopyrimidine derivative compounds (ligand) and the active pocket of (TRβ1) receptor in 3D format for complexes 15 and 18 are shown in 2D format in Figs. 6 and 7, while Fig. 8 shows the same interaction in a 3D format.

QSAR of 2-anilinopyrimidine
QSAR modeling was used to validate quantitatively the structure relationship of 2-anilinopyrimidine derivatives with its anti-proliferative activities. The robustness of the QSAR models was assessed by the fitness of the train set and predicted pIC 50 of the test set. Four QSAR models generated using the Genetic Function Approximation (GFA) technique to predict the anti-proliferative activities. Model 4 passed the internal validation with correlation coefficient squared (R 2 ) of 0.8760, correlation coefficient adjusted squared (R 2 adj ) of 0.8451, crossvalidation coefficient (Q 2 ) of 0.6141, and external validation (R 2 pred ) of 0.5390. All the values obtained were in accordance with the least proposed value used in the evaluation of QSAR model as shown in Table 2. The obtained values (R 2 , R 2 adj , Q 2 , and R 2 pred ) indicate the existence of a high correlation between the predicted pIC 50 along with the biological pIC 50 of the data set.

External validation of QSAR model 4
Model 4 was verified as the best model using the descriptors from the test set of the derivative compounds. Tables 3 and 4 show how the external validation was achieved using the values of the descriptors from the test set. The experimental, predicted, and the residual values of 2-anilinopyrimidine derivatives are shown in Table 5.
The low residual value from biological (anti-proliferative) activities and predicted activities shows the high performance of the model. Table 6 shows the definition of the descriptors (model parameters). The mean effect result (Table 7) showed the degree of impact of each descriptor on the model, and the values and coefficients of the descriptors show that decreasing MOMI-R and then VR1_Dzv (negative descriptors) would increase the anti-proliferative activities of the derivative compounds while increasing SpMin1_Bh followed by C3SP3 (positive descriptors) which would also increase the anti-proliferative activities of 2anilinopyrimidine derivative compounds. The variance inflation factor (VIF) showed that there is no much inter-correlation between the descriptors making the model very stable. The null hypothesis shows no significant connection amid the bio-activity and model parameters of the constructed model at p > 0.05. At 95% confidence level, the P values of the model parameters were below 0.05. Therefore, the null hypothesis is rejected and the alternative hypothesis is accepted as shown in Table 7. Figure 2 shows the plot of predicted activity (pIC 50 ) versus the experimental activity (IC 50 ) of both the test set and train set of compounds. The plot showed that the predicted activity was in good agreement with its experimental values as shown in Table 2, conforming the effectiveness and strength of the built model. Figure 3 Table 8. The visual examination of the docked complexes  was done by evaluating the hydrogen bond interaction, hydrogen bond length, and hydrophobic interaction. Compound 15 showed the backbone conventional hydrogen bonding interaction with ARG 429 (2.50 A 0 ) and two amino acids of GLU311 (2.7609 A 0 and 2.1551 A 0 ). Again, VAL458 showed carbon-hydrogen interaction with the compound at distance of 3.3765 A 0 . Also, the pi-orbital containing delocalized electrons in the benzene ring interact with the alkyl groups of ILE303 (5.4379 A°), LYS306 (5.04683 A°), and ARG383 (5.3858 A°) and three amino acids of PRO384 (5.1107 A°, 4.7845 A°, and 4.7531 A°) to form hydrophobic bond.
Both compounds were adequately docked and their orientation is similar in some instances, validating the good quality of the docking results. Both compounds showed the same hydrogen bond and hydrophobic bond interactions with the amino acid residues of the receptor at different distances. The ligands had docking scores better than the standard drug gefitinib (− 5.3 kcal/mol). From the compound interaction with the receptor, it  proves the ability of the compounds to inhibit TRβ1 receptor. Figures 6 and 7 give detailed binding interactions of the receptor with ligands 15 and 18 while Fig. 8 shows how the ligand (compound) binds firmly to the active site of the protein receptor to form complexes in 3D with ligands 15 and 18.

Conclusion
2-Anilinopyrimidine derivatives were proven to be a better anti-cancer drug candidate against MDA-MB-468 cell line from both QSAR studies and molecular docking studies that were carried out to predict a better activity from the experimental activity of the derivatives and also comprehend the interaction of the ligand (derivatives) and thyroid hormone receptor (TRβ1). The coefficient and values of the mean effect of QSAR model 4 indicate that increasing Spmin1_Bhs and C3SP3 descriptors will increase the anti-proliferative activities of the derivatives while decreasing VR1-DZv and MOMI-R descriptors would also increase the activities of 2-anilinopyrimidine derivatives as a standard anti-breast cancer agent. The robustness, applicability, and predicted capacity of the model generated were analyzed for both internal and external validation test which conforms to the minimum recommended values. This indicates that model 4 can be used in developing new 2-anilinopyrimidine derivative compounds with better anti-breast cancer activity. The molecular docking result showed that compounds 15 and18 had the highest docking score of − 7.4 and − 7.3 kcal/mol, when it is compared to the standard drug gefitinib. From the studies, it is proven that some series of 2-anilinopyrimidine derivative compounds bind tightly to the receptor, stabilizing the receptor (TRβ1) which is evident from the receptor-ligand interactions. The compounds would serve as the most promising inhibitors against TRβ1. This research would be a breakthrough for pharmaceutical researchers in designing and developing new anti-triple-negative breast cancer drugs.