Computer-aided molecular modeling and structural analysis of the human centromere protein–HIKM complex
Beni-Suef University Journal of Basic and Applied Sciences volume 11, Article number: 101 (2022)
Protein–peptide and protein–protein interactions play an essential role in different functional and structural cellular organizational aspects. While Cryo-EM and X-ray crystallography generate the most complete structural characterization, most biological interactions exist in biomolecular complexes that are neither compliant nor responsive to direct experimental analysis. The development of computational docking approaches is therefore necessary. This starts from component protein structures to the prediction of their complexes, preferentially with precision close to complex structures generated by X-ray crystallography.
To guarantee faithful chromosomal segregation, there must be a proper assembling of the kinetochore (a protein complex with multiple subunits) at the centromere during the process of cell division. As an important member of the inner kinetochore, defects in any of the subunits making up the CENP-HIKM complex lead to kinetochore dysfunction and an eventual chromosomal mis-segregation and cell death. Previous studies in an attempt to understand the assembly and mechanism devised by the CENP-HIKM in promoting the functionality of the kinetochore have reconstituted the protein complex from different organisms including fungi and yeast. Here, we present a detailed computational model of the physical interactions that exist between each component of the human CENP-HIKM, while validating each modeled structure using orthologs with existing crystal structures from the protein data bank.
Results from this study substantiate the existing hypothesis that the human CENP-HIK complex shares a similar architecture with its fungal and yeast orthologs, and likewise validate the binding mode of CENP-M to the C-terminus of the human CENP-I based on existing experimental reports.
Although molecular and cell biology have made huge advancements toward the delivery of powerful methodologies for the discovery and identification of protein–protein interactions, likewise their subcellular localization, structural biology alone is able to give definite answers regarding interaction mechanisms through the uncovering of atomistic and high-resolution structures of the underlying complexes . Determination of the structure of such biomolecular interactions, however, can be a costly, laborious and time-consuming endeavor . The gap increment between the universe of determined 3D structures and that of known sequences is proof that high‐throughput structural biology remains a fantasy , as the gap increases more with the consideration of available number of biomolecular complex structures . By contrast, computational structural biology has the potential to generate protein–protein interaction models of high resolution .
The timely and accurate segregation of chromosomes in meiosis and mitosis is crucial for organismal and cellular viability. Sister chromatids produced through DNA replication during mitosis maintain strong cohesion till a bioriented arrangement is formed on the mitotic spindle. The loss of sister chromatid cohesion during the transition from metaphase to anaphase allows for successful separation of the sister chromatids into daughter cells with genetic identity . The sister chromatid attachment to microtubules is mediated by the kinetochores . Kinetochores become established on a part of the centromere (specialized chromatin), with the presence of CENP-A (a variant of histone H3) as a major hallmark . The kinetochores at low resolution assume a laminar structure appearance, with the ends of each microtubule connected to its outer plate and a dense centromeric chromatin adjacent to its inner plate . The outer kinetochore plate serves as a host for the KMN network (Knl1, Mis12 and Ndc80 complexes); an assembly consisting of ten protein subunits that act as a microtubule receptor [10, 11]. The inner kinetochore on the other hand serves as a host for the CCAN (constitutive centromere–associated network), a complex consisting of sixteen different centromeric proteins (CENPs) , most of which were identified originally in the vertebrates’ CENP-A interactome .
The sixteen CCAN proteins of vertebrates are grouped into different sub-complexes, including CENP-LN, CENP-C, CENP-OPQUR, CENP-HIKM and CENP-TWSX . Orthologs of most of the listed sub-complexes have been recognized in species like fungi  and yeast . As a nucleosomal canonical H3 substitute, the CENP-A accumulates at the nucleosome of centromeres  for the initiation of the CCAN assembly through the binding to CENP-C  and CENP-LN . Several studies have also established the crucial role of the CCAN in mediating the outer kinetochore assembly [20, 21]. CENP-T and CENP-C function as the outer kinetochore structural platform through direct interaction with the NDC80 and MIS12 complexes .
Many CCAN components are held in place by a cumbersome protein–protein interaction network [23, 24]. However, the exact way in which the CCAN complex is assembled by these interactions is yet to be completely understood. As a core CCAN subunit, CENP-H (Mcm16/Fta3), CENP-I (Ctf3/Mis6) and CENP-K (Mcm22/Sim4) assemble into a ternary complex and are likewise crucial for the kinetochore integrity. Chromosomal congression is compromised upon the loss of any of these proteins , while their localization to the centromere has also been revealed to be dependent on each other . CENP-M (another subunit of the CCAN) through in vitro reconstitution has been shown to form a stable complex with the CENP-HIK via an interaction with the CENP-I C-terminus. This interaction is essential for chromosomal alignment and also for the localization of the CENP-IM to the centromere . Although low-resolution electron microscopy analyses have shown the overall CENP-HIKM organization, the specific molecular basis for the complex assembly remains predominantly uncharacterized .
Homology modeling has grown into a very crucial structural biology technique, contributing significantly to the gap narrowing between experimentally determined structures and known sequences of proteins . Fully automated tools and workflows have streamlined and simplified the process of homology modeling, thereby allowing non-experts to generate highly reliable models of proteins and likewise provide easy access to the results, interpretation, and visualization of homology models . The homology modeling role is even greater in the characterization of protein–protein interactions, given the binding modes and protein partners multiplicity . Protein–protein complex prediction methods fall into two major categories, including the free docking, in which binding mode sampling, based on proteins physicochemical and structural complementarity, is conducted without any knowledge of experimentally determined similar complexes, and template-based (or comparative, homology) docking, which relies solely on similar complex structures regarded as the templates .
With reference to the existing complex structure of the CENP-HIK from yeast and fungi, we have predicted in this study the organizational model of the human CENP-HIKM complex, using extensive computational approaches. Our result also shows great consistency with experimental inter-model interaction studies from several published literature works. Additionally, individual models of the CENP-HIKM as reported in this study showed great similarity with models of the recently released AlphaFold protein structure database (https://www.alphafold.ebi.ac.uk/), which further supports the reliability of the models (Additional file 1: Figure S1).
2.1 Reference sequence and structure retrieval
For structural validation purposes, the modeled protein complex (hsCENP-HIKM) was compared with structural orthologs with known three-dimensional structures. The reference sequences and structures were retrieved from the NCBI (National Center for Biotechnology Information) database , and the PDB (Protein Data Bank) . 5Z08 and 6YPC which represent the PDB codes for the crystal structures of the fungal (Thielavia terrestris) kinetochore CENP-HIK triple complex subunits and the yeast (Saccharomyces cerevisiae) kinetochore CENP-HIKTW subunits, respectively, were used for the retrieval of the corresponding structures from the protein data bank. The crystal structure of the human CENP-M was also retrieved with the PDB code 4P0T. The PDB codes for each structure were submitted to the NCBI database to obtain their corresponding amino acid sequences, while the full-length sequence for each subunit of the human CENP-HIK was retrieved using their respective accession numbers: Q9H3R5, Q92674 and Q9BS16.
2.2 3D structural modeling of the human CENP-HIK
High-quality 3D structural models of the hsCENP-H, -I, and -K were individually predicted using the RaptorX Contact tool . RaptorX Contact predicts contacts through the integration of both sequence conservation and evolutionary coupling information by using an ultra-deep neural network formed by two residual neural networks. Different forms of one-dimensional sequential feature transformation is conducted by the first residual network while the second conducts different types of two-dimensional pairwise information transformation which include, pairwise potential, first residual network output, and evolutionary coupling information. Through the use of these very deep residual networks, RaptorX Contact accurately models patterns of contact occurrence and complex sequence–structure relationship . RaptorX outperforms other predictive tools, especially in the modeling of proteins that have no close PDB homologs or proteins containing very few evolutionary information (i.e., highly sparse sequence profile). This tool uses deep convolutional neural fields (DeepCNF), a powerful deep learning model for the prediction of disorder regions, secondary structures, and solvent accessibility. Deep convolutional neural fields in addition to its ability to model complex sequence–structure relationships via deep hierarchical architecture also model inter-dependencies between adjacent property labels .
2.3 Structural refinement and model quality evaluation
Following successful modeling of each protein subunit (hsCENP-H, -I, and -K), a structural refinement protocol was conducted using the GalaxyRefine  which is based on a method of refinement that has successfully undergone trials in CASP10. In this method, side chains are first rebuilt and, through molecular dynamics simulation, an overall structure relaxation is performed. This approach according to the assessment by CASP10 displayed the best potential for local structure quality improvement. Both local and global structure qualities were improved upon the refinement of the RaptorX Contact-generated models using this method. The quality of each refined structural model was assessed using the ProSA-web program  which implements the z-scoring function for structural analysis. Additional model quality assessment was conducted using the PROCHECK suite . The PROCHECK suite delivers a comprehensive stereochemistry check on protein structures. The generated output is made up of several PostScript format plots and a detailed residue-by-residue listing. This highlights regions of the protein structure that might require additional investigation and also gives an evaluation of the overall structural quality in comparison to well-refined structures .
2.4 Structural alignment and visualization
In order to evaluate the degree of structural similarity between each human model and their corresponding orthologs, we carried out a structural alignment protocol using the Alignment/Superposition function of the PyMOL molecular visualizer plugin . PyMOL is a cross-platform tool for molecular graphics and it has been popularly used for the 3D visualization of trajectories, surfaces, electron densities, small molecules, nucleic acids and proteins. The tool is also used for movie making, molecule editing and ray tracing. PyMOL being a Python-based software has been designed along with many plugin tools to facilitate its usage for the 3D visualization of macromolecules as performed in this study .
2.5 Validation of residue conservation
Specific residues of CENP-H and CENP-M have been reported in different studies as essential in facilitating intermolecular interaction with residues of other subunits in the complex . We validated the evolutionary conservation of these residues with the use of ConSurf . ConSurf is popularly used for the detection of macromolecules’ functional regions, through the analysis of the evolutionary dynamics of nucleic acid and amino acid substitutions in homologous sequences. The tool evaluates the nucleic acid and amino acid evolutionary rates by mapping them onto the structure or sequence of the query macromolecule. Slowly evolving regions on the surface of the query macromolecule are known to be essential for functionality and thus, the analysis of ConSurf can highlight very important regions within the query macromolecule .
2.6 Protein–protein docking
The molecular docking protocol for the purpose of predicting the binding modes and pattern of organization of each member of the CENP-HIKM complex was conducted using ClusPro , which is a popularly used tool for the docking of different proteins. ClusPro provides multiple computational steps: rigid docking sampling of billions of conformations, RMSD (root-mean-square deviation)-based clustering of structures with the lowest energy (which are generated to detect the largest clusters that will represent the complex's closest models), and energy minimization refinement of selected structures. ClusPro employs PIPER, a docking algorithm that is anchored on the Fast Fourier Transform (FFT) correlation technique, to dock the rigid body. The FFT technique has made significant progress in rigid body protein–protein docking . The method involves placing a protein (the receptor) at the coordinate system origin on a fixed grid and another protein (the ligand) on a moveable grid, with the energy of interaction represented as a correlation function. The numerical efficiency is reinforced by the fact that such energy functions can be generated quickly, allowing for the sampling of various conformations of protein–protein interactions as well as the evaluation of grid point energies. As a result, an FFT-based approach allows for protein docking without prior knowledge of their structures .
2.7 Normal mode analysis and molecular dynamics simulation
The normal mode dynamics of the hypothetical hsCENP-HIKM complex was assessed using the iMOD  and DynaMut  tools. This analysis was directed at determining the stability of the docked complex and also for the exploration of the protein–protein interaction dynamics. iMOD analyses the conformational flexibility of nucleic acid and protein structures by utilizing the normal mode analysis in internal coordinates. Considering the dihedral angles as variables lowers the non-physical distortions and cost of computation of classical Cartesian normal mode analysis approaches. Operation of the framework is at various coarse-grained levels and delivers an active framework for the conduction of normal mode analysis-based conformational studies which include pathway exploration, vibrational analysis or Monte Carlo simulations . The iMOD normal mode analysis also functions as a rational option for atomistic simulation. The stiffness of motion is presented by a given value while covariance matrix, eigenvalue, deformability and elastic network model are also calculated . DynaMut on the other hand implements normal mode analysis using two different methods, ENCoM and Bio3D, delivering simplified and rapid access to insightful and efficient protein motion analysis .
Furthermore, each component of the CENP-HIKM sub-complex was subjected to molecular dynamics simulation using the 2019.2 version of the GROMACS software . First vacuum minimization was performed for 5000 steps using the steepest descent algorithm. Individual structures were further solvated in a triclinic box type with an SPC (simple point charge) water model. Subsequently, systems were maintained with an appropriate concentration of salt (0.15 M) through the addition of sodium and chloride counter ions. System equilibration was carried out using the NVT/NPT equilibration types at a temperature of 300 K. Each simulation was performed for 100 ns, followed by post-simulation analyses which include the RMSD (root-mean-square deviation), Rg (radius of gyration), SASA (solvent-accessible surface area) and the PCA (principal component analysis) calculations.
2.8 In silico mutagenesis
To assess the consistency of the predicted CENP-HIKM organizational pattern with experimental reports from previous studies , in silico mutants of the CENP-H and CENP-M were designed using the Chimera-curated backbone-dependent Dunbrack rotamer library . The backbone-dependent rotamer library is composed of variances, rotamer frequencies, and mean dihedral angles as a function of the backbone dihedral angles. The prediction of structures and methods of design that utilizes backbone flexibility benefit strongly from smoothly varying angles and probabilities. A new backbone-dependent rotamer library version was developed to use adaptive kernel regression for variances and mean dihedral angle calculations and also the adaptive kernel density calculations for the frequency of rotamers. This design creates an avenue for the estimation of variances, probabilities of rotamers, and mean angles as a continuous and smooth function of psi and phi . The estimation of continuous probability density for the nonrotameric degrees of freedom of aromatic side chains, amides, and carboxylates was modeled as a function of the rotamers and backbone dihedrals of the residual degrees of freedom.
2.9 Binding free energy prediction
The binding free energy of the wild-type protein complex and the change in binding free energy upon mutation was predicted using different predictive tools, such as BeAtMuSiC , mCSM-PPI2 , mmCSM-PPI , MutaBind2  and HawkDock . BeAtMuSiC is a coarse-grained prediction tool for the binding free energy changes as a result of point mutations. The algorithm depends on a set of statistical potentials extracted from proteins with known structures and combines the mutation effect on the overall complex stability and on the strength of the protein–protein interactions at the interface . mCSM-PPI2 is a novel machine learning tool developed for the precise prediction of missense mutation effects on the binding affinity of protein–protein interactions. The tool utilizes graph-based structural signatures for the modeling of variation effects on energetic terms, complex network metrics, evolutionary information and inter-residue interaction network for the generation of an optimized prediction tool . mmCSM-PPI is an effective and scalable machine learning tool for the accurate assessment of protein–protein interaction binding affinity changes resulting from multiple and single missense mutations. The tool utilizes a well-established graph-based signature in capturing geometrical and physiochemical properties of various wild-type residues and integrates them with both normal mode analysis dynamics terms and substitution scores . MutaBind2 estimates protein–protein interaction binding affinity changes as a result of single- and multiple-site mutations in corresponding sequences. The tool makes predictions based on the protein–protein complex structure. MutaBind2 uses rapid side chain optimization algorithms built through random forest method, mechanics force fields and statistical potentials. The training set used for the development of multiple and single models of mutation consists of 1707 multiple mutations from 120 protein complexes and 4191 single mutations from 265 protein complexes, respectively . The development of HawkDock was targeted at the prediction and analysis of protein–protein interactions through the integration of the MM/GBSA free energy decomposition analysis, ATTRACT docking algorithm and the HawkRank scoring function. The integration of MM/GBSA into HawkDock is to serve the purpose of analyzing important residues in the binding interface of protein–protein interactions and also for the purpose of model re-ranking .
2.10 Interatomic interaction analysis
The existing non-covalent interactions between subunits of the CENP-HIKM complex were analyzed using Arpeggio . The program is implemented in Python and it calculates interactions between and within proteins and small-molecule ligands, protein or DNA. Analyzed interactions in this study include van der Waals', hydrogen bonds and hydrophobic interactions.
3.1 Modeling of the human CENP-H, CENP-I and CENP-K
Structural models were generated using the amino acid sequence of each protein as described in the methods segment, as input. The 3D structure prediction method employed by RaptorX contact is unique in that it makes a simultaneous prediction of all protein contacts, which allows for an easy modeling of high-order residue correlation. The output provides 5 different models that are ranked by estimated root-mean-square deviation (RMSD). The estimated RMSD is a calculated average deviation distance in Å of a 3D model from its experimental structure. The smaller the estimated RMSD value is, the higher the likelihood of the 3D model to good quality. The estimated RMSD values of the top rank models for CENP-H, -I and -K (Fig. 1) are 5.7546 Å, 13.445 Å and 5.7311 Å, respectively. All generated models share high similarity with the PDB structures of their respective orthologs (Additional file 1: Figures S2 and S3) and as such were selected for structural refinement.
3.2 Model quality evaluation
Following the structural refinement of the three models with GalaxyRefine, we proceeded with the protocol for the quality evaluation of each model. Using the ProSA-web, the z-score for each model was obtained (Additional file 1: Figure S4). The z-score is an indication of the overall model quality. hsCENP-H, hsCENP-K and hsCENP-I produced individual z-scores of -3.17, -4.66 and -7.38, respectively, indicating that all the three models fall within the quality range of the nuclear magnetic resonance (NMR) as shown in the Additional file 1: Figure S4.
The PROCHECK suite was used for the calculation of the stereochemical quality of the models through the analysis of the overall structural and individual residue geometry. The Ramachandran plot for each modeled protein showed that over 92% of the residues were located in the most favorable region, with an average of 5.4% of the residues located in the allowed region while less than 0.4% were in the disallowed region (Additional file 1: Figure S5). Based on the analysis of 118 structures with a minimum of 2.0 Å and a maximum R-factor value of 20%, it is expected that a good-quality model will have more than 90% of its residues in the most favored region.
3.3 Structural alignment
Previous studies have reported a high degree of sequence conservation between the various subunits of the CENP-HIK complex across different organisms [14, 51]. It is therefore expected that the human model of each subunit displays a high level of structural similarity with the reference structures from fungi and yeast (Fig. 2) to further validate the reliability and quality of the models.
3.4 Computational validation of residue conservation
Details of the CENP-HK binding interface at the C-terminal were revealed in the crystal structure of the fungal HIK complex (5Z08). The side chain of ILE-205, ILE-211 and LEU-219 from thCENP-H were shown to insert into the hydrophobic pocket of thCENP-K which is surrounded by several residues, including LEU-177, TRP-179, PHE-180, HIS-184, ILE-270 and PHE-300. On the CENP-HI interface, thCENP-H uses its contacting helix (HH2) in interacting with the ctCENP-INT HEAT repeat. A salt bridge was reported to be formed between the ARG-220 of thCENP-H and the GLU-86 of ctCENP-INT, while the LEU-224 was reported to insert into the ctCENP-INT hydrophobic pocket (surrounded by LEU-89, VAL-126 and VAL-130) . The alignment of amino acid sequences from different orthologs of CENP-H (T. terrestris, G. gallus, O. aries, R. norvegicus, M. musculus and H. sapiens) revealed a high degree of conservation in favor of the CENP-K and -I-binding residues of the protein, which in human correspond to LEU-219, VAL-225, LEU-233, LYS-234 and LEU-238 . Using ConSurf, we validated the degree of conservation of the reported residues in the hsCENP-H model. The output depicted that all the five reported residues (LEU-219, VAL-225, LEU-233, LYS-234 and LEU-238) are conserved with varying degrees of conservation (Additional file 1: Figure S6).
In a similar study involving the human CENP-M (PDB 4P0T), conserved surface residues were also identified to be involved in the interaction with the C-terminal of the hsCENP-I an interaction which leads to the stabilization of the hsCENP-I and likewise required for an unabated kinetochore localization . Using the same computational approach, the reported conserved surface residues of the hsCENP-M were also shown to be conserved and each exhibits varying degrees of conservation (Additional file 1: Figure S6B), hence validating experimental reports from the previous studies.
3.5 Protein–protein docking study
With the availability of the hsCENP-M crystal structure (PDB 4P0T) and having successfully generated high-quality models for each component of the hsCENP-HIK complex, we proceeded with the docking of the subunits. According to the Hu et al.  model, biochemical analysis and structures revealed that the thCENP-K and thCENP-H form a heterodimer via interactions at both N-terminal and C-terminal. The integration of ctCENP-INT into the complex is through its interaction with the thCENP-H C-terminal, resulting in the formation of a ternary complex where thCENP-H is sandwiched between ctCENP-INT and thCENP-K . The study also reported the conservation of this architecture in the human HIK complex. Upon the stepwise docking of each generated model of the hsCENP-H, -I, and -K, the resulting output showed a similar architecture to the experimental reports from the literature, suggesting a structural conservation across the species (Fig. 3).
In a similar study, Basilico et al.  reported the structural organization of the hsCENP-HIKM complex, using a computational model to represent the full-length hsCENP-I as there existed no full-length ortholog of the protein. Consistent with existing literature reports, the molecular docking output also showed that the hsCENP-M binds to the C-terminal of the full-length hsCENP-I model (Fig. 3) in an appearance that resembles the importin-β/Ran complex as reported by Basilico et al. . The α-solenoid fold of importin-β is consistently reported to be a high-confidence hsCENP-I structural modeling template  (Fig. 4).
3.6 Normal mode analysis
The quality and stability of the hypothetical hsCENP-HIKM model were evaluated through the iMod-estimated elastic network map, deformability, covariance map, eigenvalue and the B-factor (Fig. 5). The deformability of the main chain is an estimation of the deformation capability of a molecule at each of its residues. The B-factor (a crystallographic atomic displacement parameter) is reported for the most X-ray crystal structure of proteins, and it is directly related to the fluctuations due to static disorder or motion in structures. The B-factor also provides a measure of an averaged root mean square (RMS). Motion stiffness is represented by the eigenvalue that is associated with each normal mode. Its value is related directly to the required energy for structural deformation. The green- and red-colored bars show the cumulative and individual variances, respectively, while the covariance matrix denotes residue pair coupling, i.e., whether the paired residues experience anti-correlated, uncorrelated or correlated motions (colored in blue, white and green, respectively). Atom pairs that are connected by springs are defined by the elastic network model. Each graphical dot represents a spring between the corresponding atom pairs. The dots are colored based on their stiffness, which means the darker gray colors denote stiffer springs and vice versa. Figure 5 shows an average root mean square in the B-factor and an insignificant hinge. The high eigenvalue (1.375238e-06) is an indication of a low deformation chance, while the elasticity and correlation also demonstrated the high quality of the hypothetical protein complex model (Fig. 5).
To delineate the stability dynamics of individual components of the CENP-HIKM sub-complex of the CCAN, we performed a 100 ns molecular dynamics simulation, followed by different post-simulation analyses. The stability profile of each component of the sub-complex was first assessed through the RMSD calculation (Fig. 6A). The RMSD in bioinformatics is a measure of the distance backbone atoms of superimposed macromolecules. Inference regarding the stability of a protein can therefore be derived from its degree of deviation, as lower degree of deviation signifies a high level of protein stability .
The Rg (radius of gyration) is known as the distribution of protein atoms around its axis. Calculations of distance and radius of gyration are the most significant and widely used structural activity prediction indicators . Protein compactness is directly related to the folding rate of the protein, and these parameters can be monitored through the calculation of the radius of gyration . For each component of the CENP-HIKM complex, the degree of compactness was assessed through the calculation of their individual radius of gyration (Fig. 6B). Additionally, the solvent accessibility and the degree of motility of individual components were evaluated via the calculation of their individual SASA (solvent-accessible surface area) (Fig. 7A) and PCA (principal component analysis) (Fig. 7B), both of which are also key indicators of the stability of proteins . Taking together the resulting output of each post-simulation analysis over the 100 ns simulation period, CENP-I has been shown to be the most unstable of the four components of the sub-complex (Figs. 7 and 8).
A similar analysis was conducted using DynaMut. The DynaMut normal mode analysis protocol is based on a bio3D package that utilizes a default C-alpha force field. The DynaMut-calculated deformation energy gives an estimation of protein complex local flexibility, while the atomic fluctuation shows the amplitude for the absolute atomic motion. The predominant blue coloration of the 3D protein complex structure as depicted in Fig. 8A and B denotes a high level of structural stability. All calculations were performed over the first ten non-trivial modes of the protein complex. Included in the DynaMut output also is the flexibility trajectory of the protein complex based on normal mode analysis (Fig. 8C), and the correlation map which reveals the anti-correlated and correlated regions in the protein complex. Both regions (anti-correlated and correlated) on the map are colored in blue and red, respectively (Additional file 1: Figure S7). A 3D animation was also generated to simulate the motion of the protein complex (Additional file 2: Figure S8).
3.7 In silico mutagenesis and binding free energy prediction
Following the experimental mutational analysis from previous studies [14, 38], we designed in silico mutants of the hsCENP-H and hsCENP-M in an attempt to validate the predicted interactions between each subunit of the hypothetical hsCENP-HIKM complex (Additional file 1: Figures S9 and S10). In order to validate predicted interface interactions between the C-terminal of the hsCENP-H and other subunits (C-terminal of the hsCENP-K and the N-terminal of the hsCENP-I), Hu et al.  constructed several mutants of the protein (L219A, V225A, L233A, K234A and L238A) based on residue conservation. The mutated residues correspond to ILE-205, ILE-211, LEU-219, ARG-220 and LEU-224, respectively, in the thCENP-H. A dramatic reduction in binding affinity was recorded upon the mutation of each residue to alanine, indicating that the residues are essential for the protein–protein interaction of the complex. In a similar study by Basilico et al. , mutants of the hsCENP-M (L94A and L163E) were also designed based on residue conservation analysis and the mutation of both residues to alanine and glutamate, respectively, affected the interaction of the protein with the C-terminal of the hsCENP-I (Table 1).
Having successfully designed the in silico mutants of these proteins in line with reports from the existing literature, we predicted the binding free energy changes using several predictive tools as reported in the Materials and Methods section. The reduction in binding free energy as a result of these mutations shows the consistency of our computational model with experimental reports (Tables 2, 3, 4 and 5).
The binding free energy of the wild-type and mutant complexes were further estimated using the MM/GBSA approach which calculates ΔΔGbind based on molecular dynamics simulation of the protein–protein complex. The prediction which was achieved using HawkDock is intermediate both in accuracy and in computational effort between strict alchemical perturbation and empirical scoring methods. The output revealed the total binding energy scores on per-residue bases for both wild-type and mutant complexes (Tables 6 and 7). Detailed contribution of each residue in the complex can be accessed from Additional file 3: Tables S1–S6.
3.8 Interatomic interaction analysis
Protein–protein interactions are essential for regular biological processes and for the regulation of cellular reactions that affect the function and expression of genes. Several studies  have elucidated the role of protein–protein complex interface residues in conferring specificity and stability. Interface residues of proteins are known to interact with main chain and side chain atoms of their interacting partners. However, the impact and relative contribution of inter-protein interactions involving interface residue as compared to intra-protein interactions in protein–protein complexes are unclear. In order to ensure that essential interactions involved in the binding affinity and stability of the hypothetical hsCENP-HIKM complex are not overlooked, we report the observed changes in interatomic interactions of the wild-type and mutant protein complex subunits (Tables 8 and 9). A comparative study of the wild-type and mutant protein complexes showed that both inter- and intra-model interactions contributed to the stability of the complex (Additional file 1: Figures S11-S13). Upon the mutation of each residue, a dramatic loss of specific interatomic interactions (van der Waals interactions, hydrogen bond interactions and hydrophobic interactions) was observed, which speculatively led to the reported reduction in the experimental and predicted binding affinity of the mutants.
Being a busy environment, thousands of molecules constantly interact in the cell and through information exchange define the cellular metabolic state. Among all cellular homeostasis contributors, proteins are both the most active and most abundant ; therefore, understanding their interactions and delineating their information-sharing mechanism is essential for a detailed comprehension of cellular functionality. This further provides the first approach toward rational therapeutic agent development against many incapacitating or deadly diseases . Despite the advances in structure determination through experimental methods, most of the known protein–protein interactions still have no atomic structure. NMR spectroscopy and X-ray crystallography, both of which are high-resolution techniques struggle with high-throughput demand, while low-resolution methods like small-angle X-ray scattering and Cryo-electron microscopy provide excessively coarse data. The development of molecular docking or computational structure prediction was first aimed at complementing experimental results but has since developed into a lively and independent research field .
Elucidating the organization and structural architecture of the CCAN is crucial for the understanding of the functionality and assembly of the kinetochore. The CENP-H, CENP-I, CENP-K and CENP-M, among other subunits of the CCAN, have previously been reported to form a stable complex based on reconstitution experiments and proteomic analyses [23, 58]. Our study for the first time presents a computationally modeled high-quality structure of the human CENP-HIKM complex (Fig. 4) alongside a detailed report of the inter- and intra-residue interactions. Previously reported computational model of the hsCENP-I suggests that it assumes a fold in form of an α-solenoid which shares a resemblance with the folding of β-importin [59, 60]. The hsCENP-I N-terminal domain (composed of residues 57–281) was also reported to be sufficient enough for the binding of the hsCENP-H and hsCENP-K while the hsCENP-M sufficiently binds to the C-terminal domain. Contiguity between CENP-H, -I, and -K was hypothesized on the basis of proteomic analysis involving precipitates from phenotypic similarities as a result of individual subunit depletion, from two-hybrid interaction data and from cell lysates . Additional analyses suggest that the revealed complex interaction is a representation of the evolutionarily conserved assembling mechanism of the CENP-HIK complex .
Structures of biologically essential proteins are consistently in high demand, especially the large proteins and those that are members of complex systems. It is, however, not always feasible, for numerous reasons, to experimentally generate high-resolution structures using NMR, cryo-electron microscopy or X-ray crystallography. Among the numerous challenges are the poor diffraction of crystals, high aggregation and low stability of proteins . In silico molecular modeling in this situation can provide a high-quality alternative for experimental research. One of the most challenging computational biology problems has been shown to be the de novo structure prediction of proteins only from amino acid sequences . Recent advances in the field have revealed that some accurately predicted long-range contacts may permit correct topology-level structural modeling  and that the DCA (direct evolutionary coupling analysis) for most multiple sequence alignments may generate an appreciable amount of long-range native contacts for protein–protein interactions and proteins with a large number of homologous sequences [64, 65]. We have therefore employed the contact-assisted folding of proteins and contact prediction in the modeling of each subunit of the hsCENP-HIK 3D structure (Fig. 1, Additional file 1: Figures S2 and S3).
Significant improvement has been made toward the generation of potential protein–protein interaction networks with the use of mass spectrometry, yeast two-hybrid assays  and high-throughput proteomics studies . X-ray crystallography-obtained atomic-level details are frequently required for the mechanistic interpretation of observed interactions . However, the occurrence of most biologically relevant interactions is in transient protein complexes, which makes the experimental determination of their structures largely difficult, even when the structures of the interacting partners are known. Computational docking approaches have therefore been designed for the structural prediction of protein complexes with an accuracy similar to that provided by X-ray crystallography [69, 70]. A substantial amount of models with well-defined atomic positions are usually generated after protein–protein docking protocols, but the currently available scoring functions possess low predictive accuracy for reliable discrimination of models, and most often, models closest to the native structure are not easily detected solely through computational tools . However, our near-native model selection in this study was guided by the architectural similarity of each generated model with the fungal and yeast orthologs of the protein complex, previously reported to be evolutionarily conserved (Fig. 3).
The main cellular functions such as DNA replication, transcription, translation, protein folding and turnover are directed by large macromolecular complexes such as proteasomes, chaperonins, ribosomes and polymerases. The mechanism of action of these macromolecules is often dynamic and requires collective and large conformational changes . Normal mode analysis is an approach that can be used for the description of the accessible flexible states of a protein around an equilibrium position based on small oscillation physics. When a macromolecule in a minimum energy conformation is perturbed slightly, a force is activated to restore the system back to its state of equilibrium . There is always an equal division of vibrational energy in the system so that all vibrational modes have equal energy and the average amplitude of oscillation for any given mode scales as the inverse of its frequency. Thus, higher frequency modes with energetically greater displacement typically describe fast but small local amplitude movement relatively involving fewer atoms, while lower frequency modes describe slow displacements and changes in conformation on a large scale with the involvement of a larger number of atoms . Coarse-grained models merged with normal mode analysis have proven to be a popular and powerful substitute for the collective motion simulation of macromolecular complexes at extended timescales. In addition to the conformational sampling and motion dynamics visualization (Additional file 1: Figure S7 and Additional file 2: S8), the normal mode analysis result also suggests that the hypothetical protein model assumes a stable conformation (Fig. 5). Although the molecular dynamics simulation analysis (Figs. 6 and 7) showed that the CENP-I component of the sub-complex displayed a high degree of instability, based on the consistency in the stability profile of the other components of the sub-complex (CENP-H, CENP-K and CENP-M), we hypothesize that their interaction with CENP-I generally increases its stability profile, hence stabilizing the entire complex as demonstrated via the normal mode analysis.
An essential prerequisite for a regular biological function is the ability of a protein to establish inordinately selective interactions with its macromolecular partner. Sequence mutations that change protein interactions may lead to a complete functional abolishment or result in a significant perturbation . A feasible method to evaluate the mutational effect on the binding affinity of proteins is to experimentally quantify it. However, while site-directed mutagenesis methodologies are fast and inexpensive, FRET (fluorescence resonance energy transfer), isothermal titration calorimetry, surface plasmon resonance and other methods used for binding affinity measurements can be costly and time-consuming . We have therefore directed computational approaches toward the prediction of binding affinity changes upon mutation (Tables 2, 3, 4, 5, 6 and 7 Additional file 3: Tables S1–S6), which has shown great consistency with results from earlier reported experimental mutagenesis studies. Our interatomic interaction visualization study also provided insights into the molecular nature of the studied interactions and likewise the comprehension of the functional and structural impact of each mutation (Tables 8 and 9, Additional file 1: Figures S11–S13).
With the aid of extensive computational approaches and following experimentally validated site-directed mutagenesis from literature reports, we have designed a hypothetical model of the hsCENP-HIKM complex. Structurally refined models of each subunit were individually docked to generate a hypothetical complex which was subjected to several in silico protocols such as the normal mode analysis, in silico mutagenesis, binding free energy prediction upon mutation, and analysis of the non-covalent interactions, in an attempt to validate the model reliability. Knowledge of the hsCENP-HIKM architecture and the surface residues at the interaction site as presented in this study may provide more insight into the mechanisms of abnormal interactions in disease states, through the comprehension of simple molecular recognition mechanisms. Such information may present future therapeutic potentials for the rational development of drugs that regulate or mimic the effects of protein–protein interactions.
Availability of data and materials
Constitutive Centromere-Associated Network
Fast Fourier transform
Fluorescence resonance energy transfer
Knl1, Mis12 and Ndc80 complexes
Molecular mechanics/generalized born and surface area continuum solvation
Normal mode analysis
Principal component analysis
Protein data bank
Radius of gyration
Solvent-accessible surface area
Simple point charge
Vidal M, Cusick ME, Barabási A-L (2011) Interactome networks and human disease. Cell 144(6):986–998
Von Eichborn J, Günther S, Preissner R (2010) Structural features and evolution of protein-protein interactions. Genome Inform Ser 22:1–10
Kolodny R et al (2013) On the universe of protein folds. Annu Rev Biophys 42(1):559–582
Schlick T et al (2021) Biomolecular modeling and simulation: a prospering multidisciplinary field. Annu Rev Biophys 50:267–301
Rodrigues JP, Bonvin AM (2014) Integrative computational modeling of protein interactions. FEBS J 281(8):1988–2003
Klare K et al (2015) CENP-C is a blueprint for constitutive centromere–associated network assembly within human kinetochores. J Cell Biol 210(1):11–22
Cheerambathur DK, Desai A (2014) Linked in: formation and regulation of microtubule attachments during chromosome segregation. Curr Opin Cell Biol 26:113–122
Fukagawa T, Earnshaw WC (2014) The centromere: chromatin foundation for the kinetochore machinery. Dev Cell 30(5):496–508
Cheeseman IM (2014) The kinetochore. Cold Spring Harb Perspect Biol 6(7):a015826
Cheeseman IM et al (2006) The conserved KMN network constitutes the core microtubule-binding site of the kinetochore. Cell 127(5):983–997
DeLuca JG et al (2006) Kinetochore microtubule dynamics and attachment stability are regulated by Hec1. Cell 127(5):969–982
Izuta H et al (2006) Comprehensive analysis of the ICEN (Interphase Centromere Complex) components enriched in the CENP-A chromatin of human cells. Genes Cells 11(6):673–684
Okada M et al (2006) The CENP-H–I complex is required for the efficient incorporation of newly synthesized CENP-A into centromeres. Nat Cell Biol 8(5):446–457
Hu L et al (2019) Structural analysis of fungal CENP-H/I/K homologs reveals a conserved assembly mechanism underlying proper chromosome alignment. Nucleic Acids Res 47(1):468–479
Amano M et al (2009) The CENP-S complex is essential for the stable assembly of outer kinetochore structure. J Cell Biol 186(2):173–182
Westermann S, Schleiffer A (2013) Family matters: structural and functional conservation of centromere-associated proteins from yeast to humans. Trends Cell Biol 23(6):260–269
Black BE et al (2007) An epigenetic mark generated by the incorporation of CENP-A into centromeric nucleosomes. Proc Natl Acad Sci 104(12):5008–5013
Falk S et al (2015) Chromosomes CENP-C reshapes and stabilizes CENP-A nucleosomes at the centromere. Science 348(6235):699–703
Chittori S et al (2018) Structural mechanisms of centromeric nucleosome recognition by the kinetochore protein CENP-N. Science 359(6373):339–343
Musacchio A, Desai A (2017) A molecular view of kinetochore assembly and function. Biology 6(1):5
McKinley KL, Cheeseman IM (2016) The molecular basis for centromere identity and function. Nat Rev Mol Cell Biol 17(1):16–29
Jeganathan S et al (2016) Molecular basis of outer kinetochore assembly on CENP-T. Elife 5:e21007
Petrovic A et al (2016) Structure of the MIS12 complex and molecular basis of its interaction with CENP-C at human kinetochores. Cell 167(4):1028-1040.e15
Hara M, Fukagawa T (2018) Kinetochore assembly and disassembly during mitotic entry and exit. Curr Opin Cell Biol 52:73–81
Pentakota S et al (2017) Decoding the centromeric nucleosome through CENP-N. Elife 6:e33442
Kim S, Yu H (2015) Multiple assembly mechanisms anchor the KMN spindle checkpoint platform at human mitotic kinetochores. J Cell Biol 208(2):181–196
McKinley KL et al (2015) The CENP-LN complex forms a critical node in an integrated meshwork of interactions at the centromere-kinetochore interface. Mol Cell 60(6):886–898
Dequeker C et al (2022) From complete cross-docking to partners identification and binding sites predictions. PLoS Comput Biol 18(1):e1009825
Waterhouse A et al (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46(W1):W296–W303
Singh A et al (2020) Application of docking methodologies to modeled proteins. Proteins Struct Funct Bioinform 88(9):1180–1188
Wheeler DL et al (2007) Database resources of the national center for biotechnology information. Nucleic Acids Res 36(suppl_1):D13–D21
Berman HM et al (2002) The protein data bank. Acta Crystallogr D Biol Crystallogr 58(6):899–907
Wang S et al (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol 13(1):e1005324
Heo L, Park H, Seok C (2013) GalaxyRefine: protein structure refinement driven by side-chain repacking. Nucleic Acids Res 41(W1):W384–W388
Wiederstein M, Sippl MJ (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 35(suppl_2):W407–W410
Laskowski RA et al (1996) AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR 8(4):477–486
Yuan S, Chan HS, Hu Z (2017) Using PyMOL as a platform for computational drug design. Wiley Interdiscip Rev Comput Mol Sci 7(2):e1298
Basilico F et al (2014) The pseudo GTPase CENP-M drives human kinetochore assembly. Elife 3:e02978
Ashkenazy H et al (2016) ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44(W1):W344–W350
Kozakov D et al (2017) The ClusPro web server for protein–protein docking. Nat Protoc 12(2):255–278
Lopéz-Blanco JR, Garzón JI, Chacón P (2011) iMod: multipurpose normal mode analysis in internal coordinates. Bioinformatics 27(20):2843–2850
Rodrigues CH, Pires DE, Ascher DB (2018) DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic Acids Res 46(W1):W350–W355
Kohnke B, Kutzner C, Grubmüller H (2020) A GPU-accelerated fast multipole method for GROMACS: performance and accuracy. J Chem Theory Comput 16(11):6938–6949
Shapovalov MV, Dunbrack RL Jr (2011) A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19(6):844–858
Dehouck Y et al (2013) BeAtMuSiC: prediction of changes in protein–protein binding affinity on mutations. Nucleic Acids Res 41(W1):W333–W339
Rodrigues CH et al (2019) mCSM-PPI2: predicting the effects of mutations on protein–protein interactions. Nucleic Acids Res 47(W1):W338–W344
Rodrigues CH, Pires DE, Ascher DB (2021) mmCSM-PPI: predicting the effects of multiple point mutations on protein–protein interactions. Nucleic Acids Res 49(W1):W417–W424
Li M et al (2016) MutaBind estimates and interprets the effects of sequence variants on protein–protein interactions. Nucleic Acids Res 44(W1):W494–W501
Weng G et al (2019) HawkDock: a web server to predict and analyze the protein–protein complex based on computational docking and MM/GBSA. Nucleic Acids Res 47(W1):W322–W330
Jubb HC et al (2017) Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures. J Mol Biol 429(3):365–371
Zhang Z, Bellini D, Barford D (2020) Crystal structure of the Cenp-HIKHead-TW sub-module of the inner kinetochore CCAN complex. Nucleic Acids Res 48(19):11172–11184
Odiba AS et al (2022) A new variant of mutational and polymorphic signatures in the ERG11 gene of fluconazole-resistant candida albicans. Infect Drug Resist 15:3111
Idris MO et al (2021) Computer-aided screening for potential TMPRSS2 inhibitors: a combination of pharmacophore modeling, molecular docking and molecular dynamics simulation approaches. J Biomol Struct Dyn 39(15):5638–5656
Durojaye OA et al (2022) Identification of a potential mRNA-based vaccine candidate against the SARS-CoV-2 spike glycoprotein: a reverse vaccinology approach. ChemistrySelect 7(7):e202103903
Mosca R, Céol A, Aloy P (2013) Interactome3D: adding structural details to protein networks. Nat Methods 10(1):47–53
Vangone A, Cavallo L, Oliva R (2013) Using a consensus approach based on the conservation of inter-residue contacts to rank CAPRI models. Proteins Struct Funct Bioinform 81(12):2210–2220
Karaca E, Bonvin AM (2013) Advances in integrative modeling of biomolecular complexes. Methods 59(3):372–381
Nishino T et al (2012) CENP-TWSX forms a unique centromeric chromatin structure with a histone-like fold. Cell 148(3):487–501
Cingolani G et al (1999) Structure of importin-β bound to the IBB domain of importin-α. Nature 399(6733):221–229
Vetter IR et al (1999) Structural view of the Ran–importin β interaction at 2.3 Å resolution. Cell 97(5):635–646
Measday V et al (2002) Ctf3p, the Mis6 budding yeast homolog, interacts with Mcm22p and Mcm16p at the yeast outer kinetochore. Genes Dev 16(1):101–113
McPherson A (2004) Introduction to protein crystallization. Methods 34(3):254–265
Kim DE et al (2014) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins Struct Funct Bioinform 82:208–218
De Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14(4):249–261
Weigt M et al (2009) Identification of direct residue contacts in protein–protein interaction by message passing. Proc Natl Acad Sci 106(1):67–72
Ito T et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci 98(8):4569–4574
Ho Y et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415(6868):180–183
Ewing RM et al (2007) Large-scale mapping of human protein–protein interactions by mass spectrometry. Mol Syst Biol 3(1):89
Smith GR, Sternberg MJ (2002) Prediction of protein–protein interactions by docking methods. Curr Opin Struct Biol 12(1):28–35
Ritchie DW (2008) Recent progress and future directions in protein-protein docking. Curr Protein Pept Sci 9(1):1–15
López-Blanco JR et al (2014) iMODS: internal coordinates normal mode analysis server. Nucleic Acids Res 42(W1):W271–W276
Mahajan S, Sanejouand Y-H (2015) On the relationship between low-frequency normal modes and the large-scale conformational changes of proteins. Arch Biochem Biophys 567:59–65
Bauer JA, Pavlović J, Bauerová-Hlinková V (2019) Normal mode analysis as a routine part of a structural investigation. Molecules 24(18):3293
Stefl S et al (2013) Molecular mechanisms of disease-causing missense mutations. J Mol Biol 425(21):3919–3936
Wainreb G et al (2011) Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site. Bioinformatics 27(23):3286–3292
The authors received no funding for this project from any organization.
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Uzoeto, H.O., Cosmas, S., Ajima, J.N. et al. Computer-aided molecular modeling and structural analysis of the human centromere protein–HIKM complex. Beni-Suef Univ J Basic Appl Sci 11, 101 (2022). https://doi.org/10.1186/s43088-022-00285-1