The functional characterization of a protein sequence is one of the most frequent problems in biology. This is facilitated by the accurate 3D structure of the studied protein.
In the absence of an experimentally derived structure, comparative/homology modeling can sometimes provide a useful 3D model for a protein (target) that is related to at least one know protein structure (template).
Despite the progress in ab inito protein structure prediction, comparative modeling remains th eonly methos that can reliably predict the 3D structure of a protein with accuracy comparable to experimentally determined structure.
The 3D structure of proteins from the same family are more conserved at their primary sequence. Therefore, if the similarity between 2 proteins is detectable at the sequence level, structural similarity can usually be assumed.
Moreover, proteins that share a low or even a non-detectable sequence similarity, will often have similar structure.
There are several computer programs and web servers that automate the comparative modeling process (eg. Swiss Model Server). These construct an atomic-resolution mode of a protein from its amino acid sequence.
The quality of the model is dependent of the quality of the sequence alignment and template structure.
COMPARATIVE MODELING STEPS -
1) SEARCHING FOR RELATED PROTEIN STRUCTURE
Comparative modeling usually starts by searching the PDB of known protein structure using the target sequence as the query. This search is usually done by comparing the target sequence with the sequence of each of the structure in database. A variety of sequence-sequence comparison methods can be used. Frequently availability of many sequences realted to the target/potential template allows more sensitive searching with sequence profile moethods and HMM.A good starting point for the template searches are the many database search servers on the Internet.
2) SELECTING TEMPLATES
Once a list of potential templates is obtained using searching methods, it is necessary to select 1 or more templates that are appropriate for the particular modeling problem. Factors that are taken to account while selecting the templates:
o The quality of the template increases with the overall sequence similarity to the target and decreases with the number and length of gaps in alignment.
§ The simplest template selection rule is to select the structure with higher sequence similarity to the target sequence.
o The family of proteins that includes the target and templates can be frequently be organized into subfamilies.
§ The construction of a multiple alignment & a phylogenetic tree can help in selecting the templates from the subfamily that is closest to the target sequence.
o The similarity between the “environment” of the template with that in which the target needs to be modeled should be considered.
§ The term “environment” includes factors like solvent, pH, ligands and quaternary interactions.
o The quality of the experimentally determined structure is another important factor in template selection.
§ Resolution and R factor of crystallographic structure and the no. of restraints per residue for an NMR structure can indicate the accuracy of the structure.
§ This info can generally be obtained from the PDB template files of the article describing structure determination.
§ For eg., if 2 templates have comparable sequence similarity to the target, the one determined at highest resolution should generally be used.
3) TARGET-TEMPLATE ALIGNMENT.
To build a model, all comparative modeling programs depend on a list of assumed structural equivalence between the target and template residues. The list is defined by the alignment of target and template sequences. Search methods tend to be tuned for detection of remote relationships. Therefore, once the templates are selected, an alignment method should be used to align them with the target sequence. The alignment is relatively simple to obtain with the target-template sequence identity is above 40%.
Once the initial target-template alignment is built, a variety of methods can be used to construct a 3D model for the target protein.
The original and still widely used method is modeling by rigid body assembly.
This method constructs the model from a few core regions and from loops and side chains, which are obtained from dissecting related structures.
o AB INITIO loop prediction – is based on the conformational search or enumeration of conformations in a given environment, guided by a scoring/energy function.
o DB approach – consists of finding a segment of main chain that fits the two stem regions of a loop.
§ It is possible to estimate whether or not a given loop prediction is correct based on the RMSD – should be less than 2A.
5) MODEL EVALUTION.
After a model has been built, it is important to check for possible errors.
The quality of a model can be approximately predicted from the sequence similarity between the target and template. The sequence identity above 30% is a relatively good predictor of the expected accuracy of a model. However, other factors, including environment, can strongly influence the accuracy of a model.
For eg, some calcium binding proteins undergo large conformational changes when bound to calcium. If a calcium-free template is used to model the calcium bound state of a target, it is likely that the model will be incorrect irrespective to the target-template similarity.
If the target-template sequence identity falls below 30%, the sequence identity becomes significantly less reliable as a measure of expected accuracy of a single model.
COMMOIN USES OF COMPARATIVE PROTEIN STRUCTURE MODELS.
1) Designing (site-directed) mutants to test the hypotheses about function.
2) Identifying the active site, binding sites.
3) Searching for ligands of a giver binding sate.
4) Designing and improving ligands of a given binding site.
5) Protein-protein docking simulations.
6) Testing a given sequence-structure alignment.